BSc/MSci project ideas

The Software Development Team at King's College London develops programming languages and tools for them. If you have a genuine interest in programming, you should strongly consider choosing one of the projects below; if you merely want to survive your project, you probably want to look elsewhere for less challenging projects.

Each project will be associated with Laurence Tratt and a member of the Software Development Team so that you will have expert technical supervision. Successful projects have the real opportunity to contribute code back to projects such as PyPy, meaning that your project has the potential to be of genuine use beyond the confines of your degree. Some of the projects also have a strong research component and could potentially lead to research papers. Contact Laurence Tratt to discuss projects further. Please note that I will only supervise projects from the list below.

Static optimisation of RPython code

RPython VMs allow users to write a simple interpreter for a language and get a fast Just-In-Time (JIT) compiler for free (see e.g. my blog post for an overview). Users write their interpreters in a subset of Python, which is then converted to C, and optimised by GCC. At run-time, the JIT records code execution paths, optimises them, and converts them into machine code. The C code benefits from GCC's static optimisations; and the run-time code benefits from the JITs run-time optimisations.

However, a reasonable proportion of the run-time optimisations could be performed statically, reducing the overhead on the JIT. One possibility is to introduce classic static compiler optimisations (e.g. loop-invariant code motion or common subexpression elimination) to the code that the JIT will eventually run. A more interesting possibility is to perform static partial escape analysis on RPython code, in similar style to work on Truffle (RPython already performs regular escape analysis at run-time, which works well; we thus expect the static partial variant to be very effective) This will reduce start-up costs and make usefully lower the running times of RPython VMs. There is also a good possibility of improving the code passed to GCC, such that some optimisations that GCC currently misses may then be performed at the RPython level.

A previous student project that prototyped optimisations for PyPy ended up in code that was integrated into the main system (see some of the post-prototype code).

Cassowary constraint solver for GUI toolkits in Rust

The Cassowary constraint solver algorithm (see also Wikipedia's overview) is commonly used in GUI toolkits to produce visually appealing layouts. Existing GUI toolkits may use explicit positioning — where the position and size of each element must be explicitly specified — or box-model layouts (e.g. GTK+) — where elements are arranged in horizontal or vertical box or in tables. In contrast more modern toolkits such as those in MacOSX and iOS specify constraints that relate the position and size of elements to one another, e.g. the left edge of element B must be 10 units to the right of the right edge of element A and element A must be at least 50 units wide. The Cassowary algorithm is then used to solve these constraints, generating positions and sizes that satisfy the constraints as much as possible (there are some situations where the constraints are impossible to resolve due to be contradictory, in which case a best attempt must be made).

This project will implement a Cassowary constraint solver in Rust. It will first need to implement Rust data structures to represent variables, and linear equality/inequality expressions that combine variables, and a simple API to allow users to construct such data structures (possibly using operator overloading). A basic Cassowary solver will then solve the system of linear equations, and a simple visualizer (in Gtk or Cairo) constructed to see the results. Two possible extensions are as follows. First, the solver could be made incremental, so that some of the variables that are assigned initial constant values can be modified, with the algorithm incrementally modifying the solution, rather than re-computing from scratch. Second, the implementation could be made multi-threaded either by: running the solver in one thread and sending results to another thread that uses them, so that the solver can operate in a pipeline fashion; or (more challengingly) using multiple threads in the solver.

The Cassowary constraint solver algorihtm has been implemented in a number of languages: The original toolkit can be found at its original home at the University of Washington The original toolkit and its associated GitHub repository; a Java implementation; a JavaScript implementation; and a Python implementation.

A parser generator in Rust

As part of our wider research, the Software Development Team has constructed a basic LR parser which creates LR parsing tables from a Yacc-like grammar. This is also the first stage in creating a parser generator. This means that one can take a grammar for real languages (e.g. Java 1.0's grammar is a common example) and create a parser that can take in programs written in that language and parse them. Theoretical parsers simply answer yes/no as to whether a given file conforms to the grammar. More usefully, we would like to automatically create a parse tree. Similarly, we would like to experiment with error recovery and error reporting (see e.g. this 2003 paper and this blog post of an implementation of this idea).

A spreadsheet in Rust

As part of our work on language composition (see e.g. Eco or my 2014 talk at Mozilla SF), we would like to experiment with combining non-textual languages with textual languages. This project will create a simple spreadsheet in Rust, with a basic Gtk or Cairo GUI, and a simple expression language, such that the spreadsheet can be embedded into other programs. Advanced possibilities include creating graph representations of tabular data and having them updated live.

Extending storage strategies

Storage strategies are used in PyPy to optimize lists, dictionaries, and sets. As the paper we published shows, they are extremely effective, speeding programs up by 18% on average.

However, this optimization has a weak point, which is the fact that adding an element of a different type to a big container that is optimized needs to change the internal representation, which can be very inefficient. A list containing many integers is stored efficiently, but adding a string to it requires work proportional to the size of the list.

The goal of this project is to find approaches for making this less likely to happen. One idea is to take inspiration from Google's V8 kinds, which record which points in a program undergo such transitions, and prevent later collections created from that point being optimised. However, implemented naively, such a mechanism may simply cause many lists to be stored inefficiently, negating storage strategies benefit. More information can be found in the related work section of the storage strategies paper above.