BSc/MSci project ideas

The Software Development Team at King's College London develops programming languages and tools for them, often in collaboration with industry. If you have a genuine interest in programming, you should strongly consider choosing one of the projects below; if you merely want to survive your project, you probably want to look elsewhere for less challenging projects.

Each project will be associated with Laurence Tratt and a member of the Software Development Team so that you will have additional expert technical supervision. Projects in previous years have made contributions to projects such as Eco and PyPy, meaning that your project has the potential to be of genuine use beyond the confines of your degree. Some of the projects also have a strong research component and could potentially lead to research papers. Contact Laurence Tratt to discuss projects further. Please note that I will only supervise projects from the list below.

Incremental name binding

The Software Development Team's Eco editor incrementally parses text as the user types, giving it accurate and fast information about the program structure the user is editing. However, modern editors also provide feedback on name binding (e.g. has the user typed in a variable name that is not defined in the current scope?). Eco currently implements NBL to give such feedback. However, this approach is not incremental, and the entire program structure must be rescanned upon every change. In practise, these checks are too slow to be performed on every keystroke and are performed in batches, reducing the quality of feedback the user perceives.

This project's goal is to make name binding analysis in Eco incremental. At first this would involve extending the current implementation to make it minimally incremental, to get a feel for the topic. Once familiarity has been achieved, Spoofax's new name binding theory (which uses a graph structure that is probably more amenable to incrementality) can then be implemented and made incremental.

A faster parser generator in Rust

As part of our wider research, the Software Development Team has constructed the grmtools libraries, which process grammars for parsing computer languages. One of these libraries is lrpar, a Yacc-like parser generator. This means that one can take a grammar for real languages (e.g. Java 7) and create a parser that can take in programs written in that language and parse them.

Currently the compile-time parser is somewhat inefficient, using Serde to serialise large chunks of the system into a user binary. Quite a bit of this information is unneeded, and some of it could be represented more efficiently. This project will therefore move lrpar in the direction of Yacc-like efficiency.

A spreadsheet in Rust

As part of our work on language composition (see e.g. Eco or my 2014 talk at Mozilla SF), we would like to experiment with combining non-textual languages with textual languages. This project will create a simple spreadsheet in Rust, with a basic Gtk or Cairo GUI, and a simple expression language, such that the spreadsheet can be embedded into other programs. Advanced possibilities include creating graph representations of tabular data and having them updated live.

Components for a meta-tracer in Rust

We are currently working on a new research-level meta-tracing system written in Rust (i.e. an alternative, of sorts, to RPython). There are many components to this rapidly moving project (ranging from processor tracing to machine code generation to optimisation), and thus potential to jump in and find a distinct part of the project to work on. Because of the rapidly moving nature of this project, it is hard to say which will be the most appropriate component in advance.

Extending storage strategies

Storage strategies are used in PyPy to optimize lists, dictionaries, and sets. As the paper we published shows, they are extremely effective, speeding programs up by 18% on average.

However, this optimization has a weak point, which is the fact that adding an element of a different type to a big container that is optimized needs to change the internal representation, which can be very inefficient. A list containing many integers is stored efficiently, but adding a string to it requires work proportional to the size of the list.

The goal of this project is to find approaches for making this less likely to happen. One idea is to take inspiration from Google's V8 kinds, which record which points in a program undergo such transitions, and prevent later collections created from that point being optimised. However, implemented naively, such a mechanism may simply cause many lists to be stored inefficiently, negating storage strategies benefit. More information can be found in the related work section of the storage strategies paper above.

A bitset library based on Vob

Vob (“Vector Of Bits”) is a fast library for representing boolean sequences in Rust. This project will first add a “bitset” library “Sob” (“Set Of Bits”) based on Vob (similar to how bit-set relies on bit-vec). It will then take existing Rust crates which use bit-vec/bit-set and alter them to use vob/sob, benchmarking to ensure that performance has, at a minimum, not been negatively effected.

Extending DynamoRIO to use Processor Trace

DynamoRIO is a system for analysing and altering running binaries. It currently uses a slow, software-based mechanism to identify frequently executed parts of a program. This project will extend DynamoRIO to use Processor Trace (and probably libipt) to move this tracing into hardware, which should lead to a significant performance improvement.

Porting warmup_stats to Rust

The warmup_stats software that forms an important part of our Virtual Machine Warmup Blows Hot and Cold paper (see also the two blogposts on the topic) is written in a mix of Python (with parts using both CPython and PyPy) and R. For the large quantities of data that we now want to use with it, is rather slow. This project will port the Python parts of warmup_stats to Rust.