BSc/MSci project ideas

RSS feed: whole site

The Software Development Team at King’s College London develops programming languages and tools for them, often in collaboration with industry. If you have a genuine interest in programming, you should strongly consider choosing one of the projects below; if you merely want to survive your project, you probably want to look elsewhere for less challenging projects.

Each project will be associated with Laurence Tratt and a member of the Software Development Team so that you will have additional expert technical supervision. Projects in previous years have made contributions to projects such as Eco PyPy, meaning that your project has the potential to be of genuine use beyond the confines of your degree. Some of the projects also have a strong research component and could potentially lead to research papers. Contact Laurence Tratt to discuss projects further. Please note that I will only supervise projects from the list below.

Source code differencing with COPY and GLUE

Source code differencing (i.e. comparing one version of a program to another) is a common operation in software development. Tools such as diff display a simple line-by-line comparison of files, that show lines that have been deleted from the “old” version of the code and inserted into the “new” version. More sophisticated algorithms, such as GumTree, also take account of code that has been moved from one part of the file to another, or words that have been updated (e.g. the word “private” might be changed to “public” in a Java program).

The DELETE, INSERT, MOVE and UPDATE operations are common to many modern source code differencing systems, but a small number of other tools have added novel operations to this list, such as COPY and GLUE (see e.g. mh-diff).

As part of our wider research, the Software Development Team has created diffract, an implementation of GumTree in the Rust programming language. In this project, you will implement a source code differencer (either based on our code or written from scratch) that includes the operations COPY and GLUE from mk-diff, and compare its performance (in terms of both time and accuracy) to the more common algorithms which use the standard set of four operations.

Incremental name binding

The Software Development Team’s Eco editor incrementally parses text as the user types, giving it accurate and fast information about the program structure the user is editing. However, modern editors also provide feedback on name binding (e.g. has the user typed in a variable name that is not defined in the current scope?). Eco currently implements NBL to give such feedback. However, this approach is not incremental, and the entire program structure must be rescanned upon every change. In practise, these checks are too slow to be performed on every keystroke and are performed in batches, reducing the quality of feedback the user perceives.

This project’s goal is to make name binding analysis in Eco incremental. At first this would involve extending the current implementation to make it minimally incremental, to get a feel for the topic. Once familiarity has been achieved, Spoofax’s new name binding theory (which uses a graph structure that is probably more amenable to incrementality) can then be implemented and made incremental.

A compile-time parser generator in Rust

As part of our wider research, the Software Development Team has constructed an LR parser in Rust which creates LR parsing tables from a Yacc-like grammar. This is also the first stage in creating a parser generator. This means that one can take a grammar for real languages (e.g. Java 1.0’s grammar is a common example) and create a parser that can take in programs written in that language and parse them.

Currently the parser is a run-time component: i.e. you can give it a grammar and an input and it will create a parse tree. Traditional parser generators such as Yacc can also operate at compile-time: they generate code which then has a minimal run-time overhead. This project will look at extending lrpar so that it can also generate code at compile-time

A spreadsheet in Rust

As part of our work on language composition (see e.g. Eco or my 2014 talk at Mozilla SF), we would like to experiment with combining non-textual languages with textual languages. This project will create a simple spreadsheet in Rust, with a basic Gtk or Cairo GUI, and a simple expression language, such that the spreadsheet can be embedded into other programs. Advanced possibilities include creating graph representations of tabular data and having them updated live.

Components for a meta-tracer in Rust

We are currently working on a new research-level meta- tracing system written in Rust (i.e. an alternative, of sorts, to RPython). There are many components to this rapidly moving project (ranging from processor tracing to machine code generation to optimisation), and thus potential to jump in and find a distinct part of the project to work on. Because of the rapidly moving nature of this project, it is hard to say which will be the most appropriate component in advance.

Extending storage strategies

Storage strategies are used in PyPy to optimize lists, dictionaries, and sets. As the paper we published shows, they are extremely effective, speeding programs up by 18% on average.

However, this optimization has a weak point, which is the fact that adding an element of a different type to a big container that is optimized needs to change the internal representation, which can be very inefficient. A list containing many integers is stored efficiently, but adding a string to it requires work proportional to the size of the list.

The goal of this project is to find approaches for making this less likely to happen. One idea is to take inspiration from Google’s V8 kinds, which record which points in a program undergo such transitions, and prevent later collections created from that point being optimised. However, implemented naively, such a mechanism may simply cause many lists to be stored inefficiently, negating storage strategies benefit. More information can be found in the related work section of the storage strategies paper above.