BSc project ideas

RSS feed: whole site

The Software Development Team at King’s College London develops programming languages and tools for them. If you have a genuine interest in programming, you should strongly consider choosing one of the projects below; if you merely want to survive your project, you probably want to look elsewhere for less challenging projects.

Each project will be associated with Laurence Tratt and a member of the Software Development Team so that you will have expert technical supervision. Successful projects have the real opportunity to contribute code back to projects such as PyPy, meaning that your project has the potential to be of genuine use beyond the confines of your degree. Some of the projects also have a strong research component and could potentially lead to research papers. Contact Laurence Tratt to discuss projects further.

Static optimisation of RPython code

RPython VMs allow users to write a simple interpreter for a language and get a fast Just-In-Time (JIT) compiler for free (see e.g. my blog post for an overview). Users write their interpreters in a subset of Python, which is then converted to C, and optimised by GCC. At run-time, the JIT records code execution paths, optimises them, and converts them into machine code. The C code benefits from GCC’s static optimisations; and the run-time code benefits from the JITs run-time optimisations.

However, a reasonable proportion of the run-time optimisations could be performed statically, reducing the overhead on the JIT. In other words, we can apply some classic compiler optimisations (e.g. loop-invariant code motion or Common subexpression elimination) to the code that the JIT will eventually run. This will reduce start-up costs and make usefully lower the running times of RPython VMs. There is also a good possibility of improving the code passed to GCC, such that some optimisations that GCC currently misses may then be performed at the RPython level.

Extending storage strategies

Storage strategies are used in PyPy to optimize lists, dictionaries, and sets. As the paper we published shows, they are extremely effective, speeding programs up by 18% on average.

However, this optimization has a weak point, which is the fact that adding an element of a different type to a big container that is optimized needs to change the internal representation, which can be very inefficient. A list containing many integers is stored efficiently, but adding a string to it requires work proportional to the size of the list.

The goal of this project is to find approaches for making this less likely to happen. One idea is to take inspiration from Google’s V8 kinds, which record which points in a program undergo such transitions, and prevent later collections created from that point being optimised. However, implemented naively, such a mechanism may simply cause many lists to be stored inefficiently, negating storage strategies benefit. More information can be found in the related work section of the storage strategies paper above.

Composing SQPyte and PyPy

One extremely common use case of language composition is a language and a database. A common database used for small tasks in Python is Sqlite, a very lightweight database written in C. Sqlite’s implementation is small, well-documented and readable. It is also internally based on a virtual machine that has fairly standard opcodes (together with database-specific extensions).

We have already converted parts of the core virtual machine into RPython such that we have a JIT compiler version of Sqlite—we call this SQPyte. This already gives some speed-up, but we are likely to see much more impressive results when PyPy (or perhaps Hippy) are composed with SQPyte. The basic idea is to have the in-memory database be a part of the language implementation such that database calls and the like are dynamically optimised into the user’s program. This should remove a useful portion of the overhead associated with using databases.

An RPython C VM

C is not an obvious choice for an RPython interpreter: after all, we would rarely expect to generate code which runs faster than GCC. However, when we call C from other languages, there is great potential for cross-language optimisations that are impossible in traditional static compilation. This project will thus create an RPython interpreter for the C99 language. The problem can be made substantially easier by having another system (e.g. LLVM) do the initial processing of the C program, producing an AST which will make producing the interpreter much easier.

A wireless daemon for OpenBSD

OpenBSD lacks a simple wireless network manage to detect when networks become available / unavailable and to switch between them. Such support can be developed entirely in userland, but has not yet done so. The wireless daemon will be configured by a user file in /etc and semi-regularly poll the wireless and ethernet cards to determine if a new network can be connected to. It should detect when networks require authorisation to connect and execute a user-defined command (e.g. open the authorization page in a web browser).