In this post I'm going to flesh out an observation I made in that post, which is that some software needs thoughtful adaption to CHERI if we want to get the security advantages we hope for. Exactly what that thoughtful adaption might look like will vary, probably substantially, between different pieces of software. What, for instance, might it look like for critical, widely used, components? In this post I'm going to look at how memory allocators (henceforth "allocators"), one of software's most fundamental building blocks, can be adapted to CHERI. If you find this interesting, but want greater depth than I'll go into here, you might be interested in the paper Picking a CHERI Allocator: Security and Performance Considerations that this post is based upon.
A Simple Allocator
It is a truism that virtually every program needs to dynamically allocate memory. Our collective folklore tells us that allocators like dlmalloc or jemalloc are impressive pieces of software that improve on their predecessors, but very few of us can really explain why. We callmalloc
, realloc
, and
free
and magically chunks of memory are allocated, resized,
or freed on our behalf.
As is often the case, one can get a useful insight into allocators by stripping away as much of the cleverness as possible. It turns out that we can write an allocator sufficient to run some real programs in just 25 lines of C:
#include <stddef.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> static char *heap_start; static char *heap; static size_t HEAP_SIZE = 1024 * 1024 * 1024; void *malloc(size_t sz) { if (!heap) heap = heap_start = mmap(NULL, HEAP_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,-1,0); sz = __builtin_align_up(sz, _Alignof(max_align_t)); if (heap + sz > heap_start + HEAP_SIZE) return NULL; heap += sz; return heap - sz; } void free(void *ptr) { } void *realloc(void *ptr, size_t sz) { void *new_ptr = malloc(sz); if (ptr && new_ptr) memmove(new_ptr, ptr, sz); return new_ptr; }What we've implemented is a "bump allocator": every time we want to allocate or reallocate a block of memory, we "bump" (i.e. increment) the pointer stored in the variable
heap
.
The first time that we call malloc
, no heap has been allocated, so
we use the operating system primitive mmap
to reserve 1GiB of RAM
for our process (lines 8, 11, and 12). Because many systems become unhappy if we
return unaligned memory (e.g. memory which starts on addresses ending in an odd
number), we round the size requested up (on my machine to the next multiple of
16) to avoid such unhappiness (line 13). We then check if there is enough space
left in the heap for the allocation, returning NULL
if there is
not (line 14). If there is enough space in the heap we then return a pointer to
the beginning of the block (line 16) and "bump" the heap pointer (line 15).
Unsurprisingly, I have cut a few corners to keep things simple (e.g.
let's hope that mmap
doesn't fail!). Most obviously,
free
doesn't "return" memory to the heap: this is semantically
correct, though it will cause programs to run out of
memory rather quickly! You may also find the definition of realloc
surprising, because it copies however many bytes are now requested from the old
to the new block. This works because our heap is contiguous so, if there's
enough space for the new block, by definition we can copy at least that many
bytes from the old block.
So, does this really work? Let's start with the classic binary
trees benchmark which does a lot of memory allocation. What we want to do
is to compare a normal allocator against our bump allocator. There are various
ways we could do this, but the simplest is to put our allocator in a shared
object and use the "LD_PRELOAD
trick" [1].
Setting the LD_PRELOAD
environment variable to point at our
shared object allows us to override the system allocator on a per-execution basis.
In the following video you can see me (in order): run the benchmark on my
normal desktop computer with OpenBSD's default system allocator; compile the
bump allocator; and then use LD_PRELOAD
to override the system
allocator when I run the benchmark for a second time.
Not only does it run, but it's three times faster than when run with the system allocator!
Before you get too excited, let's remember that the bump allocator is,
to put it mildly, less general than the system allocator — this is not, in
general, a sensible performance comparison [2].
Still, it does give us a good degree of confidence
that we're really running the bump allocator in the second run of the
benchmark.
As that suggests, it's notoriously easy to fool yourself into thinking
you're testing your new allocator ("wow, I write software without bugs!") when
you're actually running the system allocator. We won't always be able to tell
easily from time
which allocator we're running so, to give us some assurance
for later examples in this post – and also because it's fun – let's add
this little chunk of code to our allocator:
__attribute__((destructor)) static void malloc_exit() { fprintf(stderr, "heap used %lu\n", heap-heap_start); }Now, when a program uses our bump allocator, it will print out how many bytes of heap were used during execution just before the program exits. Have you ever wanted to know how many bytes
grep
or vi
allocate during execution? Now's your chance to find out:
I had to be a little careful in the choice of programs I run above, because
Unix has, over time, slowly added more functions to the allocator API (most
of which are rarely used). Fortunately, we can handle most of those (again
with minor corner cutting) fairly easily.
It's quite good fun to see which software runs with this extended allocator
— assuming you've got enough RAM!
The Bump Allocator and CHERI purecap
What does it mean to run an allocator under CHERI? Let's start by assuming we're using "pure capability" (henceforth "purecap") CHERI. For our purposes, that means that 64-bit "pointers" become 128-bit capabilities, where the additional 64-bits contain various permissions. The main permissions we're interested in are bounds, which record the range of memory that a capability – and, vitally, those capabilities derived from it – are allowed to read/write from. When we compile a C program for CHERI we're actually compiling a "CHERI C" program. For our purposes, the semantics of CHERI C can be thought of as mostly the same as normal C's semantics, though "pointer types" are now "capability types" (and hence double the width they are in normal C).Perhaps surprisingly, our bump allocator not only compiles out of the box on purecap
CHERI, but both binary trees and vi run successfully. That sounds good —
until we start digging a bit. In order for CHERI to do something useful,
capabilities need to have "non global" permissions. What bounds does the capability
returned by our bump allocator's malloc
have? The easiest
way to see this is to allocate some memory and then use the CHERI API
to print out the capability's lower bound, and the length of the bounds.
Let's allocate two blocks of memory and see what the result is:
#include <cheriintrin.h> #include <stdio.h> #include <stdlib.h> int main() { void *c1 = malloc(4); void *c2 = malloc(4); printf("c1 address 0x%lx lower 0x%lx length 0x%lx\n", cheri_address_get(c1), cheri_base_get(c1), cheri_length_get(c1)); printf("c2 address 0x%lx lower 0x%lx length 0x%lx\n", cheri_address_get(c2), cheri_base_get(c2), cheri_length_get(c2)); }For each of the two capabilities returned by
malloc
this will
print: the capability's address (equivalent to the address a "pointer" normally
stores); the lower bound of memory the capability can address; and the length
of memory (relative to the lower bound) that the capability can read/write from.
I can derive other (valid!) capabilities from these that have different
addresses, provided that those addresses stay within the bounds.
So that you can play along at home, I'm going to run an emulator of Arm's
Morello, built with cheribuild:
you can build your own with ./cheribuild.py run-morello-purecap -d
. I'll
warn you now that the emulator isn't fast, but it is more than usable for our purposes.
If I run the code snippet above with, first, the default allocator and, second,
our bump allocator I see the following:
To make things a bit easier to follow, here's the critical part of
the video as text:
root@cheribsd-morello-purecap:~/cheri_malloc_blog # ./print_bounds c1 address 0x40c0e010 lower 0x40c0e010 length 0x4 c2 address 0x40c0e018 lower 0x40c0e018 length 0x4 root@cheribsd-morello-purecap:~/cheri_malloc_blog # LD_PRELOAD=$(pwd)/malloc2.so ./print_bounds c1 address 0x41400000 lower 0x41400000 length 0x40000000 c2 address 0x41400010 lower 0x41400000 length 0x40000000 heap used 4128
Both allocators return sensible looking addresses, but the bounds are very different: the default allocator returns capabilities whose bounds are restricted to the 4 bytes requested; but our bump allocator returns capabilities that can read/write from 1GiB of memory!
More formally, with the default allocator, neither c1
or
c2
can be derived from the other, but with the bump allocator
either capability can be derived from the other. While this isn't exactly
wrong, it means that we haven't gained much from CHERI: code with access to one of those two
capabilities can read/write from the same memory as the other capability.
Indeed, it turns out that every block of memory our bump allocator
returns will share the same bounds!
The clue to where those bounds comes from is that they span 1GiB of memory
— exactly the same quantity of memory our bump allocator requested from
mmap
. As that suggests, mmap
in CheriBSD returns a capability
whose bounds are at least as big as the quantity of memory that you requested,
and our bump allocator has simply copied those bounds over unchanged to the
caller of malloc
.
What Should an Allocator do on CHERI purecap?
Our bump allocator works without changes on CHERI purecap, but gives little meaningful security improvement relative to a non-CHERI system. We thus need to pause for a brief philosophical interlude: what should an allocator do on CHERI purecap?Naively we might think we want "maximum security" but, even if we could define what that means, we probably couldn't achieve it. Like most things in life, software nearly always requires trade-offs: if we want more of one thing (e.g. security) we'll have to accept less of another (e.g. performance or ease of use). Although I'm not going to dwell on it in this post, there is unlikely to be a "one true secure allocator" that works well in all cases.
Let's start with what security folk call "threat models". In our case, let's assume that our threat model is that external attackers might be able to take over one part of our program and then "upgrade" the attack to read all the program's heap data. Given this threat model, we can then think about techniques and technologies that might help mitigate the threat.
In our case, we have a promising technology (CHERI purecap) and we may then decide that having the program store capabilities with the most restrictive bounds possible is the right way to mitigate the "upgrade" attack. That implies that allocators should return capabilities with bounds restricted to just the memory allocated. Doing so will clearly make an attacker's life harder, even though it might not make it impossible [3].
Adjusting the Bump Allocator
Let's adapt the bump allocator'smalloc
so that it returns
capabilities whose bounds only allow access to the requested range of memory:
#include "cheriintrin.h" void *malloc(size_t sz) { if (!heap) heap = heap_start = mmap(NULL, HEAP_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON,-1,0); char *new_ptr = __builtin_align_up( heap, -cheri_representable_alignment_mask(sz)); size_t bounds = cheri_representable_length(sz); sz = __builtin_align_up(sz, _Alignof(max_align_t)); if (new_ptr + sz > heap_start + HEAP_SIZE) return NULL; heap = new_ptr + sz; return cheri_bounds_set_exact(new_ptr, bounds); }The main changes we've made are lines 7-10 and 15 where we perform an odd-looking dance with CHERI's APIs to create a pointer with the appropriate bounds. In essence, because CHERI permissions don't have enough spare bits to represent all possible lower bounds, we might have to align the heap pointer to an arbitrary amount (lines 7-8), and we may also align the requested length upwards for similar reasons (line 9). We then make sure we always align future pointers to
max_align_t
as in the normal
malloc
(line 10). Finally we return a capability with the heap
pointer we want, with the bounds set to those we've calculated (line 15).
If I compile
our CHERI-fied malloc
and run our little bounds printing program again, I can see that I really am
getting capabilities whose bounds mean that they cannot be derived from each
other:
$ LD_PRELOAD=$(pwd)/malloc4.so ./print_bounds c1 address 0x41400000 lower 0x41400000 length 0x4 c2 address 0x41400010 lower 0x41400010 length 0x4 heap used 4128That all looks good! But if I write this this little program:
#include <cheriintrin.h> #include <stdlib.h> int main() { void *c1 = malloc(4); c1 = realloc(c1, 8); return 0; }and run it, the program is terminated with a
SIGPROT
(the CHERI
equivalent of a SEGFAULT
):
$ LD_PRELOAD=$(pwd)/malloc4.so ./print_bounds In-address space security exception (core dumped)How can such a simple program cause a problem? The problem is in our original
realloc
. Earlier I wrote:
You may also find the definition of realloc
surprising, because it copies however many bytes are now requested from the old
to the new block. This works because our heap is contiguous so, if there's
enough space for the new block, by definition we can copy at least that many
bytes from the old block.
My corner cutting has come back to bite us! The problem is that when we pass to
realloc
a capability whose bounds allow access to 4 bytes and then
we try and copy 8 bytes from it, CHERI – quite rightly! – considers
this a violation of its rules and stops the program.
The fix for this is obvious: we have to know how big the block of memory we want to reallocate currently is. However, our simple bump allocator doesn't store that length. Adjusting it to do so wouldn't be rocket science, but it would be annoying [4]. Fortunately, since the input capability records the length of its bounds, we can use that information to make sure that we don't copy more from the input capability than we should:
void *realloc(void *ptr, size_t sz) { void *new_ptr = malloc(sz); if (ptr && new_ptr) memmove(new_ptr, ptr, cheri_length_get(ptr) < sz ? cheri_length_get(ptr) : sz); return new_ptr; }Note that we only copy the minimum of the input capability's length or the requested size (lines 4 and 5). With this modification, our allocator now works as expected:
$ LD_PRELOAD=$(pwd)/malloc4.so ./print_bounds heap used 32
What Have We Actually Done?
Amongst all the detail, it's easy to miss that our manual adaptions of the bump allocator to CHERI have meaningfully improved its security. If an attacker gets access to one capability returned bymalloc
, they have no way
of turning it into any other capability returned by malloc
. In
other words, we have given the user a new tool for "compartmentalising" their
heap allocated memory. Consider this simple program:
#include <cheriintrin.h> #include <stdio.h> #include <stdlib.h> int main() { void *c1 = malloc(4); void *c2 = malloc(4); printf("c1 address 0x%lx lower 0x%lx length 0x%lx\n", cheri_address_get(c1), cheri_base_get(c1), cheri_length_get(c1)); printf("c2 address 0x%lx lower 0x%lx length 0x%lx\n", cheri_address_get(c2), cheri_base_get(c2), cheri_length_get(c2)); c1[0] = 'a'; printf("c1[0] %c\n", c1[0]); printf("(c2 - 0x10)[0] %c\n", (c2 - 0x10)[0]); }Because we – as might a cunning attacker – know that the bump allocator places consecutively allocated blocks at predictable locations, we can try rederiving
c1
from c2
(line 16).
This code
executes successfully (i.e. insecurely!) with our original bump allocator on
CHERI purecap, but is terminated with a SIGPROT
with our adapted bump
allocator.
This is not to say that our adjusted allocator magically makes programs
impervious to attackers. Most obviously, the user still has to think
carefully about which parts of the system should have access to which capabilities
returned by the allocator: handing out capabilities without sufficient thought
tends to undermine the
security properties the user was hoping for. Less obviously, even our adjusted
allocator can be tricked into doing things we might not expect. For example, I
can take a capability returned by the allocator, narrow its bounds or downgrade its permissions,
and hand it off to a less trusted part of my system. An attacker can then use
realloc
to obtain a more powerful capability than they started
with.
This isn't merely a theoretical concern. In Picking a CHERI Allocator we introduced 5 simple "attacks" which, in essence, model situations where an attacker has gained initial access to a system and is trying to widen the scope of the attack. We then tried these attacks on a number of different allocators that have been ported to CHERI purecap. Only one allocator was invulnerable to these simple attacks — and the default CheriBSD allocator, a jemalloc fork, was vulnerable to 3 of them! We didn't even try that hard to find such attacks — it seems likely to me that more attacks on CHERI purecap allocators could be found if someone has the time to investigate further.
Summary
If nothing else, I hope this post has made allocators seem a little less magical to you. On top of that, I hope that you believe that while CHERI (purecap or hybrid) can help you improve a program's security, it isn't magic: just because something runs correctly on CHERI doesn't necessarily mean it's any more secure than running it on a traditional system.
The adjustments we made to the bump allocator to take advantage of what CHERI has to offer are not huge, but they aren't merely mechanical either. As that suggests, to take full advantage of CHERI you have to think not just about the code itself, but how that code is expected to be used.
Acknowledgements: my co-authors on Picking a CHERI Allocator (Jacob Bramley, Dejice Jacob, Andrei Lascu, and Jeremy Singer) did all the hard work and generously offered comments on this post. In focussing on one aspect of that paper for this post, I've probably introduced various mistakes, for which I am solely responsible!
Footnotes
[1]LD_PRELOAD
isn't a trick but using it is trickier than people
often realise. First, you must give it an absolute path to a shared
object. Second, it's astonishingly easy to create an input shared object that
doesn't contain all of the functions you need, or expected, to override.[2]There are a handful of use cases for making free a no-op. First, some people really need their programs to run faster and are prepared to buy a lot of RAM to make that happen — yes, this really does happen! Second, sometimes we want to know what the relative proportion of time we spend in an allocator is — our simple bump allocator gives us a good idea of the "intrinsic" costs of a program with the allocator factored out. That said, modern hardware with its many layers of caches, increasingly requires some caution in interpreting these numbers, because fragmentation (or lack of it) can have noticeable effects.
[3]For example, unless we explicitly protect the "super capability" returned by
mmap
, that is an obvious weakness in our mitigation. CHERI
gives us tools which can make such protection plausible, but I won't cover
them in this post.[4]It is sometimes considered
realloc
's original sin that it doesn't
require the caller to track the input block's current size. Every allocator
then has to record (or otherwise derive) the size of a block, which can be a
costly operation (in terms of memory use or processor time). That said,
it can be rather annoying to use an allocator where you have to track the size
of all allocations yourself. Swings and roundabouts...LD_PRELOAD
isn't a trick but using it is trickier than people
often realise. First, you must give it an absolute path to a shared
object. Second, it's astonishingly easy to create an input shared object that
doesn't contain all of the functions you need, or expected, to override.mmap
, that is an obvious weakness in our mitigation. CHERI
gives us tools which can make such protection plausible, but I won't cover
them in this post.realloc
's original sin that it doesn't
require the caller to track the input block's current size. Every allocator
then has to record (or otherwise derive) the size of a block, which can be a
costly operation (in terms of memory use or processor time). That said,
it can be rather annoying to use an allocator where you have to track the size
of all allocations yourself. Swings and roundabouts..."Programming" and "Programmers" Mean Different Things to Different People
August 23 2023
Late last year on my blog I published "How Might Generative AI Change Programming?". Although I'm slightly biased, I'm still comfortable with that post's high-level message. However, there is one important aspect that I think I failed to bring out as explicitly as I should have: different communities' programmers will be affected in different ways.One reason I made this mistake is that I had overlooked how different communities' assumptions about "programming" and "programmer" [1] have diverged. As with most instances of incommensurability, the people involved (e.g. me) often don't realise that they're talking about slightly different things.
As a concrete example, a couple of weeks back, the Economist – whose coverage of areas I know about I find to be rather good – published an article with the following claim:
An AI "co-pilot" on GitHub, a Microsoft-owned platform for open-source programs, improves coders’ productivity by 30%.My initial reaction was incredulity. AI tools are not currently even remotely close to speeding up my "day job" programming – which involves careful development of programming language tools like compilers – by anywhere near 30%.
But, after a little thought, I realised that this statement is true for some of the programming I have to do. Recently I had to do a little bit of JavaScript programming, a language which I barely know, and which I am therefore rather unproductive in. After a quick query using ChatGPT I had gained enough insight to find a webpage that let me quickly do the task I needed. I expect that, overall, I was able to complete this task in half the time I would have been able to before ChatGPT came along.
What this highlighted to me is that there are at least two very different styles of programming that I engage in. In one style, AI/ML has, so far, had virtually no effect on me; in the other style, it can be a noticeable productivity boon.
It didn't take me long to realise that was a specific instance of something more general: different people increasingly mean different things when they talk about "programming" and "programmers". Some of this is due to the gradual, and inevitably incomplete, diffusion of understanding about software throughout wider society. But even within the communities of people whose livelihoods involve staring at screens, there are important differences of understanding that can mean we end up talking past each other.
In this post I'm going to define what seem to me to be the three major categories of "programming" and "programmers" I come across, moving from the narrowest to the broadest. While I have my own preferences as to which categories I personally attach the terms "programming" and "programmers", I'm not interested in dying in a ditch over terminology. Rather, it's already helped me better understand what other people mean when I'm talking to them.
The "narrow" category
When I have to write a short description of what I do for work, I use two words: programmer, researcher. The ordering is deliberate: I was a programmer before I was a researcher [2] and I think of programming as my craft.My self image of myself as a programmer is reflected in the way I approach programming. Most obviously, I am willing to attempt – though I may not always succeed at! – hard programming tasks (e.g. writing low-level and/or highly optimised code). Less obviously, I think of programs and programming itself as something worth careful and continual consideration. I put deliberate effort into practising programming and I spend a lot of time trying to make existing programs better. In many ways, how I program, and the quality of the result, is more important to me than what I am programming.
Although I am nervous of suggesting I am representative of anything other than my own odd self, I believe that many of the people I meet professionally have a similar outlook. We call ourselves "programmers" not just because that's how we spend most of our time, but because it is the activity we care most about. We therefore tend to put a lot of effort into thinking about programming, and how to improve the programs we're working on. We are willing to move between organisations who want us to program very different things because we want to program and we're often less worried what we program.
The "midpoint" category
You will probably not be surprised to learn that the "narrow" definition of "programmer" and "programming" is the one that I implicitly used for much of my programming journey.The first time I realised that my definition didn't work well was when I interacted more closely with system administrators. These are the folk who keep computing services such as servers running. When I started my PhD, my department had a small but gifted group of system administrators. I enjoyed the company of the "sysadmins" (as we abbreviated their title) a great deal, in no small part because they were the most technical people in the entire department [3]! I learnt a great deal from this group because a good proportion of the tasks they undertook clearly involved programming. Their programming wasn't the same as mine – it was smaller scale, with many of the programs deliberately "throwaway" – but it was similar enough that we could talk productively to one another.
Since then, I've encountered ever more groups of people who are using computers in a way that, at the very least, can be thought of as "programming lite". One obvious group are the most sophisticated of spreadsheet users who create complex expressions that, in conjunction with a spreadsheet's inherently reactive mode of operation, make my head hurt. More generally, there are a surprising number of users of "domain specific languages" (of which spreadsheet expressions are arguably a special, albeit wide-spread, case) who are commanding a computer in ways that are often like full-blown programming.
It seems to me that two things unite "programmers" and "programming" in this category.
First, the people involved rarely call themselves "programmers". They are "system administrators" or "scientists" or "accountants" and so on. In other words, they invariably have a non-programming role that programming just so happens to help them with.
Second, "programming" is both a help and a hindrance to them. If you're trying to crunch a load of data from astronomy experiments, you probably need to program a computer to deal with the sheer quantity of data. However, the time spent programming is a cost to be born in pursuit of a bigger task: the less time you spend programming, the more time you spend on the thing you care most about.
These two factors combine in an interesting way. In general (though there are exceptions), the programs that programmers in this category write tend to be smaller than those in the "narrow" category. Typically, the people involved also put less effort into learning exotic tooling [4] and writing "better" programs than in the "narrow" category. This has some interesting knock-on effects. For example, those of us in the "narrow" category are often obsessed with making software as reusable as possible; in the "midpoint" category, there tends to be a lot more "copy, paste, and tweak". It's easy for people like me to dismiss this as sloppy workmanship, but that misses the point: people in this category necessarily have to reserve much (probably most) of their time and mental bandwidth for the non-programming aspects of their role.
The "broad" definition
I can still remember the first time that I saw a CV that claimed one of the programming languages this person knew was HTML. I scoffed: HTML isn't a programming language! After all, all one is doing with it is writing down how text and images should be formatted for humans to read — that's a long way from doing the sort of fiddly control flow manipulation that I associate with programming!I've now lost track of how many people I've seen claim HTML as one of the programming languages they know. I've also encountered a growing number of people who talk about "programming" HTML (by which they mean HTML and CSS, though not JavaScript unless stated explicitly). The latter group are interesting as virtually none of them would claim that they could program non-HTML/CSS software and, indeed, I don't think I've ever heard any of them call themselves a "programmer" even when they say they're "programming".
Before I start to sound too snobby, it's now clear to me that creating a good modern web page is a difficult task, which is why my own website is both basic and ugly. Whenever I want a webpage to do something even moderately complex, such as display nicely on a mobile device, I either spend inordinate time on the task, or plead with someone for help. For some reason, I struggle to make sense of the way that CSS elements interact with each other, and it's not like I've only spent 5 minutes trying to understand it. Clearly this stuff is more complex [5] than I once gave it credit for!
It seems to me that what makes programming in this category (which one also sees in some "domain specific languages") different to the previous two categories is that it has little or no notion of "time". In most of the programming I do, I have to think about "if X then happens then Y happens which means that Z can then happen". Put formally, the system continually changes state (in the sense of "statemachine" state), and I have to reason about each state individually. In contrast, one can generally think of HTML as occupying a single state. That means that the reasoning programmers in this group have to do about the system state is generally much more restricted in nature than those in the "narrow" definition from earlier. This isn't a bad thing — anyone who's tried to write a GUI in a fully-fledged programming language will know that the flexibility this offers comes with costs that can be borne repeatedly over time!
Summary
A pithy summary of the three categories, in order, are:- Programmers who always call themselves programmers, call the activity they engage in programming, and who write arbitrarily complicated programs.
- Programmers who generally don't call themselves programmers, generally call the activity they engage in programming, and who tend to write slightly less complicated programs.
- Non-programmers who don't call themselves programmers but who sometimes call the activity they engage in programming.
Exactly where you consider the cut-off point for a "programmer" or for "programming" is an individual choice. I have my own preferences, but it doesn't worry me if you use a different definition. I also hope that I'll now be aware enough to realise how you're using the term, so that you and I will communicate more effectively!
Acknowledgements: thanks to Martin Berger, Lukas Diekmann, and Hillel Wayne for comments.
Footnotes
[1]"Programmer" often doesn't sound grand enough as a job title, so one finds "software developer" or "software engineers" or sometimes just "engineer" used instead. Interestingly, I find that when people with these job titles refer to the craft element of their job, they frequently revert back to using the term "programmer".[2]It would not surprise me at all if, in the future, I am a programmer after I'm no longer a researcher.
[3]The head of the group, Neil Faulks, even created his own Linux distribution "Neilix" (a pun on Linus Torvald's naming of "Linux" after himself). In contrast, the academic group I was part of consisted almost entirely of theoreticians. I learnt a lot from them too, but not about programming.
[4]For example, until recently, I believe that the majority of the people I came across in this category did not use version control software (e.g. git). Interestingly, that seems to have changed rapidly in the last 3 or 4 years, although I'm not sure why.
[5]CSS is in fact Turing complete but I think it's fair to say that very few of us get to the point that we realise that.
pizauth: First Stable Release
August 13 2023
In pizauth: dump and restore I said:However, there was one feature I couldn't find a good design for: persisting state to disk. This missing feature had been nagging away at me (and a small, though growing, number of other people) for quite a while: recently I finally alighted upon a design that I thought would work well. pizauth-0.2.2 is the first release to support this. Perhaps this will be the last release before 1.0.0!I was a little too optimistic. One of pizauth's configuration option names had long bothered me, and addressing that caused a backwards-incompatible change that forced a 0.3.0 release.
Since then, I've added only one feature to pizauth: an info
sub-command. This was explicitly inspired by Chris Siebenmann's "Everything
that uses configuration files should report where they're located" post.
pizauth info [-j]
informs the user where pizauth is looking for
configuration files and so on. For example:
$ pizauth info pizauth version 1.0.0: cache directory: /home/ltratt/.cache/pizauth config file: /home/ltratt/.config/pizauth.confAdding
-j
outputs the same information in JSON format [1] for integration with external tools. The
info_format_version
field is an integer value specifying the
version of the JSON output: if incompatible changes are made, this integer will
be monotonically increased.
It's just under a year since pizauth's first alpha release. While I have made three breaking changes since that first release, I did so early on. Two of the three changes are fairly trivial renamings of configuration options; the third is only marginally more "breaking". All in all, pizauth's initial interface seems to have held up fairly well, and while I can think of some possible additions I might make one day, I can't think of any breaking changes.
It thus seems that the time is right for the pizauth's first stable release: 1.0.0 is out! By "stable" I mean that, as far as possible, future releases will not change (though I might add to) the interface — and if such changes are necessary, the major version number will be bumped [2]. In one sense, the 1.0.0 version number means very little, but in another it means a great deal. When releasing software, it's easy to jump too early to "stable" or to be scared to ever do so. 11 months after pizauth's first release, and with a growing number of users, feels about the right time to me to commit to a stable release. Enjoy!
Footnotes
[1]I am not especially fond of JSON per se, but it is now a widely accepted interchange format. The older I get, the more I tend to prefer standards, even if they're not the standard that I would most have preferred.[2]Yes, I will be following Semantic Versioning — another standard that I'm not especially fond of, but which is better than nothing.
The Need to Explain
July 18 2023
One of my in-built quirks – present, I'm told, from a very young age – is spotting patterns. Whenever I see X, I also see Y; there are 3 As for every B; if this happens, then that is more likely to happen afterwards; and so on. It's not something I do particularly consciously, and few of my observations even reach the lofty heights of banal. Fortunately, it's not all wasted effort, and a small number of these observations end up making my life a little better. My train home always leaves from platforms 1 or 8. If I arrive at the station as my train is due to leave, my best bet is not to check the departure boards but to run past platform 8 to see if my train is there and, if it isn't, run straight to platform 1.As I've grown older, I've learnt to keep more of the observations I've made to myself. This isn't because my observations very often offend, but because some people view them as creating a void which must be filled, as soon as possible, by an explanation. The problem with spontaneous explanations on topics beyond our purview is that they are often obviously wrong. One of my other quirks is that I find it difficult not to rebut obviously wrong explanations. This can then cause an endless loop of further explanations and rebuttals, which is rarely useful or enjoyable.
To my surprise, I've realised that further attempts at spontaneous explanation become less convincing more often than they become more convincing. One obvious lesson to draw from this is that it is easier to disprove incorrect explanations for a phenomenon than it is to generate a correct explanation. As a researcher, I have learnt that lesson the hard way more than once!
But, over time, I've come to be fascinated by the meta-observation: many of us seem to feel a need to find explanations for everything. I certainly felt that need when I was younger, due to a combination of insecurity and egotism. But, at some point, I realised that while I really want to find out the explanations for a small handful of observations, I'm generally comfortable with not having an explanation.
Why does my train leave from platforms 1 or 8? I probably could find out the explanation for this if I really wanted to, but it doesn't seem like a good use of my time. In many cases – for example, pretty much anything involving physics – I'm probably not capable of uncovering a compelling explanation even if I devote the remainder of my days to the investigation!
Of course, I sometimes can't help myself idly speculating as to possible explanations for an observation. Perhaps this suggests that we have a deep need to feel that we live in a world governed by well-defined laws? Perhaps our willingness to spontaneously think of explanations is the enabler of progress? Honestly, I don't have a clue. The observation alone has been enough to change my behaviour, and make my life a little easier.
Two Stories for "What is CHERI?"
July 5 2023
Some readers might have heard of CHERI. Of those who've heard of it, most have probably got the rough idea that it's a "more secure" CPU — but, after that, I find that most people start to get rather confused when they try and nail down "what is CHERI?".Having grappled with this myself, and having helped others grapple it, I think that the major reason is that the high-level story requires understanding quite a bit of detail — hardware, Operating Systems (OSs), programming languages (and some end user programs) all have to be adapted for CHERI [1]. Most people tend to focus at first solely on the CPU, but one can't form a coherent story from the CPU alone. Those who persist and piece together the other parts of CHERI can then get swept up in the novelty of the idea, and assume that CHERI magically secures everything it touches. However, CHERI is just a tool and, like all tools, it's better at some things than others — and, even then, only when used thoughtfully.
In this blog post I'm going to explain just enough of CHERI to put together two coherent stories for what it is and where it might be used. I'm not going to try and cover all the possibilities, since, as you'll see, this post is long enough as it is! One reason for that is that CHERI contains overlapping concepts that can be combined in more than one way. Because of that, I'll warn you in advance that there is a fairly hard switch between the two stories in this post — sorry in advance! I'm also not going to be afraid to diverge from CHERI's terminology, which has a number of historical quirks that I find add to the confusion.
The rough idea
Let's start with a quote from the main CHERI site:
CHERI extends conventional hardware Instruction-Set Architectures (ISAs) with new architectural features to enable fine-grained memory protection and highly scalable software compartmentalization. The CHERI memory-protection features allow historically memory-unsafe programming languages such as C and C++ to be adapted to provide strong, compatible, and efficient protection against many currently widely exploited vulnerabilities.In simpler terms, a CHERI CPU is one that supports extra security features, most notably "capabilities" [2] – pointers with extra information – that can help catch flaws such as buffer overruns before they become security problems. Programs written in languages such as C can be recompiled, mostly with few source-level changes, so that they can make use of capabilities when run.
Let's imagine I have this seemingly innocuous C code:
char *buf = malloc(4); strcpy(buf, "abcd"); printf("%s\n", buf);At first glance, you could be forgiven for thinking that
strcpy
copies 4 bytes into
buf
. However, strcpy
adds a NULL byte to the end of
the string it copies, so 5 bytes are written into buf
! This is a
classic buffer overrun, of the sort that has led to a great many security flaws
in real software.
When compiled for a normal CPU and OS, the
chances are that this snippet will appear to run correctly, though it may
corrupt the program's internal state and cause security problems. If I
compile it with a CHERI C compiler,
for a CHERI CPU, and run the resulting executable on a CHERI-aware OS, the
incorrect write of the fifth byte will be detected by the CPU, and the program
terminated with a SIGPROT
(the capability equivalent of a
SEGFAULT
). We can see that happening in the video below, where
in the split tmux
we have my normal OpenBSD desktop running
in the top and an Arm CHERI machine with a CHERI-aware OS in the bottom:
The video shows OpenBSD statically warning me that strcpy
is
dangerous which is clearly true — my buggy program ran to completion on what is
often considered the most security conscious Unix around! But on the bottom, with a
CHERI-aware OS running on CHERI hardware, the buffer overrun was detected and
the program stopped before it could become a security issue. Running gdb
shows that the buffer overrun ("Capability bounds fault") is detected while
strcpy
is running.
I've been deliberately explicit that there are three "CHERI" factors
involved in this: a variant of C ("CHERI C") that knows that CHERI
capabilities are pointers that carry extra information about the
size of a block, and a compiler capable of dealing with this; an operating
system (as in the video above that currently typically means CheriBSD, a FreeBSD variant, although some
CHERI machines now have a Linux available)
whose malloc
[3] returns capabilities that record the size of the block it
has allocated; and a CHERI CPU which detects the write past the end of the
block.
We thus immediately see an obvious point of confusion: when we say "CHERI" do we mean the language, OS, CPU, an abstract definition of what CHERI is capable of, a model of capabilities, or some combination of all of these factors? In informal conversation, I've heard all of these possibilities occur — frequently, the two speakers don't even realise that they're using the term to refer to different things! I don't think it's possible to nail down a single meaning for "CHERI". Instead, I use the unqualified term "CHERI" to refer to the very high-level idea and concepts and then, when talking about a specific component, to qualify it accordingly (e.g. "CHERI CPU" or "CHERI OS" and so on).
CHERI CPUs
You might have noticed above that I said "a" CHERI CPU, not "the" CHERI CPU. We thus see another point of confusion: CHERI features can be added to any CPU. Indeed, there are at least three different CHERI CPUs available. Chronologically, the first was based on a MIPS processor, though I believe that's no longer supported. Arm's experimental Morello CPU is a mostly normal, fairly modern, AArch64 ARMv8-A processor, extended with CHERI features (they're not available commercially, but are/were available to researchers such as myself). There are also at least two RISC-V CHERI processors (desktop and microprocessor) CHERI processors under design: I'm not sure if physical versions are available yet, but there there is at least simulator/emulator support.
A CHERI CPU can be thought of as a combination of the traditional things that make up a CPU, plus the things necessary to support the capability features that CHERI defines. For example, Morello supports the normal AArch64 instruction set "A64" as well as a second "pure capability" instruction set "C64": programs can switch between the two on-the-fly (though this is normally not exposed to the high-level programmer). This might sound sci-fi, but it's not unprecedented: for example, some CPUs can switch between different instruction sets on-the-fly [4].
Practically speaking, even when taking advantage of CHERI-related features that the CPU offers, one can write "CHERI software" that is largely ignorant of the specific CHERI CPU it's running on. There are of course some visible differences between the different CHERI CPUs, but it's rather like the situation with normal CPUs, where I can write code that's largely ignorant of whether it's running on, say, AArch64 or x86_64.
Capabilities
It's now time to define a capability. Semi-formally, I think of a capability as a token that a program can pass to the CPU to perform actions on its behalf. Unfortunately, I find that that definition only really makes sense when you already know what a capability is! To try and break that cycle, I'll explain things using a series of gradually less inaccurate approximations.
For our first approximation, let's say that a capability is a pointer that
comes with permissions. Amongst a capability's permissions are
traditional things like "can this pointer read and/or write to the address
pointed to?" If I try and use a capability in a situation for which it doesn't
have sufficient permissions, my program will be terminated with a SIGPROT
.
New capabilities
can be derived from existing capabilities. The simplest derivation
is to simply copy the capability unchanged. I can also derive a new capability that
has fewer permissions: for example, I can take a capability which can read
and write to an address and derive a new capability which can read but not
write to the address. However, I cannot add to a capability's permissions
— this property is crucial to CHERI's security. There are some interesting
implications to this. First, if a program destroys (e.g. by writing NULL
)
the only copy it has of a powerful permission, it can never recover those permissions.
Second, a CHERI system must start with a "root" or "super" capability that has
the maximum set of permissions that will ever be needed.
To understand how one can be prevented from extending a capability's permissions, we need a slightly better approximation. On a normal CPU, pointers are interchangeable with integers representing memory addresses [5] — I can magic an integer out of thin air and load a value from the address in memory that matches that integer. On a CHERI CPU, capabilities are not interchangeable with integers / addresses. I cannot magic a new capability out of thin air — only the CPU can create new capabilities, and it will only let me do so by deriving them from existing capabilities. This allows the CPU to enforce the rules about how an (authentic) capability can be derived from an (authentic) input capability.
We can improve our approximation further by realising that every location in
memory that can store a capability has an authenticity bit [6] associated with it
[7]. If a
program asks the CPU to
perform a capability-related operation on an inauthentic capability, it will be terminated
with a SIGPROT
. It might seem odd that one can have inauthentic capabilities,
but amongst their uses are that they are needed to represent arbitrary binary data (e.g. a JPEG
file in memory). I can always derive an inauthentic capability from
an authentic capability, but never vice versa.
We can now answer the seemingly simple question: how big is a capability? Let's first refine our approximation by saying that a capability contains a memory address and a set of permissions. The address is the same size as normal — on a 64-bit CPU like Morello or CHERI RISC-V, that's 64 bits. On those CPUs the set of permissions are also 64 bits in size [8]. Depending on how you look at it, the size of a capability can be thought of as 128 bits (64 and 64, after all!) or 129 bits if you include the authenticity bit — however, since the authenticity bit can only be read separately from the other bits, we typically refer to a capability as 128 bits in size.
Putting a CHERI story together
At this point you can be forgiven a degree of confusion: I've given you lots of low-level detail, but no high-level story. Fortunately, with one more bit of detail, we can start to see a CHERI story come together.
Earlier I used read/write as an example of capability permissions. However,
by far the most commonly used permissions are bounds, which record
the range of memory that a capability can read/write from. We
saw earlier an example of a capability-aware malloc
returning a
capability that encoded how big the block was. For example, malloc(4)
might return
a capability with an address of (say) 0x200, a lower bound address of 0x200,
and an upper bound address of 0x204 [9]. If
I create a capability with an address outside of the bounds 0x200-0x204
[10] and try
to read/write from it, my program will terminate with a SIGPROT
.
Not coincidentally, capability bounds work very well with pointer arithmetic as found in C. Consider the following snippet:
char *buf = malloc(4); char *c1 = buf + 1; // In-bounds char *c2 = buf + 2; // In-bounds char *c3 = c1 + 1; // In-bounds (and equivalent to c2) char *c4 = buf - 1; // Out-of-bounds char *c5 = buf + 5; // Out-of-boundsI can read/write to any of
buf
, c1
, c2
,
or c3
without issue: but trying to read or write from
c4
or c5
will terminate my program with a
SIGPROT
.
We now have enough detail to see a high-level CHERI story come together for the first time:
- Programming language: CHERI C treats what are normally pointers as capabilities, but does so in a way that allows the vast majority of normal C to be compiled unchanged with CHERI C.
- Operating system:
malloc
returns capabilities with appropriately restricted bounds. - CPU: programs that do things that are potential or actual security problems
(which includes, but is not limited to, code with undefined behaviour) tend to
be terminated by a
SIGPROT
.
SIGPROT
— that might have caused a
denial-of-service, but that's a lot better than giving an attacker access to
your system.
What does CHERI protect us from?
Because CHERI can transparently protect us from many classic security flaws in C programs, it can end up seeming like magic. Indeed, I haven't even talked about the other protections we gain almost for free, such as preventing a program from executing code at arbitrary addresses. Part of the reason that CHERI can do so much for C (or, indeed, C++) programs is because C/C++ make little effort to protect the programmer from such mistakes, even though the necessary information to do so is already present in the source code. However, as soon as we want to enforce security properties that are not directly inferable from source code, we realise that we have to use CHERI explicitly rather than implicitly.For example, I've assumed in this post that
malloc
returns a capability whose bounds are restricted to the
size of the block I requested. This seemingly obvious property cannot, in
the innards of any realistic malloc
, be inferred from the source code:
the malloc
author
will have to explicitly use the C CHERI API (e.g. cheri_bounds_set
)
to ensure that the bounds are restricted to the appropriate memory range.
In some cases, such as malloc
, the bounds-related security properties we
want are fairly obvious and only require small tweaks to small sections of code
[11]. However, many security properties are either
harder to define and/or require many sections of code coordinating correctly
in order to enforce them.
For example, my program may have a secret that I want to hide away from most of the program so that it can't accidentally leak to the outside world. Capabilities seem perfect for this, because one of the fundamental properties of a capability system is that it restricts the reachable set of capabilities [12] at a given point in time (unlike a normal system where I can access nearly any part of virtual memory at any point). Indeed, CHERI gives me a number of exotic features such as sentries that allow me to enforce much more complex security properties.
While capabilities clearly have potential to make programs more secure in such cases, I'm sceptical that it will be something people want to do a great deal of. Capabilities are a dynamic (i.e. run-time) concept and humans are notoriously bad at reasoning about the dynamic (or "temporal") behaviour of programs. My guess is that programmers will often make mistakes and give overly-powerful capabilities to parts of a program that should not have access to them, undermining the expected security guarantees.
In a sense, the problem is more general: the more complex my use of
capabilities, the less confident I am that they will enforce the security
properties I care about. In most cases, capabilities
are used to stop bad things happening: a program being terminated by a
SIGPROT
generally indicates a bug, which will then surely be
fixed. That means that I'm left hoping that capabilities will catch the bad
cases that I haven't thought of. History suggests that if a programmer hasn't
thought of, or hasn't tested, something then it will not work as the programmer
would hope.
A related question is: what happens if I'm using a language that offers me more
protection than C/C++? For example, if I'm writing code in Python, or
(non-unsafe
) Rust, I can't be subject to buffer overruns [13]. Are bounds checks useful in such cases? As belt and
braces, yes, but capabilities are not cost-free — they take up more
memory than traditional pointers (e.g. placing more pressure on caches) and
checking their validity must surely have some sort of performance impact on the
CPU too. I might be willing to pay this cost on the small parts of a Rust
program that use unsafe
, but probably not elsewhere.
In summary, C/C++ programs have enough to gain from using capabilities everywhere that many users might choose to do so. However, for other languages, the situation is more complex: bounds checks are much less useful, but still impose a performance cost that people are probably unwilling to pay; and more complex uses of capabilities are hard to reason about. Interestingly, though, CHERI has another trick up its sleeve that I think is of much wider interest than is currently thought.
The hybrid CHERI story
Up until this point in this post I have explained CHERI as if it requires explicitly using 128-bit capabilities everywhere. CHERI has a second "mode" of operations where one can use both normal 64-bit pointers and 128-bit capabilities alongside each other. Conventionally, when a program uses capabilities everywhere it's said to be a purecap program; when it mixes capabilities and normal pointers it's said to be a hybrid program.
To many people's surprise, many current practical CHERI systems use hybrid mode in, at least, their OS kernels, because converting a large OS kernel to purecap is often too much work to contemplate. When the kernel interacts with a purecap user program, it must bridge between the hybrid and purecap worlds [14].
What's interesting about hybrid mode is that 64-bit pointer accesses are implicitly made against global 128-bit capabilities. The global capability I'll use as an example is the Default Data Capability (DDC). One way of thinking about how this works is as follows: if I'm on Morello, executing normal A64 (not C64!) code, and then execute a load with a 64-bit pointer, the CPU will derive a new capability from the DDC with the address of the pointer and then do all its normal capability checks.
Showing how this works is a bit awkward, but I hope the following slightly simplified code helps:
// Get two normal 64-bit pointers to chunks of memory char *b1 = malloc(4); char *b2 = malloc(4); // Restrict the (128-bit) DDC to only cover b1's block of memory void *__capability new_ddc = cheri_ddc_get(); new_ddc = cheri_address_set(new_ddc, b1); new_ddc = cheri_bounds_set(new_ddc, 4); cheri_ddc_set(new_ddc); printf("%c\n", b1[0]); // Succeeds printf("%c\n", b2[0]); // SIGPROTsThis chunk is written in "hybrid" CHERI C, which is most easily thought of as "normal" C with 64-bit pointers, with an additional
__capability
qualifier which denotes a 128-bit capability. We allocate two normal
blocks of memory (b1
and b2
), receiving back two
normal 64-bit pointers. We then read the DDC (which, by default, has bounds
from 0 to the maximum possible virtual memory address) into a capability,
restrict that capability to only cover b1
, and then set the
DDC to that new capability [15]. We can then read
and write via normal 64-bit pointers to b1
successfully
but as soon as we try and read to b2
our program is terminated
with a SIGPROT
[16].
Hybrid mode is considered passé in the CHERI world, but I see it as potentially interesting because by restricting the DDC, we implicitly define a sort-of "subprocess" within a process. Unix-esque processes offer simple, highly effective isolation but they are very heavyweight. Various sandboxing techniques have been developed over the years, but most are awkward to use and, often, easily bypassed. Hybrid "subprocesses" seem to me to be a new point in the design space.
Splitting programs up into multiple processes for security isn't a new idea. A handful of existing programs, most notably OpenSSH split, already do so ("privilege separation"): an attacker who takes over an individual process has a hard time exploiting the system beyond that process. However, intra-process communication is either comically slow (pipes) or difficult to use reliably (shared memory). CHERI's hybrid mode can still use capabilities, and those are not checked relative to the DDC: thus capabilities can be used to communicate between compartments with very little performance penalty.
Of course, hybrid "subprocesses" are not magic — they clearly provide less isolation than processes, and CHERI's hybrid support has received much less attention than purecap. Still, at the very least hybrid mode offers a very interesting alternative to a purecap world. I think it's fairly easy to imagine a future where software uses both processes and subprocesses to gain different security/performance trade-offs.
Summing up
There is a lot more I could have said: CHERI contains many, many details I haven't even hinted at; and there are certainly other ways of using CHERI than the two stories I've outlined. I think, though, that this post is long enough already!In terms of the two CHERI stories I've presented, it's worth being explicit that the purecap story is very much the standard CHERI story at the moment, with the hybrid story being more speculative. The very different implications each story has for today and tomorrow's software might give you some idea that CHERI is best thought of as a toolbox – and many tools within that toolbox have more than one possible use. Exactly which use(s) of CHERI will end up being most useful – which, to my mind, probably means "provides the best balance between security and performance" – remains unclear. And, of course, there are no guarantees that CHERI will enter the mainstream! But, whatever happens to CHERI, I think it is extremely helpful in opening our eyes to possibilities that most of us have previously not even considered.
Update (2023-07-05): I originally stated that all CHERI OS kernels were hybrid. I have been reliably informed that is no longer the case — there is a CheriBSD purecap kernel. CHERI Linux, though, is a hybrid-only kernel.
Acknowledgements: thanks to Jacob Bramley, Andrei Lascu, and Myoung Jin Nam for comments.
Footnotes
[1]Although it's difficult to say with absolute certainty, due to a lack of evidence, I think that any CHERI-esque system would require a similarly wide range of changes.[2]Capabilities, and capability aware hardware, are not new ideas (see e.g. the CAP or E). While it is possible that these ideas didn't hit the mainstream because they are flawed, it seems more likely to me that random chance meant they were not developed to the point that we could meaningfully evaluate them. Put another way, capabilities were, until CHERI, a "what if?".
[3]For simplicity's sake, I'm conflating "kernel" and "userland libraries" into "operating system". In reality – as is the case in CheriBSD today – I need the kernel to return an appropriate capability from
mmap
which the malloc
I'm using can then restrict
further.[4]Indeed, Morello reuses the same
BX
mnemonic that previous Arm
processors used to switch to / from normal ("A32") / Thumb ("T32") modes.
Similarly, like Thumb, Morello can also switch between instruction sets based
on a bit set/unset in branch instructions.[5]Well, mostly. As I've previously written about, real, though mostly historic, hardware has done all sorts of odd things with addresses and pointers.
[6]CHERI mostly calls this the "tag", or occasionally the "validity", bit.
[7]The physical location of the authenticity bit is a hardware-dependent detail. An excellent analogy I've heard is to think of the authenticity bit as being similar to ECC memory, where there are extra bits somewhere helping ensure the integrity of the memory but you don't know (and can't find out) where those bits are actually stored.
[8]On CHERI MIPS the address was 64 bits and the permissions 192 bits!
[9]Since a capability's permissions are only 64 bits, the bounds cannot precisely represent every possible range of memory. Modern CHERI systems thus use a clever encoding scheme that allows small bounds to be precisely represented, with larger bounds becoming progressively less precise. I can see this in action with the following code:
char *b1 = malloc(4); printf("%lu\n", cheri_length_get(b1)); char *b2 = malloc(4097); printf("%lu\n", cheri_length_get(b2)); char *b3 = malloc(16385); printf("%lu\n", cheri_length_get(b2));
cheri_length_get
is a CHERI C intrinsic (i.e. "builtin CHERI C
function") which asks the CPU to report how many bytes the capability can refer
to above its lower bound. [Note, since I've used cheri_length_get
my program can now only be compiled by CHERI C: it is no longer a valid "normal
C" program.] On Morello this prints 4, 4097, 16392 — note that that last
number has been rounded up from 16385! Not only are bounds imprecise, but
different CHERI CPUs round things up at slightly different points — on
CHERI RISC V the code prints out 4, 4104 (note: rounded up!), and 16392.
The reason I'm going into this in detail is because newcomers often check
small examples, see that their bounds are precisely represented, and
understandably, but incorrectly, generalise from there. Instead, CHERI C code
must account for the imprecision of capability bounds, which I think of as a
kind of alignment requirement. For example, memory allocators will probably
want to leave "blank" memory after a block if its bounds would otherwise
overlap with the next block.
[10]Perhaps surprisingly, an out-of-bounds capability is not necessarily
inauthentic. From the perspective of this blog post, we don't need to worry
about this detail other than to know that it's at least in part to cope with
C, which allows a pointer to point to the first byte past the end of a block
of memory.
[11]Though it turns out to be much more fiddly than one might expect. More on that
in a subsequent post.
[12]Capabilities form a graph, so if I have access to a capability C1 that
points to C2 then my reachable set is {C1, C2}. If a
capability C3 points to C1 but I still only have a reference
to C1, then my reachable set remains {C1, C2}.
[13]Such languages are often called "memory safe", though I find that a rather vague
term myself. For example, Python might be a "memory safe" language, but CPython
is written in the "memory unsafe" language C: bugs in the low-level code can
undermine high-level guarantees. We stand on the shoulders of footguns.
[14]As this might suggest, "purecap" and "hybrid" modes are separate compile-time
and run-time concepts. For this post, I'm going to ignore the difference.
[15]Oddly there isn't a cheri_ddc_set
in the CHERI API, but on Morello
we can define it thus:
void cheri_ddc_set(void *__capability cap) {
asm volatile("MSR DDC, %[cap]"
: : [cap] "C"(cap) : "memory");
}
[16]This fragment elides one important detail: the call stack isn't included in
our new DDC, so as soon as we read or write to it, the program will
SIGPROT
. In this example, calling printf
will fail.
I'll soon explain, at a high level, how to avoid this happening.
mmap
which the malloc
I'm using can then restrict
further.BX
mnemonic that previous Arm
processors used to switch to / from normal ("A32") / Thumb ("T32") modes.
Similarly, like Thumb, Morello can also switch between instruction sets based
on a bit set/unset in branch instructions.char *b1 = malloc(4); printf("%lu\n", cheri_length_get(b1)); char *b2 = malloc(4097); printf("%lu\n", cheri_length_get(b2)); char *b3 = malloc(16385); printf("%lu\n", cheri_length_get(b2));
cheri_length_get
is a CHERI C intrinsic (i.e. "builtin CHERI C
function") which asks the CPU to report how many bytes the capability can refer
to above its lower bound. [Note, since I've used cheri_length_get
my program can now only be compiled by CHERI C: it is no longer a valid "normal
C" program.] On Morello this prints 4, 4097, 16392 — note that that last
number has been rounded up from 16385! Not only are bounds imprecise, but
different CHERI CPUs round things up at slightly different points — on
CHERI RISC V the code prints out 4, 4104 (note: rounded up!), and 16392.
The reason I'm going into this in detail is because newcomers often check small examples, see that their bounds are precisely represented, and understandably, but incorrectly, generalise from there. Instead, CHERI C code must account for the imprecision of capability bounds, which I think of as a kind of alignment requirement. For example, memory allocators will probably want to leave "blank" memory after a block if its bounds would otherwise overlap with the next block.
cheri_ddc_set
in the CHERI API, but on Morello
we can define it thus:
void cheri_ddc_set(void *__capability cap) { asm volatile("MSR DDC, %[cap]" : : [cap] "C"(cap) : "memory"); }
SIGPROT
. In this example, calling printf
will fail.
I'll soon explain, at a high level, how to avoid this happening.