Can We Retain the Benefits of Transitive Dependencies Without Undermining Security?

Recent posts
Can We Retain the Benefits of Transitive Dependencies Without Undermining Security?
Structured Editing and Incremental Parsing
How I Prepare to Make a Video on Programming
pizauth: HTTPS redirects
Recording and Processing Spoken Word
Why the Circular Specification Problem and the Observer Effect Are Distinct
What Factors Explain the Nature of Software?
Some Reflections on Writing Unix Daemons
Faster Shell Startup With Shell Switching
Choosing What To Read

Blog archive

One of life’s great pleasures is trust: having confidence in another person to do the right thing warms the hearts of both parties. Despite the cynicism that we sometimes mistake for profundity, modern society would be impossible without a large degree of trust, including trust between people who don’t know each other.

However, we all know that trust has limits. I have a legal and moral right to leave my home for a week’s holiday, jam the doors wide open, pin a sign to the gate saying “back in a week”, and to expect the contents of my house to be untouched. I would be unwise, though, to trust that everyone will respect my rights. Some people, alas, spend their lives looking for opportunities to abuse other’s trust; some may only act when an “opportunity” confronts them and their willpower buckles. Either way, it is sensible for me to acknowledge this reality and to lock my door.

Just as with people, we place a great degree of trust in software, and different pieces of software place a great degree of trust in each other. I trust that, after I log into my bank account, my web browser won’t transfer my money behind my back; my web browser itself trusts an image processing library to decode arbitrary data from the internet without allowing an attacker to take over the computer; and so on.

In this post I’m going to argue that the growth in transitive dependencies in software is the equivalent of jamming our door open and hoping for the best — we are putting too much trust in things we don’t and can’t know in detail. However, I don’t think that the best long-term solution is to avoid transitive dependencies all together — we’re increasing our use of direct and indirect dependencies because it makes us more productive and our software better. Is it possible to get the advantages without the disadvantages?

Trust in modern software

Writing modern software increasingly involves the use of dependencies, that is other pieces of software [1]. For example, I “wrote” the software that produces the website you’re looking at right now by gluing together a number of Rust libraries. If those libraries hadn’t existed – and if cargo [2] didn’t make it so easy to use them – I doubt I’d have attempted to do so: it would have taken me too long [3], and the results too fragile, to be practical. Only by placing trust in those libraries – both in their overt “quality” and the motives of their authors – could I tackle and complete my task.

However, there is a problem: my trust in those direct dependencies ends up being transitively extended to their dependencies. My website software uses 20 libraries directly [4], but transitive dependencies mean that it ends up building 181 libraries! It’s easy for me to forget that the trust I place in those 20 direct dependencies is extended unchanged to the 161 indirect dependencies.

This is very unlike the real world, where trust decays exponentially the further it extends. If a good friend introduces me to one of their friends and says that they trust them 100%, I will immediately upgrade my level of trust in that person — but not to 100%. If that person immediately introduces me to another person, and repeats the 100% trust claim, I may well not upgrade my default trust at all.

In software, in contrast, a flaw – deliberate or otherwise – in a dependency of a dependency of a dependency of a dependency of a dependency is implicitly a flaw in my software, because I have to extend identical levels of trust to each part of the dependency chain equally.

Processes are our main line of defence

Our current software security model was designed for a more innocent age: a greater degree of trust and was assumed; and software was vastly smaller and more easily reasoned about.

For most of us, the fundamental aspect of our security model is the process [5], the runtime “cage” that an operating system like Unix will run a program in. Processes have been such a successful security abstraction that we often forget to think of them as an explicit concept: they enable us to place reliable, easily reasoned about, and impermeable boundaries between different pieces of software. It is really difficult for one process to subvert the security of another process [6].

However, within a process there is virtually no security. There are various aspects to this, but let’s consider just a process’s memory (i.e. the RAM it is using). Simplifying only slightly, every machine code instruction executed has the ability to read from, and write to, anywhere within a processes’ memory.

If the software building my website does something clever with passwords [7], any one of those 181 dependencies could decide that it will scan my processes’ memory for passwords, and send any it finds over the internet to a bad person.

Possible mitigations

Is there anything I can do to stop one “part” of a process doing something with another “part” that I would prefer it didn’t?

For example, let’s assume that my “threat model” for Rust code is that I consider anything which uses unsafe to be a security threat [8]. I could then try banning any direct or indirect dependency which uses unsafe, such that my dependencies really are guaranteed not to undermine security. However, since large parts of the Rust standard library use unsafe, would I ban libraries which call such functions? Depending on how strict my ban was, I may well find that there are virtually no dependencies I could actually use.

I could manually examine, and then whitelist, trustworthy Rust libraries but that’s incredibly difficult to do reliably, and what happens when a new version of a dependency is released — do I have to examine it again from scratch?

We very quickly see that a seemingly easy rule – ban use of unsafe – is impractical. Interestingly – and this is a problem with most such schemes – it seems I can never find the right balance between too little and too much trust.

There are, of course, many different approaches. For example, modern CPUs have started to provide additional intra-process defences. Most of these try to aid “control-flow integrity”, which aims to restrict the program to executing “machine code” only at the places the process put known, good machine code at. This makes it harder for an attacker to upload valid machine code to a process and then force the process to execute that code. It also makes it harder to exploit existing “gadgets” in a process’s binary. Control-flow integrity does improve security — but it wouldn’t mitigate the simple password stealing attack I outlined earlier.

Capability architectures

Capability architectures more actively try to “compartmentalise” processes in a wider sense. Probably the best known such architecture right now is CHERI, though there are other interesting designs, such as E and Fil-C [9].

Most of us tend to focus on “memory safety” when thinking about capability systems, that is preventing things like buffer overflows, use-after-frees, and so on. This is undoubtedly the major user case for such systems, and the sole focus of some. This is not a minor advantage: even today, too many security flaws (though many fewer than in the past [10]) result from mistakes made when programming in C/C++. By adding memory safety to memory unsafe languages, we make the use of those languages – and dependencies written in those languages! – harder to exploit.

However, capability systems – particularly CHERI – give us a tempting glimpse of a more advanced future. Using CHERI, one can create various flavours of ‘compartments’, that is different portions of a process which are at least somewhat prevented from attacking other portions of that process. There are different ways of doing this, which I’ve explored in more detail elsewhere, but the one most people will think of is to use pure capabilities.

In essence, capabilities in CHERI are double width pointers with fine-grained permissions: code can only access capabilities to which it is given permission. One can use this, for example, to lock code into a subset of a process, only able to escape via a single well-defined exit point. Exploring the possibilities is great fun, but the more sophisticated one’s compartment mechanism becomes, the more likely it is to be incomplete.

I have come to think that this style of compartmentalisation most useful for ensuring that cooperative (i.e. trusted) software doesn’t go wrong accidentally. Ensuring that actively malicious code doesn’t undermine the hoped-for security guarantees is much harder.

In particular, most compartments require very careful temporal [11] reasoning about a program, and this is something that us humans find difficult to do. A single mistake – and my compiler tells me that I can make low hundreds of obvious mistakes a day, so who knows how many non-obvious mistakes I make? – can accidentally gift a capability with unexpectedly high privileges. One can use systems like CHERI to raise the bar fairly high, but they might not be the right tool to raise the bar all the way.

The problem will only increase

One could argue that the password-scanning attack I outlined earlier will be so obvious to anyone looking at the code that it will soon be spotted. I agree — but attacks don’t have to be obvious to be successful. I know a lot of people who laughed at JavaScript for the leftpad incident, but I didn’t see anyone laugh at the more recent xz attack — it shocked many people, who had assumed that open-source software could not be exploited in such a manner.

Arguably all that such things are doing is forcing us to acknowledge the possibilities for bad things to occur. For example, one brave soul tried installing every Python library available, and found all sorts of unpleasant things happening to, and ending up on, their computer [12]. Or there was the time when some foolhardy researchers thought it would be a good idea to see how easily one could slip vulnerabilities into the Linux kernel without anyone noticing. Or, just yesterday, it turned out that an “AI” had cloned and then at least in part made worse a well-known Rust library — which had then become a direct dependency of 134 other libraries!

It’s not as if this is a recent idea. I would loved to have been present at Ken Thompson’s Reflections on Trusting Trust talk in 1984 [13], where an unsuspecting audience were led, step-by-step, through a series of innocuous observations until they slowly realised that Thompson had undermined the integrity of the Unix login program. That attack undermined the operating system, so it’s marginally different than the point I’m making in this post — but only marginally.

All this has led me, slowly and reluctantly, to the conclusion that our dependency-heavy approach to building software is fundamentally incompatible with security. I say this with great reluctance — I find it much easier to write large, reliable software than I did 10 years ago, and the quality and quantity of dependencies that is now available is a big part of that. However, I am painfully aware that this approach means that I’m taking on more risk than I should be comfortable with.

It is possible to dismiss this as a theoretical worry. Yes, someone could inject a security vulnerability into a library and use it for nefarious reasons later, but there’s no evidence for it, right?

I have little time for head-in-the-sand thinking like this. History is replete with examples of people who thought that they could avoid bad behaviour by others by asserting that it couldn’t happen — and to whom the bad behaviour later happened. Money spent on defence might seem wasted, but war is much more expensive.

I find it hard to believe that bad actors have seen things like the xz attack and not then thought “I could do something like that”. Indeed, I would be astonished if multiple such ‘bugs’ have not been inserted into various pieces of software over many years. Quantifying the existence of “secret attacks” is, of course, rather difficult: in general, all we can do is observe explicit attacks that were spotted in the wild.

Several factors are working against us. First, modern package systems – Rust’s cargo, Python’s pip, and so on – make publishing and using large quantities of dependencies ever easier. Second, our software continues to increases in size [14], and the larger it becomes, the easier it will become for problems to seep through the cracks. Third, we have surprisingly few security mechanisms beyond the Unix process.

What might we be able to do?

The most fundamental part of our current security model – the process – would be entirely recognisable to programmers from the late 1960s. Perhaps we stumbled across the perfect security abstraction so early in computing’s history, but such a belief requires more optimism than I can muster. The more dependencies we use within a single process, the less suitable the process is as a security mechanism.

However, stating that there is a problem that needs solving doesn’t mean that there is an obvious solution, or that any such solution is practical, or that there is even a “solution” at all. Sometimes we have to accept that all the trade-offs available to us are unpalatable.

With that qualification in mind, let me outline where I would one day like our software to go. I would like to run software, built from multiple components (i.e. dependencies of some kind), in such a way that:

  1. Components are isolated from each other as much as possible.
  2. Each component only has the minimum permissions it needs.

For example, I don’t want my image decoding component to have network access, or the ability to access RAM with passwords in; but I do want my network downloading component to have network access, and I do want to be able to create a component that can manage and use passwords.

What do I mean by “component”? I’ve deliberately introduced a new term to abstract away from some of the mental hang-ups that come with terms such as “dependency”, “library”, “process”, and the like.

Roughly speaking, I imagine a “component” to encompass both a static and dynamic thing: statically, a component corresponds to our notion of a “program”; dynamically, a component corresponds to our notion of a “process”.

In other words, I want to split software up into mutually distrusting dynamic “cells”, like processes, but with the ability to communicate more easily, frequently, and cheaply. The communications between dynamic components would need to be tightly specified, and if a component fails to communicate in exactly the required way, other components should ignore all interactions. Another way of looking at this is that it is a more rigorous enforcement of the age-old principle of least privilege: our current approach to software hands out far too many privileges to dependencies, and we need to rethink how we build software for this to become a thing of the past.

The general outline of what I’m suggesting has some obvious antecedents:

  • It is reminiscent of privilege separation as found in OpenSSH which splits single “static programs” into multiple dynamic processes. Compromising one process does not compromise another.
  • The actor model, which defines how interacting “things” [15] can communicate with each other. This is a fairly large umbrella term, ranging from languages such as Erlang to various libraries and frameworks; few have security as an explicit aim.

My guess is that the ideal solution is a combination of these two ideas. That doesn’t mean this is going to be easy: both solutions are well known, but – depending on how liberally you interpret “actor” – neither has quite taken over the world. There are a variety of reasons for that, some shallow, some deeper. Combining these two ideas will, I think pose two fundamental challenges:

  1. Performance. Components will have to be able to communicate efficiently and in a suitably restricted manner. IPC (Inter-Process Communication) on a typical Unix is something like 5-7 orders of magnitude slower than intra-process communication, which is far too slow for the component model. Unix-esque shared memory, though much faster, is far too difficult to use reliably for untrusted components.
  2. Expressivity. Different components will need to maintain as much of the ease of use of current-era libraries as possible, without descending into the horrors that RPC (Remote Procedure Call) tends to descend to.

Addressing the first of these points requires at least somewhat rethinking of hardware and operating systems; the second point requires rethinking programming languages. Neither will be easy — but both, I believe, are doable.

Indeed, there are plenty of clues that what I’m suggesting is doable: from reimagined operating systems (e.g. seL4) to reimagined CPUs (e.g. Arm’s Morello to the evolution of programming languages (from Erlang to Rust). Indeed, WebAssembly seems to be aiming for a version of componentisation, though I have the impression this is mostly aimed at lightweight compartmentalisation of small processes [16]. Put another way, I think this is a partial solution to point (1) above, but on its own it can’t address point (2).

There will definitely be challenges along the way. In particular, what do we do with all our existing software? Some sort of migration process is inevitable, but I think it unlikely that we can magically “upgrade” our existing software to the model I’m suggesting. Fundamentally, most software does not contain the information we would need to retrospectively make it really secure. Indeed, much software does the opposite, deliberately doing things that are insecure [17]. Realistically, we’d need a transition period where we would have to accept that “old style” software was not as secure as we would like.

I also think we will find opportunities in rethinking how we structure software: perhaps the componentisation I’m suggesting will allow us – finally! – to be able to meaningfully and consistently distribute computations across cores and machines.

Too much, not enough, or just right?

It is tempting to recoil from the change I’m proposing; to consider the problems I’ve highlighted not all that bad; or to say that it would have been good to do this 20 years ago, but it’s too late now. For longer than I should have, I have recoiled in the same way! But, ultimately, two questions have helped focus my mind:

  1. Will we come to view the proliferation of transitive dependencies as the point when we lost the ability to secure software?

Looking to the future, I find it very difficult to answer this in any other way than “yes”. Despite this depressing answer, I don’t think we want to turn the clock back to the time before the explosion in transitive dependencies: they have allowed us to give our software more features while making it cheaper to write and more reliable — quite the trio! Which brings us to the next question:

  1. Have we already written most of the software we will ever need, or does most of it remain to be written?

Imagine if, in 100 years from now, people were to look back to 2025: would they say “they already had nearly all the software tools that we’ve ever needed”? If the answer to that question turns out to be “yes” then it means that we’ve done a good enough job already, and we don’t need to rethink anything we’re currently doing. I do not find this plausible: my strong intuition is that most of the software we will ever need remains to be written.

Now, how this will happen is an open question. The software industry is more productive than ever, but arguably less imaginative. We’re mostly using operating systems that are almost exclusively recognisably 1960s/1970s in style, and programming languages and CPUs that are mostly recognisably 1970s/1980s in style. I hope that this reflects a current period of consolidation, and that it is not indicative of permanent stasis.

Indeed, I am hopeful that other imaginative, optimistic souls might take the plunge on ideas along the lines I’m suggesting. Perhaps just as likely they will imagine another different, but equally fundamentally rethought, model to how we can create software. Either way, anyone who tackles this problem will have my heartfelt gratitude: we need a better future for our software, one that can exploit the advantages of dependencies — without the downsides!

2025-01-28 11:15 Older
If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Footnotes

[1]

There are several kinds of dependencies, including but not only libraries. I’m not going to delve into all of these kinds, because all are subject to the same general forces I’m describing.

There are several kinds of dependencies, including but not only libraries. I’m not going to delve into all of these kinds, because all are subject to the same general forces I’m describing.

[2]

Rust’s package manager. I suspect that, for me, cargo is the biggest individual productivity boost when moving to Rust. The language has made me more productive too, but because cargo the tool has made publishing and using libraries easy, the wider Rust ecosystem of libraries is now more significant in productivity terms.

Rust’s package manager. I suspect that, for me, cargo is the biggest individual productivity boost when moving to Rust. The language has made me more productive too, but because cargo the tool has made publishing and using libraries easy, the wider Rust ecosystem of libraries is now more significant in productivity terms.

[3]

The generic part of the website builder is about 800LoC and there’s another 1100LoC for the parts specific to my site. To put that into perspective, the markdown parser I use is at least 13KLoc!

The generic part of the website builder is about 800LoC and there’s another 1100LoC for the parts specific to my site. To put that into perspective, the markdown parser I use is at least 13KLoc!

[4]

Including both Iffy’s dependencies and a couple more that are only relevant to this site.

Including both Iffy’s dependencies and a couple more that are only relevant to this site.

[5]

Many other security concepts can be boiled down to variants of processes. For example, I tend to think of virtual machines, in the sense of Docker et al., as providing isolation to a group of processes.

Many other security concepts can be boiled down to variants of processes. For example, I tend to think of virtual machines, in the sense of Docker et al., as providing isolation to a group of processes.

[6]

There are, of course, exceptions. Sometimes, operating systems or hardware have had flaws which allow one process to nobble another — but it’s rare. I’m also not considering denial of service attacks: on most Unices, a nefarious (or, more likely, incompetent) process can easily create a “fork bomb” which brings the whole system to a halt.

There are, of course, exceptions. Sometimes, operating systems or hardware have had flaws which allow one process to nobble another — but it’s rare. I’m also not considering denial of service attacks: on most Unices, a nefarious (or, more likely, incompetent) process can easily create a “fork bomb” which brings the whole system to a halt.

[7]

It doesn’t. Nothing clever, nothing to do with passwords.

It doesn’t. Nothing clever, nothing to do with passwords.

[8]

Because of hole(s) in the type system, one can also undermine Rust code in equivalent ways to unsafe without using unsafe. See this GitHub issue and the exploits one can build upon it. For the purposes of this post, I’m going to assume that such holes will eventually be closed.

Because of hole(s) in the type system, one can also undermine Rust code in equivalent ways to unsafe without using unsafe. See this GitHub issue and the exploits one can build upon it. For the purposes of this post, I’m going to assume that such holes will eventually be closed.

[9]

I expect to see more systems along these lines in the medium term, varying design factors such as “hardware or software?” and “more features or more security or more performance?” and so on.

I expect to see more systems along these lines in the medium term, varying design factors such as “hardware or software?” and “more features or more security or more performance?” and so on.

[10]

We sometimes underappreciate how many simple security mitigations included in modern operating systems have made it harder for programming flaws to result in security flaws.

We sometimes underappreciate how many simple security mitigations included in modern operating systems have made it harder for programming flaws to result in security flaws.

[11]

That is, reasoning about a program as it goes through various state changes. For example, in different states, it might require different permissions.

That is, reasoning about a program as it goes through various state changes. For example, in different states, it might require different permissions.

[12]

Admittedly this seems mostly to have been the result of build scripts. Still, those are a part of libraries – and a very obvious attack vector! And, more recently, it looks like the Python packaging community are trying to make this route harder to exploit.

Admittedly this seems mostly to have been the result of build scripts. Still, those are a part of libraries – and a very obvious attack vector! And, more recently, it looks like the Python packaging community are trying to make this route harder to exploit.

[13]

Given my age at the time, it is unlikely I would have been present. And, if by some miracle I had been, I think it unlikely that I would have fully understood the content.

Given my age at the time, it is unlikely I would have been present. And, if by some miracle I had been, I think it unlikely that I would have fully understood the content.

[14]

As two random examples, Chromium is something like 35MLoC (Million Lines of Code) and the Linux kernel heading towards 30MLoC. I don’t know if there is yet a 100MLoC system, but if such a thing does not yet exist, I would guess it will do so in not too many years from now.

As two random examples, Chromium is something like 35MLoC (Million Lines of Code) and the Linux kernel heading towards 30MLoC. I don’t know if there is yet a 100MLoC system, but if such a thing does not yet exist, I would guess it will do so in not too many years from now.

[15]

These are often called processes, but in most actor systems they don’t correspond to a Unix “process”: they tend to correspond to intra-process things.

These are often called processes, but in most actor systems they don’t correspond to a Unix “process”: they tend to correspond to intra-process things.

[16]

These were originally called nanoprocesses. I think that name has fallen out of use now, though I am not entirely sure.

These were originally called nanoprocesses. I think that name has fallen out of use now, though I am not entirely sure.

[17]

Mostly for performance reasons. Many such reasons are of questionable relevance on modern hardware.

Mostly for performance reasons. Many such reasons are of questionable relevance on modern hardware.

Comments



(optional)
(used only to verify your comment: it is not displayed)