Laurence Tratt: The Missing Level of Abstraction?

Levels of abstractions

I feel fairly confident in stating that everyone who is familiar with the details of computing will have encountered the phrases high level of abstraction and low level of abstraction - probably rather often. Abstraction is one of those words which is used rather more frequently than it is considered. In short, an X that is an abstraction of Y implies three things in our context:

That X is at a higher level of abstraction than Y (alternatively one could say that X is more abstract than Y).
That X does not introduce anything fundamentally new over Y; indeed, it may well remove fundamental things, or present them in an easier fashion.
Assuming that one does not need any of the fundamental things in Y that may have been lost in the abstraction X, then X is in some sense easier to use than Y.

Abstractions abound in computing, as they do in life in general. To take a simple example, the first computers were programmed in machine code (zeros and ones); the second generation, in assembly code which easily translated into zeros and ones, but provided an easier syntax for humans; the third generation and beyond, in languages such as C whose translation into machine code became increasingly complex. Though they are often nebulous and, indeed, often rather hard to spot and understand, abstractions are what make modern computing possible; life without them would be like trying to walk from Lands End to John o’ Groats with ones eyes pointed downwards and half an inch above the ground.

Abstractions are typically relative things. From the statements X is at a higher level of abstraction than Y and Y is at a higher level of abstraction than Z we can deduce that X is at a higher level of abstraction than Z, but we can not say that Z is the lowest level of abstraction of all - it may well be at a higher level of abstraction than some other thing of which we are currently unaware.

Regrettably, my professional life has taught me that the notion of relative levels of abstraction is not universally shared, probably because it is harder to understand than the notion of absolute levels of abstraction. This fallacy is commonplace; I first encountered it in the MDA world where the terms PIM (Platform Independent Model) and PSM (Platform Specific Model) are widely, and incorrectly, abused to imply absolute levels of abstraction. Similarly one often hears talk of high-level versus low-level languages as if they were absolutes when they clearly are not. Provided one bears in mind that these notions are always relative and that omitting the relative qualification is a simple brevity aid, then many things make a good deal more sense.

Objective-C

I have recently been doing some programming in Objective-C for reasons that most people can probably easily guess. Learning a new language is rarely a wasted opportunity, and this has certainly been an interesting experience. As I have noted before, I have a definite soft spot for C as a language. If it wasn’t for its brain-dead approach to arrays (which don’t know their size, and therefore can’t automatically resize themselves, bloating code and causing untold bugs) and, to a lesser extent, the baroque system of standard types that has accreted during decades of porting to different platforms (and consequent misunderstandings), I would not have a bad word to say about it. This praise is of course conditional on the fact that C has a particular niche: it is commonly called a low-level language, because it is little more than a slim layer above assembly code.

Objective-C, alas, I find a more difficult language to like. In small part this is because it generally implies (and did in my case) working under OS X, an operating system beloved of those who do not truly love computers - its idiot-proof lack of customisation and innumerable small bugs came close to breaking me. Objective-C grafts higher-level Smalltalk-like features onto a lower-level C base. The resulting language is not only more verbose than either of its parents - I particularly enjoyed needing to type class attributes into three different places - but distorts many of their defining features. For example, Smalltalk’s collection hierarchy (i.e. lists, sets etc.) is a thing of genuine beauty and utility, allowing a syntactically minimalist language to succinctly express complex constraints; since Objective-C doesn’t allow blocks (small anonymous functions), not to mention that its collection hierarchy is not as well designed, this is impossible. Another aspect is that C is a statically (if weakly) typed language, catching errors at compile-time that would otherwise cause hard to debug segfaults; however Objective-C’s message sending is, like Smalltalk, dynamically typed. In Smalltalk this is not an issue - type errors are simply triggered, lead to predictable backtraces, and are easily fixed. While some dynamic typing errors in Objective-C lead to similar backtraces, others fall foul of C’s free-wheeling approach to memory management and cause horrible run-time results, leading to little more than a splat sound; debugging those is a chore.

Memory management

One of C’s many lower-level delights is its approach to memory management: put simply, the user is responsible for all memory allocation and deallocation. The virtue of C’s approach is that it’s simple to understand (for users) and implement (for compiler and library writers). The disadvantage is that it’s easy to use incorrectly; in particular, it’s very easy to forget to free memory, causing hard to debug memory leaks. A less common, but more dangerous, error is to try and free an already freed chunk of memory (the so-called double free). It is thus reasonable to say that C’s memory management is low-level. In contrast, high-level memory management has, in my mind, been largely synonymous with garbage collection, which automatically frees memory (and implicitly prevents double frees). Garbage collection is not without some negative implications - for example, it is notoriously difficult to work out when garbage collection pauses will strike - but overall is generally a good thing. Indeed, garbage collection is now largely ubiquitous outside of systems and embedded programming; anyone weened on Java (or even much older languages such as Lisp and Smalltalk) will know nothing else.

As I understand things, modern Objective-C under OS X uses garbage collection. On some other platforms, such as the one I was targeting, there is no garbage collection. I initially assumed that in such cases Objective-C would revert to a standard C-style system of malloc and free - to my surprise, it does not. Instead, Objective-C uses an unusual system of sometimes-implicit, sometimes-explicit memory management. For someone used to traditional high-level garbage collection or low-level C-style memory management, it’s rather hard to get your head around: there don’t seem to be hard and fast rules; some of the documentation is ambiguous about the users responsibilities; and there is a good deal of obfuscating cruft which I assume has gathered over time. However, the basic idea is as follows. If a user explicitly allocates memory (via an alloc message), he is responsible for freeing it. If another function allocates memory, it will (or, more accurately, should) be put into a memory pool; when the memory pool is freed, the objects in it are freed too. New memory pools can be created, and they are kept in a stack so objects are put into the most recent memory pool.

Memory pools are an attempt to allow implicit memory allocation (something which is, for good reason, deeply frowned on in traditional C libraries) with implicit memory deallocation. In practice, they resemble a poor-mans attempt at garbage collection or, perhaps, garbage collection as designed by someone who hadn’t fully grasped the idea. For the life of me, I can not fully prevent memory leaks in a medium-sized Objective-C system; ironically, I have found it much easier to debug memory leaks in pure C applications. Different parts of code can retain and release objects, which I assumed was a synonym for reference counting, but in reality doesn’t always seem to quite work: what, according to the documentation, are valid combinations of calls can cause crashes or leaks. Was this due to bugs in my code, someone else’s, or a flaw in the memory management design? Who knows - at some point, I got the leaks down to the level of a few bytes a minute and gave up.

The middle-level of abstraction

A good question at this point is: what does all this have to do with abstractions? Well, the two examples above are instances of something that I’ve seen once or twice before, but which is hardly common. In normal computing conversation, C is at a low-level of abstraction, and Smalltalk is at a high-level. As I wrote earlier, levels of abstraction are relative, but that doesn’t mean we can’t talk about the gaps between two levels. Objective-C, which is a child of these two languages, explicitly pitches itself as being the middle-level of abstraction between C and Smalltalk. Similarly, its memory management is the middle-level of abstraction between malloc/ free and garbage collection.

The interesting thing to me is not that Objective-C is the middle-level of abstraction between C and Smalltalk as such - after all, given two points on the abstraction scale, there’s a high chance that there are other levels in-between - but that it appears to me to have been deliberately designed to slot into this place. This made me wonder about two things. First, why are there not more systems designed to slot between two existing levels of abstractions? Second, why have I never heard the term middle-level of abstraction before?

A couple of answers immediately suggested themselves to me; no doubt others can think of better ones. Perhaps systems designed to slot between two existing levels are more likely than not (as Objective-C) to be worse than the things either side of it? Alternatively, perhaps under normal circumstances we only need to think of one higher level of abstraction thing and one lower-level thing in order to work satisfactorily, and what comes in the middle is generally irrelevant? Whatever the reason may be, I suspect we’re unlikely to ever hear many uses of the term middle-level of abstraction - it may remain forever the missing level of abstraction.

Newer 2009-09-15 08:00 Older

If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

The Missing Level of Abstraction?

Blog archive

Levels of abstractions

Objective-C

Memory management

The middle-level of abstraction

Comments