The Missing Level of Abstraction?

[RSS feed]
 

September 15 2009

Levels of abstractions

I feel fairly confident in stating that everyone who is familiar with the details of computing will have encountered the phrases high level of abstraction and low level of abstraction - probably rather often. Abstraction is one of those words which is used rather more frequently than it is considered. In short, an X that is an abstraction of Y implies three things in our context:
  1. That X is at a higher level of abstraction than Y (alternatively one could say that X is more abstract than Y).
  2. That X does not introduce anything fundamentally new over Y; indeed, it may well remove fundamental things, or present them in an easier fashion.
  3. Assuming that one does not need any of the fundamental things in Y that may have been lost in the abstraction X, then X is in some sense easier to use than Y.
Abstractions abound in computing, as they do in life in general. To take a simple example, the first computers were programmed in machine code (zeros and ones); the second generation, in assembly code which easily translated into zeros and ones, but provided an easier syntax for humans; the third generation and beyond, in languages such as C whose translation into machine code became increasingly complex. Though they are often nebulous and, indeed, often rather hard to spot and understand, abstractions are what make modern computing possible; life without them would be like trying to walk from Lands End to John o' Groats with ones eyes pointed downwards and half an inch above the ground.

Abstractions are typically relative things. From the statements X is at a higher level of abstraction than Y and Y is at a higher level of abstraction than Z we can deduce that X is at a higher level of abstraction than Z, but we can not say that Z is the lowest level of abstraction of all - it may well be at a higher level of abstraction than some other thing of which we are currently unaware.

Regrettably, my professional life has taught me that the notion of relative levels of abstraction is not universally shared, probably because it is harder to understand than the notion of absolute levels of abstraction. This fallacy is commonplace; I first encountered it in the MDA world where the terms PIM (Platform Independent Model) and PSM (Platform Specific Model) are widely, and incorrectly, abused to imply absolute levels of abstraction. Similarly one often hears talk of high-level versus low-level languages as if they were absolutes when they clearly are not. Provided one bears in mind that these notions are always relative and that omitting the relative qualification is a simple brevity aid, then many things make a good deal more sense.

Objective-C

I have recently been doing some programming in Objective-C for reasons that most people can probably easily guess. Learning a new language is rarely a wasted opportunity, and this has certainly been an interesting experience. As I have noted before, I have a definite soft spot for C as a language. If it wasn't for its brain-dead approach to arrays (which don't know their size, and therefore can't automatically resize themselves, bloating code and causing untold bugs) and, to a lesser extent, the baroque system of standard types that has accreted during decades of porting to different platforms (and consequent misunderstandings), I would not have a bad word to say about it. This praise is of course conditional on the fact that C has a particular niche: it is commonly called a low-level language, because it is little more than a slim layer above assembly code.

Objective-C, alas, I find a more difficult language to like. In small part this is because it generally implies (and did in my case) working under OS X, an operating system beloved of those who do not truly love computers - its idiot-proof lack of customisation and innumerable small bugs came close to breaking me. Objective-C grafts higher-level Smalltalk-like features onto a lower-level C base. The resulting language is not only more verbose than either of its parents - I particularly enjoyed needing to type class attributes into three different places - but distorts many of their defining features. For example, Smalltalk's collection hierarchy (i.e. lists, sets etc.) is a thing of genuine beauty and utility, allowing a syntactically minimalist language to succinctly express complex constraints; since Objective-C doesn't allow blocks (small anonymous functions), not to mention that its collection hierarchy is not as well designed, this is impossible. Another aspect is that C is a statically (if weakly) typed language, catching errors at compile-time that would otherwise cause hard to debug segfaults; however Objective-C's message sending is, like Smalltalk, dynamically typed. In Smalltalk this is not an issue - type errors are simply triggered, lead to predictable backtraces, and are easily fixed. While some dynamic typing errors in Objective-C lead to similar backtraces, others fall foul of C's free-wheeling approach to memory management and cause horrible run-time results, leading to little more than a splat sound; debugging those is a chore.

Memory management

One of C's many lower-level delights is its approach to memory management: put simply, the user is responsible for all memory allocation and deallocation. The virtue of C's approach is that it's simple to understand (for users) and implement (for compiler and library writers). The disadvantage is that it's easy to use incorrectly; in particular, it's very easy to forget to free memory, causing hard to debug memory leaks. A less common, but more dangerous, error is to try and free an already freed chunk of memory (the so-called double free). It is thus reasonable to say that C's memory management is low-level. In contrast, high-level memory management has, in my mind, been largely synonymous with garbage collection, which automatically frees memory (and implicitly prevents double frees). Garbage collection is not without some negative implications - for example, it is notoriously difficult to work out when garbage collection pauses will strike - but overall is generally a good thing. Indeed, garbage collection is now largely ubiquitous outside of systems and embedded programming; anyone weened on Java (or even much older languages such as Lisp and Smalltalk) will know nothing else.

As I understand things, modern Objective-C under OS X uses garbage collection. On some other platforms, such as the one I was targeting, there is no garbage collection. I initially assumed that in such cases Objective-C would revert to a standard C-style system of malloc and free - to my surprise, it does not. Instead, Objective-C uses an unusual system of sometimes-implicit, sometimes-explicit memory management. For someone used to traditional high-level garbage collection or low-level C-style memory management, it's rather hard to get your head around: there don't seem to be hard and fast rules; some of the documentation is ambiguous about the users responsibilities; and there is a good deal of obfuscating cruft which I assume has gathered over time. However, the basic idea is as follows. If a user explicitly allocates memory (via an alloc message), he is responsible for freeing it. If another function allocates memory, it will (or, more accurately, should) be put into a memory pool; when the memory pool is freed, the objects in it are freed too. New memory pools can be created, and they are kept in a stack so objects are put into the most recent memory pool.

Memory pools are an attempt to allow implicit memory allocation (something which is, for good reason, deeply frowned on in traditional C libraries) with implicit memory deallocation. In practice, they resemble a poor-mans attempt at garbage collection or, perhaps, garbage collection as designed by someone who hadn't fully grasped the idea. For the life of me, I can not fully prevent memory leaks in a medium-sized Objective-C system; ironically, I have found it much easier to debug memory leaks in pure C applications. Different parts of code can retain and release objects, which I assumed was a synonym for reference counting, but in reality doesn't always seem to quite work: what, according to the documentation, are valid combinations of calls can cause crashes or leaks. Was this due to bugs in my code, someone else's, or a flaw in the memory management design? Who knows - at some point, I got the leaks down to the level of a few bytes a minute and gave up.

The middle-level of abstraction

A good question at this point is: what does all this have to do with abstractions? Well, the two examples above are instances of something that I've seen once or twice before, but which is hardly common. In normal computing conversation, C is at a low-level of abstraction, and Smalltalk is at a high-level. As I wrote earlier, levels of abstraction are relative, but that doesn't mean we can't talk about the gaps between two levels. Objective-C, which is a child of these two languages, explicitly pitches itself as being the middle-level of abstraction between C and Smalltalk. Similarly, its memory management is the middle-level of abstraction between malloc/free and garbage collection.

The interesting thing to me is not that Objective-C is the middle-level of abstraction between C and Smalltalk as such - after all, given two points on the abstraction scale, there's a high chance that there are other levels in-between - but that it appears to me to have been deliberately designed to slot into this place. This made me wonder about two things. First, why are there not more systems designed to slot between two existing levels of abstractions? Second, why have I never heard the term middle-level of abstraction before?

A couple of answers immediately suggested themselves to me; no doubt others can think of better ones. Perhaps systems designed to slot between two existing levels are more likely than not (as Objective-C) to be worse than the things either side of it? Alternatively, perhaps under normal circumstances we only need to think of one higher level of abstraction thing and one lower-level thing in order to work satisfactorily, and what comes in the middle is generally irrelevant? Whatever the reason may be, I suspect we're unlikely to ever hear many uses of the term middle-level of abstraction - it may remain forever the missing level of abstraction.

Follow me on Twitter @laurencetratt

Link to this entry

 

All posts

 

Last 10 posts

An editor for composed programs
The Bootstrapped Compiler and the Damage Done
Relative and Absolute Levels
General Purpose Programming Languages' Speed of Light
Another Non-Argument in Type Systems
Server Failover For the Cheap and Forgetful
Fast Enough VMs in Fast Enough Time
Problems with Software 3: Creating Crises Where There Aren't Any
Problems with Software 2: Failing to Use the Computing Lever
Problems with Software 1: Confusing Problems Whose Solutions Are Easy to State With Problems Whose Solutions Are Easy to Realise
 
 

DSLs

Tony Clark
Zef Hemel
 

Modelling

Mark Delgano
Steven Kelly
Jim Steel
 

OS

Marc Balmer
Ross Burton
Peter Hansteen
OpenBSD Journal
Ted Unangst
 

Programming

Peter Bell
Gilad Bracha
Tony Clark
Cliff Click
William Cook
Jonathan Edwards
Daniel Ehrenberg
Fabien Fleutot
Martin Fowler
John Goerzen
Grace
James Hague
James Iry
JOT
Ralf Laemmel
Lambda the Ultimate
Daniel Lemire
Michael Lucas
Bertrand Meyer
Keith Packard
Havoc Pennington
Brown PLT
John Regehr
Software Engineering Radio
Diomidis Spinellis
Shin Tai
Markus Voelter
Phil Wadler
Russel Winder
Steve Yegge