Some Lessons Learned from Icon

[RSS feed]
 

December 3 2007
Updated: December 10 2007
Updated: October 26 2010

Updated (October 26 2007): Before reading this article, please be aware that it has been superseded by the paper Experiences with an Icon-like expression evaluation system.

One of Converge's lesser known influences is Icon. Although it's a relatively obscure language itself, a surprising number of people have heard of Icon's direct predecessor SNOBOL. I first heard of Icon through Tim Peters and Python, where Icon was the inspiration for Python's generator feature (basically procedures that whose execution can be resumed to produce multiple values). However Icon has a much richer, and definitely unique, approach to expression evaluation than any other imperative programming language. When I started working on Converge, I decided to go back to the source, and see how much of Icon's unique approach I could incorporate into Converge.

This article is a personal look back at Icon's influence on Converge, and the successes and failures resulting from that influence. It's quite probable that much of it reflects my slightly superficial understanding of Icon - apart from its innovative approach to expression evaluation, much of the language has an old fashioned feel to it, sometimes appearing like a dynamically typed version of Pascal, and this prevented me from ever wanting to write anything large in it. This isn't a criticism of Icon as such - it was ahead of its time in many ways - but is merely an observation that modern programmers expect something slightly different from their programming languages. Hopefully, even with the above caveat, this article contains something of value to those interested in programming languages.

Why is Icon interesting

In a nutshell, Icon is interesting due to what it calls goal-directed evaluation. This is an evaluation mechanism which gives to an imperative programming language some of the flexibility more often associated with languages like Prolog, such as a limited form of backtracking. It is built around the concepts of success and failure, and generators (see the Converge documentation for more details). Expressions can succeed or fail; if they succeed, they produce a value; if they fail, they do not produce a value and transmit failure to their containing scope. Generators are functions which can produce more than one value.

Icon's syntax is derived from Algol, so the following complete program prints 1 to 9 inclusive:

procedure upto(x)
  i := 0
  while i < x do {
    suspend i
    i := i + 1
  }
end

procedure main()
  every x := upto(10) do write(x)
end
Most of this is probably reasonably intuitive, apart from the every and suspend keywords. every can be considered to be equivalent to the typical understanding of a for statement (indeed, in Converge, the same construct is called for). suspend is equivalent to yield in Python or Converge, returning a value, but allowing the procedure calls execution to be resumed directly after the suspend statement.

Failure and if: a beautiful combination

If I had to pick one thing from Icon that I would not be without in Converge, it would be the concept of failure in if statements. There is something beautiful about saying if this function succeeded, assign the value to this variable and then do xyz. This idiom is used to build a very useful convention that is present throughout Converge's libraries. As in most languages, container items such as dictionaries have a get function which returns the value associated with a certain key; geting a key which doesn't exist generates an error. There is a very common variation on this use case, which is to check whether a dictionary has a certain key, and if it does to do something with its value; otherwise a different code path is taken. In most languages this idiom is expressed roughly as follows:
d := Dict{"a" : 2, "b" : 8}
if d.has_key("a"):
  Sys::println(d["a"])
Not only is the double lookup of a an eyesore, and a maintenance accident waiting to happen, but it can be a significant overhead in tight loops. In Python (and probably other languages), it's common to see the following idiom:
d := {"a" : 2, "b" : 8}
try:
  v = d.has_key("a")
  print v
except KeyError:
  pass
This idiom makes use of the fact that it's nearly always quicker to catch an exception if a key isn't found than to perform two lookups. In my opinion, this idiom is far from pleasant, for reasons that I'm sure most readers will intuitively understand.

In Converge one can express this common idiom thus:

x := Dict{"a" : 2, "b" : 8}
if v := x.find("a"):
  Sys::println(v)
In this example, v only gets assigned a value if find succeeds. There are some other advantages to this approach (e.g. there's no fiddling around checking for null), but the symmetry between get and find, and the consistency of this convention across libraries has proved a real success in Converge. For me, this feature alone justifies the Icon experiment in Converge.

Variables should not spring into existence with a default value

The idiom mentioned in the previous section is, unfortunately, somewhat dangerous in Icon itself, as all variables have a default value of sorts. What this means is that the following Icon code runs without error:
procedure main()
  if 1 < 0 then x := 0
  write(x)
end
To me, this is odd, because x never has a value assigned to it (as the expression in the if statement is clearly false); any errors resulting from such a mistake turn out to be very difficult to debug. In Converge I therefore made it so that reading from a currently unassigned variable raises an error, which protects against programming slips such as the following:
x := Dict{"a" : 2, "b" : 8}
if v := x.find("c"):
  ...
else:
  Sys::println(v)
As the lookup of c fails, v doesn't have a value assigned to it, so reading from v in the else branch of the if statement raises an Unassigned_Var_Exception, pinpointing the offending code.

Procedures shouldn't fail by default

In Icon, the default return value of a procedure is failure. This makes quite a lot of sense for generators such as the following:
procedure upto(x)
  i := 0
  while i < x do {
    suspend i
    i := i + 1
  }
end
As this suggests, the typical action for a generator is (via a loop) to generate several values; once that loop is finished, the generator fails. Thus each Icon procedure effectively has return &fail at its end. Very early versions of Converge inherited this behaviour.

What became quickly apparent is that while this is reasonable behaviour for generators, it can be a disaster for normal procedures. Look at this code and see if you can work out what will happen when it's executed:

procedure f(x)
  if x > 0 then return 1
end

procedure main()
  write(f(-1))
end
If you guessed that nothing would be printed to screen, congratulations. If you didn't, then don't worry, because this frequently surprised me. This happens since the failure of the if's condition means that the procedure executes its default return action, which is to fail. Thus the procedure doesn't return a value, and the call to write is never even evaluated.

Although this might seem trivial, the problem becomes significantly worse when it's embedded in a large body of code. f is an example of what I informally think of as the dangling if return problem - in other words, it's quite easy to write a procedure where different values should be returned, but where one branch of an if incorrectly neglects to actually return anything. I find this occurs quite often during the early stages of development when one is fleshing out functions. On several occasions I spent several hours tracking down instances of this problem.

The question I asked myself was whether this feature was worth the pain. A quick analysis of real world code quickly showed that the vast majority of functions aren't generators, and indeed only ever return a single value. Therefore optimising for the exceptional case (generators) is an odd design choice.

I considered two solutions to the problem. First, one could syntactically differentiate generators and procedures, with the former failing by default and the latter not. Second, one can make all procedures not fail by default. Adding new syntax is a decision that should never be taken lightly, and since generators aren't that common, I couldn't justify the added conceptual complexity. Therefore Converge functions became similar to Python functions, with their default return action being to return the null object.

Generators are useful, but easily hidden

Generators are an integral part of Icon and Converge, but as well as having no specific syntax to define them, there is no specific syntax to call them. Assuming that g is a generator, I have grown quite fond of the following idiom:
l := []
...
for l.append(g())
which appends every element generated by g to l. However the problem in such cases is that it can be difficult to work out what part of the expression is the generator. Converge now tries to solve this problem by prefixing generator function names, whenever it makes sense, with iter_ which highlights that the function in question is a generator. This makes it easier to look at a piece of code and work out what it does without having to understand every function called.

Backtracking is rarely useful

One of the major features of goal-directed evaluation is that backtracking can be performed. The most common way for this to be achieved is by linking expressions together with &. The resulting conjunction of these expressions only succeeds if all of its component expressions succeed. The expressions are evaluated in order. If one of them fails, and a previous expression is a generator, then backtracking occurs. The previous generator is resumed to produce a new value, and the conjunction then resumes execution from that point.

Combining conjunction with for allows compact expressions such as the following to be expressed:

x := "abcabaacvcabcbab"
for Sys::println(i := 0.iter_to(x.len()) & i % 2 == 0 & x[i] == "a" & i)
This prints out every index position in a which is a multiple of two, and which contains the character a (0 6 10 14 for those who are interested).

There are two problems with the above. Most obviously, it is incredibly difficult to read from a human perspective; indeed, it would be far better written as a couple of if statements within a for body. The second problem is that the form of backtracking used is often too weak to be useful. Icon allows variables in conjunctions to have assignments undone during backtracking, but even this isn't really enough (I didn't even bother implementing such functionality in Converge, because I couldn't really see its use). Full backtracking often involves undoing assignments within objects and so on, and no imperative language can ever hope to do this automatically.

Out of all the systems I've written in Converge, only one has made any practical use of the built-in backtracking, and even then it needed some manual help to be truly useful.

The fail variable causes bizarre behaviour

One way in which Icon shows its age is its differentiation between values and references. In keeping with most object orientated languages, Converge makes no such distinction to the user (although within the VM it does so for efficiencies sake). One of Icon's main oddities is that fail is a synonym for return &fail, while &fail is a sort-of reference to an invisible object.

Converge simplified this, so that fail was a variable pointing to a semi-special object (in similar fashion to the null object). However the fail variable proved, in admittedly rare cases, to be conceptually troubling. It's troubling in Icon too, although in slightly different ways. What do you think happens in Icon if I define l as the following and then iterate over each of its elements and print them out?

l := [1, &fail, 3]
If you guessed that an error is raised because l didn't get assigned a value (because evaluating &fail cause the whole thing to fail), pat yourself on the back. Now try this:
l := []
put(l, 1)
put(l, &fail)
put(l, 3)
If you guessed that 1, then 2 get printed out, you're doing very well. And finally consider this:
l := []
put(l, 1)
t := &fail
put(l, t)
put(l, 3)
If you guessed that 1, then a blank line, and then 2 get printed out, you're doing much better than I ever managed to.

Converge's attempts to simplify thus were somewhat successful, but left one hole. While one could successfully evaluate the following list in Converge:

l := [1, fail, 2]
the following mysteriously printed nothing:
Sys::println(l[1])
since l[1] is a synonym for l.get(1), which of course tried to evaluate return fail, which then caused get to fail. While this might seem a contrived example, it unavoidably cropped up when getting a module to return the value of a specific definition, since one of those definitions was fail.

fail was edged out of Converge in stages. A while ago, the recommendation became that the only safe idiom involving fail was return fail. Perhaps surprisingly, this was not an onerous restriction. Nevertheless it still failed to prevent the problem of fail being a definition in a module. Therefore the fail variable was finally banished entirely from Converge recently (although it lives on internally in the VM). fail is now an expression in Converge's grammar, and is equivalent to return &fail in Icon. Thus finally, all of the bizarre behaviour associated with the fail variable is forgotten, and I now sleep easier at night.

Do something different

When I was looking for information in Icon, I stumbled across a few interviews with Ralph Griswold (Icon's main designer), and a paper on its history. One thing that became clear to me was that a major design goal of Icon was not to be just another (boring) synthesis of a few existing language features - like the vast majority of programming languages - but to try out genuinely new things. Have a look at e.g. p38 of this interview or p6 and 13 of this paper looking back at Icon's history. Whatever you might think of Icon, it undoubtedly fulfilled this aim: its expression evaluation system alone has no precedent in contemporary programming languages (and there are other interesting parts of the language).

When I was starting Converge, I agreed whole heartedly with this aim, but had no idea how to achieve it. Indeed, when Converge started, I only had the vague intention of chucking a few seemingly useful influences into the melting pot (initially Python, Icon, ObjVLisp, and ultimately Template Haskell when I'd got the mix of the former three languages about right). I had that vague feeling that either everything interesting had been done before, or that the remaining interesting things were outside of my grasp. To be honest, I sort of gave up on the aim of being innovative, and just concentrated on trying to mix Converge's influences together in a coherent way. What has been interesting is that as Converge has developed, new challenges have presented themselves, and I've had to find answers (some better than others) to such challenges. And in so doing, I think Converge has grown some genuinely innovative features, such as DSL blocks, and its approach to error information for macros amongst others. And in Converge's case - unlike Icon, as I understand it - the journey isn't over. Indeed, since I think Converge is currently nearing the end of the beginning, there's bound to be new challenges ahead, some of which will lead to new innovations of one sort or another, some ultimately successful, some ultimately not.

When I read about Icon's aim, and looked at the end result, I assumed that the innovative ideas in Icon had sprung out of nowhere. I, on the other hand, had no idea how to bring such ideas, fully formed, into the world. What's become clear to me is a restating of Edison's famous rule. Innovation is often a matter of perspiration over inspiration. Now my advice to those who wish to do innovative work is simple: if you put the miles in, you'll raise your head one day and discover that you're in a place where no-one else has ever found themselves in before. It's a good feeling, and one that benefits us all, as collectively we gradually discover what works and what doesn't.

Updated (December 10 2007): Used a clearer example in the Failure and if: a beautiful combination section.

Updated (October 26 2007): This article has been superseded by the paper Experiences with an Icon-like expression evaluation system.

Follow me on Twitter @laurencetratt

Link to this entry

 

All posts

 

Last 10 posts

The Bootstrapped Compiler and the Damage Done
Relative and Absolute Levels
General Purpose Programming Languages' Speed of Light
Another Non-Argument in Type Systems
Server Failover For the Cheap and Forgetful
Fast Enough VMs in Fast Enough Time
Problems with Software 3: Creating Crises Where There Aren't Any
Problems with Software 2: Failing to Use the Computing Lever
Problems with Software 1: Confusing Problems Whose Solutions Are Easy to State With Problems Whose Solutions Are Easy to Realise
Parsing: The Solved Problem That Isn't
 
 

DSLs

Tony Clark
Zef Hemel
 

Modelling

Mark Delgano
Steven Kelly
Jim Steel
 

OS

Marc Balmer
Ross Burton
Peter Hansteen
OpenBSD Journal
Ted Unangst
 

Programming

Peter Bell
Gilad Bracha
Tony Clark
Cliff Click
William Cook
Jonathan Edwards
Daniel Ehrenberg
Fabien Fleutot
Martin Fowler
John Goerzen
Grace
James Hague
James Iry
JOT
Ralf Laemmel
Lambda the Ultimate
Daniel Lemire
Michael Lucas
Bertrand Meyer
Keith Packard
Havoc Pennington
Brown PLT
John Regehr
Software Engineering Radio
Diomidis Spinellis
Shin Tai
Markus Voelter
Phil Wadler
Russel Winder
Steve Yegge