Tail Call Optimization

[RSS feed]

December 21 2004
Updated: April 4 2006

Lambda the Ultimate might well be the last resort of the programming scoundrel, but it can be quite informative on occasion. Plus there's always the unintended entertainment value of those who refuse to accept that people who don't use functional programming very much may still be worthy computer users: it can be genuinely hilarious. It's as if Tony Hancock's spirit had risen from the ashes and decided to inhabit the body of a 21st century academically-inclined computer programmer.

However, I digress. A recent thread on dynamic languages on LtU has some comments on tail call optimization (inevitably courtesy in part of some griping from disgruntled functional programmers), which have prompted me to write down some thoughts I've had on this subject over the past six months or so.

The first question many people will ask is "what is a tail call?". Whilst there's a slightly deeper general principle at work, it's easiest to see this via a simple example of a recursive function. Let's take the standard Fibonacci number generator in a simple Python example:

def fib(i):
  if i == 0:
    return 0
  elif i == 1:
    return 1
    return fib(i - 1) + fib(i - 2)
and then use this function in a simple interpreter session:
>>> fib(0)
>>> fib(1)
>>> fib(2)
>>> fib(10)
Everything seems fine. But if I try to compute the 1000th member of the Fibonacci sequence I find the following:
>>> fib(1000)
  File "fib.py", line 7, in fib
    return fib(i - 1) + fib(i - 2)
  File "fib.py", line 7, in fib
    return fib(i - 1) + fib(i - 2)
RuntimeError: maximum recursion depth exceeded
[I have elided the huge numbers of repetitive exception entries that Python prints.]

Essentially Python is saying "you can't have a function calling a function this many levels deep". Interestingly, Python sets a fairly strict limit to prevent programmers bogging their programs down unintentionally. However there is a standard problem here: if one has a recursive function then every time it recurses one needs extra stack space. At some point the stack will run out. If you're a functional programmer, or coding in a similar style, this is not very appealing - one can never be sure if, or when, recursive functions will mysteriously fail.

So in order to make this style of programming practical, a way is needed for allowing recursive functions to use a fixed stack size. This is where tail calls come in. Let us first rewrite our Fibonacci generator as follows (shamelessly adapted from John David Stone's & Ben Gum's example):

def fib(i, current = 0, next = 1):
  if i == 0:
    return current
    return fib(i - 1, next, current + next)

Our function now has a tail call. That is, the branch of the if construct that recursively calls the function is a tail call. What this means is that the very last thing the function does before returning is to recursively call itself. At this point, why do we need to allocate more stack space? The current invocation of the function will never need the stack space again since as soon as it receives a value from the recursive call it will return it to its caller. So instead of allocating stack space, we could simply reuse the stack space used by the current function (and inherit the return address of the current function rather than making the current function the address to which we will return). This way, it doesn't matter how many times the function recursively calls itself - it will only ever use a constant amount of stack space. Taking advantage of this fact is called tail call optimization.

The utility of tail call optimization is an accepted part of functional programming: in fact, one has to buy into the undisputed utility of tail call optimization if one wishes to be accepted as a functional programmer by ones peers. What the LtU chaps are unhappy with is that most languages such as Python and Java aren't capable of tail call optimization in any form. The standard argument in favour of tail call optimization runs along the lines of "it's so useful, and it doesn't cost anything to implement". It is this latter point which I wish to address. Whilst I agree tail calling is sometimes useful, it is not a feature without costs. Those costs may be worth paying in some, or maybe even all, circumstances but they are most definitely there.

There are two chief costs associated with tail calls and their optimization. The first is fairly obvious, and is evident in the forced rewriting of the Fibonacci function: many functions have their most natural form without tail calls. Thus functions often need to be designed very specifically with tail calls in mind. Writing functions in this style, or rewriting existing functions in this style, can be far from trivial. What's more, as can be seen from the Fibonacci example, the resulting function is frequently hard to understand because it often requires state to be passed around in parameters.

The second cost associated with tail calls is something that I have not seen mentioned elsewhere, which may well mean that I've got the wrong end of the stick. However I suspect it might be the case that it is not mentioned because the problem is only really evident in languages which give decent stack trace error reports when exceptions occur. Consider this: since tail call optimization involves overwriting the stack, what happens if an exception is raised at some point deep within a tail calling part of a program?

Let's go back to our Fibonacci program and assume that, at some point during its execution, an exception relating to the overflow of a numerical operation is raised [in reality, this error is remarkably hard to trigger in modern Python, which generally automatically switches to a non-word representation of integers in the presence of overflow]. We would expect an error report along the following:

Traceback (most recent call last):
  File "fib.py", line 13, in ?
    print fib(10)
  File "fib.py", line 9, in fib
    return fib(i - 1, next, current + next)
  File "fib.py", line 9, in fib
    return fib(i - 1, next, current + next)
  File "fib.py", line 9, in fib
    return fib(i - 1, next, current + next)
  File "fib.py", line 9, in fib
    return fib(i - 1, next, current + next)
  File "fib.py", line 9, in fib
    return fib(i - 1, next, current + next)
  File "fib.py", line 3, in fib
    raise OverflowError

Now the key here is that I earlier referred to such reports as "stack traces": this information is tightly related to information kept on the stack detailing function call histories. If our example had tail call optimization turned on, then no such information is available on the stack in the presence of tail calling. It seems the best we can hope for is a report along the lines of the following:

Traceback (most recent call last):
  File "fib.py", line 13, in ?
    print fib(10)
  File "fib.py", line 9, in fib: tail call, unknown recursion depth
    return fib(i - 1, next, current + next)
  File "fib.py", line 3, in fib
    raise OverflowError

In other words, the report could tell you that one part in the exception was at a tail calling location, but that the number of times that tail calling occurred is unknown. However, in this example we can in fact restore our error report to its full glory and still maintain constant stack space. All we need to do is have a counter associated with the stack entry of a tail call detailing how many times the tail call has recursed. As the Fibonacci function calls itself, this value is incremented, and when it returns a value it is decremented. It seems tail calls and decent error reports are compatible with relatively little implementation effort.

Unfortunately, whilst this trick works for tail recursive functions such as the Fibonacci function, I am not aware as to how one could generalize it to mutually recursive functions or for functions which contain two or more tail calling points. Consider the following slightly contrived function:

def f(i, accum = 0):
  if i == 0:
    return accum
  elif i % 2 == 0:
    return f(i - 1, accum + i)
    return f(i - 1, accum) 

This function essentially sums all even integers from 0 to i inclusive. However, this function contains two tail calls, and what's worse as the function recurses those two tail calls are interleaved (in this case, the pattern of that interleaving is simple and easily determined in advance, but there is no reason why that would be the case in general). As far as I can see, this means that it is now impossible to record, or calculate, a full error report in the presence of tail calls in a static amount of stack space. The previous trick of recording a pointer is of little use, since every time tail recursion occurs via a different line of code than the previous call, then a new counter needs to be created pointing to the new tail call location: in the worst case, this will lead to exactly the same number of stack levels being recorded as in the unoptimized tail calling case.

I don't know whether the designers of languages such as Python and Java have considered this problem, but it may well be that this issue (or something related) is what has prevented them from automatically optimizating tail calls. For me personally, it is currently more important that error reports are full, detailed and comprehensible. If someone can point out to me how tail call optimization can be made to be compatible with this wish, then I may add such a feature into the Converge VM and compiler, since it is a relatively easy feature to implement. Until then, I fear I have little choice but to take the approach of most other languages, and leave such a feature out, despite the inevitable scorn of some of the LtU crowd.

Updated (April 4 2006): It turns out that other people had noted that tail call optimization has drawbacks when it comes to stack backtraces; there is a very clear hint in the paper "Programming as an Experience: The Inspiration for Self" by Randall B. Smith and David Ungar, although they don't articulate the specific reasons outlined in this entry.

Follow me on Twitter @laurencetratt

Link to this entry


All posts


Last 10 posts

Debugging Layers
An Editor for Composed Programs
The Bootstrapped Compiler and the Damage Done
Relative and Absolute Levels
General Purpose Programming Languages' Speed of Light
Another Non-Argument in Type Systems
Server Failover For the Cheap and Forgetful
Fast Enough VMs in Fast Enough Time
Problems with Software 3: Creating Crises Where There Aren't Any
Problems with Software 2: Failing to Use the Computing Lever


Tony Clark
Zef Hemel


Steven Kelly
Jim Steel


Marc Balmer
Ross Burton
Peter Hansteen
OpenBSD Journal
Ted Unangst


Gilad Bracha
Tony Clark
Cliff Click
William Cook
Jonathan Edwards
Fabien Fleutot
John Goerzen
James Hague
James Iry
Ralf Laemmel
Lambda the Ultimate
Daniel Lemire
Michael Lucas
Dan Luu
Niko Matsakis
Bertrand Meyer
Keith Packard
Brown PLT
John Regehr
Diomidis Spinellis
Shin Tai
Markus Voelter
Phil Wadler
Russel Winder