Filling in a Gap

[RSS feed]
 

March 21 2007

One of life's little pleasures is filling in a gap and then making a new connection between the altered object and one of its neighbours. Performing this action on software is just as satisfying as doing the same to any real-world object.

Recently I've made a previously lop-sided part of Converge much more internally consistent, and in so doing realised a useful new connection between two language features. Since it's to do with a unique part of Converge, it makes an interesting little study. In essence, Converge has a macro-esque system that is inverted from the traditional LISP/Scheme style. LISP macros are special constructs, with some ordinary looking function calls actually being macro calls. In Converge (as in Template Haskell), macros are normal functions or expressions; macro calls however are explicitly identified. In the following example, m is a normal function that intuitively is used a macro by the splice $<...> in main:

func m():
  return [| Sys::println("m") |]

func main():
  $<m()>
The splice operator was the first implemented in Converge and can be considered to be the traditional splice operator. It became obvious quite quickly to me that the following idiom has two practical problems when embedding a DSL:
func my_dsl():
  return [| ... |]

$<my_dsl("""...
...
...""")>
The first problem is an aesthetic one. Passing a big string, typically split over multiple lines, to the my_dsl function is ugly. It seems somehow wrong. The second problem is far deeper. If there's an error in the users DSL input then the resulting error message will, at best, pinpoint that error as starting at the beginning of the string. Can you imagine debugging a program that only told you which file an error occurred in, and not the line number too? In practical terms, it's too painful to contemplate (trust me - I've tried it).

Therefore Converge soon grew a second splice operator which I subsequently named the DSL splice operator. It is used as follows:

func my_dsl(dsl_block, src_info):
  ...

$<<my_dsl>>:
  ...
  ...
  ...
Basically, the DSL splice operator forgoes the need to wrap up DSL input as a big string. It simply takes the indented block of code underneath the operator as the DSL input, and passes it raw to the DSL implementation function my_dsl (via the dsl_block argument). This solves the aesthetic problem, but also allows a neat solution to error reporting. When the DSL implementation function translates the DSL string into a Converge Abstract Syntax Tree (AST), it can record where the string came from relative to the users input by manually adding src infos to the AST that is created (the src infos that are created are relative to the src_info argument, but that's more detail than is necessary here). So if an error in the users input is raised (at compile-time or run-time) the DSL can pinpoint exactly where within the input the error occurred.

So that was where it was left for a long time. The traditional splice operator spliced unmodified ASTs in, and the DSL splice operator spliced in ASTs with extra src infos.

Recently it occurred to me that something was missing. Consider the following code:

func f():
  return [| 1.foo |]

func main():
  $<f()>
Since integers don't have a foo slot, this raises a run-time error such as:
Traceback (most recent call at bottom):
  1: File "test.cv", line 2, column 12
Slot_Exception: No such slot 'foo' in instance of 'Int'.
where line 2 refers to the line within f which generates the AST. If f is only called from one splice, this exception gives one enough information to debug the problem (if somewhat indirectly). But if there are two separate splices which call f one can't distinguish which of those two calls led to the incorrect AST being generated.

This might sound quite limited, but is, at worst, on a par with any existing macro system I've yet come across. Many macro systems don't record any error information when creating or splicing in ASTs. About the best that I've seen is some Scheme variants which record the splice location of any error; however if a complex AST was spliced in, the user is given no clue as to which part of the AST led to the error.

At this point, the comparison with the DSL splice operator should be obvious, although it escaped me for quite some time: the DSL implementation function called by a DSL splice can customise its error reporting based on the input DSL block. However what we want for the above example isn't manual customisation of the error reporting: we want it to be created automatically. I therefore recently merged a patch into Converge which means that when a traditional splice is performed, the spliced-in AST automatically has added to it the src info for the splice location. For the above example one now gets the following run-time exception:

Traceback (most recent call at bottom):
  1: File "test.cv", line 2, column 12
     File "test.cv", line 5, column 12
Slot_Exception: No such slot 'foo' in instance of 'Int'.
What this means is that the single entry in call stack is associated with two source file locations. Larger examples show what this means more clearly. For example the following exception (created by injecting the same error from above into some real Converge code) shows a backtrace with 3 entries, where 2 of the entries are associated with more than one source location:
Traceback (most recent call at bottom):
  1: File "test.cv", line 125, column 2
  2: File "test.cv", line 54, column 4
     File "test.cv", line 112, column 2
  3: File "test.cv", line 78, column 118
     File "test.cv", line 113, column 3
Slot_Exception: No such slot 'foo' in instance of 'Int'

So it became obvious to me while I was implementing this new functionality that I had created a symmetry - in the sense of a mirror image - of sorts between the two types of splice. Traditional splices automatically add source information about the splice location, whereas DSL splices don't. If I can think of shorter names to capture this, I may well retrospectively rename the two splice operators, as this concept is the one that most usefully captures the difference between them. More importantly this exercise gave me - if no one else - a more profound insight into splicing. I find this sort of insight, which is a relatively rare event, deeply satisfying: and it all comes from solving a little problem, filling a little gap, and then making connections after the fact.

Follow me on Twitter @laurencetratt

Link to this entry

 

All posts

 

Last 10 posts

The Bootstrapped Compiler and the Damage Done
Relative and Absolute Levels
General Purpose Programming Languages' Speed of Light
Another Non-Argument in Type Systems
Server Failover For the Cheap and Forgetful
Fast Enough VMs in Fast Enough Time
Problems with Software 3: Creating Crises Where There Aren't Any
Problems with Software 2: Failing to Use the Computing Lever
Problems with Software 1: Confusing Problems Whose Solutions Are Easy to State With Problems Whose Solutions Are Easy to Realise
Parsing: The Solved Problem That Isn't
 
 

DSLs

Tony Clark
Zef Hemel
 

Modelling

Mark Delgano
Steven Kelly
Jim Steel
 

OS

Marc Balmer
Ross Burton
Peter Hansteen
OpenBSD Journal
Ted Unangst
 

Programming

Peter Bell
Gilad Bracha
Tony Clark
Cliff Click
William Cook
Jonathan Edwards
Daniel Ehrenberg
Fabien Fleutot
Martin Fowler
John Goerzen
Grace
James Hague
James Iry
JOT
Ralf Laemmel
Lambda the Ultimate
Daniel Lemire
Michael Lucas
Bertrand Meyer
Keith Packard
Havoc Pennington
Brown PLT
John Regehr
Software Engineering Radio
Diomidis Spinellis
Shin Tai
Markus Voelter
Phil Wadler
Russel Winder
Steve Yegge