Filling in a Gap

Blog archive

Recent posts
Some Reflections on Writing Unix Daemons
Faster Shell Startup With Shell Switching
Choosing What To Read
Debugging A Failing Hotkey
How Often Should We Sharpen Our Tools?
Four Kinds of Optimisation
Minor Advances in Knowledge Are Still a Worthwhile Goal
How Hard is it to Adapt a Memory Allocator to CHERI?
"Programming" and "Programmers" Mean Different Things to Different People
pizauth: First Stable Release

One of life’s little pleasures is filling in a gap and then making a new connection between the altered object and one of its neighbours. Performing this action on software is just as satisfying as doing the same to any real-world object.

Recently I’ve made a previously lop-sided part of Converge much more internally consistent, and in so doing realised a useful new connection between two language features. Since it’s to do with a unique part of Converge, it makes an interesting little study. In essence, Converge has a macro-esque system that is inverted from the traditional LISP/Scheme style. LISP macros are special constructs, with some ordinary looking function calls actually being macro calls. In Converge (as in Template Haskell), macros are normal functions or expressions; macro calls however are explicitly identified. In the following example, m is a normal function that intuitively is used a macro by the splice $<...> in main:

func m():
  return [| Sys::println("m") |]

func main():
  $<m()>

The splice operator was the first implemented in Converge and can be considered to be the traditional splice operator. It became obvious quite quickly to me that the following idiom has two practical problems when embedding a DSL:

func my_dsl():
  return [| ... |]

$<my_dsl("""...
...
...""")>

The first problem is an aesthetic one. Passing a big string, typically split over multiple lines, to the my_dsl function is ugly. It seems somehow wrong. The second problem is far deeper. If there’s an error in the users DSL input then the resulting error message will, at best, pinpoint that error as starting at the beginning of the string. Can you imagine debugging a program that only told you which file an error occurred in, and not the line number too? In practical terms, it’s too painful to contemplate (trust me - I’ve tried it).

Therefore Converge soon grew a second splice operator which I subsequently named the DSL splice operator. It is used as follows:

func my_dsl(dsl_block, src_info):
  ...

$<<my_dsl>>:
  ...
  ...
  ...

Basically, the DSL splice operator forgoes the need to wrap up DSL input as a big string. It simply takes the indented block of code underneath the operator as the DSL input, and passes it raw to the DSL implementation function my_dsl (via the dsl_block argument). This solves the aesthetic problem, but also allows a neat solution to error reporting. When the DSL implementation function translates the DSL string into a Converge Abstract Syntax Tree (AST), it can record where the string came from relative to the users input by manually adding src infos to the AST that is created (the src infos that are created are relative to the src_info argument, but that’s more detail than is necessary here). So if an error in the users input is raised (at compile-time or run-time) the DSL can pinpoint exactly where within the input the error occurred.

So that was where it was left for a long time. The traditional splice operator spliced unmodified ASTs in, and the DSL splice operator spliced in ASTs with extra src infos.

Recently it occurred to me that something was missing. Consider the following code:

func f():
  return [| 1.foo |]

func main():
  $<f()>

Since integers don’t have a foo slot, this raises a run-time error such as:

Traceback (most recent call at bottom):
  1: File "test.cv", line 2, column 12
Slot_Exception: No such slot 'foo' in instance of 'Int'.

where line 2 refers to the line within f which generates the AST. If f is only called from one splice, this exception gives one enough information to debug the problem (if somewhat indirectly). But if there are two separate splices which call f one can’t distinguish which of those two calls led to the incorrect AST being generated.

This might sound quite limited, but is, at worst, on a par with any existing macro system I’ve yet come across. Many macro systems don’t record any error information when creating or splicing in ASTs. About the best that I’ve seen is some Scheme variants which record the splice location of any error; however if a complex AST was spliced in, the user is given no clue as to which part of the AST led to the error.

At this point, the comparison with the DSL splice operator should be obvious, although it escaped me for quite some time: the DSL implementation function called by a DSL splice can customise its error reporting based on the input DSL block. However what we want for the above example isn’t manual customisation of the error reporting: we want it to be created automatically. I therefore recently merged a patch into Converge which means that when a traditional splice is performed, the spliced-in AST automatically has added to it the src info for the splice location. For the above example one now gets the following run-time exception:

Traceback (most recent call at bottom):
  1: File "test.cv", line 2, column 12
     File "test.cv", line 5, column 12
Slot_Exception: No such slot 'foo' in instance of 'Int'.

What this means is that the single entry in call stack is associated with two source file locations. Larger examples show what this means more clearly. For example the following exception (created by injecting the same error from above into some real Converge code) shows a backtrace with 3 entries, where 2 of the entries are associated with more than one source location:

Traceback (most recent call at bottom):
  1: File "test.cv", line 125, column 2
  2: File "test.cv", line 54, column 4
     File "test.cv", line 112, column 2
  3: File "test.cv", line 78, column 118
     File "test.cv", line 113, column 3
Slot_Exception: No such slot 'foo' in instance of 'Int'

So it became obvious to me while I was implementing this new functionality that I had created a symmetry - in the sense of a mirror image - of sorts between the two types of splice. Traditional splices automatically add source information about the splice location, whereas DSL splices don’t. If I can think of shorter names to capture this, I may well retrospectively rename the two splice operators, as this concept is the one that most usefully captures the difference between them. More importantly this exercise gave me - if no one else - a more profound insight into splicing. I find this sort of insight, which is a relatively rare event, deeply satisfying: and it all comes from solving a little problem, filling a little gap, and then making connections after the fact.

Newer 2007-03-21 08:00 Older
If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Comments



(optional)
(used only to verify your comment: it is not displayed)