SREP

[RSS feed]
 

October 3 2005
Updated: October 7 2005

In my opinion, what separates the men from the boys when it comes to programming is debugging. It doesn't matter how good one is, bugs are an inevitable part of a programmers life. The difference in the amount of time it takes different people to notice a bug, track down its cause, and provide a fix can be quite amazing. Maybe I am a troglodyte, but I believe that the best advice I have seen on the subject of debugging is from Brian Kernighan, who once said that the best tool for debugging is printf and common sense. Frankly I have never found debuggers of any practical use (apart from when using those languages so crude that one needs a debugger to view a stack trace).

There is however one other technique in my debugging armoury, and it involves the humble grep utility. [For those unfamiliar with grep, it searches through one or more files searching for a match against a given regular expression]. I use it to hunt for every occurrence of a function name or data-type in a large code base while trying to track down a problem. If I have a hunch of a possible problem, this often enables me to track down the offending calling code far faster than any other mechanism I am aware of. A very handy idiom that I use is the following which finds every file containing Func and loads it straight into my text editor:

grep -iRl Func | xargs nedit
This does a case insensitive (-i), recursive (-R) search and then prints out just the filename of matching files (-l). Cunning uses of grep's regular expressions can result in a very powerful debugging aid which unfortunately seems to be severely underutilized by most people.

Despite my fondness for grep, I have always felt that it is lacking in one important regard: it can not replace what it matches with another string. I have therefore long had a simple utility in my ~/bin directory of useful little programs which was a simple wrapper around the sub function in Python's regular expression library. It essentially did a recursive search through a list of files replacing the regular expression R with the string S. I have used this utility extensively for debugging and non-debugging related purposes and it is incredibly useful. However it is something of a crude tool. Experience has taught me that often a regular expression matches against more things than one intended, and that it is therefore a very good idea to take a backup of all relevant data before running the utility.

Recently I have had much cause to make use of my simple utility on an evolving code base. Continually backing up data, and refining a regular expression until it matches only its intended target is highly repetitive and tedious, and in my experience anything that is repetitive and tedious leads, sooner or later, to boredom induced errors. So I sat down and quickly cooked up a new variant of my utility which I have flippantly named srep (Search and REPlace). srep has one novelty of particular interest: rather than always directly modifying files, it can be told produce output which is acceptable format for the patch utility i.e. it outputs diffs. This has some interesting benefits:

  • One can inspect the diff output of srep, and check that it has only matched against what was expected.
  • One can edit any incorrect changes manually.
  • The standard patch utility can be used to actually commit the changes verified by the user to the data in question.
  • If for any reason the changes turn out not to be correct, running patch with the -R ('reverse') switch backs out the changes from the data in question.
Here's an example of using srep on a code base of C files. The following command executes srep on all .c and .h files in the current directory, and outputs a unified diff (-u) into the changes file.
find . | grep "\\.[ch]$" | xargs srep -u Con_Func_Obj \
  Con_Func_Seg > changes
A fragment of the changes file is as follows (the full unified diff can be found here)
--- ./VM.c Sun Oct  2 14:31:38 2005
+++ ./VM.c Sun Oct  2 14:31:38 2005
@@ -183,12 +183,12 @@
 Con_Obj * Con_VM_apply(Con_EC_Obj *ec, Con_Obj *func)
 {
     jmp_buf env;
-    Con_Func_Obj *func_seg;
+    Con_Func_Seg *func_seg;
     Con_Obj *return_obj;
 
     if ((func->seg_c_class != ec->vm->builtins[CON_BUILTIN_FUNC_CLASS]))
         return NULL;
-    func_seg = (Con_Func_Obj *) func;
+    func_seg = (Con_Func_Seg *) func;
     
     if (func_seg->pc_type == PC_TYPE_C_FUNCTION) {
         if (sigsetjmp(env, 0) == 0) {
Once I have verified that the changes that will be made are what I expect, I can then apply this diff in the normal fashion:
patch -p0 < changes
srep has a useful variant on this, which is to output files in a unified diff but with the additional output from Tim Peter's ndiff utility. The -n flag tells srep to produce a hybrid unified / ndiff patch such as the following fragment (the full hybrid diff can be found here):
--- ./VM.c     Sun Oct  2 14:31:38 2005
+++ ./VM.c     Sun Oct  2 14:31:38 2005
@@ -154,12 +154,12 @@
 Con_Obj * Con_VM_apply(Con_EC_Obj *ec, Con_Obj *func)
 {
     jmp_buf env;
-    Con_Func_Obj *func_seg;
?             ^^^
+    Con_Func_Seg *func_seg;
?             ^^^
     Con_Obj *return_obj;
 
     if ((func->seg_c_class != ec->vm->builtins[CON_BUILTIN_FUNC_CLASS])
         return NULL;
-    func_seg = (Con_Func_Obj *) func;
?                         ^^^
+    func_seg = (Con_Func_Seg *) func;
?                         ^^^
     
     if (func_seg->pc_type == PC_TYPE_C_FUNCTION) {
         if (sigsetjmp(env, 0) == 0) {
What srep takes from ndiff is the lines beginning with ? which show you which characters within a line are affected by the diff. This can be very useful when you are trying to visually track which intra-line changes will be made by applying a diff. Unfortunately the patch utility complains about such lines, so one can not directly feed such a diff into patch. One can however use srep to automatically modify the changes diff into valid patch input by removing all lines starting with ?:
srep "^\\?.*?\n" "" changes
As this example suggests, if srep is run without either the unified (-u) or ndiff (-n) output options, it modifies files in situ.

srep is what I would consider a hack in the best sense of that term. It's basically a simple idea with a correspondingly simple implementation. I wouldn't necessarily trust it not to eat my hard disk, although so far I've not had any problems; I can however tell you with some confidence that it is grossly resource inefficient, so don't expect it to run particularly fast. If you're feeling a touch brave and would like to try out this potentially interesting way to debug and change programs, download the put-together-very-quickly version of srep and feel free to play. And if you think of any new uses for it, please let me know! Perhaps one day someone might make srep a fully fledged utility rather than the slightly ugly hack it currently is.

Updated (October 7 2005): Attributed the paraphrased "common-sense" quote to Kernighan.

Follow me on Twitter @laurencetratt

Link to this entry

 

All posts

 

Last 10 posts

An editor for composed programs
The Bootstrapped Compiler and the Damage Done
Relative and Absolute Levels
General Purpose Programming Languages' Speed of Light
Another Non-Argument in Type Systems
Server Failover For the Cheap and Forgetful
Fast Enough VMs in Fast Enough Time
Problems with Software 3: Creating Crises Where There Aren't Any
Problems with Software 2: Failing to Use the Computing Lever
Problems with Software 1: Confusing Problems Whose Solutions Are Easy to State With Problems Whose Solutions Are Easy to Realise
 
 

DSLs

Tony Clark
Zef Hemel
 

Modelling

Mark Delgano
Steven Kelly
Jim Steel
 

OS

Marc Balmer
Ross Burton
Peter Hansteen
OpenBSD Journal
Ted Unangst
 

Programming

Peter Bell
Gilad Bracha
Tony Clark
Cliff Click
William Cook
Jonathan Edwards
Daniel Ehrenberg
Fabien Fleutot
Martin Fowler
John Goerzen
Grace
James Hague
James Iry
JOT
Ralf Laemmel
Lambda the Ultimate
Daniel Lemire
Michael Lucas
Bertrand Meyer
Keith Packard
Havoc Pennington
Brown PLT
John Regehr
Diomidis Spinellis
Shin Tai
Markus Voelter
Phil Wadler
Russel Winder
Steve Yegge