SREP

Blog archive

Recent posts
Some Reflections on Writing Unix Daemons
Faster Shell Startup With Shell Switching
Choosing What To Read
Debugging A Failing Hotkey
How Often Should We Sharpen Our Tools?
Four Kinds of Optimisation
Minor Advances in Knowledge Are Still a Worthwhile Goal
How Hard is it to Adapt a Memory Allocator to CHERI?
"Programming" and "Programmers" Mean Different Things to Different People
pizauth: First Stable Release

In my opinion, what separates the men from the boys when it comes to programming is debugging. It doesn’t matter how good one is, bugs are an inevitable part of a programmers life. The difference in the amount of time it takes different people to notice a bug, track down its cause, and provide a fix can be quite amazing. Maybe I am a troglodyte, but I believe that the best advice I have seen on the subject of debugging is from Brian Kernighan, who once said that the best tool for debugging is printf and common sense. Frankly I have never found debuggers of any practical use (apart from when using those languages so crude that one needs a debugger to view a stack trace).

There is however one other technique in my debugging armoury, and it involves the humble grep utility. [For those unfamiliar with grep, it searches through one or more files searching for a match against a given regular expression]. I use it to hunt for every occurrence of a function name or data-type in a large code base while trying to track down a problem. If I have a hunch of a possible problem, this often enables me to track down the offending calling code far faster than any other mechanism I am aware of. A very handy idiom that I use is the following which finds every file containing Func and loads it straight into my text editor:

grep -iRl Func | xargs nedit

This does a case insensitive (-i), recursive (-R) search and then prints out just the filename of matching files (-l). Cunning uses of grep’s regular expressions can result in a very powerful debugging aid which unfortunately seems to be severely underutilized by most people.

Despite my fondness for grep, I have always felt that it is lacking in one important regard: it can not replace what it matches with another string. I have therefore long had a simple utility in my ~/bin directory of useful little programs which was a simple wrapper around the sub function in Python’s regular expression library. It essentially did a recursive search through a list of files replacing the regular expression R with the string S. I have used this utility extensively for debugging and non-debugging related purposes and it is incredibly useful. However it is something of a crude tool. Experience has taught me that often a regular expression matches against more things than one intended, and that it is therefore a very good idea to take a backup of all relevant data before running the utility.

Recently I have had much cause to make use of my simple utility on an evolving code base. Continually backing up data, and refining a regular expression until it matches only its intended target is highly repetitive and tedious, and in my experience anything that is repetitive and tedious leads, sooner or later, to boredom induced errors. So I sat down and quickly cooked up a new variant of my utility which I have flippantly named srep (Search and REPlace). srep has one novelty of particular interest: rather than always directly modifying files, it can be told produce output which is acceptable format for the patch utility i.e. it outputs diffs. This has some interesting benefits:

  • One can inspect the diff output of srep, and check that it has only matched against what was expected.
  • One can edit any incorrect changes manually.
  • The standard patch utility can be used to actually commit the changes verified by the user to the data in question.
  • If for any reason the changes turn out not to be correct, running patch with the -R (‘reverse’) switch backs out the changes from the data in question.

Here’s an example of using srep on a code base of C files. The following command executes srep on all .c and .h files in the current directory, and outputs a unified diff (-u) into the changes file.

find . | grep "\\.[ch]$" | xargs srep -u Con_Func_Obj \
  Con_Func_Seg > changes

A fragment of the changes file is as follows (the full unified diff can be found here)

--- ./VM.c Sun Oct  2 14:31:38 2005
+++ ./VM.c Sun Oct  2 14:31:38 2005
@@ -183,12 +183,12 @@
 Con_Obj * Con_VM_apply(Con_EC_Obj *ec, Con_Obj *func)
 {
     jmp_buf env;
-    Con_Func_Obj *func_seg;
+    Con_Func_Seg *func_seg;
     Con_Obj *return_obj;

     if ((func->seg_c_class != ec->vm->builtins[CON_BUILTIN_FUNC_CLASS]))
         return NULL;
-    func_seg = (Con_Func_Obj *) func;
+    func_seg = (Con_Func_Seg *) func;

     if (func_seg->pc_type == PC_TYPE_C_FUNCTION) {
         if (sigsetjmp(env, 0) == 0) {

Once I have verified that the changes that will be made are what I expect, I can then apply this diff in the normal fashion:

patch -p0 < changes

srep has a useful variant on this, which is to output files in a unified diff but with the additional output from Tim Peter’s ndiff utility. The -n flag tells srep to produce a hybrid unified / ndiff patch such as the following fragment (the full hybrid diff can be found here):

--- ./VM.c     Sun Oct  2 14:31:38 2005
tags = ["essay"]
+++ ./VM.c     Sun Oct  2 14:31:38 2005
@@ -154,12 +154,12 @@
 Con_Obj * Con_VM_apply(Con_EC_Obj *ec, Con_Obj *func)
 {
     jmp_buf env;
-    Con_Func_Obj *func_seg;
?             ^^^
+    Con_Func_Seg *func_seg;
?             ^^^
     Con_Obj *return_obj;

     if ((func->seg_c_class != ec->vm->builtins[CON_BUILTIN_FUNC_CLASS])
         return NULL;
-    func_seg = (Con_Func_Obj *) func;
?                         ^^^
+    func_seg = (Con_Func_Seg *) func;
?                         ^^^

     if (func_seg->pc_type == PC_TYPE_C_FUNCTION) {
         if (sigsetjmp(env, 0) == 0) {

What srep takes from ndiff is the lines beginning with ? which show you which characters within a line are affected by the diff. This can be very useful when you are trying to visually track which intra-line changes will be made by applying a diff. Unfortunately the patch utility complains about such lines, so one can not directly feed such a diff into patch. One can however use srep to automatically modify the changes diff into valid patch input by removing all lines starting with ?:

srep "^\\?.*?\n" "" changes

As this example suggests, if srep is run without either the unified (-u) or ndiff (-n) output options, it modifies files in situ.

srep is what I would consider a hack in the best sense of that term. It’s basically a simple idea with a correspondingly simple implementation. I wouldn’t necessarily trust it not to eat my hard disk, although so far I’ve not had any problems; I can however tell you with some confidence that it is grossly resource inefficient, so don’t expect it to run particularly fast. If you’re feeling a touch brave and would like to try out this potentially interesting way to debug and change programs, download the put-together-very-quickly version of srep and feel free to play. And if you think of any new uses for it, please let me know! Perhaps one day someone might make srep a fully fledged utility rather than the slightly ugly hack it currently is.

Updated (October 7 2005): Attributed the paraphrased “common-sense” quote to Kernighan.

Newer 2005-10-03 08:00 Older
If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Comments



(optional)
(used only to verify your comment: it is not displayed)