snare: a Minimalistic GitHub Webhooks Runner

Recent posts
pizauth: HTTPS redirects
Recording and Processing Spoken Word
Why the Circular Specification Problem and the Observer Effect Are Distinct
What Factors Explain the Nature of Software?
Some Reflections on Writing Unix Daemons
Faster Shell Startup With Shell Switching
Choosing What To Read
Debugging A Failing Hotkey
How Often Should We Sharpen Our Tools?
Four Kinds of Optimisation

Blog archive

Starting with extsmail I seem to have developed the accidental hobby of occasionally developing new tools in the Unix tradition. In this post I’m going to introduce snare, a minimalistic GitHub webhooks runner daemon for those cases where something like GitHub actions don’t give you sufficient control. I’ve been using snare in production for over 2 years for tasks from running CI using specific hardware to sending out email diffs to automatically restarting Unix daemons when their configuration has been changed. It’s a simple, but now stable, tool [1].

What is a webhooks runner? Well, whenever something happens to a GitHub repository – for example, a new PR is raised, or a commit is pushed to a PR – GitHub can send an HTTP request to a specified URL informing it of what happened. Configuring a webhook for a given GitHub repository is relatively simple: go to that repository, then Settings > Webhooks > Add webhook.

snare is a simple program which listens for GitHub webhook events and runs Unix shell commands based on them. You need a server of your own to run snare on. When you add a GitHub webhook, you then need to specify http://yourmachine.com:port/ [2], a secret (in essence, a shared password between GitHub and snare) and then choose which events you wish GitHub to deliver. For example, the default “Just the push event” works well if you want to send email diffs whenever someone pushes commits to a repository.

snare needs a configuration file to tell it what to do when an event comes in. A very simple snare.conf file looks as follows [3]:

listen = "<ip-address>:<port>";

github {
  match ".*" {
    cmd = "somecmd";
    secret = "<secret>";
  }
}

In essence, snare will listen on <ip-address>:<port> for webhook events, verifying that they were created with the secret <secret>. Each request is relative to a repository: match blocks match against a “ owner/repository“ string using Rust’s regex crate for regular expressions“. Thus ".*" matches against any repository and the Unix shell command somecmd will be run when (any) event is received for that repository.

Let’s imagine we want to send out email diffs when someone pushes to the repositories “owner/repo1” or “owner/repo2”. We might create a github block along the lines of the following:

github {
  match "owner/repo[12]" {
    cmd = "ghemaildiff %o %r %e %j email@example.com";
    secret = "<secret>";
  }
}

This only matches against the particular repositories we wanted to match. The command we’re now going to execute is called ghemaildiff (I’ll show an example of this below) and it takes five or more arguments: the repository’s owner (%o), name (%r), the GitHub event type (%e), a path to the full JSON of the GitHub event (%j), and one or more email addresses to send diffs to. As you’ve probably guessed, snare searches for text like %e and replaces it with other text; %% escapes percentage characters, should you need to do so.

One of the big problems when executing commands like this is when something goes wrong – it’s easy for the error to sit unnoticed in a log. Instead, snare allows one to add an errorcmd, which is very similar to cmd, except a) it’s only executed when cmd fails b) it has an additional %s modifier, which is a path to a file with the stdout / stderr of the failed command. I typically use it as follows:

github {
  match "user/repo[12]" {
    cmd = "ghemaildiff %o %r %e %j email@example.com";
    errorcmd = "cat %s | mailx -s \"snare error: github.com/%o/%r\" email@example.com";
  }
}

so that if executing a command fails, I’m sent an email that helps me debug the problem.

Security

For most purposes, the example configuration above is enough to use snare in anger. However, any program which takes input from a network and runs commands based on it is a security risk. snare tries to reduce these worries by rejecting incoming requests if any part of the input isn’t exactly as expected. The % escape sequences available to cmd are guaranteed to:

  1. satisfy the regular expression [a-zA-Z0-9._-]+
  2. not to be the strings “.” or “..”.

This means that the escape sequences are safe to use as shell arguments and/or to be included in file system paths.

However, the user still has to be thoughtful in the commands they run which boils down to:

  • All input (including JSON files) must be treated as potentially suspect: I urge you to accept input only if it precisely matches the format you expect, rather than merely rejecting input if it does something that you happen to recognise as unexpected or bad. The problem with the latter approach is that it’s easy to overlook things that will subsequently turn out to be bad. Put another way: it’s better to be overstrict and relax later.
  • Use at least set -euf (and perhaps more) in shell scripts so that errors in subcommands cause your script to immediately terminate rather than limp on in a way that you almost certainly didn’t anticipate.
  • Think carefully about who can cause an event to be triggered: for example, if you run webhooks when a pull request is merged, can someone outside your organisation cause a merge to occur?
  • If a command fails, think about whether your errorcmd (if you have one) can unintentionally leak private information.

I am deliberately making the above scary sounding, because I want to emphasise that you need to use snare in “don’t trust until proven trustworthy” mode. If you do so, I believe that snare can be used in a way that is wholly secure.

An example command

You can execute whatever command you want with snare, but here’s an example ghemaildiff script which creates simple, but useful, diffs which it sends via email:

#! /bin/sh

set -euf

if [ $# -lt 5 ]; then
    echo "Usage: ghemaildiff <owner> <repository> <event> </path/to/JSON> <email_1> [...<email_n>]" > /dev/stderr
    exit 1
fi

# We only generate diffs for push events
if [ "$3" != "push" ]; then
    exit 0
fi

before_hash=`jq .before "$4" | tr -d '\"'`
after_hash=`jq .after "$4" | tr -d '\"'`
echo "$before_hash" | grep -E "^[a-fA-F0-9]+$" 2>&1 > /dev/null
echo "$after_hash" | grep -E "^[a-fA-F0-9]+$" 2<&1 > /dev/null

owner=$1
repo=$2
shift ; shift ; shift ; shift

git clone https://github.com/$owner/$repo repo
cd repo
for email in $@; do
    git log --reverse -p "$before_hash..$after_hash" | mail -s "Push to $owner/$repo" "$email"
done

Notice that this script doesn’t check inputs which snare has already validated (e.g. $1 is snare’s %o and has thus already been validated as a sensible input) but is careful to check that the git commit IDs extracted via jq satisfy a very narrow regular expression before passing them on as shell arguments.

Advanced configuration

As you can see from the snare.conf man page, snare doesn’t have a huge number of configuration options. That’s deliberate, because I wanted to keep snare simple: snare doesn’t even provide a builtin way to fetch a repository! However, there are two additional configuration tricks that are worth knowing about.

When a request comes in, snare “executes” all the match statements in the config file, from top to bottom: later settings override earlier settings [4]. This allows the user to set, or override, defaults in a predictable manner. Indeed, snare inserts an implicit match block before the user’s configuration:

match ".*" {
  queue = sequential;
  timeout = 3600;
}

I’ll explain queue shortly; the timeout is 1 hour. If, for example, the user has this configuration file:

github {
  match ".*" {
    cmd = "somecmd";
    errorcmd = "cat %s | mailx -s \"snare error: github.com/%o/%r\" abc@def.com";
    secret = "sec";
  }
  match "a/b" {
    errorcmd = "lpr %s";
  }
}

then the following repositories will have these settings:

a/b:
  queue = sequential
  timeout = 3600
  cmd = "somecmd";
  errorcmd = "lpr %s";
  secret = "sec"
c/d:
  queue = sequential
  timeout = 3600
  cmd = "somecmd";
  errorcmd = "cat %s | mailx -s \"snare error: github.com/%o/%r\" abc@def.com";
  secret = "sec"

You can override settings as many times as you want in a file: it’s a powerful technique!

By default, snare queues requests for any given repository and only executes the next in the queue when the previous command has finished. This is a safe default, but can lead to undue work and delay, particularly for repositories with significant activity. There are two other queue modes. queue = parallel executes requests in parallel to each other. I’ve not used this much myself, but there are obvious use cases for it.

In contrast, I use queue = evict extensively: it means a repository has a maximum queue length of 1, with any new request coming in replacing the existing queue entry (if it exists). For example, we have many webhooks which build documentation for a repository after a pull request is merged. If several pull requests are merged in quick succession (which is common), there’s no point waiting to build the documentation for all the pull requests: we might as well only build the documentation relating from the “latest and greatest” merge. Note that evict does not stop any currently running job.

Summary

snare is a niche tool, but I suspect more people could benefit from this niche than currently realise it: certainly, we’ve found ourselves using snare in more ways than I ever expected.

An obvious example is where we use it to automatically rebuild and release websites when a commit is pushed to a repository. Less obviously, we frequently pair it with bors and buildbot. Sometimes that’s because we need to run actions on specific hardware, but there are other simpler uses too. For example, we use it to build grmtools documentation and force push it to a gh-pages branch on every pull request merge: this way the grmtools documentation is always up-to date, but we don’t have to share a GitHub access token in the globally visible .buildbot.sh file. I’m sure other people can think of uses for snare which would never have occurred to me!

At a later date, I’ll write a short blog post about my experiences about writing snare in Rust.

Newer 2022-05-04 08:00 Older
If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Footnotes

[1]

Rust’s cargo is, by some distance, the best language package manager I’ve used and, in my opinion, a significant factor in Rust’s success. However, the culture of having many (many!) small dependencies means that it’s not possible to take the traditional Unix approach of OS-level packaging to crates. That means that if I want to make sure that snare users have access to the latest security release of a dependency-of-a-dependency, the expectation is that I release a new version of snare. Most recent updates of snare have thus really just been about updating dependencies.

Rust’s cargo is, by some distance, the best language package manager I’ve used and, in my opinion, a significant factor in Rust’s success. However, the culture of having many (many!) small dependencies means that it’s not possible to take the traditional Unix approach of OS-level packaging to crates. That means that if I want to make sure that snare users have access to the latest security release of a dependency-of-a-dependency, the expectation is that I release a new version of snare. Most recent updates of snare have thus really just been about updating dependencies.

[2]

If, as I recommend, you want to put snare behind https you’ll need to use a forwarding proxy server. It would be nice if snare could support https directly, perhaps including automatic certificate support to avoid problems with untrusted SSL certificates.

If, as I recommend, you want to put snare behind https you’ll need to use a forwarding proxy server. It would be nice if snare could support https directly, perhaps including automatic certificate support to avoid problems with untrusted SSL certificates.

[3]

The reason for an explicit github block is because I can imagine snare easily being extended in the future to cope with the webhooks-equivalents for other sites such as GitLab and the like.

The reason for an explicit github block is because I can imagine snare easily being extended in the future to cope with the webhooks-equivalents for other sites such as GitLab and the like.

[4]

I don’t know who first came up with this style of config file, but it’s certainly become a common idiom in OpenBSD daemons over the years, which is what influenced me.

I don’t know who first came up with this style of config file, but it’s certainly become a common idiom in OpenBSD daemons over the years, which is what influenced me.

Comments



(optional)
(used only to verify your comment: it is not displayed)