Home e-mail: laurie@tratt.net   twitter: laurencetratt   twitter: laurencetratt
email updates:
  |   RSS feed

Making a Video of a Single Window

August 9 2022

Blog archive

 
Last 10 blog posts
Making a Video of a Single Window
Two researcher jobs in soft-dev
Another Reorganisation
July Links
What's the Most Portable Way to Include Binary Blobs in an Executable?
CHERITech22 and PLISS 2022
How I Clean my Glasses
June Links
What Metric to Use When Benchmarking?
Chance, Luck, and Risk
 
I recently wanted to send someone a video of a program doing some interesting things in a single X11 window. Recording the whole desktop is easy (some readers may remember my post on Aeschylus which does just that) but it will include irrelevant (and possibly unwanted) parts of the screen, leading to unnecessarily large files. I couldn't immediately find a tool which did what I wanted on OpenBSD [1] but through a combination of xwininfo, FFmpeg, and hk I was able to put together exactly what I needed in short order. Even better, I was able to easily post-process the video to shrink its file size, speed it up, and contort it to the dimension requirements of various platforms. Here's a video straight out of the little script I put together:

In this post I'm going to quickly go through what I did. I hope you get two things from this. First, FFmpeg, hk, and xwininfo are great examples of tools whose power can be magnified by combining them with other tools. Second, I'm not presenting an end product as much as I am presenting example recipes which you can alter and adjust to fit your circumstances.

Selecting the right portion of the screen

The first problem is automatically determining what portion of the screen I want to capture. Fortunately, xwininfo does exactly what I want: when I run it, the cursor changes to a "+"; and, when I click on a window, it outputs information to stdout about that window. Here's what happens when I click on the editor window I'm typing this post into:

$ xwininfo
xwininfo: Please select the window about which you
          would like information by clicking the
          mouse in that window.

xwininfo: Window id: 0x2800006 "making_a_video_of_a_single_window.incomplete + (~/web/tratt.net/templates/laurie/blog/2022) - NVIM — Neovim"

  Absolute upper-left X:  309
  Absolute upper-left Y:  104
  Relative upper-left X:  5
  Relative upper-left Y:  29
  Width: 3000
  Height: 2400
  Depth: 24
  Visual: 0x57c
  Visual Class: TrueColor
  Border width: 0
  Class: InputOutput
  Colormap: 0x2800005 (installed)
  Bit Gravity State: NorthWestGravity
  Window Gravity State: NorthWestGravity
  Backing Store State: NotUseful
  Save Under State: no
  Map State: IsViewable
  Override Redirect State: no
  Corners:  +309+104  -531+104  -531-56  +309-56
  -geometry 1500x1200+309-56
What I need to extract are:
  1. the window's width and height,
  2. and the coordinates of the window's top-left corner (where 0,0 is the top-left of the screen).
Let's assume I've put xwininfo's output into /tmp/t. I can then easily extract all four pieces of information we need:
$ grep Width /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
3000
$ grep Height /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
2400
$ grep "Absolute upper-left X" /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
309
$ grep "Absolute upper-left Y" /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
104
grep obtains the single line we're interested in, cut selects the text after the colon (":"), and tr removes the leading characters that cut leaves behind.

Rather than continually copy those numbers around by hand, I'll put them into variables so we can easily reference them in the rest of the post:

width=`grep Width /tmp/t | cut -d ":" -f 2 | tr -d " "`
height=`grep Height /tmp/t | cut -d ":" -f 2 | tr -d " "`
top_x=`grep "Absolute upper-left X" /tmp/t | cut -d ":" -f 2 | tr -d " "`
top_y=`grep "Absolute upper-left Y" /tmp/t | cut -d ":" -f 2 | tr -d " "`

Starting and stopping FFmpeg

Now that we know what portion of the screen we want to capture, I can start FFmpeg recording:
ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  output.nut
I always record to a lossless format, specifically the NUT format, which is best thought of as FFmpeg's "native" format — since I've been using that, I've had none of the problems I encountered with other formats. I do as little processing as I can get away when recording (-preset ultrafast does minimal compression, otherwise the resulting files can become unwieldy), because it's easy to do heavy processing as a subsequent step. How many frames per second one prefers is a personal choice, though in the above example I've gone for a neutral 30 [2].

When I'm finished recording, pressing "q" in the terminal running FFmpeg causes it to stop recording. However, this is a bit annoying, because I have to change keyboard focus from the window I'm recording to the one running FFmpeg. If those two windows overlap (bearing in mind that FFmpeg is capturing a fixed portion of the screen), the result of FFmpeg's voluminous terminal output suddenly heaving into view is discombobulating:

What I really want to do is run FFmpeg in the background and stop it when I press a keyboard shortcut. This is a perfect use-case for hk, which waits until a specific key combination has pressed and then executes an arbitrary shell command:

ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  output.nut &
ffmpeg_pid=$!
hk Shift+F4 kill -SIGINT $ffmpeg_pid
wait $ffmpeg_pid
I first run FFmpeg in the background (with "&"). I then tell hk to wait for Shift+F4 to be pressed, at which point hk will run kill, which sends SIGINT to the FFmpeg process (whose PID is stored in the $! shell variable). When FFmpeg receives the SIGINT signal it will stop recording and finish writing to output.nut.

Although we don't need this feature quite yet, if I want to execute commands on output.nut, there is no guarantee that ffmpeg will have finished writing to it when hk terminates — indeed, it's likely that hk will terminate first. It's thus safest to wait [3] until the FFmpeg process has definitely terminated.

Post processing

At this point, I have a .nut file which has captured exactly the portion of the screen we want. However, many people are scared by .nut files, and because I did only minimal compression, the .nut file can be alarmingly large.

I thus want to convert the .nut file to a more widely recognised .mp4 file, and heavily compress the video while doing so:

ffmpeg \
  -i output.nut \
  -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -preset veryslow \
  -crf 10 \
  output.mp4
I've chosen FFmpeg's highest compression level (-preset veryslow). Since I'm also willing to sacrifice a little bit of visual quality in order to get better compression, I've specified -crf. Roughly speaking, -crf 0 means "lossless" and every increase of that number by 3 doubles the lossiness. -crf 10 is below the threshold where I can see any real differences in the video. pix_fmt is a slightly irritating detail that I don't want to go into: suffice it to say that some programs (e.g. Firefox and some versions of VLC) won't play the resulting mp4 if I don't use this option. However, using -pix_fmt in this way means we need a video with dimensions divisible by two, hence the pad video filter (-vf). Unfortunately, video encoding is full of these sorts of bizarre details.

On a few tests, the command above leads to a .mp4 file that's at least 5x smaller, and often 10x or more smaller, than the .nut input.

If you actually try recording yourself typing, you'll probably find that, like me, you type much slower than you expected. In my case, the original videos I recorded would be frustratingly slow for other people, so the videos you've seen above have been sped up by 2x. Compare the original recording: with the 2x-faster version: Speeding the video up simply requires the FFmpeg's setpts filter:

ffmpeg \
  -i output.nut \
  -vf "\
    setpts=0.5*PTS[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -preset veryslow \
  -crf 10 \
  output.mp4
What happens if I want to upload the video to a site where videos can't exceed a certain width or height? I don't want to scale the video up if it's smaller than that and I always want to maintain the aspect ratio. Let's say I want a maximum height of 1920 and a maximum width of 1200:
ffmpeg \
  -i output.nut \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]scale='min(1920,iw)':min'(1200,ih)': \
       force_original_aspect_ratio=decrease[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2 " \
  -preset veryslow \
  -crf 12 \
  -c:v libx264 \
  -pix_fmt yuv420p \
  output_capped.mp4
That means that I can record a huge window (e.g. a browser window at 3366x2468) and automatically scale it down to the largest possible resolution that preserves the aspect ratio (in this case to 1638x1200):

An example combination

One can put all the bits above together in various different ways, but at the moment I've bundled them into this simple script:
#! /bin/sh

if [[ $# -ne 1 ]]; then
    echo "video_single_window <base_file_name>" > /dev/stderr
    exit 1
fi

t=`mktemp`
xwininfo > $t
width=`grep Width $t | cut -d ":" -f 2 | tr -d " "`
height=`grep Height $t | cut -d ":" -f 2 | tr -d " "`
top_x=`grep "Absolute upper-left X" $t | cut -d ":" -f 2 | tr -d " "`
top_y=`grep "Absolute upper-left Y" $t | cut -d ":" -f 2 | tr -d " "`

ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  $1.nut &
ffmpeg_pid=$!
hk Shift+F4 kill -SIGINT $ffmpeg_pid
wait $ffmpeg_pid

ffmpeg \
  -i $1.nut \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -preset veryslow \
  -crf 10 \
  $1.mp4

ffmpeg \
  -i $1.nut \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]scale='min(1920,iw)':min'(1200,ih)': \
       force_original_aspect_ratio=decrease[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -preset veryslow \
  -crf 10 \
  -c:v libx264 \
  -pix_fmt yuv420p \
  $1_scaled.mp4

rm $t
In essence after running video_single_window base I end up with three files: base.nut (the "raw" recording), base.mp4 (a compressed MP4), and base_scaled.mp4 (a compressed and possibly scaled-down MP4).

One of the best bits about the simplicity of this script is how easy it is to create variants. For example, I created a variant with a different hotkey, which allowed me to record video_single_window base running (at normal speed, so you can see FFmpeg at its real speed; the recording FFmpeg is killed at about 16 seconds in): I've long been impressed by FFmpeg's versatility, but co-opting xwininfo and hk leads to a surprisingly powerful toolbox!

Acknowledgements: thanks to Edd Barrett and Lukas Diekmann for comments.

If you’d like updates on new blog posts, follow me on Twitter,
or subscribe to email updates:

Footnotes

[1] I could perhaps have ported ssr, but it was too much fun to do things myself!
[2] For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.
[3] Until I'd fully understood how zombie processes come to be zombies (with much debugging in extsmail), I did not realise how unsatisfactory, and potentially unsafe, the Unix PID system and wait are. That said, I've never seen "PID reuse" problems manifest in real life, even though they clearly could. Since there's no alternative, carrying on is on the only possibility!
I could perhaps have ported ssr, but it was too much fun to do things myself!
For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.
Until I'd fully understood how zombie processes come to be zombies (with much debugging in extsmail), I did not realise how unsatisfactory, and potentially unsafe, the Unix PID system and wait are. That said, I've never seen "PID reuse" problems manifest in real life, even though they clearly could. Since there's no alternative, carrying on is on the only possibility!

Two researcher jobs in soft-dev

August 8 2022

The soft-dev research team is growing and there are two open jobs: Both jobs are looking at the security side of systems in the context of CHERI. Roughly speaking, the CapableVMs position is looking to secure programming language virtual machines, and the Chrompartments position is looking to secure web browsers.

Come and join our happy band of researchers! We're open-minded about who the right sort of people might be for either job: you might, for example, be a researcher who wants to work on your programming chops; or a programmer who wants to work on your researcher chops. Most importantly, you need to be enthusiastic about software, partly because the rest of us are, but mostly because, with that, you can learn nearly everything else you need on the job. You do need to be eligible to work in the UK, though we are flexible about where you work within the UK.

If you have questions about either job, please send me an email!

If you’d like updates on new blog posts, follow me on Twitter,
or subscribe to email updates:

Another Reorganisation

August 1 2022

To every idea there is a season, though some ideas seem to have purpose only under faulty assumptions. In April I decided to rethink how I went about my "informal" writing, which had previously been highly intermittent, rather formal, and interminably long. In When is a Blog a Blog? I renamed my old blog to "essays" and aimed for more frequent, less formal, shorter updates in this blog.

After 4 months it's time to evaluate how this split is working. I have managed rather more frequent updates, in a less formal style, and they are shorter in length. But – and this is perhaps unsurprising for someone who a school report once pinpointed as prone to "prolixity" [1] – I'd hardly call many of my recent posts short in an absolute sense. Now I've met the new boss, I realise that he acts rather similarly to the old boss.

I have thus been forced to acknowledge that it makes little sense to divide my writing into separate "blogs" and "essays". There is more that unites my writing, for better and worse, than divides it. I have therefore merged all my old "essays" back into my blog. Suddenly my blog archive has grown from 18 entries to 74. At some point I intend creating a separate list of what I consider the posts which are most likely to be of longer-term interest, because there is now a definite divide between more substantial and ephemeral posts.

With luck this merge has been done in a way that rewrites all old links to the right content. I automated parts of the migration (including generating most of the rewrite rules), because splitting old posts up by year was a tedious job, but I had to do some parts by hand. Even with my long-standing rewrite test suite (which checks that URLs are rewritten to the correct target) I'm bound to have made multiple mistakes. If you spot links that don't work as expected, please send me an email, and I'll fix things as soon as I can.

And I promise that I will make no further reorganisations — until the next one.

If you’d like updates on new blog posts, follow me on Twitter,
or subscribe to email updates:

Footnotes

[1] I remember having to ask what it meant.
I remember having to ask what it meant.

July Links

July 31 2022

  • On Turing machines I found this overview from of Turing's (and other's) early work on the theory of computers / software enlightening — even though I knew in advance that I wouldn't know all of the details, I was surprised by how little I actually knew!
  • Self-hosting a static site with OpenBSD, httpd, and relayd I've been meaning to hide several of my server processes behind relayd for ages. This blog post gave me the kick up the behind I needed to do so!
  • RETBLEED: Arbitrary Speculative Code Execution with Return Instructions Another Spectre mitigation down. "we invalidate some of the key assumptions behind retpoline... RETBLEED leaks privileged memory at the rate of 219 bytes/s on Intel Coffee Lake and 3.9 kB/s on AMD Zen 2."
  • Mechanical watch, a stunning animation and description of a mechanical watch — for the first time I (roughly) understand how they work!
  • Twenty years of Valgrind When I first came across valgrind I thought "surely that can't work with any real software?" Yet it did, and I'm still using it regularly 20 years later!
If you’d like updates on new blog posts, follow me on Twitter,
or subscribe to email updates:

What's the Most Portable Way to Include Binary Blobs in an Executable?

July 25 2022

I recently needed to include an arbitrary blob of data in an executable, in a manner that's easily ported across platforms. I soon discovered that there are various solutions to including blobs, but finding out what the trade-offs are has been a case of trial and error [1]. In this post I'm going to try and document the portability (or lack thereof...) of the solutions I've tried, give a rough idea of performance, and then explain why I'll probably use a combination of several solutions in the future.

Outlining the problem

Let's assume that I want to embed an arbitrary null-terminated string into an ELF executable and that I'm OK with that string having the fixed symbol name string_blob. My C program may then look as simple as:

#include <stdio.h>
extern char string_blob[];
int main() {
    printf("%s\n", string_blob);
    return 0;
}
Let's compile my C program ex1.c into an object file (i.e. a '.o' file):
$ cc -c -o ex1.o ex1.c
Here's my actual blob of data:
$ echo -n "blobby blobby blobby\0" > string_blob.txt
What do I do now? Well, I need to produce a second object file that contains my blob of data. On both my OpenBSD and Linux amd64 machines I can use objcopy to convert a blob into an object file:
$ objcopy -I binary -O elf64-x86-64 -B i386:x86-64 \
    string_blob.txt string_blob.o
Then I can link the two files together and run them:
$ cc -o ex1 ex1.o string_blob.o
ld: error: undefined symbol: string_blob
>>> referenced by ex1.c
>>>               ex1.o:(main)
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Perhaps unsurprisingly this has failed, as objcopy hasn't created a symbol called string_blob. Let's see what symbols string_blob.o actually defines:
$ readelf -Ws string_blob.o

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_string_blob_txt_start
     3: 0000000000000015     0 NOTYPE  GLOBAL DEFAULT    1 _binary_string_blob_txt_end
     4: 0000000000000015     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_string_blob_txt_size
It turns out that objcopy creates three symbol names, with the one I care about being _binary_escaped_file_name_start [2]. Let's rewrite my C program to use the correct symbol name:
#include <stdio.h>
extern char _binary_string_blob_txt_start[];
int main() {
    printf("%s\n", _binary_string_blob_txt_start);
    return 0;
}
I'll call that version ex2.c and try again:
$ cc -c -o ex2.o ex2.c
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
Success!

Trying to solve objcopy's ugliness

There are at least two pieces of ugliness in my solution above. Let's look again at the objcopy command-line:
$ objcopy -I binary -O elf64-x86-64 -B i386:x86-64 \
    string_blob.txt string_blob.o
Where did the values elf64-x86-64 [3] and i386:x86-64 come from? Well, for me, they came from the tooth fairy, also known in the land of programming as StackOverflow. As happy as I am to lean heavily on search engines to peer into StackOverflow, that's not going to help me work out the right values for platforms I don't know about: what happens if someone tries to run my example on, say, an Arm box?

Is there a portable way to automatically determine the right values? On Linux, with GNU's ld I can easily get the right value for -O with:

$ ld --print-output-format
elf64-x86-64
but getting the -B value is a bit fiddlier [4]:
$ ld --verbose | grep OUTPUT_ARCH \
    | sed -E "s/OUTPUT_ARCH.(.*)./\\1/g"
i386:x86-64
Unfortunately on OpenBSD, which uses LLVM's lld linker:
$ ld --print-output-format
ld: error: unknown argument '--print-output-format'
$ ld --verbose
ld: error: no input files
GNU ld and lld aren't the only linkers I tend to encounter. gold (another GNU linker, different than the "classic" BFD-based ld) and mold (a new performance-focussed linker, broadly in the spirit of lld) are a mixed bag [5]:
$ gold --print-output-format
elf64-x86-64
$ gold --verbose
gold: fatal error: no input files
$ mold --print-output-format
mold: fatal: unknown command line option: --print-output-format
$ mold --verbose
mold: fatal: option -m: argument missing
In short, there doesn't seem to be a portable way of discovering the right values to pass to objcopy. But it's OK, because newer versions of GNU objcopy will create object files in the way I want simply with:
$ objcopy --version | head -n 1
GNU objcopy (GNU Binutils for Debian) 2.35.2
$ objcopy -I binary -O default string_blob.txt string_blob.o
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
Unfortunately OpenBSD's much older version of objcopy creates object files which don't seem usable:
$ objcopy --version | head -n 1
GNU objcopy 2.17
$ objcopy -I binary -O default string_blob.txt \
    string_blob.o
$ cc -o ex2 ex2.o string_blob.o
ld: error: string_blob.o is incompatible with /usr/lib/crt0.o
cc: error: linker command failed with exit code 1 (use -v to see invocation)
The key difference seems to be that on the object file produced by newer GNU objcopy the object file has a sensible Machine value:
$ objcopy -I binary -O default string_blob.txt \
    string_blob.o
$ readelf -h string_blob.o|grep Machine
  Machine:                           Advanced Micro Devices X86-64
whereas OpenBSD's older GNU objcopy produces an object file with no Machine at all:
$ objcopy -I binary -O default string_blob.txt \
    string_blob.o
$ readelf -h string_blob.o|grep Machine
  Machine:                           None
On OpenBSD I have to specify -B to fix this:
$ objcopy -I binary -O default -B i386:x86-64 string_blob.txt string_blob.o
$ readelf -h string_blob.o|grep Machine
  Machine:                           Advanced Micro Devices X86-64
which is unfortunate as it's not obvious to me, at least, how to interrogate the compiler toolchain to find out what the right value to pass to -B might be.

But there is an alternative! LLVM has a completely different, but mostly compatible, objcopy called llvm-objcopy and most boxes I have access to have a copy. It certainly works fine if I give it complete values for -O and -B:

$ llvm-objcopy -I binary -O elf64-x86-64 -B i386:x86-64 \
    string_blob.txt string_blob.o
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
It's a promising start, but -O doesn't support default as a value:
$ llvm-objcopy -I binary -O default -B i386:x86-64 string_blob.txt string_blob.o
llvm-objcopy: error: invalid output format: 'default'
However, and unlike GNU objcopy, I can leave -B out:
$ llvm-objcopy -I binary -O elf64-x86-64 \
    string_blob.txt string_blob.o
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
So, in conclusion it seems that the situation with the various versions of objcopy is:
  1. I have to assume some operating systems have quite an old version of GNU objcopy.
  2. For old versions of GNU objcopy I have to specify -B.
  3. For llvm-objcopy I have to specify -O.
  4. There is no portable way of automatically determining the right values for -O or -B.

Put another way: objcopy doesn't seem to satisfy my initial constraints when it comes to portability.

Using ld

On Debian, which uses the GNU linker, I can use the linker to perform the same task as objcopy without specifying any tricky arguments:
$ ld --version | head -n 1
GNU ld (GNU Binutils for Debian) 2.35.2
$ ld -r -o string_blob.o -b binary string_blob.txt
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
However lld isn't as forgiving:
$ ld --version
LLD 13.0.0 (compatible with GNU linkers)
$ ld -r -o string_blob.o -b binary string_blob.txt
ld: error: target emulation unknown: -m or at least one .o file required
I have to pass a similar value to objcopy's -O parameter (but note that, for reasons unknown to me, hyphens have now become underscores) for lld to work:
$ ld --version
$ ld -r -m elf_x86_64 -o string_blob.o \
    -b binary string_blob.txt
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
OpenBSD does include GNU's ld (though called ld.bfd because it uses GNU's BFD library) which is of a similar vintage to its version of objcopy but, surprisingly, it's less pernickety than its version of objcopy:
$ ld.bfd --version | head -n 1
GNU ld version 2.17
$ ld.bfd -r -o string_blob.o -b binary \
    string_blob.txt
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
gold works as well as GNU ld:
$ gold -r -o string_blob.o -b binary \
    string_blob.txt
$ cc -o ex2 ex2.o string_blob.o
$ ./ex2
blobby blobby blobby
but mold points me back to objcopy which, as we know from above, isn't viable:
$ mold -r -o out.o -b binary LICENSE
mold: fatal: mold does not support `-b binary`. If you want to convert a binary
file into an object file, use `objcopy -I binary -O default 
` instead.
At least for Linux and OpenBSD the situation with linkers is thus [6]:
  1. GNU ld and gold work fine.
  2. lld only works with similar limitations to llvm-objcopy.
  3. mold doesn't work at all.

At least at the moment, it seems that I can reasonably expect to find a copy of GNU ld on many systems (which is good) but it might not be the linker the user wants me to use (which is bad). I'm also reasonably sure that some systems (e.g. OS X?) only have lld. Furthermore, because it's so much faster than any other linker I've tried, it seems possible that some systems will make mold their default linker in the future. In summary, I don't really think I can rely on using a linker for my task.

Assembler tricks

Many assemblers support the GNU directive .incbin directive which allows us to embed an arbitrary binary blob. Given the following assembly file (which I'll call string_blob.S):
    .global string_blob
string_blob:
    .incbin "string_blob.txt"
and a C file ex3.c using the (under my control!) symbol name string_blob:
#include <stdio.h>
extern char string_blob[];
int main() {
    printf("%s\n", string_blob);
    return 0;
}
everything works nicely on Linux and OpenBSD:
$ cc -c -o blob.o string_blob.S
$ cc -c -o ex3.o ex3.c
$ cc -o ex3 ex3.o string_blob.o
$ ./ex3
blobby blobby blobby
It looks like I have a winner! However, .incbin isn't supported by all assemblers; some call it incbin (without the leading '.'); some don't seem to have it at all. The incbin C library does an excellent job of hiding away most of these portability horrors (at least until one comes to MSVC, at which point there's more work involved). Unfortunately it doesn't seem to be available as a package on (at least) Debian or OpenBSD and, since there is no widely agreed upon package manager for C, that means slurping its source code into your repository, which you may or may not be keen on doing.

Preprocessing

The "obvious" way of including binary blobs is to convert them into C source code and compile that into an object file. One possibility is to use xxd, which generates exactly the sort of C source code I want:
$ xxd -i string_blob.txt
unsigned char string_blob_txt[] = {
  0x62, 0x6c, 0x6f, 0x62, 0x62, 0x79, 0x20, 0x62, 0x6c, 0x6f, 0x62, 0x62,
  0x79, 0x20, 0x62, 0x6c, 0x6f, 0x62, 0x62, 0x79, 0x00
};
unsigned int string_blob_txt_len = 21;
However, perhaps surprisingly, xxd is part of Vim (but not Neovim?), which is a rather heavyweight (and odd) dependency to require just to include a binary blob. I've also seen references to various other programs which supposedly do the same job, but they don't seem widely available as OS-level packages.

Fortunately, we can make use of the venerable and widely available hexdump tool. Although little used these days, its -e parameter allows us to format its output in a manner of our choosing. It's not difficult to get it to produce output that looks very similar to the core produced by xxd:

$ hexdump -v -e '"0x" 1/1 "%02X" ", "' string_blob.txt
0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x20, 0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x20, 0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x00
It's then trivial to use echo to make this into a valid C source file:
$ echo "unsigned char string_blob[] = {" \
    > string_blob.c
$ hexdump -v -e '"0x" 1/1 "%02X" ", "' \
    string_blob.txt >> string_blob.c
$ echo "\n};" >> string_blob.c
which produces this:
unsigned char string_blob[] = {
0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x20, 0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x20, 0x62, 0x6C, 0x6F, 0x62, 0x62, 0x79, 0x00, 
};
which I can then compile:
$ cc -c -o string_blob.o string_blob.c
$ cc -o ex3 ex3.o string_blob.o
$ ./ex3
blobby blobby blobby

However, this route is not fast, and for large binary blobs, especially if they frequently change, it would be a definite bottleneck. As a quick test, I took an 82MiB input file on my desktop machine: hexdump took about 15 seconds to produce the C output; and clang took about 90 seconds, and used just under 10GiB of RAM at its peak, to produce an object file. That's 3 orders of magnitude longer than the 0.2 seconds it took objcopy and the 0.8 seconds it took using incbin in assembler!

A variant of this approach is to use hexdump to produce output suitable for an assembler:

$ echo ".global string_blob\nstring_blob:" \
    > string_blob.S
$ hexdump -v -e '".byte 0x" 1/1 "%02X" "\n"' \
    string_blob.txt >> string_blob.S
$ as -o string_blob.o string_blob.S
$ cc -c -o ex3 ex3.o string_blob.o
$ ./ex3
blobby blobby blobby
The good news is that while hexdump still takes about 15 seconds to convert my huge file, GNU as takes only 17 seconds and uses a peak of just under 100MiB RAM. That's a lot better than when using clang! However, I'm not sure whether there are any modern assemblers that support .byte that don't support .incbin. In other words, if I've become desperate enough to use hexdump, I suspect it's because I've found myself in a situation where the assembler is expecting a syntax I don't know about.

Summary

There are almost certainly other ways of achieving what I want [7] and in the future it looks like we might finally have compilers which can include blobs directive with a #embed directive. However, realistically it will take many years before I can rely on every compiler I come across supporting this. In the interim, I feel like the options I've outlined above give us a reasonable spread of options. What would I actually use in practice? Well, if I'm only dealing with small binary blobs, I'd probably use the hexdump route because it works easily on every platform I have access to [8].

If performance was an issue, I would be forced to interrogate the system to see if one of the faster routes worked, gradually falling back on slower routes otherwise. For example, in a configure script I would, in order:

  1. test whether objcopy -I binary -O default text.txt out.o (where text.txt is a small file whose contents are irrelevant) produces an object file which can be linked to produce an executable.
  2. test whether the assembler works with .incbin or incbin.
  3. otherwise use the hexdump-into-C route.
At least on Linux and OpenBSD (and, I suspect, on most other modern Unices including OS X) one of the first two routes (which are roughly equivalently fast) would succeed. But if they didn't, I'd be fairly confident that hexdump would either be available or easily installed by the user. As a pleasant bonus the hexdump route will work equally well on non-ELF platforms (though I couldn't be entirely sure that all compilers would cope with huge binary blobs).

Update (2022-07-25): David Chisnall points out that (at least) clang can process blobs-in-C-source-code faster if they're embedded as a string (but watch out for the null byte at the end!).

Acknowledgements: thanks to Edd Barrett, Stephen Kell, and Davin McCall for comments.

If you’d like updates on new blog posts, follow me on Twitter,
or subscribe to email updates:

Footnotes

[1] After I'd put most of the post together, I discovered that C23 will include an #embed directive. In the long term that will probably end up being the easiest way of achieving what I want — but it will take quite a while before I can rely on compilers on random boxes supporting it.
[2] As far as I know, GNU objcopy doesn't specify what the file name escaping rules are though llvm-objcopy says that "non-alphanumeric characters [are] converted to _".

One can also use objcopy to rename symbols using objcopy --redefine-sym "_binary_string_blob_txt_start=string_blob" string_blob.o. Although objcopy has the -w switch to allow wildcards to be specified, none of the 3 versions of objcopy I'm using in this post supports that syntax with --redefine-sym, so you have to work out the "full" name yourself.

[3] In the GNU toolchain this is the BFDName. You can see a list of those supported by GNU objcopy with --info, though without any indication of what the "native" BFDName is. llvm-objcopy does not support --info.
[4] Amusingly if I take the same approach to get the value of -O, I find that ld --verbose likes elf64-x86-64 so much that it specifies it thrice:
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
              "elf64-x86-64")
[5] Stephen Kell points out that, since linker scripts are optional in gold and mold, there is no concept of a default script.
[6] After I wrote this I stumbled across this mind-boggling example of how hard it is to deal with GNU ld, as well as OS X's and mingw's approach (though, if you want to try it out, I think the command-line given for ld is missing -b binary immediately after the -r).
[7] For example, I have only lightly looked into linker scripts because they don't seem to promise greater portability than other routes. Based on a pointer from some brave souls, the best I managed was:
TARGET(binary)
OUTPUT_FORMAT("elf32-i386")
OUTPUT(string_blob.o)
INPUT (string_blob.txt)
with GNU ld, which works, but isn't an improvement over what I could manage via the command line.

In other situations I have used Rust's include_bytes macro, but as rapidly as Rust is growing, I still wouldn't expect to find rustc available on a random box.

[8] I don't have a Windows box to test on, but I presume its newish Unix subsystem includes hexdump, or it's easily available as a package. I also assume (but don't know) that the objcopy routes aren't available on Windows.

OS X does, I believe, include hexdump but OS X is not an ELF platform (it uses the Mach-O format). I assume that the standard developer packages include llvm-objcopy (but I would not expect to find GNU binutils installed on most boxes).

After I'd put most of the post together, I discovered that C23 will include an #embed directive. In the long term that will probably end up being the easiest way of achieving what I want — but it will take quite a while before I can rely on compilers on random boxes supporting it.
As far as I know, GNU objcopy doesn't specify what the file name escaping rules are though llvm-objcopy says that "non-alphanumeric characters [are] converted to _".

One can also use objcopy to rename symbols using objcopy --redefine-sym "_binary_string_blob_txt_start=string_blob" string_blob.o. Although objcopy has the -w switch to allow wildcards to be specified, none of the 3 versions of objcopy I'm using in this post supports that syntax with --redefine-sym, so you have to work out the "full" name yourself.

In the GNU toolchain this is the BFDName. You can see a list of those supported by GNU objcopy with --info, though without any indication of what the "native" BFDName is. llvm-objcopy does not support --info.
Amusingly if I take the same approach to get the value of -O, I find that ld --verbose likes elf64-x86-64 so much that it specifies it thrice:
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
              "elf64-x86-64")
Stephen Kell points out that, since linker scripts are optional in gold and mold, there is no concept of a default script.
After I wrote this I stumbled across this mind-boggling example of how hard it is to deal with GNU ld, as well as OS X's and mingw's approach (though, if you want to try it out, I think the command-line given for ld is missing -b binary immediately after the -r).
For example, I have only lightly looked into linker scripts because they don't seem to promise greater portability than other routes. Based on a pointer from some brave souls, the best I managed was:
TARGET(binary)
OUTPUT_FORMAT("elf32-i386")
OUTPUT(string_blob.o)
INPUT (string_blob.txt)
with GNU ld, which works, but isn't an improvement over what I could manage via the command line.

In other situations I have used Rust's include_bytes macro, but as rapidly as Rust is growing, I still wouldn't expect to find rustc available on a random box.

I don't have a Windows box to test on, but I presume its newish Unix subsystem includes hexdump, or it's easily available as a package. I also assume (but don't know) that the objcopy routes aren't available on Windows.

OS X does, I believe, include hexdump but OS X is not an ELF platform (it uses the Mach-O format). I assume that the standard developer packages include llvm-objcopy (but I would not expect to find GNU binutils installed on most boxes).

Blog archive
Home e-mail: laurie@tratt.net   twitter: laurencetratt twitter: laurencetratt