I recently wanted to send someone a video of a program doing some interesting things in a single X11 window. Recording the whole desktop is easy (some readers may remember my post on Aeschylus which does just that) but it will include irrelevant (and possibly unwanted) parts of the screen, leading to unnecessarily large files. I couldn’t immediately find a tool which did what I wanted on OpenBSD [1] but through a combination of xwininfo, FFmpeg, and hk I was able to put together exactly what I needed in short order. Even better, I was able to easily post-process the video to shrink its file size, speed it up, and contort it to the dimension requirements of various platforms. Here’s a video straight out of the little script I put together:
In this post I’m going to quickly go through what I did. I hope you get two things from this. First, FFmpeg, hk, and xwininfo are great examples of tools whose power can be magnified by combining them with other tools. Second, I’m not presenting an end product as much as I am presenting example recipes which you can alter and adjust to fit your circumstances.
Selecting the right portion of the screen
The first problem is automatically determining what portion of the screen I want to capture. Fortunately, xwininfo does exactly what I want: when I run it, the cursor changes to a “+”; and, when I click on a window, it outputs information to stdout about that window. Here’s what happens when I click on the editor window I’m typing this post into:
$ xwininfo xwininfo: Please select the window about which you would like information by clicking the mouse in that window. xwininfo: Window id: 0x2800006 "making_a_video_of_a_single_window.incomplete + (~/web/tratt.net/templates/laurie/blog/2022) - NVIM --- Neovim" Absolute upper-left X: 309 Absolute upper-left Y: 104 Relative upper-left X: 5 Relative upper-left Y: 29 Width: 3000 Height: 2400 Depth: 24 Visual: 0x57c Visual Class: TrueColor Border width: 0 Class: InputOutput Colormap: 0x2800005 (installed) Bit Gravity State: NorthWestGravity Window Gravity State: NorthWestGravity Backing Store State: NotUseful Save Under State: no Map State: IsViewable Override Redirect State: no Corners: +309+104 -531+104 -531-56 +309-56 -geometry 1500x1200+309-56
What I need to extract are:
- the window’s width and height,
- and the coordinates of the window’s top-left corner (where
0,0
is the top-left of the screen).
Let’s assume I’ve put xwininfo’s output into /tmp/t
. I can then
easily extract all four pieces of information we need:
$ grep Width /tmp/t \ | cut -d ":" -f 2 \ | tr -d " " 3000 $ grep Height /tmp/t \ | cut -d ":" -f 2 \ | tr -d " " 2400 $ grep "Absolute upper-left X" /tmp/t \ | cut -d ":" -f 2 \ | tr -d " " 309 $ grep "Absolute upper-left Y" /tmp/t \ | cut -d ":" -f 2 \ | tr -d " " 104
grep
obtains the single line we’re interested in, cut
selects the text after the colon (“:”), and tr
removes the
leading characters that cut
leaves behind.
Rather than continually copy those numbers around by hand, I’ll put them into variables so we can easily reference them in the rest of the post:
width=`grep Width /tmp/t | cut -d ":" -f 2 | tr -d " "` height=`grep Height /tmp/t | cut -d ":" -f 2 | tr -d " "` top_x=`grep "Absolute upper-left X" /tmp/t | cut -d ":" -f 2 | tr -d " "` top_y=`grep "Absolute upper-left Y" /tmp/t | cut -d ":" -f 2 | tr -d " "`
Starting and stopping FFmpeg
Now that we know what portion of the screen we want to capture, I can start FFmpeg recording:
ffmpeg \ -f x11grab \ -framerate 30 \ -video_size ${width}x${height} \ -i +${top_x},${top_y} \ -c:v libx264rgb \ -preset ultrafast \ -crf 0 \ output.nut
I always record to a lossless format, specifically the NUT format, which is
best thought of as FFmpeg’s “native” format — since I’ve been using that,
I’ve had none of the problems I encountered with other formats. I do as little processing
as I can get away when recording (-preset ultrafast
does minimal
compression, otherwise the resulting files can become unwieldy), because it’s
easy to do heavy processing as a subsequent step. How many frames per
second one prefers is a personal choice, though in the above example I’ve
gone for a neutral 30 [2].
When I’m finished recording, pressing “q” in the terminal running FFmpeg causes it to stop recording. However, this is a bit annoying, because I have to change keyboard focus from the window I’m recording to the one running FFmpeg. If those two windows overlap (bearing in mind that FFmpeg is capturing a fixed portion of the screen), the result of FFmpeg’s voluminous terminal output suddenly heaving into view is discombobulating:
What I really want to do is run FFmpeg in the background and stop it when I press a keyboard shortcut. This is a perfect use-case for hk, which waits until a specific key combination has pressed and then executes an arbitrary shell command:
ffmpeg \ -f x11grab \ -framerate 30 \ -video_size ${width}x${height} \ -i +${top_x},${top_y} \ -c:v libx264rgb \ -preset ultrafast \ -crf 0 \ output.nut & ffmpeg_pid=$! hk Shift+F4 kill -SIGINT $ffmpeg_pid wait $ffmpeg_pid
I first run FFmpeg in the background (with “&”). I
then tell hk to wait for Shift+F4
to be pressed, at which
point hk will run kill
, which sends SIGINT
to the FFmpeg process (whose PID
is stored in the $!
shell variable). When FFmpeg receives
the SIGINT
signal it will stop recording and finish writing
to output.nut
.
Although we don’t need this feature quite yet, if I want to execute commands on
output.nut
, there is no guarantee that ffmpeg will have
finished writing to it when hk terminates — indeed, it’s likely that hk
will terminate first. It’s thus safest to wait
[3]
until the FFmpeg process has definitely terminated.
Post processing
At this point, I have a .nut
file which has captured
exactly the portion of the screen we want. However, many people are scared by
.nut
files, and because I did only minimal compression,
the .nut
file can be alarmingly large.
I thus want to convert the .nut
file to a more widely recognised
.mp4
file, and heavily compress the video while doing so:
ffmpeg \ -i output.nut \ -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" \ -c:v libx264 \ -pix_fmt yuv420p \ -preset veryslow \ -crf 10 \ output.mp4
I’ve chosen FFmpeg’s highest compression level (-preset veryslow
).
Since I’m also willing to sacrifice a little bit of visual quality in order to get better
compression, I’ve specified -crf
. Roughly speaking, -crf 0
means “lossless” and every
increase of that number by 3 doubles the lossiness. -crf 10
is below the
threshold where I can see any real differences in the video.
pix_fmt
is a slightly irritating detail that I don’t want to go
into: suffice it to say that some programs (e.g. Firefox and some versions of
VLC) won’t play the resulting mp4 if I don’t use this option. However, using
-pix_fmt
in this way means we need a video with dimensions
divisible by two, hence the pad
video filter (-vf
).
Unfortunately, video encoding is full of these sorts of bizarre details.
On a few tests, the command above leads to a .mp4
file that’s
at least 5x smaller, and often 10x or more smaller, than the .nut
input.
If you actually try recording yourself typing, you’ll probably find that, like me, you type much slower than you expected. In my case, the original videos I recorded would be frustratingly slow for other people, so the videos you’ve seen above have been sped up by 2x. Compare the original recording:
with the 2x-faster version:
Speeding the video up simply requires the FFmpeg’s setpts
filter:
ffmpeg \ -i output.nut \ -vf "\ setpts=0.5*PTS[v]; \ [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \ -c:v libx264 \ -pix_fmt yuv420p \ -preset veryslow \ -crf 10 \ output.mp4
What happens if I want to upload the video to a site where videos can’t exceed a certain width or height? I don’t want to scale the video up if it’s smaller than that and I always want to maintain the aspect ratio. Let’s say I want a maximum height of 1920 and a maximum width of 1200:
ffmpeg \ -i output.nut \ -vf " \ setpts=0.5*PTS[v]; \ [v]scale='min(1920,iw)':min'(1200,ih)': \ force_original_aspect_ratio=decrease[v]; \ [v]pad=ceil(iw/2)*2:ceil(ih/2)*2 " \ -preset veryslow \ -crf 12 \ -c:v libx264 \ -pix_fmt yuv420p \ output_capped.mp4
That means that I can record a huge window (e.g. a browser window at 3366x2468) and automatically scale it down to the largest possible resolution that preserves the aspect ratio (in this case to 1638x1200):
An example combination
One can put all the bits above together in various different ways, but at the moment I’ve bundled them into this simple script:
#! /bin/sh if [[ $# -ne 1 ]]; then echo "video_single_window <base_file_name>" > /dev/stderr exit 1 fi t=`mktemp` xwininfo > $t width=`grep Width $t | cut -d ":" -f 2 | tr -d " "` height=`grep Height $t | cut -d ":" -f 2 | tr -d " "` top_x=`grep "Absolute upper-left X" $t | cut -d ":" -f 2 | tr -d " "` top_y=`grep "Absolute upper-left Y" $t | cut -d ":" -f 2 | tr -d " "` ffmpeg \ -f x11grab \ -framerate 30 \ -video_size ${width}x${height} \ -i +${top_x},${top_y} \ -c:v libx264rgb \ -preset ultrafast \ -crf 0 \ $1.nut & ffmpeg_pid=$! hk Shift+F4 kill -SIGINT $ffmpeg_pid wait $ffmpeg_pid ffmpeg \ -i $1.nut \ -c:v libx264 \ -pix_fmt yuv420p \ -vf " \ setpts=0.5*PTS[v]; \ [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \ -preset veryslow \ -crf 10 \ $1.mp4 ffmpeg \ -i $1.nut \ -vf " \ setpts=0.5*PTS[v]; \ [v]scale='min(1920,iw)':min'(1200,ih)': \ force_original_aspect_ratio=decrease[v]; \ [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \ -preset veryslow \ -crf 10 \ -c:v libx264 \ -pix_fmt yuv420p \ $1_scaled.mp4 rm $t
In essence after running video_single_window base
I end up with
three files: base.nut
(the “raw” recording), base.mp4
(a compressed MP4), and base_scaled.mp4
(a compressed and possibly
scaled-down MP4).
One of the best bits about the simplicity of this script is
how easy it is to create variants. For example, I created a variant with
a different hotkey, which allowed me to record video_single_window base
running (at normal speed, so you can see FFmpeg at its real speed;
the recording FFmpeg is kill
ed at about 16 seconds in):
I’ve long been impressed by FFmpeg’s versatility, but co-opting xwininfo and hk leads to a surprisingly powerful toolbox!
Acknowledgements: thanks to Edd Barrett and Lukas Diekmann for comments.
Footnotes
I could perhaps have ported ssr, but it was too much fun to do things myself!
For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.
For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.
Until I’d fully understood how zombie processes come to be zombies
(with much debugging in extsmail), I did not
realise how unsatisfactory, and potentially unsafe, the Unix PID system and
wait
are. That said, I’ve never seen “PID reuse” problems manifest
in real life, even though they clearly could. Since there’s no
alternative, carrying on is on the only possibility!
Until I’d fully understood how zombie processes come to be zombies
(with much debugging in extsmail), I did not
realise how unsatisfactory, and potentially unsafe, the Unix PID system and
wait
are. That said, I’ve never seen “PID reuse” problems manifest
in real life, even though they clearly could. Since there’s no
alternative, carrying on is on the only possibility!