Laurence Tratt: Making a Video of a Single Window

I recently wanted to send someone a video of a program doing some interesting things in a single X11 window. Recording the whole desktop is easy (some readers may remember my post on Aeschylus which does just that) but it will include irrelevant (and possibly unwanted) parts of the screen, leading to unnecessarily large files. I couldn’t immediately find a tool which did what I wanted on OpenBSD 1 but through a combination of xwininfo, FFmpeg, and hk I was able to put together exactly what I needed in short order. Even better, I was able to easily post-process the video to shrink its file size, speed it up, and contort it to the dimension requirements of various platforms. Here’s a video straight out of the little script I put together:

In this post I’m going to quickly go through what I did. I hope you get two things from this. First, FFmpeg, hk, and xwininfo are great examples of tools whose power can be magnified by combining them with other tools. Second, I’m not presenting an end product as much as I am presenting example recipes which you can alter and adjust to fit your circumstances.

Selecting the right portion of the screen

The first problem is automatically determining what portion of the screen I want to capture. Fortunately, xwininfo does exactly what I want: when I run it, the cursor changes to a “+”; and, when I click on a window, it outputs information to stdout about that window. Here’s what happens when I click on the editor window I’m typing this post into:

$ xwininfo
xwininfo: Please select the window about which you
          would like information by clicking the
          mouse in that window.

xwininfo: Window id: 0x2800006 "making_a_video_of_a_single_window.incomplete + (~/web/tratt.net/templates/laurie/blog/2022) - NVIM --- Neovim"

  Absolute upper-left X:  309
  Absolute upper-left Y:  104
  Relative upper-left X:  5
  Relative upper-left Y:  29
  Width: 3000
  Height: 2400
  Depth: 24
  Visual: 0x57c
  Visual Class: TrueColor
  Border width: 0
  Class: InputOutput
  Colormap: 0x2800005 (installed)
  Bit Gravity State: NorthWestGravity
  Window Gravity State: NorthWestGravity
  Backing Store State: NotUseful
  Save Under State: no
  Map State: IsViewable
  Override Redirect State: no
  Corners:  +309+104  -531+104  -531-56  +309-56
  -geometry 1500x1200+309-56

What I need to extract are:

the window’s width and height,
and the coordinates of the window’s top-left corner (where 0,0 is the top-left of the screen).

Let’s assume I’ve put xwininfo’s output into /tmp/t. I can then easily extract all four pieces of information we need:

$ grep Width /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
3000
$ grep Height /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
2400
$ grep "Absolute upper-left X" /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
309
$ grep "Absolute upper-left Y" /tmp/t \
    | cut -d ":" -f 2 \
    | tr -d " "
104

grep obtains the single line we’re interested in, cut selects the text after the colon (“:”), and tr removes the leading characters that cut leaves behind.

Rather than continually copy those numbers around by hand, I’ll put them into variables so we can easily reference them in the rest of the post:

width=`grep Width /tmp/t | cut -d ":" -f 2 | tr -d " "`
height=`grep Height /tmp/t | cut -d ":" -f 2 | tr -d " "`
top_x=`grep "Absolute upper-left X" /tmp/t | cut -d ":" -f 2 | tr -d " "`
top_y=`grep "Absolute upper-left Y" /tmp/t | cut -d ":" -f 2 | tr -d " "`

Starting and stopping FFmpeg

Now that we know what portion of the screen we want to capture, I can start FFmpeg recording:

ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  output.nut

I always record to a lossless format, specifically the NUT format, which is best thought of as FFmpeg’s “native” format — since I’ve been using that, I’ve had none of the problems I encountered with other formats. I do as little processing as I can get away when recording (-preset ultrafast does minimal compression, otherwise the resulting files can become unwieldy), because it’s easy to do heavy processing as a subsequent step. How many frames per second one prefers is a personal choice, though in the above example I’ve gone for a neutral 30 2.

When I’m finished recording, pressing “q” in the terminal running FFmpeg causes it to stop recording. However, this is a bit annoying, because I have to change keyboard focus from the window I’m recording to the one running FFmpeg. If those two windows overlap (bearing in mind that FFmpeg is capturing a fixed portion of the screen), the result of FFmpeg’s voluminous terminal output suddenly heaving into view is discombobulating:

What I really want to do is run FFmpeg in the background and stop it when I press a keyboard shortcut. This is a perfect use-case for hk, which waits until a specific key combination has pressed and then executes an arbitrary shell command:

ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  output.nut &
ffmpeg_pid=$!
hk Shift+F4 kill -SIGINT $ffmpeg_pid
wait $ffmpeg_pid

I first run FFmpeg in the background (with “&”). I then tell hk to wait for Shift+F4 to be pressed, at which point hk will run kill, which sends SIGINT to the FFmpeg process (whose PID is stored in the $! shell variable). When FFmpeg receives the SIGINT signal it will stop recording and finish writing to output.nut.

Although we don’t need this feature quite yet, if I want to execute commands on output.nut, there is no guarantee that ffmpeg will have finished writing to it when hk terminates — indeed, it’s likely that hk will terminate first. It’s thus safest to wait3 until the FFmpeg process has definitely terminated.

Post processing

At this point, I have a .nut file which has captured exactly the portion of the screen we want. However, many people are scared by .nut files, and because I did only minimal compression, the .nut file can be alarmingly large.

I thus want to convert the .nut file to a more widely recognised .mp4 file, and heavily compress the video while doing so:

ffmpeg \
  -i output.nut \
  -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -preset veryslow \
  -crf 10 \
  output.mp4

I’ve chosen FFmpeg’s highest compression level (-preset veryslow). Since I’m also willing to sacrifice a little bit of visual quality in order to get better compression, I’ve specified -crf. Roughly speaking, -crf 0 means “lossless” and every increase of that number by 3 doubles the lossiness. -crf 10 is below the threshold where I can see any real differences in the video. pix_fmt is a slightly irritating detail that I don’t want to go into: suffice it to say that some programs (e.g. Firefox and some versions of VLC) won’t play the resulting mp4 if I don’t use this option. However, using -pix_fmt in this way means we need a video with dimensions divisible by two, hence the pad video filter (-vf). Unfortunately, video encoding is full of these sorts of bizarre details.

On a few tests, the command above leads to a .mp4 file that’s at least 5x smaller, and often 10x or more smaller, than the .nut input.

If you actually try recording yourself typing, you’ll probably find that, like me, you type much slower than you expected. In my case, the original videos I recorded would be frustratingly slow for other people, so the videos you’ve seen above have been sped up by 2x. Compare the original recording:

with the 2x-faster version:

Speeding the video up simply requires the FFmpeg’s setpts filter:

ffmpeg \
  -i output.nut \
  -vf "\
    setpts=0.5*PTS[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -preset veryslow \
  -crf 10 \
  output.mp4

What happens if I want to upload the video to a site where videos can’t exceed a certain width or height? I don’t want to scale the video up if it’s smaller than that and I always want to maintain the aspect ratio. Let’s say I want a maximum height of 1920 and a maximum width of 1200:

ffmpeg \
  -i output.nut \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]scale='min(1920,iw)':min'(1200,ih)': \
       force_original_aspect_ratio=decrease[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2 " \
  -preset veryslow \
  -crf 12 \
  -c:v libx264 \
  -pix_fmt yuv420p \
  output_capped.mp4

That means that I can record a huge window (e.g. a browser window at 3366x2468) and automatically scale it down to the largest possible resolution that preserves the aspect ratio (in this case to 1638x1200):

An example combination

One can put all the bits above together in various different ways, but at the moment I’ve bundled them into this simple script:

#! /bin/sh

if [[ $# -ne 1 ]]; then
    echo "video_single_window <base_file_name>" > /dev/stderr
    exit 1
fi

t=`mktemp`
xwininfo > $t
width=`grep Width $t | cut -d ":" -f 2 | tr -d " "`
height=`grep Height $t | cut -d ":" -f 2 | tr -d " "`
top_x=`grep "Absolute upper-left X" $t | cut -d ":" -f 2 | tr -d " "`
top_y=`grep "Absolute upper-left Y" $t | cut -d ":" -f 2 | tr -d " "`

ffmpeg \
  -f x11grab \
  -framerate 30 \
  -video_size ${width}x${height} \
  -i +${top_x},${top_y} \
  -c:v libx264rgb \
  -preset ultrafast \
  -crf 0 \
  $1.nut &
ffmpeg_pid=$!
hk Shift+F4 kill -SIGINT $ffmpeg_pid
wait $ffmpeg_pid

ffmpeg \
  -i $1.nut \
  -c:v libx264 \
  -pix_fmt yuv420p \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -preset veryslow \
  -crf 10 \
  $1.mp4

ffmpeg \
  -i $1.nut \
  -vf " \
    setpts=0.5*PTS[v]; \
    [v]scale='min(1920,iw)':min'(1200,ih)': \
       force_original_aspect_ratio=decrease[v]; \
    [v]pad=ceil(iw/2)*2:ceil(ih/2)*2" \
  -preset veryslow \
  -crf 10 \
  -c:v libx264 \
  -pix_fmt yuv420p \
  $1_scaled.mp4

rm $t

In essence after running video_single_window base I end up with three files: base.nut (the “raw” recording), base.mp4 (a compressed MP4), and base_scaled.mp4 (a compressed and possibly scaled-down MP4).

One of the best bits about the simplicity of this script is how easy it is to create variants. For example, I created a variant with a different hotkey, which allowed me to record video_single_window base running (at normal speed, so you can see FFmpeg at its real speed; the recording FFmpeg is kill ed at about 16 seconds in):

I’ve long been impressed by FFmpeg’s versatility, but co-opting xwininfo and hk leads to a surprisingly powerful toolbox!

Acknowledgements: thanks to Edd Barrett and Lukas Diekmann for comments.

Newer 2022-08-09 08:00 Older

If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Footnotes

I could perhaps have ported ssr, but it was too much fun to do things myself!

☒

I could perhaps have ported ssr, but it was too much fun to do things myself!

For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.

☒

For the sort of desktop videos I do, 30fps is overkill: I regularly use 10 or 12 fps, and no-one has ever noticed or complained.

Until I’d fully understood how zombie processes come to be zombies (with much debugging in extsmail), I did not realise how unsatisfactory, and potentially unsafe, the Unix PID system and wait are. That said, I’ve never seen “PID reuse” problems manifest in real life, even though they clearly could. Since there’s no alternative, carrying on is on the only possibility!

☒

Making a Video of a Single Window

Selecting the right portion of the screen

Starting and stopping FFmpeg

Post processing

An example combination

Footnotes

Comments