Laurence Tratt: pizauth: differentiating transient from permanent errors

As and when I’ve found some spare moments, I’ve been working towards getting pizauth in a state where I think it’s ready for a stable release. I’m now making what I hope is the final alpha release available.

Since the last alpha release I have been in many more situations where pizauth only has intermittent internet access. Previously pizauth treated a wide category of errors as transient — that is, the sorts of errors that generally resolve themselves given a little bit of time (e.g. a temporary loss of internet access). However, while some errors (e.g. DNS lookup failing) are probably benign, some (e.g. HTTP requests returning non-200 codes) show something serious has definitely occurred.

The new alpha release of pizauth first differentiates (possibly) transient error from (definitely) permanent errors. If a permanent error occurs when refreshing an access token, then the token is invalidated, and the error logged. The reason that the user isn’t explicitly informed is that the true error (e.g. “your account is no longer valid”) is generally masked by another more generic error (e.g. “server refused to refresh token”). By invalidating the token, the user will then be asked to reauthenticate, at which point the true error is more likely to be properly reported. In the (hopefully) rare cases where there are persistent permanent refresh errors, users can look through logs to find out what happened.

However, possibly transient errors might not actually be transient. For example, imagine DNS lookup of your OAuth2 server fails: if that happens when there’s no internet access, you have a transient error; but if it happens when you do have internet access, it suggests your OAuth2 server has fallen off the internet, and that no amount of retrying will refresh the access token.

It took me quite a long time to work out how I might sensibly handle this. I was tempted to add a default “check there’s internet access” concept to pizauth but decided against this because doing so in a platform independent way is hard, and certain types of check can be viewed as a security leak. Instead, I have eventually settled on an optional global setting with the awkward name of not_transient_error_if which contains a shell command. This command is run when transient errors have occurred several times in a row. If the command succeeds (i.e. returns a zero exit code), pizauth will then treat the last transient error as a permanent error, log it, and invalidate the relevant access token.

I mostly expect not_transient_error_if to be used as a means for determining if the machine has internet access. Here are some possibilities:

```
nc -z website 80
```
Opens a connection to website on port 80, and returns with a zero exit code immediately if it succeeds, or non-zero otherwise.
```
ping -c 5 ip address
```
Returns a zero exit code if it can ping ip address or non-zero otherwise.
```
curl http://www.msftconnecttest.com/connecttest.txt | grep "Microsoft Connect Test"
```
Emulates Windows connection test (using whatever program you use to download webpages; e.g. on OpenBSD I might prefer the always-installed ftp -V -o - to curl).

I suspect many people will default to using ping, but I sometimes have to use networks that drop ping traffic 1. I’m sure that some people will have other mechanisms that make more sense in their context.

It took me quite a while to find a design for this that I was happy with. I can’t say I’m hugely enamoured of the name not_transient_error_if but my other attempts (some of which you can see in commits) were even worse. Since it’s mostly a “set once and forget” setting, I think I can live with the slightly unwieldy name.

The other major change to this version of pizauth is that refreshing now runs in a thread, so if one account stalls when refreshing, other accounts continue being refreshed. Refreshing should probably always have been done in this way but finding a simple internal design proved harder than I expected: I tried at least 5 or 6 designs before settling on the final approach.

As before, testing and comments are hugely appreciated. pizauth now has a small but gradually growing user base, so I’m increasingly hoping that pizauth is not only close to having a stable interface, but is also stable when it runs.

Acknowledgements: thanks to Edd Barrett for comments.

Newer 2022-12-14 08:00 Older

If you’d like updates on new blog posts: follow me on Mastodon or Twitter; or subscribe to the RSS feed; or subscribe to email updates:

Footnotes

This is nearly always the result of system administrators making two false assumptions. First, assuming that the more restrictions made to their network, the more secure it is (there are nearly always ways around such restrictions). Second, that any restriction is justified, no matter the inconvenience it makes to users. I see this slightly often than I did a few years ago, when I used to semi-regularly use networks that did things like drop all ssh traffic.

☒

pizauth: differentiating transient from permanent errors

Footnotes

Comments