| Home > Technical articles | laurie@tratt.net |
|
December 10 2005 My life sometimes feels overly peripatetic. One way in which I feel this pinch is that I regularly use three computers: my desktop at home, my laptop, and my desktop at work. I also have a server at work (on which you are probably reading this) and a server at home. This creates an obvious problem in synchronizing files between machines. When I was only regularly using two machines, I used to use the traditional approach to the problem which was to manually copy files (usingscp) from machine to machine. This was a pain, and even with just two machines I occasionally overwrote files with old versions, or went on a trip only to discover I didn't have the latest version of a particular file on my laptop. With more than two computers the problem becomes disproportionately more difficult.
My solution to this problem is not a novel one, but nor does it seem to be particularly well known. I had the germ of the idea around three years or so ago, and largely got it working before finding that Joey Hess had already eloquently described most of the important steps; I used some of Joey's ideas to refine my setup. The idea that's fairly completely described by Joey is 'use version control to store files in your home directory.' Version control systems such as CVS are generally used so that multiple developers can work on the same source code and share their changes in a controlled fashion amongst each other. As this implies, on each developers' machine lies a (largely) identical copy of the shared source code. However there's no reason to restrict this to being of use only when multiple people are involved. If one has multiple computers, using version control software simply means that each contains an identical copy of shared files. The benefits of taking this approach are, from my experience, almost impossible to overstate. My life has not only become significantly easier by significantly reducing the chance for mistakes, but I've also been able to be significantly more cavalier about moving between new machines, adding new machines to my menagerie, and even simply reinstalling existing machines. Of course for most normal people out there, this won't be an advantage at all since it fulfils a need you don't have, and uses a mechanism you won't want to understand, but if you're a serious computer user I think you should consider it. I suspect one of the reasons why this method is rarely used - I know a grand total of one person in real life who uses something approaching this technique - is because of the use of "version control system" in the above text. Version control software is traditionally scary (most of the tools were one or more of big, slow, and unreliable), and of course it is seen as being applicable only to source code. In practice, even with simple tools, neither of these points is valid. Using this technique does require some thought, and it does take getting used to, but once one is used to it, the benefits significantly outweigh the disadvantages. One thing that's interesting is that I see the list of pros, cons and irrelevancies a little bit differently than Joey and other similar write-ups.
So now that I've been using this technique for a few years I feel that I have a few useful suggestions for anyone tempted to go down this highly recommended route. Use a commonly available version control system.At some point you will probably want to ensure that you can synchronize your data on a machine where it might be a liability to have unusual software. I use CVS since most of its (well known) deficiencies relate to problems encountered with multiple developers. The only significant remaining pain relates to directory handling and renaming files, and I can live with that, as annoying as it is.An oft used alternative is Subversion but I wouldn't touch that with a barge pole, since it appears to be a project with the limited ambition of just replacing CVS. Unfortunately while they fixed some of CVS's more obvious deficiencies, they've introduced some tear-inducingly stupid new flaws. I've seen several corrupted repositories because using BSD-DB or similar for a storage backend is an obviously bad move. At some point, one of the more advanced systems like Darcs or bzr might be well known enough to use here. But not for a few years yet I suspect. Think before you name and add files.Especially with CVS, renaming of files and directories is a slow and tedious task. But no matter what your system, a useful consequence of using this approach is that you will probably carry a copy of every file you add to your repository for life. If you choose an inappropriate name in haste, or locate a file in an inappropriate location, you will make life difficult for yourself in the long run.A corollary of this is that the layout of the top-level directories in your home directory is extremely important. I have the following:
.private only gets checked out on trusted machines).
Divide your binary data into three types.Since binary data tends to be much bigger than text files, I split binary data into three groups:
Some lateral thinking can lead to useful savings in terms of the amount of binary data you store. For example I store only the large versions of my photos in my repository, but I've set up Makefile's so that the thumbnails and web pages that allow one to sensibly view these files are created after checkout (or any changes to the photos). Although the saving of around 15% that I get in this particular case might not seem very significant, this actually translates to a useful saving when checking out a fresh repository or manipulating files because binary data tends to dwarf textual data in size. E-mail is special.Using either version control or the binary data technique outlined for e-mail would be masochistic. I use OfflineIMAP to synchronize my e-mail because it's better suited to the task and I use some other useful tricks on it (which I will document in a later entry).Automate your setup.I have a couple of small scripts which make my life a lot easier. The first is an obvious one which I callcvssync (not the best name in retrospect) and which takes two arguments: ci or up. It goes through all my various CVS modules and updates them or commits the changes, runs some Unison commands, calls my cvsfix script (see Joey's article for suggestions on what this should do), and performs a few other minor tasks. None of which I need to explicitly remember.
The second script is much less obvious: I call it Create a complete backup before you try this.Trust me on this one. At first you will either forget to add files, not add them correctly, not fully understand the software you're using, or suffer a similar such problem. If you have a backup you can fix these problems with little penalty; after a month or so without problems, you may well feel comfortable discarding the backup. |
| Home > Technical Articles | Copyright © 1995-2010 Laurence Tratt laurie@tratt.net |