Last modified 11 years ago Last modified on 09/26/06 13:58:30

Linux backups

/!\ The latest Linux backups are always available at


The latest /home/ and /usr/local/data* backups are available from any machine in /usr/local/backup/home/current/ and /usr/local/backup/data/current/.

The backup scripts are run on vastus from /etc/cron.d/backup. The scripts create files based on configurations in /etc/backup*conf in the following directories:


The /data/rsync-data and /data/rsync-home directories are exported via NFS and mounted as /usr/local/backup/. The /data/rsync directory contains system files and so is not exported.

The backup system is implemented with rsync over ssh and the magic of hard links. The backup server (vastus, which has a 1.1TB RAID array of storage space), runs a nightly cron job which runs a home-grown perl script to manage the logistical details of the backup (creating directories, etc.), which in turn rsync's the directories to be backed up. Using rsync saves a lot of both time and network traffic, as only the differences between modified files are sent (and sent effeciently via the rsync protocol/algorithm), using a 'copy' of the backup from the previous day as the basis. An interesting (and questionably intentional) quirk in rsync has it always remove (unlink) any file that has been modified. This is essential to the operation of our system as the 'copy' of the backup from previous day used as the basis for the sync is not actually a copy at all--all the files are 'hard links' (pointers, aliases, etc.) to the same entry on the hard drive. When rsync is about to modify a file, the aforementioned unlink merely removes that particular 'hard link', but (here's the kicker) the data is not actually deleted until all hard links are removed. This allows us to present a full snapshot (full hierarchy) of the backup each day without having to duplicate files that haven't changed.

Nothing is without tradeoff, and this is no exception. If a file with multiple links were to be modified, the modification would be visible in every 'copy' (hard link) of that file, since they all 'point' to the same data. Thus, no one but the administrators have access to the backups of the system files and the data dirs and home directory backups are exported via NFS as read-only (which is a good idea with NFS anyway). These NFS mounts of the backups are available at:


and the selection of these directories (as well as the selection of those directories to be backed up) are enumerated in /etc/backup.conf, /etc/backup-data.conf, and /etc/backup-home.conf

There is yet another level of backup onto removeable Firewire hard drives, but this is still under development. The idea here is that the two Firewire drives can be swapped every X days, and the one not in use can be taken home (with care -- hard drives are delicate machines!) with the systems administrator so as to have a complete off-site backup that is at most X days old. To accomplish this, the most recent date snapshot of the backup directories themselves (ie, the "current" symlink in each of the backup destination directories is dereferrenced) are backed up at arbitrary intervals manually using:

nohup /usr/local/sbin/backup-cron-firewire <&- >/root/firewire.log 2>&1 &

The Firewire drive must be mounted on /firewire under the current configuration, and this involves (if not already mounted), powering up the drive (the big blue button on the front), reconnecting the Firewire cable (it will need to be disconnected and connected to notify the Firewire driver properly), and running /usr/local/sbin/

Windows Backup

has been moved to WindowsBackupDetails?