Version 3 (modified by k1@…, 12 years ago) (diff)

minor update

Linux backups

/!\ The latest Linux backups are always available at


The latest /home/ and /usr/local/data* backups are available from any machine in /usr/local/backup/home/current/ and /usr/local/backup/data/current/.

The backup scripts are run on vastus from /etc/cron.d/backup. The scripts create files based on configurations in /etc/backup*conf in the following directories:


The /data/rsync-data and /data/rsync-home directories are exported via NFS and mounted as /usr/local/backup/. The /data/rsync directory contains system files and so is not exported.

The backup system is implemented with rsync over ssh and the magic of hard links. The backup server (vastus, which has a 1.1TB RAID array of storage space), runs a nightly cron job which runs a home-grown perl script to manage the logistical details of the backup (creating directories, etc.), which in turn rsync's the directories to be backed up. Using rsync saves a lot of both time and network traffic, as only the differences between modified files are sent (and sent effeciently via the rsync protocol/algorithm), using a 'copy' of the backup from the previous day as the basis. An interesting (and questionably intentional) quirk in rsync has it always remove (unlink) any file that has been modified. This is essential to the operation of our system as the 'copy' of the backup from previous day used as the basis for the sync is not actually a copy at all--all the files are 'hard links' (pointers, aliases, etc.) to the same entry on the hard drive. When rsync is about to modify a file, the aforementioned unlink merely removes that particular 'hard link', but (here's the kicker) the data is not actually deleted until all hard links are removed. This allows us to present a full snapshot (full hierarchy) of the backup each day without having to duplicate files that haven't changed.

Nothing is without tradeoff, and this is no exception. If a file with multiple links were to be modified, the modification would be visible in every 'copy' (hard link) of that file, since they all 'point' to the same data. Thus, no one but the administrators have access to the backups of the system files and the data dirs and home directory backups are exported via NFS as read-only (which is a good idea with NFS anyway). These NFS mounts of the backups are available at:


and the selection of these directories (as well as the selection of those directories to be backed up) are enumerated in /etc/backup.conf, /etc/backup-data.conf, and /etc/backup-home.conf

There is yet another level of backup onto removeable Firewire hard drives, but this is still under development. The idea here is that the two Firewire drives can be swapped every X days, and the one not in use can be taken home (with care -- hard drives are delicate machines!) with the systems administrator so as to have a complete off-site backup that is at most X days old. To accomplish this, the most recent date snapshot of the backup directories themselves (ie, the "current" symlink in each of the backup destination directories is dereferrenced) are backed up at arbitrary intervals manually using:

nohup /usr/local/sbin/backup-cron-firewire <&- >/root/firewire.log 2>&1 &

The Firewire drive must be mounted on /firewire under the current configuration, and this involves (if not already mounted), powering up the drive (the big blue button on the front), reconnecting the Firewire cable (it will need to be disconnected and connected to notify the Firewire driver properly), and running /usr/local/sbin/

Windows Backup

Windows backups are kept on viscus, run from Task Scheduler. Users must keep data in their profile or the C:\Users\ directory for it to be backed up.

To restore:

  1. Start->Run ntbackup and select a backup set on the left. Unfortunately it does not give any indication of the date of each backup set.
  2. Select a backup set (for example, \\ROSSE\users\) by clicking on the checkbox, and ntbackup will ask you to find the data, point it to F:\Backup Data\RosseDataBackup.bkf (or the appropriate file for your system)
  3. Select the files to restore via the annoyingly small left pane (it's often easiest to just restore everything, since it's a pretty fast disk copy)
  4. By default ntbackup will want to restore back to the "Original Locations" on that system. To put the files in a temporary location, create a directory and then click Advanced and tell it to restore to "Alternate Location" such as F:\tmp\.
  5. Click through the wizard, the default settings are normally fine.
  6. Click OK when ntbackup asks again for the backup file (I think it asks in case your indexes are on disk but the data on tape?)
  7. Wait for the restore to complete

We currently use Microsoft's built-in simple backup software. The current scheme is to have full backups done weekly and incrementals done daily. Only the two previous full backups will be stored for access to data from two weeks prior, but the oldest of these will be deleted when a new full backup is stored. Two tools I've heard good reviews about are Robocopy or BackupPC, maybe someday we'll investigate those but for now this is working fine.

Backups are on viscus, run from the Task Scheduler. To see a command, right-click a task and click Properties. An example single task, for the ROSSE machine, looks similar to this (lines broken for readability, definitions at

C:\WINNT\system32\NTBACKUP.EXE backup 
"@C:\Documents and Settings\Administrator\Local Settings\Application Data\Microsoft\Windows NT\NTBackup\data\Rosse Normal Backup.bks" 
/n "Rosse Normal Backup" /d "Rosse Normal Backup" 
/v:no /r:no /rs:no /hc:off /m normal 
/j "Rosse Normal Backup" 
/f "F:\Backup Data\RosseDataBackup.bkf"

The bks (backup script) files are very simple, for example:


Archives for Windows Backup

The backup data (in the form of .bkf files) for for inactive machines are kept in the folder c:\archives\ on the windows domain server Cor.

Here is a list of the machines that have archived backup data (as of 8/9/2006):


Please note that even though these machines are not generating new data (whether they be turned off or gone completely) the backup scripts that point to these machines are still intact should they ever be used again in the future.