I have made a point of backing up about anything I do for a long time. Why? (Especially automated) backup systems will eventually be helpful no matter what you do. Whether it is changing or deleting some file that you did not have under version control (oops), hardware going bad, software going bad, hardware getting lost.. I have had all of them, but I have still yet to lose significant amount of data once I started taking backups consistently.

To keep the text to reasonable length, I am not going to talk much about hardware, mainly about software and how it has been used. While I do have funny numbers about storage of my 90s backup systems, they are perhaps better left for another post (or to history). I’m also skipping how I am have backed up (and continue to backup) non-essential devices, e.g. phones, tables, gaming PC, ..

3-2-1 rule

Attributed to Peter Krogh, it is relatively reasonable basis for backup strategy:

  • 3 copies of data
  • 2 different media
  • 1 copy offsite

I have tried to follow this, to some extent, although admittedly my backup strategy is considerably weirder. And I think it is missing something: I believe it should say encrypted copies as dealing with non-encrypted backups is simply a bad idea.

History of my backup approaches

In the early 90s and before, I did not really consistently make backups. Due to that, I lost some things I would still rather have.

Manual backups from mid-90s until 2000

After that, I think I mostly took manual zip, cpio or tar backups of data I cared about, and I did not deal with it particularly consistently. Most of the time I had about 2 copies of the data, but there was no really catastrophic losses. I have still some of those backup files floating around.

2000+ a script enters the picture (for UNIX use)

I created script called perform_backup.sh which literally evolved over 20 years until I stopped using it few years ago. The basic idea of the script was to create gzipped tar archive, which is then encrypted using gpg. The main command within was as follows (in the last 2020 edition, obviously the paths varied bit over time):

sudo nice tar clfz - /boot/grub2 /etc /var /usr/local \
  `find /home/mstenber/ -mindepth 1 -maxdepth 1 -print | egrep -v '(/nobackup|/share|/.local)'` \
  | nice gpg --passphrase-fd 3 -es -r fingon -u AUTOMATIC 3< $HOME/.pgppass \
  > $STOREDIR/${CRYPT_BACKUP_BASENAME} 

and beyond that it mostly did some housecleaning (e.g. grab dpkgpackage selections on Debian hosts ).

I made copies of the backups to both my personal as well as networked file servers (using rsync over ssh), and no data was lost (that I noticed, anyway). I used similar script both for my personal as well as work laptop, and also (minor variations) of it elsewhere too.

The key downside of this approach was still that backups were made manually (and therefore rarely) by invoking the script, and they were not incremental so they took a long time if the dataset was large. However, 3-2-1 rule was more or less fulfilled for those backups (as I had number of (full) local backups, and typically some offsite network backups, and those were frequently backed up too by whoever ran the file server).

A lot of ‘not interesting’ stuff was in nobackup folders, which were not backed up at all.

2007+ Mac backups - Apple Time Machine

I switched to using Macs as my personal machines around 2006 or so, after having dabbled with them a bit before (~2001-2002 when the OS X initially came out).

This was the first time I had actually real-time backups - when they worked, anyway. The first few years were bit shaky, but from the start I dedicated (directly connected) hard drive for Time Machine use. But in addition to that, I also kept taking manual backups to a local NAS box and external USB drives, as well as making Time Machine backups to the NAS too. At the time I had some relatively lousy dedicated NAS box (Buffalo Terastation), but few years later (2012) and some really bad experience with btrfs later I wound up with my current NAS setup.

I have been taking Time Machine backups to both local as well as NAS disk ( 2 copies out of the box ) and then manually taking backups which are on drives that are not usually onsite. So 3-2-1 rule was covered here too, kind of (‘different media’ is bit questionable).

2012+ NAS with ZFS

I added ZFS and its snapshots to my NAS around 2012. Initially I used raidz (with 4 to 6 disks), but over the years the price of storage decreased and duration of resilvering ZFS got longer and longer so I moved eventually to just RAID-10 (mirrored disks, striped), which performed better than raidz and resilvering was also faster.

Taking automated zfs snapshots of (mainly) backups felt bit weird, but my zfs pool was always much larger than the amount of data I had in the NAS, so using the extra space for snapshots was quite nice.

2021+ Dedicated backup tool(!)

I had looked at dedicated backup software every now and then, but my manual occasional script use on machines with little disk + Time Machine for my main datasets + rsync had kept my data more or less safe for 20 years at that point. However, as the script was not incremental, I had started using just rsyncing systems to backup systems instead of using the script.

The downside with rsync was that I had to trust the destination systems; I avoided this somewhat by using combination of sshfs for access, and encfs for filesystem encryption (on the client end), which kept the backups still relatively safe both on the remote system, as well as in backups of the remote system.

I wanted a version history of the backups, and while I rotated backup media it felt somewhat inelegant. So, I started looking yet again at software..

I eventually settled on restic/restic: Fast, secure, efficient backup program. It fulfilled all of my requirements:

  • various backends where it can store the backups (I use mostly the local file backend, as well as over-ssh backend)
  • incremental backups
  • encryption
  • performance (it scans quite fast for incremental changes)

and I transitioned to it over 2021 and 2022. There are also some bonus aspects I have come to appreciate:

  • it deals with both small and huge files quite well (packing small files to packs, splitting large files to pieces)
  • deduplication across all hosts that take backups to same location (I have some files on number of machines but my backups store them only once)
  • compression (admittedly most of my big files are my photos, which do not compress, but it is the thought that counts)
  • quite robust development style - I have yet to encounter a bug

Now I use it (and Time Machine) to backup almost all of my systems.

Current backup flow

  • Time Machine backs up my Macs to local SSD (and to NAS, when it is up; I fire it up every week or two at least)
  • I manually run cu command couple of times a week (really, horribly complicated nested set of zsh functions) which:
    • bidirectionally synchronises configuration files of my main machine with other machines (both local and remote), as well as to a local gitstore of configuration
    • takes backups to non-local networked storage location (which also has its own backups) using restic of my most valuable data both on my local laptop as well as some other network hosts (e.g. home router 2023)
    • takes backups to NAS using restic of all my data on my home machines (if the NAS is available and I am at home)
  • I take monthly offsite backup which I rotate of the NAS
  • I take once or twice per year offsite backup to another location from the NAS backup disk set and rotate that

So far, I think I have only once looked at backups of NAS (mostly because they cover historic periods that are no longer stored on the NAS, or in the ZFS snapshots within). But the local-ish backups (e.g. Time Machine, or always-available restic backup) I seem to be using once per month or so, mostly when I have done something I shouldn’t have done at one of my machines.

I have not had proper hardware failures in over 10 years; we shall see how long this record lasts, but when those happened in the past, I also did not lose data thanks to number of backups floating around.

I am wondering about getting rid of the physical offsite and offsite^2 backups, as they are some hassle, but having them makes me feel safer so I guess that is a small price to pay for my peace of mind, although e.g. Amazon’s Glacier is getting cheap enough these days that the temptation is definitely there to stop playing with physical hardware just for backups.