Syncing two backup drives, take two: ZFS

Sep 26, 2023

Following up on my previous setup of syncing external data backups, and my fears of bitrot gradually rising as my data becomes more fragmented, more at rest, and also more subject to frequently copy over sometimes non-ECC RAM, I’m moving to ZFS, which is nothing short of magic, on my external drives.

I’ve used ZFS via TrueNAS for years for “hot”, bulk storage data on spinning rust but it never occurred to me I could do it on my externals as well. And because my data on cold backups is very important, I’m opting to mirror two drives.

Setting up the main storage pool

  • zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase main/encrypted

Check encryption and dataset creation:

  • zfs list -r main
  • zfs get encryption main/encrypted

To load keys for mounting

  • zfs load-key -r main/encrypted (check key status using zfs get keystatus main/encrypted)
  • zfs mount -a

Unmounting or unloading keys:

  • zfs unmount main/encrypted
  • zfs unload-key -r main/encrypted

Setting up the backup storage pool

  • zpool create backup /dev/sdb

Backing up to the backup pool

Using zfs send --raw it is possible to send encrypted datasets with encryption in transit. When received by zfs receive, it will create an encrypted dataset at the destination, which may be desirable if the destination is intended to be offsite or an untrusted storage provider.

If not using raw streams zfs send after decrypting (thus sending unencrypted data in transit and encrypting at the destination dataset) is also an option. However, raw streams cannot be mixed with non-raw so for a given backup destination the choice must be made the first time backups are made because this cannot be changed for future incremental snapshots.

For first backup

Have to send a full snapshot first.

  • Take a snapshot of the main pool using zfs snapshot main@root -r.
  • zfs send -w -R main@root | pv | zfs receive -F backup to force write this snapshot into the backup pool.
  • Verify that the destination pool now has the encrypted dataset: zfs list -r

The whole process can be made encrypted-in-transit, so there is no need to load the key at all!

For subsequent backups

  • zpool import the main and backup pools
  • Take a snapshot of the main pool using zfs snapshot main@newsnapshot -r (-r recursively creates snapshots of all datasets)
  • Find the most recent shapshot that both main and backup have using zfs list main backup -t snapshot; for example oldsnapshot is the common one
  • zfs send -w -R -I main@oldsnapshot main@newsnapshot | zfs receive backup to incrementally transfer between snapshot deltas. (Note: I added -w for --raw here, but if this is done once it has to be done forever because raw and non-raw sends cannot be mixed in snapshots). Pipe through pv if you want to see progress.
  • Delete any old snapshots (except the latest common ancestor) using zfs destroy -r main@nolongerneeded
  • Export the pool before removing the hardware: zpool export backup

Note: Never alter data in the backup destination pool or snapshot sends will fail. If you are using encrypted streams, -F will fail and you have to rollback all the changes in backup manually. Use zfs rollback backup@lastbackupsnapshot to rollback the data.

Notes

  • Make sure you use ECC RAM or bad, bad things will happen.
  • zfs receive does not gracefully recover, this means that zfs send output piped to a file can be dangerous if it is corrupted because if zfs receive cannot import it, the piped output is completely useless.

References