Automating ZFS pools maintenance


Updated: 2014-01-27

If you 'value your data', as some people would say, they are not a lot of filesystems you should entrust your data to. The main Linux and Windows file systems are not fit for it, since they have not been designed to handle things like bit rot. "But I have a pimped out RAID 1/5/6 array!", you might say. Well, that won't help either - in fact, it might even make things worse. If all this sounds new or unlikely to you, Ars Technica has an excellent writeup on the so-called 'next-gen' filesystems like ZFS and Btrfs, which also points out the weaknesses of traditional filesystems like NTFS or ext4.

ZFS has been developed by Sun Microsystems from 2005 onwards. I will not expand on its features and possibilities; it is a buzzword in its own right, so there are plenty of sources that thoroughly explain what it can do. The Ars Technica article, for example, is a start.

I/O issues with my Samsung HD204UI drives corrupted quite a bit of data. Ext4 was unable to detect this and I decided it was time to move to a better filesystem - data integrity being a strong requirement. Although Btrfs might be the more obvious choice on Linux, it had some catching up to do to ZFS feature-wise and stability-wise at the time, so I decided on ZFS. I initially started out with FreeBSD based ZFSguru, which is a nice solution for people wanting to test ZFS. It offers easy setup and a handy web interface, but its rather limited set of add-ons means you need to muck around on the command line if you want anything non-standard. That highly decreases maintainability in the long run, so I then moved to Debian's Gnu/kFreeBSD fork. I finally settled on the regular GNU/Linux flavour, with the native ZFS port developed by the Lawrence Livermore National Laboratory.

At the moment my server harbours two ZFS mirror pools with 2 4TB 5K4000 drives each, the mirror disks being in eS-ATA enclosures. At first I ran ZFS scrubs manually, but this is not feasible or reliable in the long run. To automate most ZFS maintenance, I have written a script that does the following:

  • Run a scrub on every pool every two weeks. Although cron is supposed to support filtering in weeks, this does not seem to work reliably.
  • E-mail a report when errors are detected
  • Simplifies on- and off-lining mirror disks for the pools

ZFS offers two administration tools - zfs and zpool. Their syntax is not complex, but they do require you to provide drive names for e.g. onlining disks in a pool, and using block device names like sda is highly discouraged (drive mapping might not always be consistent). My script uses a split out configuration file where you define the following:

  • Language (English by default, but Dutch if preferred)
  • Pool names
  • Mirror disk names
  • E-mail address reports will be sent to

The script sources the configuration file and will display its usage if called with no arguments:

root@amalthea:~# zpm
Possible arguments:
* scrub                Start a ZFS scrub on all pools defined in the array
* report               Check pool status and send an e-mail report to a predefined address
* online $pool         Bring the mirror disk of said pool online
* offline $pool        Take the mirror disk of said pool offline
Configuration file is /etc/default/zpm.conf.

The e-mail functionality requires you to set up an SMTP server. I picked sSMTP for this, a lightweight Sendmail alternative. Configuration is pretty straightforward and excellently documented online.

To automate the scrubs and e-mail reports, we add some cron entries. These are the ones I have:

root@amalthea:~# crontab -l|grep zpm
00 02 * * sat/2 /bin/bash /usr/local/bin/scripts/zpm scrub
00 17 * * sat/2 /bin/bash /usr/local/bin/scripts/zpm report

As you might glance from cron's configuration, I tried to set it to run every two saturdays, but as mentioned earlier cron ignores my wishes. Since scrubs can take a long time on bigger drives, it's wise to leave a safe margin and run the report like half a day later. When the scrub has been going on for a few minutes you might get a reliable estimate of how long it's going to take. Use this to determine your margin.