well not exactly, but I am throwing a few more speak and spells onto the pile. I’m currently running a collection of 8 refurbished 4tb enterprise drives (max the case can hold). I run snapraid to protect them with dual parity leaving me with 6 drives for capacity. I’ve filled this homebrew NAS to within 1tb and so I need an upgrade. I kinda want to rebuild the system but honestly, it’s still healthy so I’m just going to pull off an in-place upgrade and this post will detail how.
I intentionally keep this as simple as possible. I avoid ZFS and other such systems because this is ultimately a collection of smaller, independent resources so in the event of a total catastrophic failure my chances of recovering some useful data is possible. so, how do I do it?
first up, the initial state:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 234G 111G 111G 50% /
mergerfs 21T 21T 817M 95% /mnt/aggr1
/dev/sdg1 3.6T 3.3T 154M 96% /mnt/parity2
/dev/sdj1 3.6T 3.3T 192M 95% /mnt/data5
/dev/sdi1 3.6T 3.3T 192M 95% /mnt/data6
/dev/sdf1 3.6T 3.2T 221G 94% /mnt/data3
/dev/sdh1 3.6T 3.2T 220M 94% /mnt/data1
/dev/sde1 3.6T 3.2T 213M 94% /mnt/data4
/dev/sdc1 3.6T 3.3T 154M 96% /mnt/parity1
/dev/sdd1 3.6T 3.3T 200M 95% /mnt/data2
/dev/md0 916G 577G 294G 67% /mnt/raid1
the “array” is made of the parity and data drives shown above (ignore the /dev/md0, it’s a mirrored pair of SSDs for my docker test environment). these are simply ext4 mounts in /mnt/dataX and /mnt/parityX. ext4 was chosen for it’s simplicity, maturity and ubiquity; if I ever lose the system, the drives can be read by any linux machine. I’ll get into the data storage on these partitions later but safety being the #1 goal, there’s recovery options there as well.
next, as mentioned above, I use SnapRaid to protect the data; essentially a file level RAID that runs on a schedule. instead of an array that does parity calculation on the block level, SnapRaid hashes the files and stores parity information on a separate disk(s). it’s functionally similar to a backup process as it runs on schedule to protect your data. it is not a backup though as it does not duplicate data. snapraid is widely used and quite mature so their example config is a perfect starting point, and pretty much self-documents.
# Example configuration for snapraid
# Defines the file to use as parity storage
# It must NOT be in a data disk
# Format: "parity FILE [,FILE] ..."
#parity /mnt/diskp/snapraid.parity
parity /mnt/parity1/snapraid.parity
# Defines the files to use as additional parity storage.
# If specified, they enable the multiple failures protection
# from two to six level of parity.
# To enable, uncomment one parity file for each level of extra
# protection required. Start from 2-parity, and follow in order.
# It must NOT be in a data disk
# Format: "X-parity FILE [,FILE] ..."
#2-parity /mnt/diskq/snapraid.2-parity
#3-parity /mnt/diskr/snapraid.3-parity
#4-parity /mnt/disks/snapraid.4-parity
#5-parity /mnt/diskt/snapraid.5-parity
#6-parity /mnt/disku/snapraid.6-parity
2-parity /mnt/parity2/snapraid.parity
# Defines the files to use as content list
# You can use multiple specification to store more copies
# You must have least one copy for each parity file plus one. Some more don't hurt
# They can be in the disks used for data, parity or boot,
# but each file must be in a different disk
# Format: "content FILE"
#content /var/snapraid.content
#content /mnt/disk1/snapraid.content
#content /mnt/disk2/snapraid.content
content /var/snapraid.content
content /mnt/data1/.snapraid.content
content /mnt/data2/.snapraid.content
content /mnt/data3/.snapraid.content
content /mnt/data4/.snapraid.content
content /mnt/data5/.snapraid.content
content /mnt/data6/.snapraid.content
# Defines the data disks to use
# The name and mount point association is relevant for parity, do not change it
# WARNING: Adding here your /home, /var or /tmp disks is NOT a good idea!
# SnapRAID is better suited for files that rarely changes!
# Format: "data DISK_NAME DISK_MOUNT_POINT"
data d1 /mnt/data1/
data d2 /mnt/data2/
data d3 /mnt/data3/
data d4 /mnt/data4/
data d5 /mnt/data5/
data d6 /mnt/data6/
# Excludes hidden files and directories (uncomment to enable).
#nohidden
# Defines files and directories to exclude
# Remember that all the paths are relative at the mount points
# Format: "exclude FILE"
# Format: "exclude DIR/"
# Format: "exclude /PATH/FILE"
# Format: "exclude /PATH/DIR/"
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude appdata/
exclude *.!sync
# Defines the block size in kibi bytes (1024 bytes) (uncomment to enable).
# WARNING: Changing this value is for experts only!
# Default value is 256 -> 256 kibi bytes -> 262144 bytes
# Format: "blocksize SIZE_IN_KiB"
#blocksize 256
# Defines the hash size in bytes (uncomment to enable).
# WARNING: Changing this value is for experts only!
# Default value is 16 -> 128 bits
# Format: "hashsize SIZE_IN_BYTES"
#hashsize 16
# Automatically save the state when syncing after the specified amount
# of GB processed (uncomment to enable).
# This option is useful to avoid to restart from scratch long 'sync'
# commands interrupted by a machine crash.
# It also improves the recovering if a disk break during a 'sync'.
# Default value is 0, meaning disabled.
# Format: "autosave SIZE_IN_GB"
#autosave 500
# Defines the pooling directory where the virtual view of the disk
# array is created using the "pool" command (uncomment to enable).
# The files are not really copied here, but just linked using
# symbolic links.
# This directory must be outside the array.
# Format: "pool DIR"
#pool /pool
# Defines a custom smartctl command to obtain the SMART attributes
# for each disk. This may be required for RAID controllers and for
# some USB disk that cannot be autodetected.
# In the specified options, the "%s" string is replaced by the device name.
# Refers at the smartmontools documentation about the possible options:
# RAID -> https://www.smartmontools.org/wiki/Supported_RAID-Controllers
# USB -> https://www.smartmontools.org/wiki/Supported_USB-Devices
#smartctl d1 -d sat %s
#smartctl d2 -d usbjmicron %s
#smartctl parity -d areca,1/1 /dev/sg0
#smartctl 2-parity -d areca,2/1 /dev/sg0
I don’t need to change much, important bits are adding the 2-partity drive option and ensuring each data disk is defined. I also put the content list on each of the drives and a copy in /var; likely overkill but again, each individual drive has as much recovery info as possible.
so, that’s snapraid installed, it doesn’t do anything on it’s own though. typically a cron job (or scheduled task in windows) is setup to run your chosen snapraid settings. since moving to linux however, I have embraced this snapraid runner script:
[snapraid]
; path to the snapraid executable (e.g. /bin/snapraid)
executable = /usr/local/bin/snapraid
; path to the snapraid config to be used
config = /etc/snapraid.conf
; abort operation if there are more deletes than this, set to -1 to disable
deletethreshold = 250
; if you want touch to be ran each time
touch = true
[logging]
; logfile to write to, leave empty to disable
file = /var/log/snapraid/snapraid.log
; maximum logfile size in KiB, leave empty for infinite
maxsize = 5120
[email]
; when to send an email, comma-separated list of [success, error]
sendon = error
; set to false to get full program output via email
;short = false
subject = [SnapRAID] Status Report:
from = [email protected]
to = [email protected]
; maximum email size in KiB
maxsize = 1024
[smtp]
host = smtp.snand.org
; leave empty for default port
port =
; set to "true" to activate
ssl = true
tls = true
user = [email protected]
password = IMNOTTHATDUMB
[scrub]
; set to true to run scrub after sync
enabled = true
; scrub plan - either a percentage or one of [bad, new, full]
plan = 12
; minimum block age (in days) for scrubbing. Only used with percentage plans
older-than = 10
once again, only minimal changes needed to the configuration here. I basically just added my email, changed some of the log sizes and made sure the scrub settings were appropriate.
so, data is protected, now how do I access it? I’ll leave the NFS/Samba configuration out of this for now, that’s it’s own post if folks are interested, in fact, I’m planning on moving to NFS4 so perhaps I’ll make a post then. for now, let’s move on to MergerFS which is used to aggregate the individual data drives into one larger “volume.”
mergerfs is a FUSE based union fileystem. to put it simply, it’s what allows me to aggregate multiple drives into one single path. it does this in userspace so the underlying filesystem can be whatever, and it simply stitches multiple paths into one. see this example, stolen straight from their github page:
A + B = C
/disk1 /disk2 /merged
| | |
+-- /dir1 +-- /dir1 +-- /dir1
| | | | | |
| +-- file1 | +-- file2 | +-- file1
| | +-- file3 | +-- file2
+-- /dir2 | | +-- file3
| | +-- /dir3 |
| +-- file4 | +-- /dir2
| +-- file5 | |
+-- file6 | +-- file4
|
+-- /dir3
| |
| +-- file5
|
+-- file6
this all operates at the file level and simply aggregates paths. it ends up placing files across the entire ‘array’, so each folder may have files across multiple drives. each drive/path may only have part of the folder contents but the paths are correct and so can be manually rebuilt if necessary (ask me if I know why). in the end, it’s just a nice luxury to only worry about my giant blob of data rather than juggle a bunch of individual drives.
back to my previous df -h output:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 234G 111G 111G 50% /
mergerfs 21T 21T 817M 95% /mnt/aggr1
/dev/nvme0n1p1 511M 6.1M 505M 2% /boot/efi
/dev/sdg1 3.6T 3.3T 154M 96% /mnt/parity2
/dev/sdj1 3.6T 3.3T 192M 95% /mnt/data5
/dev/sdi1 3.6T 3.3T 192M 95% /mnt/data6
/dev/sdf1 3.6T 3.2T 221M 94% /mnt/data3
/dev/sdh1 3.6T 3.2T 220M 94% /mnt/data1
/dev/sde1 3.6T 3.2T 213M 94% /mnt/data4
/dev/sdc1 3.6T 3.3T 154M 96% /mnt/parity1
/dev/sdd1 3.6T 3.3T 200M 95% /mnt/data2
/dev/md0 916G 573G 297G 66% /mnt/raid1
make note of all the partitions mounted at /mnt/data*. once installed, all you need to do to combine these drives is add one single line to /etc/fstab:
/mnt/data* /mnt/aggr1 fuse.mergerfs direct_io,defaults,allow_other,category.create=mfs,moveonenospc=true,minfreespace=50G,fsname=mergerfs 0 0
in the above df output, the mergerfs filesystem, /mnt/aggr1 (a NetApp joke for nerds) is the result.
this has many other benefits for me as well, for example, since only the drive that is in use needs to be available, it can save a bit of juice by not keeping the entire array spinning. to be fair, it carries it’s limitations as well but none that are of any concern to me at the moment so it’s been a perfect solution for years.
that’s the current state, or was when I started this novel anyway. I was nearly out of space, my server had all it’s drive bays full and then some (I’ve got 2 SSD’s taped to the side as well). time for the upgrade I’ve been putting off for too long.
first, some backstory, when I started this project years ago I had 1 or 2TB consumer grade drives which I would generally upgrade as they failed. over time I settled on these refurbs because I’d lost so many drives, I was pretty confident in my ability to rebuild. back then, 4tb was a pretty hefty drive and so refurb was the only way I could make it work. I fully expected to be replacing them frequently so I even made sure I kept a spare on hand.
all but one consumer grade drive eventually failed so the plan has been to start swapping out these 4tb drives as they fail. only problem is, these damn things just won’t die. I have a perfectly running refurb drive with nearly 50,000 power_on_hours and it sat in a case that would constantly overhead for YEARS. it pains me to upgrade a perfectly good drive but it’s time, and right now it seems that 10TB drives are the best bang for my buck so three of them arrived at my door.
the upgrade process was easy, and this post is long enough so I’ll get to the point. upgrade will be done as follows:
- format, mount new drive to a temporary mountpoint.
- rsync data from old drive to new
- rsync again to ensure no changes/missed files
- edit /etc/fstab, comment out old drive, add mountpoint for new drive
- reboot
here’s my fstab entries, I like to mount disks via their ID, that way if I change their SATA port or even pull them out and use them in a USB enclosure, it still mounts the drives properly.
#/dev/disk/by-id/ata-Hitachi_HUS724040ALE641_PAKM9NLS-part1 /mnt/parity1 ext4 defaults 0 0
#/dev/disk/by-id/ata-ST4000DM004-2CV104_WFN2AZ0Z-part1 /mnt/parity2 ext4 defaults 0 0
/dev/disk/by-id/ata-HGST_HUH721010ALE600_JEGTGTBZ-part1 /mnt/parity1 ext4 defaults 0 0
/dev/disk/by-id/ata-HGST_HUH721010ALE600_1SJ4X9WZ-part1 /mnt/parity2 ext4 defaults 0 0
#/dev/disk/by-id/ata-HGST_HMS5C4040BLE640_PL2331LAGW4HVJ-part1 /mnt/data1 ext4 defaults 0 0
/dev/disk/by-id/ata-Hitachi_HUS724040ALE641_PAJXAUMX-part1 /mnt/data2 ext4 defaults 0 0
/dev/disk/by-id/ata-HGST_HMS5C4040BLE640_PL2331LAGWLV3J-part1 /mnt/data3 ext4 defaults 0 0
/dev/disk/by-id/ata-Hitachi_HUS724040ALE641_PAJT7VHT-part1 /mnt/data4 ext4 defaults 0 0
/dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN1K65B-part1 /mnt/data5 ext4 defaults 0 0
/dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN1K6FF-part1 /mnt/data6 ext4 defaults 0 0
/dev/disk/by-id/ata-HGST_HUH721010ALE600_JEH4AH2M-part1 /mnt/data1 ext4 defaults 0 0
/mnt/data* /mnt/aggr1 fuse.mergerfs direct_io,defaults,allow_other,category.create=mfs,moveonenospc=true,minfreespace=50G,fsname=mergerfs 0 0
I was well into this plan when it was time for vacation. I’d made it through the replacement of both one parity drive and one data drive but they were in a SATA enclosure next to the server; this was enough to at least get the array back online. I knew I was risking data redundancy if the new data drive grew beyond the smaller 4tb parity capacity (snapraid parity disks need to be equal or larger than the largest data drive) but since there was no incoming data I figured I’d be fine.
and I was thankfully, but I don’t like leaving things in a less than ideal situation and with the basement disaster I was sweating a little. I have since completed the upgrade and have now moved on to a backup project.
here’s the current state:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 234G 110G 112G 50% /
mergerfs 28T 20T 5.9T 78% /mnt/aggr1
/dev/sdc1 9.1T 4.1T 4.6T 47% /mnt/parity1
/dev/sdh1 3.6T 3.2T 237G 94% /mnt/data6
/dev/sdf1 9.1T 4.1T 4.6T 47% /mnt/parity2
/dev/sdg1 9.1T 4.0T 4.6T 47% /mnt/data1
/dev/sdi1 3.6T 3.2T 220G 94% /mnt/data5
/dev/sdj1 3.6T 3.2T 227G 94% /mnt/data2
/dev/sdd1 3.6T 3.2T 240G 94% /mnt/data4
/dev/sde1 3.6T 3.2T 244G 94% /mnt/data3
/dev/md0 916G 577G 294G 67% /mnt/raid1
you can see, I’ve got both parity drives and one data drive upgraded, they are out of the USB enclosure and in the correctly labeled drive bays. the /mnt/aggr1 mountpoint now shows nearly 6tb free, enough space for my newest vacation toy (which will also get it’s own post).
reflecting on this, I’m actually a little proud of myself, this garbage endured and my plans were successful. can’t wait to finish this next project.