Sharing a recent experience that might help others.
The tl;dr
- Don't panic. Unless your drive is clickety-clunking and you don't have offline/offsite backups of the important stuff.
- Don't use 8yr old drives.
- Spend the couple of minutes it takes to configure notifications
- Knowing how to use dd or ddrescue, mount and xfs_repair can save the day.
Had an old (8yr) 750gb drive disappear from its slot in the array. I power cycled, still not there but the disk appeared in unassigned devices. I mounted it from there to check its contents. Empty. SMART report showed a bunch of bad sectors.
I unmounted it, stopped the array and tried to re-add the drive. The drive was allocated to the slot but unRAID couldn't mount it and wanted to format it, stating doing so would erase the contents. I noticed that some files from recent work had disappeared from the array, so I knew unraid wasn't emulating the contents of the failed disk. I can only suspect the disk in its moments somehow lost its contents and unRAID dutifully updated the parity drive to reflect the 'deleted' files.
I have offline/offsite backups of the critical stuff so I knew I'd only lost data I'd have to re-obtain from other sources, however I didn't have a firm understanding 'what' had been lost. I set about getting the data back.
I replaced the 750gb drive with a new 3TB dive. Once it was running I created a dd image file of the failed disk to my new 3TB disk, as it had enough free space for a 750gb file.
dd if=/dev/sdm of=/mnt/disk2/drive.img
I left this copy overnight. Lucky for me dd worked fine, ddrescue might give better results to do the same job if you have it on your system.
I tried mounting the image file:
mkdir /mnt/loop
mount -o ro,loop,offset=32256 harddrive.img /mnt/loop
But mount failed, saying the structure needed cleaning first.
So then I made a loop device:
losetup --offset 32256 /dev/loop2 harddrive.img
(thanks to https://major.io/2010/12/14/mounting-a-raw-partition-file-made-with-dd-or-dd_rescue-in-linux/)
Tried to repair this
xfs_repair /dev/loop2
This failed, saying there was data in the log file that should be written to disk first by mounting the drive, if unable to mount the drive to try the -L option to purge the log.
xfs_repair -L /dev/loop2
Log purged, XFS repair process completed, some errors but it did complete.
Finally I could mount this image
mount /dev/loop2 /mnt/loop
The good news - the bulk of my data was there Almost my entire directory structure was back, with a bunch of random things in 'lost+found' I can sort through on a rainy day.
The above procedure could (and probably should) be done on a separate system with a linux boot USB, I unfortunately didn't have one available with enough space so had to utilize my unRAID box.
I've now correctly configured notifications to alert me of disk warnings, and will be proactively replacing old (>5yr) disks. Lesson learned.
My experience may be a testimony to the caveat of using inexpensive (desktop) disks, particularly old ones. But hey that's why there's an I in unRAID