Hello unRAID Community!
I had the unfortune to have one of my disks failing during the weekend. The series of events:
OLD Disk 3 shows up as disabled in the unRAID GUI with a red cross and reports as failed by SMART.
The array is stopped and OLD Disk 3 and replaced with NEW Disk 3 in the chassi.
The array is started and data rebuild begins, shortly after unRAID reports read errors with Disk 2, everything is still fine, the array looks OK and the rebuild is doing fine.
Some time later during the rebuild Disk 2 is showing up both as a green disk in the array but also as an Unassigned Device with the ability to mount it. At this stage I touch nothing and let the rebuild finish. See link for how it looked with the Unassiged DIsk. Do note that this picture is taken at a later stage where DIsk 2 has failed and NEW Disk 3 has already been rebuilt. Disk 2 was green during the rebuild. (and yes, I know my disks are too hot at the moment, but it has nothing to do with this topic, I don't question the drive failure itself or the reason behind it). https://imgur.com/a/1aknzNP
When the rebuild of NEW Disk 3 is done it turns green in the array. I SSH to /mnt/disk3/movies/ and ls gives me "Structure needs cleaning"
unRAID Forums tells me it is caused by XFS-corruption, I mount the array in maintenance mode and do an XFS Repair.
After stopping the array from maintenance mode the Start array button is gone and the only ones visible are shutdown and reboot. Seems to be a bug, I reboot the server without first saving diagnostics logs, so I do not have them.
After reboot it is possible to start the array, running ls in /mnt/disk3/movies/ is now possible but all data previously on OLD Disk 3 is gone, the disk is practically empty.
Shortly after this Disk 2 gets disabled with a red cross but SMART show Pass.
XFS Repair gets stuck on:
Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!
attempting to find secondary superblock...
...found candidate secondary superblock...
unable to verify superblock, continuing...
....found candidate secondary superblock...
unable to verify superblock, continuing...
I'm left with loss of data on two disks.
I basically have three questions here:
Why did I loose all the data on Disk 3? I rebuilt it and repaired XFS, why is the data gone? Did the weird behavior shown on Disk 2 actually meant that I lost it during the rebuild and only having one parity disk led to data loss?
What do I do with the current disabled Disk 2? Should I rebuild it? If it actually failed during the rebuild of NEW DIsk 3 I guess the data is gone on that one as well?
I still have the OLD Disk 3 (do not know if it is broken or not), can I add it as an Unassigned Disk and decrypt it in an attempt to retrieve data from it?
Thanks in advance!