Hi all - first off, here is the overview of my server:
Unraid v: 6.9.2 (trial)
Plugins: CA, FCP, My Servers, Nerd Tools, Tips and Tweaks, Unassigned Devices, Dynamix System Buttons
No dockers or VMs
Hardware: ASUS P8B-E/4L mobo, Xeon CPU E3-1220 V2 @ 3.10GHz, 16 GB ECC RAM, 6 total disks all connected through onboard Intel C204 chipset controller
Array:
Parity: (1) WD60EFZX - 6TB
Data disks: (4) WD30EFRX - 3TB
Cache: (1) Samsung 870 EVO 500GB
Background: I originally set this hardware up about 9 years ago with Win Server 2012 Essentials and it pretty much ran fine until recently. The OS drive (Samsung 840 Pro) seemed to have a disk error and became unbootable. I restored the backup to a new 870 EVO, but then started having a problem with one of the disks in the storage pool (a WD30EFRX). Since I knew I was overdue to move to a new OS (that wasn't 10 years old!), I decided to scrap the whole thing and try Unraid instead of another version of Win Server.
I set the Unraid server up last week with a new WD60EFZX (6TB) as the parity drive and the 870 EVO (basically brand new) as the cache drive. I used 4 WD30EFRX's for the data disks. During the first parity check, on of my WD30EFRX's had read errors. I checked all the cables and replaced it with another (I have 6 total) - not the one that had the errors previously in the Win Server setup. This time around the parity check was all good so I started up the array and began restoring all my data.
Problem: After restoring all my data, things seemed to be running well. Then, I started working on setting up backups to iDrive using their Linux scripts. I got it pretty much figured out and ran two selective backups (as tests) successfully. On the third one, it hung up and then I lost connectivity to the WebGUI. I logged into the console directly but I could not get it to shut down cleanly - I read a lot pages with explanations on how to do this, but apparently I didn't get all the commands to run successfully before rebooting.
After it rebooted, a parity check immediately started. During that check, Disk 3 started showing a ton of read errors. When it got to something like 48k read errors, Unraid disabled the disk and the rest of the parity check finished with no other errors (all other disks show healthy). I stopped the array, downloaded diagnostics, then shut down the server and checked all cables. I restarted the server and tried to run an extended SMART test on disk 3. The test hardly started before coming back with errors and stopping.
Unfortunately I did not get a diagnostics before the unclean restart, but I attached the diagnostics from after the parity check along with the SMART test report for Disk 3.
Questions:
1) Based on the attached reports, is disk 3 definitely toast? (I assume it is)
2) Assuming yes to 1, then is this the proper procedure to follow to replace it: https://wiki.unraid.net/Manual/Storage_Management#Replacing_failed.2Fdisabled_disk.28s.29
3) Should I assume the other old WD30EFRX's are likely to fail soon also? Essentially, I have lost 3 of 6 so far. I am running extended SMART tests on them currently.
4) My plan is to add a second parity disk (another WD60EFZX 6TB), a second cache disk (another 870 EVO 500GB) and replace Disk 3 with a new WD60EFZX 6TB. But I am wondering if the other 3 WD30EFRX's are ticking time bombs and I should just decommission and replace all of them?
That's all I can think of for now. I really appreciate any help and insights since I am new to Unraid and drinking the manual, guides, etc. like a fire hose currently, but still very green at this point!
bnt-unraid-smart-20220223-0909.zip
bnt-unraid-diagnostics-20220223-0908.zip