ridewithjoe Posted January 4, 2018 Share Posted January 4, 2018 I have a dilemma I need some help with resolving. I have an array with 16 drives. About 3 times in 4 months I have had drive 15 drop out and need a rebuild. Generally this is not a problem but the drive tests good so I have been waiting for further symptoms to diagnose. Well, it may be related and may not but I was about 3% into the rebuild and lost drive 11. Now my challenge is how to get the array online without compromising data. Obviously a second parity drive would resolve yeh problem but unfortunately I only have a single parity drive at the moment. Screen shot and diags attached if anyone has any thoughts. nasvm-diagnostics-20180104-1848.zip Quote Link to comment
JorgeB Posted January 5, 2018 Share Posted January 5, 2018 Disk11 dropped offline, there's no SMART, post new diags after power cycling the server, though it's using a fiber connection, and SMART reports are like for SAS drives, I'd like to see a standard SATA SMART report. Quote Link to comment
ridewithjoe Posted January 5, 2018 Author Share Posted January 5, 2018 Swapped drive 11 with drive 2. Diags attached. Interesting that the system sees it now. nasvm-diagnostics-20180105-1043.zip Quote Link to comment
JorgeB Posted January 5, 2018 Share Posted January 5, 2018 There are no pending sectors but SMART shows triple digits raw read error rate, not good on a WD disk, anything above single digits is bad news. You can try re-enabling that disk and rebuild disk15 again and see how it holds up, but only if the array is 100% unchanged (this includes docker and/or VMs using the array) since before the second disk got disable. Quote Link to comment
ridewithjoe Posted January 5, 2018 Author Share Posted January 5, 2018 OK I was able to re-enable the disk in the JBOD controller and I disabled SMART monitoring for now. I disabled docker and VM support in UNRAID. All the drives are online again and I'm rebuilding drive 15. It will take awhile but by tomorrow my two new 8TB drives should arrive. One for replacing drive 11 and second parity drive to reduce the chances of this happening again. I love UNRAID but when things go wrong it's nerve wracking. I should have looked a little deeper instead of panicking, I could have done this yesterday. 2% so far..... 17 hours to go. Thanks for the guidance. Quote Link to comment
ridewithjoe Posted January 8, 2018 Author Share Posted January 8, 2018 OK... well I rebuilt drive 15. By then my 2 new 8TB drive arrived so I replaced the bad drive 11 and added a second parity drive...... then rebuilt those. Now the array is all in sync but I fear I have corruption. I cannot add change or remove shares. My appadata share give an I/O error when trying to add or change anything. Diagnostics attached. I'm not sure what my viable options are at this point. Drive 11 and possibly drive 15 are corrupted I'm guessing. nasvm-diagnostics-20180107-1959.zip Quote Link to comment
JorgeB Posted January 8, 2018 Share Posted January 8, 2018 Disk15 at least has filesystem corruption, run xfs_repair and best to reboot when it's done to clear the logs, if more issues grab and post new diagnostics. http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS Quote Link to comment
ridewithjoe Posted January 8, 2018 Author Share Posted January 8, 2018 Done. Disk 15 was the only one that seem to have a lot of issues. Diags attached nasvm-diagnostics-20180108-0911.zip Quote Link to comment
JorgeB Posted January 8, 2018 Share Posted January 8, 2018 Everything looks fine so far, some abnormal entries which I assume are related to your controller. Quote Link to comment
ridewithjoe Posted January 8, 2018 Author Share Posted January 8, 2018 Yes.... it seems i have a bunch of files in Lost and Found on drive 15 but hopefully nothing too important that I cannot recover. Thanks for the help. Hopefully I have enough redundancy now for a bit. I have to watch these older drives more carefully and swap them before they puke. I have 94TB online right now. It's painful when things go wrong. Quote Link to comment
JorgeB Posted January 8, 2018 Share Posted January 8, 2018 30 minutes ago, ridewithjoe said: I have 94TB online right now. It's painful when things go wrong. Yep, dual parity helps, but don't forget unRAID is not a backup. Quote Link to comment
ridewithjoe Posted January 8, 2018 Author Share Posted January 8, 2018 Trust me... that I know..... The critical stuff goes to an external drive as well but thats much less than the 78 TB I have Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.