Hard drive dropped during parity sync


Recommended Posts

I had been running Unraid 6.8.3 for a few months now without any issue in my HPE MicroServer Gen10 with 4 drives (5TB,4TB,2x2TB) and SSD cache, but recently bought two 14TB Western Digitals to replace all the existing drives. After pre-clearing each of the 14TB drives successfully in their external enclosures and then moving off everything I needed to a backup drive, I reset the config and installed the two 14TB drives and let it run the parity check. During the parity check I was alerted to errors and looked at the UI to see it showing millions (billions?) of reads/writes on the two drives in the array along with errors, and then the two drives showed up in the unassigned devices list instead of being in the array. Looking in the logs around the time this happened I see it has lines about hard resetting the sata link and the link being down (11/24/2020 22:08). This is actually the second time this has happened and after the first time I double checked all the connections in the drive cage and the connection to the motherboard on the breakout cable the server uses to ensure it was all connected but forgot to download the logs before shutting it down. I'm not sure what else to check or what could be going on. Is this a possible issue related to the Marvel 88SE9230 controller that this server uses?

 

Attached are the diagnostics.

allthethings-diagnostics-20201124-2244.zip

Edited by bptillman
Link to comment

So I added in a LSI SAS9211 controller card and plugged the two 14TB drives into it instead and ran parity sync and after almost an hour I started getting disk read errors. Nothing with millions of errors like the last time but still errors. I didn't see anything in the log about the sata link being dropped this time though. Attached are the diagnostic logs.

allthethings-diagnostics-20201126-2314.zip

Link to comment

I re-ran SMART test on both disk1 and parity which showed success. But then I went and did a second SMART on disk1 and it for some reason spun down during the smart test, so I spun it back up and started the test again and it passed. I double checked the spin down delay in settings and its set to 4 hours currently and the disk1 is set to use default so I'm not sure why it spun down since the server had only been online for half an hour at that point. Attached are the latest diagnostics after all this.

 

allthethings-diagnostics-20201127-1129.zip

Edited by bptillman
updated for running second smart test
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.