Hard drive dropped during parity sync

bptillman · November 26, 2020

I had been running Unraid 6.8.3 for a few months now without any issue in my HPE MicroServer Gen10 with 4 drives (5TB,4TB,2x2TB) and SSD cache, but recently bought two 14TB Western Digitals to replace all the existing drives. After pre-clearing each of the 14TB drives successfully in their external enclosures and then moving off everything I needed to a backup drive, I reset the config and installed the two 14TB drives and let it run the parity check. During the parity check I was alerted to errors and looked at the UI to see it showing millions (billions?) of reads/writes on the two drives in the array along with errors, and then the two drives showed up in the unassigned devices list instead of being in the array. Looking in the logs around the time this happened I see it has lines about hard resetting the sata link and the link being down (11/24/2020 22:08). This is actually the second time this has happened and after the first time I double checked all the connections in the drive cage and the connection to the motherboard on the breakout cable the server uses to ensure it was all connected but forgot to download the logs before shutting it down. I'm not sure what else to check or what could be going on. Is this a possible issue related to the Marvel 88SE9230 controller that this server uses?

Attached are the diagnostics.

allthethings-diagnostics-20201124-2244.zip

Edited November 27, 2020 by bptillman

itimpi · November 26, 2020

Quite likely to be due to the Marvel controller. They are known for dropping drives for no obvious reason.

bptillman · November 26, 2020

That's what I was starting to think.

Edited November 27, 2020 by bptillman

bptillman · November 27, 2020

So I added in a LSI SAS9211 controller card and plugged the two 14TB drives into it instead and ran parity sync and after almost an hour I started getting disk read errors. Nothing with millions of errors like the last time but still errors. I didn't see anything in the log about the sata link being dropped this time though. Attached are the diagnostic logs.

allthethings-diagnostics-20201126-2314.zip

JorgeB · November 27, 2020

There's no SMART report for disk1, check connections and post new diags.

bptillman · November 27, 2020

I re-ran SMART test on both disk1 and parity which showed success. But then I went and did a second SMART on disk1 and it for some reason spun down during the smart test, so I spun it back up and started the test again and it passed. I double checked the spin down delay in settings and its set to 4 hours currently and the disk1 is set to use default so I'm not sure why it spun down since the server had only been online for half an hour at that point. Attached are the latest diagnostics after all this.

allthethings-diagnostics-20201127-1129.zip

Edited November 27, 2020 by bptillman
updated for running second smart test

JorgeB · November 27, 2020

Disk looks fine, most likely a connection issue, you can replace/swap cables and try again.

bptillman · November 27, 2020

7 minutes ago, JorgeB said:

Disk looks fine, most likely a connection issue, you can replace/swap cables and try again.

Okay I'll see about finding a replacement breakout cable and test things again once I get it.

Hard drive dropped during parity sync

Recommended Posts

bptillman

Link to comment

itimpi

Link to comment

bptillman

Link to comment

bptillman

Link to comment

JorgeB

Link to comment

bptillman

Link to comment

JorgeB

Link to comment

bptillman

Link to comment

Join the conversation