November 26, 20205 yr I had been running Unraid 6.8.3 for a few months now without any issue in my HPE MicroServer Gen10 with 4 drives (5TB,4TB,2x2TB) and SSD cache, but recently bought two 14TB Western Digitals to replace all the existing drives. After pre-clearing each of the 14TB drives successfully in their external enclosures and then moving off everything I needed to a backup drive, I reset the config and installed the two 14TB drives and let it run the parity check. During the parity check I was alerted to errors and looked at the UI to see it showing millions (billions?) of reads/writes on the two drives in the array along with errors, and then the two drives showed up in the unassigned devices list instead of being in the array. Looking in the logs around the time this happened I see it has lines about hard resetting the sata link and the link being down (11/24/2020 22:08). This is actually the second time this has happened and after the first time I double checked all the connections in the drive cage and the connection to the motherboard on the breakout cable the server uses to ensure it was all connected but forgot to download the logs before shutting it down. I'm not sure what else to check or what could be going on. Is this a possible issue related to the Marvel 88SE9230 controller that this server uses? Attached are the diagnostics. allthethings-diagnostics-20201124-2244.zip Edited November 27, 20205 yr by bptillman
November 26, 20205 yr Community Expert Quite likely to be due to the Marvel controller. They are known for dropping drives for no obvious reason.
November 26, 20205 yr Author That's what I was starting to think. Edited November 27, 20205 yr by bptillman
November 27, 20205 yr Author So I added in a LSI SAS9211 controller card and plugged the two 14TB drives into it instead and ran parity sync and after almost an hour I started getting disk read errors. Nothing with millions of errors like the last time but still errors. I didn't see anything in the log about the sata link being dropped this time though. Attached are the diagnostic logs. allthethings-diagnostics-20201126-2314.zip
November 27, 20205 yr Community Expert There's no SMART report for disk1, check connections and post new diags.
November 27, 20205 yr Author I re-ran SMART test on both disk1 and parity which showed success. But then I went and did a second SMART on disk1 and it for some reason spun down during the smart test, so I spun it back up and started the test again and it passed. I double checked the spin down delay in settings and its set to 4 hours currently and the disk1 is set to use default so I'm not sure why it spun down since the server had only been online for half an hour at that point. Attached are the latest diagnostics after all this. allthethings-diagnostics-20201127-1129.zip Edited November 27, 20205 yr by bptillman updated for running second smart test
November 27, 20205 yr Community Expert Disk looks fine, most likely a connection issue, you can replace/swap cables and try again.
November 27, 20205 yr Author 7 minutes ago, JorgeB said: Disk looks fine, most likely a connection issue, you can replace/swap cables and try again. Okay I'll see about finding a replacement breakout cable and test things again once I get it.
Archived
This topic is now archived and is closed to further replies.