bptillman Posted November 26, 2020 Share Posted November 26, 2020 (edited) I had been running Unraid 6.8.3 for a few months now without any issue in my HPE MicroServer Gen10 with 4 drives (5TB,4TB,2x2TB) and SSD cache, but recently bought two 14TB Western Digitals to replace all the existing drives. After pre-clearing each of the 14TB drives successfully in their external enclosures and then moving off everything I needed to a backup drive, I reset the config and installed the two 14TB drives and let it run the parity check. During the parity check I was alerted to errors and looked at the UI to see it showing millions (billions?) of reads/writes on the two drives in the array along with errors, and then the two drives showed up in the unassigned devices list instead of being in the array. Looking in the logs around the time this happened I see it has lines about hard resetting the sata link and the link being down (11/24/2020 22:08). This is actually the second time this has happened and after the first time I double checked all the connections in the drive cage and the connection to the motherboard on the breakout cable the server uses to ensure it was all connected but forgot to download the logs before shutting it down. I'm not sure what else to check or what could be going on. Is this a possible issue related to the Marvel 88SE9230 controller that this server uses? Attached are the diagnostics. allthethings-diagnostics-20201124-2244.zip Edited November 27, 2020 by bptillman Quote Link to comment
itimpi Posted November 26, 2020 Share Posted November 26, 2020 Quite likely to be due to the Marvel controller. They are known for dropping drives for no obvious reason. Quote Link to comment
bptillman Posted November 26, 2020 Author Share Posted November 26, 2020 (edited) That's what I was starting to think. Edited November 27, 2020 by bptillman Quote Link to comment
bptillman Posted November 27, 2020 Author Share Posted November 27, 2020 So I added in a LSI SAS9211 controller card and plugged the two 14TB drives into it instead and ran parity sync and after almost an hour I started getting disk read errors. Nothing with millions of errors like the last time but still errors. I didn't see anything in the log about the sata link being dropped this time though. Attached are the diagnostic logs. allthethings-diagnostics-20201126-2314.zip Quote Link to comment
JorgeB Posted November 27, 2020 Share Posted November 27, 2020 There's no SMART report for disk1, check connections and post new diags. Quote Link to comment
bptillman Posted November 27, 2020 Author Share Posted November 27, 2020 (edited) I re-ran SMART test on both disk1 and parity which showed success. But then I went and did a second SMART on disk1 and it for some reason spun down during the smart test, so I spun it back up and started the test again and it passed. I double checked the spin down delay in settings and its set to 4 hours currently and the disk1 is set to use default so I'm not sure why it spun down since the server had only been online for half an hour at that point. Attached are the latest diagnostics after all this. allthethings-diagnostics-20201127-1129.zip Edited November 27, 2020 by bptillman updated for running second smart test Quote Link to comment
JorgeB Posted November 27, 2020 Share Posted November 27, 2020 Disk looks fine, most likely a connection issue, you can replace/swap cables and try again. Quote Link to comment
bptillman Posted November 27, 2020 Author Share Posted November 27, 2020 7 minutes ago, JorgeB said: Disk looks fine, most likely a connection issue, you can replace/swap cables and try again. Okay I'll see about finding a replacement breakout cable and test things again once I get it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.