AnyColourYouLike Posted October 1, 2019 Share Posted October 1, 2019 Hey guys, feels like I've been here a lot lately. I thought I got my array just about in working order. This error is a new one for me. My array was starting a scheduled parity scan at some point in the early AM, I woke up to find 2 newer drives have been disabled and show an alarming amount of writes. The parity scan canceled with 0 errors at only 5 minutes in. There doesn't appear to be any SMART errors on the affected drives but I'm afraid they will need to be rebuilt regardless. I really am not sure what happened or how to prevent this in the future. Hopefully someone wiser can shed some light on what happened? Was it a controller issue? Unraid 6.6.0 Dual Parity tower-diagnostics-20191001-1255-Anon.zip Quote Link to comment
AnyColourYouLike Posted October 3, 2019 Author Share Posted October 3, 2019 (edited) One of my hopeful replacement drives is failing a post read after clearing. I have the feeling there may be something wrong with my HBA controller, a supposed genuine "LSI Logic SAS 9207-8i Storage Controller with SAS/SATA Cables H3-25412-00G" I picked up on Ebay this summer. I've attached the log from the preclear, the drive itself has passed an extended SMART test without error. I just get the feeling the i/o errors from my OP and this error now trying to preclear a replacement drive is all pointing to this HBA card. I'm at a loss. I re-seated and checked all cables, even replaced the mini-sas breakout cables with new ones to see if that was an issue. One of my OP drives is being rebuilt (crosses fingers) or i'd swap the disk from the errors below to a different bay to see if it's one affected part of my back-plane/case. tower-smart-20191003-1440.zip Edited October 3, 2019 by AnyColourYouLike Quote Link to comment
JorgeB Posted October 5, 2019 Share Posted October 5, 2019 On 10/3/2019 at 10:43 PM, AnyColourYouLike said: or i'd swap the disk from the errors below to a different bay to see if it's one affected part of my back-plane/case. That's what I would suggest, since it looks more like a connection/power issue than controller problem. Quote Link to comment
AnyColourYouLike Posted October 7, 2019 Author Share Posted October 7, 2019 This whole thing is a mess. I got one disk replaced, it rebuilt with only 390,116 errors 😫. I assume it was from what looks like (read?)errors attributed to disk 4, Which btw was a recently replaced fresh drive with currently only 2 months power on hours. The next replacement drive was supposed to be the drive above in my second post that originally got kicked out in my last thread 2 months ago, it passed extended SMART tests and never showed any reallocated sectors or notable issues, it is older though at 1yr power on hours. That drive did finally successfully clear so I began a rebuild process but that failed about 2 minutes in with errors similar to above, I/O and write errors. I really don't know what is happening, the disks all seem to be fine on their own, I can mount the 2 failed drives from my OP here and no errors are present. But my array is all sorts of messed up. I wondering what do do because it seems very fragile, I brace for every parity scan that it won't rack up a million errors or disable drives altogether. I can't afford all new hardware or I'd just build a new server that didn't have me questioning all the individual variables like my LSI card, 750w PSU, Kingwin KM-5000 adapters backplates and even RAM. tower-diagnostics-20191007-1506.zip Quote Link to comment
Squid Posted October 7, 2019 Share Posted October 7, 2019 The real key thing here is to reseat everything to the drives and mobo. Sata connectors aren't very robust, and the slightest touch when you're replacing a drive can cause this. Quote Link to comment
AnyColourYouLike Posted October 8, 2019 Author Share Posted October 8, 2019 4 hours ago, Squid said: The real key thing here is to reseat everything to the drives and mobo. Sata connectors aren't very robust, and the slightest touch when you're replacing a drive can cause this. I did precisely this today, removed everything and air dusted it all out. Trying to preclear and rebuild again, i'll report back hopefully this week. Quote Link to comment
JorgeB Posted October 8, 2019 Share Posted October 8, 2019 Diags are after rebooting and don't cover any of the issues you describe, they reset after every reboot. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.