Scheduled Parity Aborts with 2 disabled drives!


Recommended Posts

Hey guys, feels like I've been here a lot lately. I thought I got my array just about in working order. This error is a new one for me.

 

My array was starting a scheduled parity scan at some point in the early AM, I woke up to find 2 newer drives have been disabled and show an alarming amount of writes. The parity scan canceled with 0 errors at only 5 minutes in. There doesn't appear to be any SMART errors on the affected drives but I'm afraid they will need to be rebuilt regardless. I really am not sure what happened or how to prevent this in the future. 

 

Hopefully someone wiser can shed some light on what happened? Was it a controller issue?

 

Unraid 6.6.0

Dual Parity

 

 

tower-diagnostics-20191001-1255-Anon.zip

Link to comment

One of my hopeful replacement drives is failing a post read after clearing. I have the feeling there may be something wrong with my HBA controller, a supposed genuine "LSI Logic SAS 9207-8i Storage Controller with SAS/SATA Cables H3-25412-00G" I picked up on Ebay this summer. I've attached the log from the preclear, the drive itself has passed an extended SMART test without error. I just get the feeling the i/o errors from my OP and this error now trying to preclear a replacement drive is all pointing to this HBA card. I'm at a loss. I re-seated and checked all cables, even replaced the mini-sas breakout cables with new ones to see if that was an issue. One of my OP drives is being rebuilt (crosses fingers) or i'd swap the disk from the errors below to a different bay to see if it's one affected part of my back-plane/case. 

preclearlog.PNG

tower-smart-20191003-1440.zip

Edited by AnyColourYouLike
Link to comment

This whole thing is a mess. I got one disk replaced, it rebuilt with only 390,116 errors 😫. I assume it was from what looks like (read?)errors attributed to disk 4, Which btw was a recently replaced fresh drive with currently only 2 months power on hours. The next replacement drive was supposed to be the drive above in my second post that originally got kicked out in my last thread 2 months ago, it passed extended SMART tests and never showed any reallocated sectors or notable issues, it is older though at 1yr power on hours. That drive did finally successfully clear so I began a rebuild process but that failed about 2 minutes in with errors similar to above, I/O and write errors. I really don't know what is happening, the disks all seem to be fine on their own, I can mount the 2 failed drives from my OP here and no errors are present. But my array is all sorts of messed up. I wondering what do do because it seems very fragile, I brace for every parity scan that it won't rack up a million errors or disable drives altogether. I can't afford all new hardware or I'd just build a new server that didn't have me questioning all the individual variables like my LSI card, 750w PSU, Kingwin KM-5000 adapters backplates and even RAM. 

tower-diagnostics-20191007-1506.zip

Link to comment
4 hours ago, Squid said:

The real key thing here is to reseat everything to the drives and mobo.  Sata connectors aren't very robust, and the slightest touch when you're replacing a drive can cause this.

I did precisely this today, removed everything and air dusted it all out. Trying to preclear and rebuild again, i'll report back hopefully this week. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.