Error state during parity check


Recommended Posts

After a recent monthly parity check, one of my drives went into error state. I pulled the data off the drive and removed the drive. Now, running the parity check I get a lot farther in the parity check but now a different disk is throwing 79k errors and went into error state. The parity check paused.

 

The drive originally throwing errors wasn't the most healthy with about 70 bad sectors. It had amassed these bad sectors one year, and three years ago and hasn't gotten any new bad sectors since then. They all were corrected. The second drive's diagnostics are appended.

I've had issues with a faulty flash drive recently and had rather a lot of troubles with Unraid. In the past month, I had to redo my drive configuration a bunch of times and thus ran more than half a dozen parity checks - these two being the only one's to throw errors.

 

Edit: Fiveteen minutes after the disk went into error state all other drives connected to that HBA card are gone. I guess it's an issue of my Adaptec ASR-71605 then?

 

Edit: Got an agent notification that the parity sync just aborted without input. It found 21,341,196 errors.

 

What is the recommended course of action here?

 

 

Edit: I zeroed the two drives throwing errors. One returned to normal operation according to SMART, the other got better. I'll use them for unimportant data.

stower20-diagnostics-20210203-1852.zip stower20-smart-20210203-1851.zip

Edited by DesertCookie
added more information
Link to comment

I will try a reseat when I have physical access to the hardware again. I still have two free x16 slots I can try. For now, after a restart, it picks them all up again.

 

I have found the drive that threw up errors this time to also have some SMART alerts. I have appended the SMART report (first is said drive, second the other drive I already knew to have some issues).

stower20-smart-20210203-2040.zip stower20-smart-20210203-2058.zip

Link to comment

I've pulled both out of the array for the moment; I've ran the extended diagnostics - the original drive with known bad sectors wouldn't even run it. I got errors on a completely healthy drive that went into error mode too. I'll try swapping the HBA card asap! It's an Adaptec ASR-71605 and it's not actively cooled. I suspect that might have caused issues as those are known to run hot.

I'm running 

xfs_repair -vn

on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far.

Link to comment
2 hours ago, DesertCookie said:

on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far.

 

Normally these checks only take seconds/minutes.  What device name did you use?   Getting the dots suggests that the master superblock was not found and a search is being made to try and find a backup copy.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.