Error state during parity check

DesertCookie · February 3, 2021

After a recent monthly parity check, one of my drives went into error state. I pulled the data off the drive and removed the drive. Now, running the parity check I get a lot farther in the parity check but now a different disk is throwing 79k errors and went into error state. The parity check paused.

The drive originally throwing errors wasn't the most healthy with about 70 bad sectors. It had amassed these bad sectors one year, and three years ago and hasn't gotten any new bad sectors since then. They all were corrected. The second drive's diagnostics are appended.

I've had issues with a faulty flash drive recently and had rather a lot of troubles with Unraid. In the past month, I had to redo my drive configuration a bunch of times and thus ran more than half a dozen parity checks - these two being the only one's to throw errors.

Edit: Fiveteen minutes after the disk went into error state all other drives connected to that HBA card are gone. I guess it's an issue of my Adaptec ASR-71605 then?

Edit: Got an agent notification that the parity sync just aborted without input. It found 21,341,196 errors.

What is the recommended course of action here?

Edit: I zeroed the two drives throwing errors. One returned to normal operation according to SMART, the other got better. I'll use them for unimportant data.

stower20-diagnostics-20210203-1852.zip stower20-smart-20210203-1851.zip

Edited February 6, 2021 by DesertCookie
added more information

JorgeB · February 3, 2021

1 hour ago, DesertCookie said:

I guess it's an issue of my Adaptec ASR-71605 then?

Most likely:

Feb  3 11:22:24 STOWER20 kernel: aacraid: Host bus reset request. SCSI hang ?

You can try a different slot if available.

DesertCookie · February 3, 2021

I will try a reseat when I have physical access to the hardware again. I still have two free x16 slots I can try. For now, after a restart, it picks them all up again.

I have found the drive that threw up errors this time to also have some SMART alerts. I have appended the SMART report (first is said drive, second the other drive I already knew to have some issues).

stower20-smart-20210203-2040.zip stower20-smart-20210203-2058.zip

JorgeB · February 4, 2021

Both don't look good, you can run an extended SMART test to confirm.

DesertCookie · February 4, 2021

I've pulled both out of the array for the moment; I've ran the extended diagnostics - the original drive with known bad sectors wouldn't even run it. I got errors on a completely healthy drive that went into error mode too. I'll try swapping the HBA card asap! It's an Adaptec ASR-71605 and it's not actively cooled. I suspect that might have caused issues as those are known to run hot.

I'm running

xfs_repair -vn

on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far.

itimpi · February 4, 2021

2 hours ago, DesertCookie said:

on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far.

Normally these checks only take seconds/minutes. What device name did you use? Getting the dots suggests that the master superblock was not found and a search is being made to try and find a backup copy.

JorgeB · February 4, 2021

3 hours ago, DesertCookie said:

Is it normal for it to take multiple hours and only display dots for an extended period of time?

Make sure you sue the md device if they are still in the array or you need to specify the partition if not, e.g.:

xfs_repair /dev/md1

xfs_repair /dev/sdX1

DesertCookie · February 4, 2021

2 hours ago, itimpi said:

What device name did you use?

I used the name displayed in the web ui of "sdh".

1 hour ago, JorgeB said:

Make sure you sue the md device if they are still in the array [...].

I used "/dev/sdh". I see how I might have messed up...

Edited February 4, 2021 by DesertCookie

DesertCookie · February 4, 2021

Alright, the repairs runs correctly. I don't know if it actually did anything to releviate the drive errors that seem to move with the data from disk to disk.

Error state during parity check

Recommended Posts

DesertCookie

Link to comment

JorgeB

Link to comment

DesertCookie

Link to comment

JorgeB

Link to comment

DesertCookie

Link to comment

itimpi

Link to comment

JorgeB

Link to comment

DesertCookie

Link to comment

DesertCookie

Link to comment

Join the conversation