DesertCookie Posted February 3, 2021 Share Posted February 3, 2021 (edited) After a recent monthly parity check, one of my drives went into error state. I pulled the data off the drive and removed the drive. Now, running the parity check I get a lot farther in the parity check but now a different disk is throwing 79k errors and went into error state. The parity check paused. The drive originally throwing errors wasn't the most healthy with about 70 bad sectors. It had amassed these bad sectors one year, and three years ago and hasn't gotten any new bad sectors since then. They all were corrected. The second drive's diagnostics are appended. I've had issues with a faulty flash drive recently and had rather a lot of troubles with Unraid. In the past month, I had to redo my drive configuration a bunch of times and thus ran more than half a dozen parity checks - these two being the only one's to throw errors. Edit: Fiveteen minutes after the disk went into error state all other drives connected to that HBA card are gone. I guess it's an issue of my Adaptec ASR-71605 then? Edit: Got an agent notification that the parity sync just aborted without input. It found 21,341,196 errors. What is the recommended course of action here? Edit: I zeroed the two drives throwing errors. One returned to normal operation according to SMART, the other got better. I'll use them for unimportant data. stower20-diagnostics-20210203-1852.zip stower20-smart-20210203-1851.zip Edited February 6, 2021 by DesertCookie added more information Quote Link to comment
JorgeB Posted February 3, 2021 Share Posted February 3, 2021 1 hour ago, DesertCookie said: I guess it's an issue of my Adaptec ASR-71605 then? Most likely: Feb 3 11:22:24 STOWER20 kernel: aacraid: Host bus reset request. SCSI hang ? You can try a different slot if available. Quote Link to comment
DesertCookie Posted February 3, 2021 Author Share Posted February 3, 2021 I will try a reseat when I have physical access to the hardware again. I still have two free x16 slots I can try. For now, after a restart, it picks them all up again. I have found the drive that threw up errors this time to also have some SMART alerts. I have appended the SMART report (first is said drive, second the other drive I already knew to have some issues). stower20-smart-20210203-2040.zip stower20-smart-20210203-2058.zip Quote Link to comment
JorgeB Posted February 4, 2021 Share Posted February 4, 2021 Both don't look good, you can run an extended SMART test to confirm. Quote Link to comment
DesertCookie Posted February 4, 2021 Author Share Posted February 4, 2021 I've pulled both out of the array for the moment; I've ran the extended diagnostics - the original drive with known bad sectors wouldn't even run it. I got errors on a completely healthy drive that went into error mode too. I'll try swapping the HBA card asap! It's an Adaptec ASR-71605 and it's not actively cooled. I suspect that might have caused issues as those are known to run hot. I'm running xfs_repair -vn on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far. Quote Link to comment
itimpi Posted February 4, 2021 Share Posted February 4, 2021 2 hours ago, DesertCookie said: on these 4TB drives. Is it normal for it to take multiple hours and only display dots for an extended period of time? There have been 2M reads on the drive so far. Normally these checks only take seconds/minutes. What device name did you use? Getting the dots suggests that the master superblock was not found and a search is being made to try and find a backup copy. Quote Link to comment
JorgeB Posted February 4, 2021 Share Posted February 4, 2021 3 hours ago, DesertCookie said: Is it normal for it to take multiple hours and only display dots for an extended period of time? Make sure you sue the md device if they are still in the array or you need to specify the partition if not, e.g.: xfs_repair /dev/md1 xfs_repair /dev/sdX1 Quote Link to comment
DesertCookie Posted February 4, 2021 Author Share Posted February 4, 2021 (edited) 2 hours ago, itimpi said: What device name did you use? I used the name displayed in the web ui of "sdh". 1 hour ago, JorgeB said: Make sure you sue the md device if they are still in the array [...]. I used "/dev/sdh". I see how I might have messed up... Edited February 4, 2021 by DesertCookie Quote Link to comment
DesertCookie Posted February 4, 2021 Author Share Posted February 4, 2021 Alright, the repairs runs correctly. I don't know if it actually did anything to releviate the drive errors that seem to move with the data from disk to disk. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.