Sanity Check for Read Errors

April 28, 20233 yr

I think I've been pretty lucky with my Unraid server, this is the first time I've received alerts for read errors, so I'm dealing with this for the first time. I received an alert one of the hard drives in my array had 32 errors last night. I shut down all the containers running on the server so nothing has been writing to the array for the past few hours. I checked SMART and ran an extended SMART test (took about 10 hours) and it seems it all completed without error. In the disk log I see a single error which corresponds to the time when I received the alert:

Apr 26 02:06:51 Tower kernel: I/O error, dev sdf, sector 9206914224 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0

I've also attached the SMART report for this drive. From what I can tell it looks like the drive might be fine. I dumped all the diagnostic information too, but it's a little overwhelming, nothing really jumped out at me.

Based on what I've read, it seems like the drive may be ok, and I should reboot to clear the statistics and keep an eye on the drive. I actually just bought a new 8TB hard drive that was pre-clearing when the errors were reported. So I was contemplating swapping the drive with errors with the new one to be safe, but I was thinking maybe I should run another parity check to check the entire array (with corrections turned off)? Last check was completed on the first of the month with no errors.

When researching this, it seems like an error like this could be caused by data / power cables. I have my hard drives in a server chassis with hot swap bays and they connect to the motherboard using this card:

https://www.amazon.com/dp/B002RL8I7M/?coliid=I1JDPV4RBNXVPJ&colid=3RNCTRI7WGC20&psc=1&ref_=lv_ov_lig_dp_it

and these cables:

https://www.amazon.com/dp/B07CKXFKHT/?coliid=IJ7PN6L2328TJ&colid=3RNCTRI7WGC20&ref_=lv_ov_lig_dp_it&th=1

Thought I'd mention it in case someone spotted some issues with the cards / cables I might be using. When I shut down the server, I was going to check cable connections to make things are seated right, but I don't see errors with other drives, so maybe not the likely cause. Here's what I was thinking of doing:

1. Shut down the server and check connections.

2. Reboot and start running a parity check with corrections turned off.

3. Depending on the parity check results, run tests on the RAM.

If all that completes without issue, I'm thinking about leaving the disk in the array, but part of me wants to swap it for the new drive. Any thoughts or observations? I can attach the entire diagnostic output if it would help. Any help is appreciated.

ST8000VN004-2M2101_WKD37GL8-20230428-0444.txt

Quote

April 28, 20233 yr

Community Expert

24 minutes ago, Geth said:

attach the entire diagnostic output

Yes post that zip file.

Quote

April 28, 20233 yr

Author

Here's the entire output. Thanks for checking it out.

tower-diagnostics-20230428-0733.zip

Quote

April 28, 20233 yr

Community Expert

Since it happened at spin up it's likely this issue:

Quote

April 28, 20233 yr

Author

Interesting, let me read through that, I only have two seagate drives so I forget it's not all western digital. Maybe I should swap them with western digital until things work better...

Quote

May 1, 20233 yr

Author

@trurl and @JorgeB Alright the plot has thickened. I was preparing to replace the two seagate drives and the monthly parity check started this am and I didn't realize. It was about 40% through the parity check when 4 drives suddenly showed ~3 million read errors. One of them was the same seagate drive, but 3 other non-seagate drives had errors too. I cancelled the parity check (which maybe I shouldn't have) and rebooted and the read errors are gone. I want to replace the seagate drives, but now I'm worried parity isn't correct and if I try to rebuild their contents on a new disk it might not rebuild right. Any thoughts on what I should do? Should I restart the parity check and write corrections, then try and replace the two disks? I downloaded the diagnostics again and attached.

tower-diagnostics-20230501-1409.zip

Quote

May 1, 20233 yr

Community Expert

You rebooted before getting diagnostics so can't tell what happened before reboot.

5 minutes ago, Geth said:

monthly parity check started this am

Scheduled parity checks should be configured to be non-correcting. Is it?

7 minutes ago, Geth said:

restart the parity check and write corrections

You should never correct parity until all other problems are eliminated.

Quote

May 1, 20233 yr

Author

Shit my bad. So I actually see parity corrections is disabled for the monthly check. The server had been running fine until the parity checks this am. So I think I'm gonna run the short smart test on all the drives first and see if any errors show up. Assuming no issues, you think I should start the parity check again?

Quote

May 1, 20233 yr

Community Expert

Probably not drive problems since multiple drives are involved.

You could start a NON-correcting parity check and let it run until errors start showing up then post new diagnostics so we can see what is happening.

Quote

May 1, 20233 yr

Author

Ok, just kicked off a non-correcting check. We'll see what happens. Thanks for the help.

Quote

May 2, 20233 yr

Author

So parity check just completed without any errors. I'm going to apply the fix to those SeaGate drives mentioned in the thread when I get home from work. I have everything set-up and ready to go, although I installed the SeaGate utilities a bit differently. Maybe I'll write up how I eneded up doing it. Seems like things have changed a bit since the original instructions were written. Going to shut the server down and inspect backplane connections and the power supply as well, mostly to tick the boxes. It seems strange that the other western digital drives would have sync errors if it was the seagate issue though. But maybe I've misunderstood the problem exactly.

Quote

May 7, 20233 yr

Author

@trurl So I just got read errors on 4 disks again, pretty sure these were the same disks that had issues last time. I actually applied the SeaGate fix a few days ago. Haven't had any issues until this morning. I downloaded the diagnostics and haven't rebooted. Let me know if you see anything. I'm gonna check if these 4 disks are a part of the same backplane, but don't want to reboot yet.

tower-diagnostics-20230507-1228.zip

Quote

May 7, 20233 yr

Author

It also looks like two of the 4 disks are stuck in standby mode, but I can't see any of the smart attributes for either of the 4 disks. You can see in this screenshot.

Quote

May 7, 20233 yr

Author

I dumped the diagnostics logs again and shut the server down. All 4 drives with issues are in the same row and use the same backplane. So I'm thinking that the backplane has gone bad and that might be the issue. Problem is I have a NORCO RPC-4224 case and the company has gone under. Seems like there are a lot of problems with backplanes dying and taking drives with them. Maybe I should move to a new case and be done with this one.

Quote

May 8, 20233 yr

Community Expert

Problem was after a spin up suggestion the Seagate fix didn't work, try disabling spin down.

Quote

May 8, 20233 yr

Author

When the issues first happened, I set the spin down delay to never on both of those drives. The morning of the issues, I set the spin down delay back to the default for both drives so you might be right. The only thing that I don't understand is why other non-SeaGate drives show read errors too? I followed the instructions in that support thread and disabled EPC & Low Current Spinup, I guess it hasn't helped. I'm thinking of just moving all the files off these two disks using the unbalance plugin and removing them from the array entirely. What do you think of that?

Quote

May 8, 20233 yr

Community Expert

That may be a good idea.

Quote

Sanity Check for Read Errors

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)