Jump to content

Disk read errors but SMART test PASSED


Go to solution Solved by trurl,

Recommended Posts

For the last 5 days I have been getting notified that the health check of my disks is failing. One disk in particular "disk1" is having read errors. When I run a SMART test, both short and extended, it says PASSED. I will note that my extended test took like 8 hours to complete. Is there a next step to testing the drive or gathering additional info on the "read" errors? Would it be the FS that is causing it? I have attached the SMART logs.

 

Thank you.

SMART-REPORT.txt

Link to comment

You should click on each of your WD disks and add attributes 1 and 200 for monitoring. Disk1 does have value 1 for each of those attributes, but I'm not sure how critical that is since it passed extended self-test. On the other hand, syslog says critical medium error for that disk and nothing that would indicate communication problems such as cabling.

 

Looks like sdc (cache) also has some disk problems.

Link to comment
7 minutes ago, trurl said:

You should click on each of your WD disks and add attributes 1 and 200 for monitoring. Disk1 does have value 1 for each of those attributes, but I'm not sure how critical that is since it passed extended self-test. On the other hand, syslog says critical medium error for that disk and nothing that would indicate communication problems such as cabling.

 

Looks like sdc (cache) also has some disk problems.

I added 1 and 200 to all my WD drives. Disk 1 popped alerts right away. Another disk has those same values for 1 and 200. But its not in error.

image.thumb.png.4c8490e748812dd6eab0bb199a3d09ce.png

Link to comment
On 1/5/2024 at 8:47 AM, trurl said:

You should click on each of your WD disks and add attributes 1 and 200 for monitoring. Disk1 does have value 1 for each of those attributes, but I'm not sure how critical that is since it passed extended self-test. On the other hand, syslog says critical medium error for that disk and nothing that would indicate communication problems such as cabling.

 

Looks like sdc (cache) also has some disk problems.

The disk that was tossing errors "Raw read error rate" (raw value of 1) is now zero. "Returned to normal". However the Multi zone error rate is still 1. Now my Parity disk has a Raw read error rate of 1, and UDMA CRC error count of 53. So something is up. But I am not sure how to keep chasing this one. I do have 1 8TB drive with less then 48 hours on it so my plan was to put that in the parity slot. Then build parity. Once that was correct I was going to replace all my 4tb with new 8tb disks. I guess my question now is, I can just shut down, remove the parity disk, add the new parity disk, boot up and let it build right?

 

Looking at this (https://docs.unraid.net/legacy/FAQ/parity-swap-procedure/), I am right. But maybe I am doing it wrong lol

 

This procedure is strictly for replacing data drives in an Unraid array. If all you want to do is replace your Parity drive with a larger one, then you don't need the Parity Swap procedure. Just remove the old parity drive and add the new one, and start the array. The process of building parity will immediately begin. (If something goes wrong, you still have the old parity drive that you can put back!)

Link to comment
  • Solution

CRC errors are just communication problems logged by the disk firmware. They are almost always connection or cable problems. You can acknowledge any SMART warning on the Dashboard page by clicking on it and it will warn again if any increase. I usually just acknowledge the occasional CRC, maybe reseat the connection next time I'm in the case. If it continues to increase investigate further and replace cables or whatever. Power cables and splitters can also be a reason for problems. Connection problems are much more common than actual disk problems so be careful with connections.

 

You are correct, you don't need parity swap procedure.

 

The documentation talks a lot about replacing a data disk, but replacing a parity disk is exactly the same.

 

You can't change disk assignments with the array started.

 

With the array stopped, assign the new disk to the slot to be replaced. Start the array to begin rebuild. Simple as that.

 

Shutting down, installing new disk, possibly removing old disk, whatever you need to do to get to assigning that new disk then starting the array.

 

Link to comment
2 hours ago, trurl said:

CRC errors are just communication problems logged by the disk firmware. They are almost always connection or cable problems. You can acknowledge any SMART warning on the Dashboard page by clicking on it and it will warn again if any increase. I usually just acknowledge the occasional CRC, maybe reseat the connection next time I'm in the case. If it continues to increase investigate further and replace cables or whatever. Power cables and splitters can also be a reason for problems. Connection problems are much more common than actual disk problems so be careful with connections.

 

You are correct, you don't need parity swap procedure.

 

The documentation talks a lot about replacing a data disk, but replacing a parity disk is exactly the same.

 

You can't change disk assignments with the array started.

 

With the array stopped, assign the new disk to the slot to be replaced. Start the array to begin rebuild. Simple as that.

 

Shutting down, installing new disk, possibly removing old disk, whatever you need to do to get to assigning that new disk then starting the array.

 

Thanks for your help with this. I have moved disks and parity is rebuilding now. Good to know on the CRC. I will monitor it for now. I have 10 drives of varying size. It would be ideal to pare them down so that I am not using SATA power splitters. Which I am now, and could be causing these issues? I am using a SAS HBA for the disks that have issues now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...