Errors found on scheduled parity check - Should I be worried?


Recommended Posts

Hello! I've been running Unraid for a few months now and my system has completed two scheduled parity checks without any errors.

 

Today, the scheduled parity check returned 7 errors.

 

Since the last parity check, I have added a new disk (passed a pre-clear) and moved some data to it using unbalance. I haven't had any unclean shutdowns, and I use a UPS.

 

Is there an easy way to find out where these errors came from? Could it have been a result of using unbalance? Do I have a hardware problem?

 

So my plan is to run another parity check tonight and see if there are any more errors. If there are, I'll be running memtest.

 

Is there any other action I should be taking right now?

tower-diagnostics-20211105-2312.zip

Link to comment

Further info on hardware specs:

 

Motherboard: Asus Z170M-PLUS

CPU: Intel i7 7700

RAM: 32GB DDR4 Corsair (Non-ECC)

 

Cache: 2x 1TB SSD

 

Array: 1x 10TB WD White Label (Parity)

2x 8TB WD White Label (Data 1 & 2)

1x 6TB WD Blue (Data 3) (Recently added, older than the other disks)

 

Unraid version: 6.9.2

Link to comment
3 hours ago, Dalarielus said:

the scheduled parity check returned 7 errors

Looks like you have it set to correct parity errors. You should only correct parity errors after you have determined that parity needs correcting. You don't want a problem with another disk or maybe RAM or other hardware issue to cause parity to be changed.

 

After a scheduled NON-correcting parity check says you have parity errors is the time to ask what might have caused it.

 

I didn't see any I/O errors in your syslog. Do any of your disks have SMART warnings on the Dashboard page? Have you done memtest lately?

Link to comment
16 hours ago, trurl said:

Looks like you have it set to correct parity errors. You should only correct parity errors after you have determined that parity needs correcting. You don't want a problem with another disk or maybe RAM or other hardware issue to cause parity to be changed.

 

After a scheduled NON-correcting parity check says you have parity errors is the time to ask what might have caused it.

 

I didn't see any I/O errors in your syslog. Do any of your disks have SMART warnings on the Dashboard page? Have you done memtest lately?

Right - I'll change that to non-correcting immediately.

 

Is there any way of determining which files are occupying a sector that has thrown a parity error so that they can be checked manually?

 

My second parity check is almost complete, and hasn't found any errors so far. I have no SMART warnings, all of my drives are passing SMART testing and there are no worrying noises coming from any of them. I haven't run memtest lately, but that's going to be my next step. Is that something that I should consider working into a regular maintenance cycle?

 

Thanks!

Link to comment
13 minutes ago, itimpi said:

This will only help if you have already been running this in the past so that there are existing checksums to be checked against.

I have been running it, but somehow it managed to completely escape my mind! When the file integrity check finishes, I'll be running memtest.

Link to comment
4 hours ago, Dalarielus said:

cancelling an incomplete job on Unbalance

How exactly did you cancel it? Unbalance itself can't do anything different from any other process that writes to the array, and parity updates happen at a very low level when writing array disks. About the only way you can get them out of sync is a write failure that disables a disk, an unclean shutdown that doesn't allow parity update to complete, or some undetected hardware problem such as RAM.

Link to comment
5 hours ago, trurl said:

How exactly did you cancel it? Unbalance itself can't do anything different from any other process that writes to the array, and parity updates happen at a very low level when writing array disks. About the only way you can get them out of sync is a write failure that disables a disk, an unclean shutdown that doesn't allow parity update to complete, or some undetected hardware problem such as RAM.

I cancelled it in the usual manner from the Unbalance WebUI, but at one point the interface became unresponsive for a few minutes. When it terminated, the file that had been in the process of moving was corrupted (literally cut in half).

 

At the time, I simply replaced the file and moved on, though in hindsight this is the first parity check that has been run since then.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.