How to figure out what files/drives were involved in corrected parity check (Teach a man to fish?)


Recommended Posts

Running the latest unraid pro 6.10.3 with dual parity using mirrored SSD's (240GB, mover nightly) as cache drives...

 

I run a large array and run parity checks automatically every month. Most times I get no parity errors. But sometimes get a few thousand corrected parity errors.

 

I have a ups that does a graceful shutdown (but i guess it's possible that the shutdown process takes longer than the ups power could hold out for waiting for unraid to shutdown the array). I do have power outages, but the UPS that can stand 20 minutes of run time before it tells unraid to issue a shut down.

 

No drives have red balls or have any issues with SMART 5, 187, 188, 197 or 198 (Backblaze recommended)

 

The physical server has not been moved/opened in several months.

 

Two questions:

1.) What files in the diagnostics download (saved immediately after sync errors) would show me what files/drives reported the sync errors?  What am I looking for in the files that would be able to tell me the details?

 

2.) Do corrected sync parity errors (with dual parity) mean that the data was corrected and no corruption has occurred?

 

 

 

Link to comment

Interested in the answer as well, because it seems that when a parity error is detected all that could be known is that something changed on any of the drives, but I don't see how it would be possible to know which is correct.

 

24 minutes ago, bsim said:

(but i guess it's possible that the shutdown process takes longer than the ups power could hold out for waiting for unraid to shutdown the array). I do have power outages, but the UPS that can stand 20 minutes of run time before it tells unraid to issue a shut down.

If you have a UPS it would make sense to use it to ensure a clean shutdown, so setting timeouts within the available timeframe / making it shut down earlier to be sure.

Link to comment
35 minutes ago, bsim said:

sometimes get a few thousand corrected parity errors.

You should not run a correcting parity check until you have determined that you have sync errors that need to be corrected and they are not caused by some drive or other hardware issue.

 

36 minutes ago, bsim said:

UPS that can stand 20 minutes of run time before it tells unraid to issue a shut down.

The purpose of an UPS is to allow you to continue to run during a very brief power outage, and shutdown when a power outage continues for somewhat longer. You shouldn't try to run on UPS power for any significant amount of time. You don't want to drain the battery, and you don't want to try to restart when the UPS doesn't have plenty of charge.

 

Now on to your actual question.

 

There is no way to determine which disk is the cause of a sync error, much less which file. Parity is a single bit that just shows that the other bits aren't consistent. Parity can't know anything about files or specific disks.

 

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array

Link to comment

Are corrected parity sync errors truly corrected or will there be some sort of hidden corruption?

 

1 hour ago, trurl said:

not caused by some drive or other hardware issue.

 

The errors are not recurring and I often go several months/checks with no errors detected. The hardware has been stable/unchanged for years now. If I can't determine the issue using smart, obvious unraid errors or any log files then why wouldn't a correcting parity check just save me time?

Link to comment
3 minutes ago, bsim said:

If I can't determine the issue using smart, obvious unraid errors or any log files

You don't know until you look.

 

If your scheduled parity checks are correcting, you don't know if you will have a hardware issue when the scheduled check runs.

 

4 minutes ago, bsim said:

why wouldn't a correcting parity check just save me time?

It would save time if you have no other explanation for sync errors. You probably don't know that in advance.

 

If you correct parity when something else is wrong, then you could change parity when it shouldn't be changed. And then you are out-of-sync and won't be able to rebuild a failed disk.

 

Link to comment
20 minutes ago, bsim said:

Are corrected parity sync errors truly corrected

Based on the data received from all other disks, parity is calculated and rewritten. Whether that is "truly" or not...

 

1 hour ago, trurl said:

Parity is a single bit that just shows that the other bits aren't consistent.

When you correct, parity is made consistent with those other bits.

 

Don't know if you looked at the wiki I linked. Parity is basically the same concept wherever it is used in computers and communications. It just allows you to determine if other corresponding bits are consistent (but it can only detect a single bit error, if 2 bits are wrong they cancel out). And it lets you calculate a missing bit from the other corresponding bits (this is how it can reconstruct a drive).

Link to comment

I see the point for being careful with automatic parity corrections, but with how stable my system is hardware wise, it's worked for years flawlessly. Just every once in a while i get a burst of sync errors on a 140TB array, the number of errors don't seem like a major issue vs the potential problems automatic parity correction would save me from.

 

I considered installing some type of indexing/checksum software to watch for any type of bit rot or actual corruption...just haven't got around to it.

 

It would be awesome if there was a way to translate the location of incorrect bits to at least a controller/drive/file...would help greatly in my case. I don't see why the main unraid driver wouldn't be able to spit out the details of the parity issue when doing corrections, seems like it would be a great diagnostic tool.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.