Jump to content

[SOLVED] Parity errors - which files?


Recommended Posts

Hi,

 

I've been running UnRAID 4.7 on an HP Microserver for the past 2 years and it's been running brilliantly.  I have 3 x 2tb disks and every so often I run a read only parity check.  This has reported a couple of parity errors - I'm at work at the moment so can't check the exact number, but I think it's something like 4.  Question is, how do I know if the errors are with the data disk(s) or the parity disk?  Is it possible to find out which files are showing the parity mismatch so I can check them?

 

Any suggestions as to what I should do?  I haven't tried a parity sync in case the parity drive is wrong and the data disks were correct!

 

Many thanks in advance.

Tim.

Link to comment

The easiest way to be SURE whether or not the errors are in the data or on the parity drive is to just run a comparison between your backups and the UnRAID array.    If the data's wrong, the compare will fail;  if the compare is okay, the issue is on the parity disk.

 

I've NEVER found a data error in these cases -- the error has always been on the parity drive.  In fact, I never bother with non-correcting parity checks ... my monthly checks are always "correcting"  (although it's also been a LONG time since they found any errors).

 

Link to comment

If you don't have backups like garycase suggested then hopefully you have something like an MD5 checksum of your files that you could compare against to see if there are any differences.  That's what I do since the majority of my files are recordings that I could get again if I had to - so I don't have backups of them.  But I DO have MD5 checksums of all of them so that I can run MD5Deep again and see if there are any checksum differences.  If no differences then you can just do a correcting parity check again to eliminate your errors by correcting parity data.

Link to comment

Virtually all of my media is stuff I could "... get again if I had to ..." ==> but the hassle factor of getting it; re-ripping it; categorizing it;  etc. is probably big enough that if it was lost I'd simply never replace it.    The cost of one or two extra drives/year (actually less than one drive/year now that I can get 4TB drives) is cheap insurance to never have to do that !!

 

As for parity sync errors -- in 5 years with UnRAID I've had 3 parity checks that had non-zero sync error results.  I always do correcting checks, so I've simply re-run the check on those occasions, and they always have zeroes the 2nd time.    Every time it happened, I've run a complete set of comparisons against my backup disks ... and every time the data proved to be just fine, so it was indeed an error on the parity disk (which is what UnRAID assumes, and for good reason).  If you had a "real" data error, the error column for the disk itself should be non-zero.

 

Link to comment

Thanks, all.  Unfortunately I don't have checksums for my data and only have separate backups for things I don't have originals for - photos, documents, etc, whereas most of the data by size is music and films for which I have the original media.

 

Have just checked and I'm showing 2 sync errors (207 days ago... :-[ - time to run another check!), but zero errors against all three disks in the console - by the sounds of it I should be OK to assume it's an error on the parity and the best thing to do would therefore be a parity correction?  I'm using unmenu, so I assume the button to press is "Check and correct parity"?

 

Thanks,

Tim

Link to comment

Thanks, all.  Unfortunately I don't have checksums for my data and only have separate backups for things I don't have originals for - photos, documents, etc, whereas most of the data by size is music and films for which I have the original media.

 

Have just checked and I'm showing 2 sync errors (207 days ago... :-[ - time to run another check!), but zero errors against all three disks in the console - by the sounds of it I should be OK to assume it's an error on the parity and the best thing to do would therefore be a parity correction?  I'm using unmenu, so I assume the button to press is "Check and correct parity"?

 

Thanks,

Tim

 

Yes, run a correcting parity check.  The result will be non-zero, since it has to correct those sync errors.    So after you run it once, do it again -- this time the result should be zero.

 

Link to comment

Hi,

 

Just to follow up, thanks for your advice everything seems fine now.  Did a correcting parity check which fixed the 2 previously reported errors, then a non-correcting check which returned zero errors.

 

Cheers,

Tim

Link to comment

Looks like all's well.  As I noted earlier, I've NEVER seen a parity sync error that wasn't simply an error on the parity drive ... there are a variety of things that can cause that, but it's by far the most likely (which is why UnRAID always assumes that's where the error is).

 

Doesn't hurt to run non-correcting checks, followed by a correcting check if necessary ... but I always just run correcting checks (what else are you going to do if there are errors ??).

 

Link to comment
I always just run correcting checks (what else are you going to do if there are errors ??).
Running 2 non-correcting checks in a row and coming up with different results (been there done that) indicates there is something deeper going on that must be addressed before you can trust the correcting run to do the right thing.

 

Just because you have never experienced a particular error mode doesn't mean it doesn't exist. There is a very good reason why the community lobbied for non-correcting checks in the first place, and I still would rather all checks be non-correcting until I've intervened and have a reasonable explanation for the disparity.

 

Granted an unclean shutdown is typically a very good reason for parity to be out of sync, it's almost guaranteed to be out of sync, but the REASON for the unclean shutdown is still very important to me at least. I'd rather not kick off the parity correction until I'm sure that's the best thing to do for my data.

 

A full correcting parity run is the most stressful thing most systems do, so it's the time when silently degrading components like to go into full failure mode.

Link to comment

I don't disagree -- it certainly doesn't hurt to do the non-correcting run first.  But as you noted, to really tell you anything, you then have to run a 2nd non-correcting check ... so overall it triples the time to correct a parity error.  But it DOES provide the opportunity to see if there are other factors involved (i.e. a loose cable or connection).

 

Link to comment
so overall it triples the time to correct a parity error.
Actually, it's pretty much the minimum amount of time to find out that you have an error that really should be corrected on the parity disk. Running a correcting check the first time guarantees you don't get a second chance to get it right. I lost a significant amount of data because of a correcting check that clobbered good parity data. That data at that point in time only resided on unraid, it was a backup from a silently failing RAID5 array. My fault for not having good backups on offline media, but it could have been avoided if I had been able to catch the error before it wiped out my good parity.
Link to comment

It would be slick to have a Windows-based utility that would:

 

(a)  Computer SHA checksums on the complete UnRAID array;

 

and

 

(b)  Run a verify on the array that recomputed and checked those sums.  [This would, of course, run for DAYS on any large server ... but it'd be a useful thing to be able to do automatically -- 2 minutes of "human time",  days of "computer time" is no big deal.]

 

I can do effectively that now by running compares against all my backup disks, but I have to intervene and change the disk every 12 hours or so, so it typically takes me a week to do it.

 

Link to comment

I'm currently working on a bash script to do file inventory and optional SHA checksums all on the UNRAID server. It's functional but in a alpha state. I'll be posting the script when I feel it's in a beta state.

 

The md5deep package in unMenu does this.

Link to comment
  • 2 months later...

So for the first time ever I have ran into a parity error (using nocorrect option).

 

My plan was always to check the file that the error is in and determine which was wrong - was the file corrupted or was the parity corrupt.  (By manually reviewing each affected file)

 

I just came to the conclusion looking at syslog, that there is no way to determine which file is effected by the parity error. 

 

Is that true?  Is there no way to determine which is the potentially affected file other than the two options earlier in this thread?  (Neither of which will help me here)

 

Thanks :)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...