tj80 Posted June 12, 2013 Share Posted June 12, 2013 Hi, I've been running UnRAID 4.7 on an HP Microserver for the past 2 years and it's been running brilliantly. I have 3 x 2tb disks and every so often I run a read only parity check. This has reported a couple of parity errors - I'm at work at the moment so can't check the exact number, but I think it's something like 4. Question is, how do I know if the errors are with the data disk(s) or the parity disk? Is it possible to find out which files are showing the parity mismatch so I can check them? Any suggestions as to what I should do? I haven't tried a parity sync in case the parity drive is wrong and the data disks were correct! Many thanks in advance. Tim. Link to comment
garycase Posted June 12, 2013 Share Posted June 12, 2013 The easiest way to be SURE whether or not the errors are in the data or on the parity drive is to just run a comparison between your backups and the UnRAID array. If the data's wrong, the compare will fail; if the compare is okay, the issue is on the parity disk. I've NEVER found a data error in these cases -- the error has always been on the parity drive. In fact, I never bother with non-correcting parity checks ... my monthly checks are always "correcting" (although it's also been a LONG time since they found any errors). Link to comment
BobPhoenix Posted June 12, 2013 Share Posted June 12, 2013 If you don't have backups like garycase suggested then hopefully you have something like an MD5 checksum of your files that you could compare against to see if there are any differences. That's what I do since the majority of my files are recordings that I could get again if I had to - so I don't have backups of them. But I DO have MD5 checksums of all of them so that I can run MD5Deep again and see if there are any checksum differences. If no differences then you can just do a correcting parity check again to eliminate your errors by correcting parity data. Link to comment
garycase Posted June 12, 2013 Share Posted June 12, 2013 Virtually all of my media is stuff I could "... get again if I had to ..." ==> but the hassle factor of getting it; re-ripping it; categorizing it; etc. is probably big enough that if it was lost I'd simply never replace it. The cost of one or two extra drives/year (actually less than one drive/year now that I can get 4TB drives) is cheap insurance to never have to do that !! As for parity sync errors -- in 5 years with UnRAID I've had 3 parity checks that had non-zero sync error results. I always do correcting checks, so I've simply re-run the check on those occasions, and they always have zeroes the 2nd time. Every time it happened, I've run a complete set of comparisons against my backup disks ... and every time the data proved to be just fine, so it was indeed an error on the parity disk (which is what UnRAID assumes, and for good reason). If you had a "real" data error, the error column for the disk itself should be non-zero. Link to comment
tj80 Posted June 12, 2013 Author Share Posted June 12, 2013 Thanks, all. Unfortunately I don't have checksums for my data and only have separate backups for things I don't have originals for - photos, documents, etc, whereas most of the data by size is music and films for which I have the original media. Have just checked and I'm showing 2 sync errors (207 days ago... - time to run another check!), but zero errors against all three disks in the console - by the sounds of it I should be OK to assume it's an error on the parity and the best thing to do would therefore be a parity correction? I'm using unmenu, so I assume the button to press is "Check and correct parity"? Thanks, Tim Link to comment
garycase Posted June 12, 2013 Share Posted June 12, 2013 Thanks, all. Unfortunately I don't have checksums for my data and only have separate backups for things I don't have originals for - photos, documents, etc, whereas most of the data by size is music and films for which I have the original media. Have just checked and I'm showing 2 sync errors (207 days ago... - time to run another check!), but zero errors against all three disks in the console - by the sounds of it I should be OK to assume it's an error on the parity and the best thing to do would therefore be a parity correction? I'm using unmenu, so I assume the button to press is "Check and correct parity"? Thanks, Tim Yes, run a correcting parity check. The result will be non-zero, since it has to correct those sync errors. So after you run it once, do it again -- this time the result should be zero. Link to comment
tj80 Posted June 13, 2013 Author Share Posted June 13, 2013 Many thanks. I kicked off another check last night, so when I get home I'll double check it's still showing zero errors against the disks themselves and then run a correcting parity check. Cheers, Tim Link to comment
tj80 Posted June 18, 2013 Author Share Posted June 18, 2013 Hi, Just to follow up, thanks for your advice everything seems fine now. Did a correcting parity check which fixed the 2 previously reported errors, then a non-correcting check which returned zero errors. Cheers, Tim Link to comment
garycase Posted June 18, 2013 Share Posted June 18, 2013 Looks like all's well. As I noted earlier, I've NEVER seen a parity sync error that wasn't simply an error on the parity drive ... there are a variety of things that can cause that, but it's by far the most likely (which is why UnRAID always assumes that's where the error is). Doesn't hurt to run non-correcting checks, followed by a correcting check if necessary ... but I always just run correcting checks (what else are you going to do if there are errors ??). Link to comment
JonathanM Posted June 18, 2013 Share Posted June 18, 2013 I always just run correcting checks (what else are you going to do if there are errors ??).Running 2 non-correcting checks in a row and coming up with different results (been there done that) indicates there is something deeper going on that must be addressed before you can trust the correcting run to do the right thing. Just because you have never experienced a particular error mode doesn't mean it doesn't exist. There is a very good reason why the community lobbied for non-correcting checks in the first place, and I still would rather all checks be non-correcting until I've intervened and have a reasonable explanation for the disparity. Granted an unclean shutdown is typically a very good reason for parity to be out of sync, it's almost guaranteed to be out of sync, but the REASON for the unclean shutdown is still very important to me at least. I'd rather not kick off the parity correction until I'm sure that's the best thing to do for my data. A full correcting parity run is the most stressful thing most systems do, so it's the time when silently degrading components like to go into full failure mode. Link to comment
garycase Posted June 18, 2013 Share Posted June 18, 2013 I don't disagree -- it certainly doesn't hurt to do the non-correcting run first. But as you noted, to really tell you anything, you then have to run a 2nd non-correcting check ... so overall it triples the time to correct a parity error. But it DOES provide the opportunity to see if there are other factors involved (i.e. a loose cable or connection). Link to comment
JonathanM Posted June 18, 2013 Share Posted June 18, 2013 so overall it triples the time to correct a parity error.Actually, it's pretty much the minimum amount of time to find out that you have an error that really should be corrected on the parity disk. Running a correcting check the first time guarantees you don't get a second chance to get it right. I lost a significant amount of data because of a correcting check that clobbered good parity data. That data at that point in time only resided on unraid, it was a backup from a silently failing RAID5 array. My fault for not having good backups on offline media, but it could have been avoided if I had been able to catch the error before it wiped out my good parity. Link to comment
jbartlett Posted June 18, 2013 Share Posted June 18, 2013 I'm currently working on a bash script to do file inventory and optional SHA checksums all on the UNRAID server. It's functional but in a alpha state. I'll be posting the script when I feel it's in a beta state. Link to comment
garycase Posted June 18, 2013 Share Posted June 18, 2013 It would be slick to have a Windows-based utility that would: (a) Computer SHA checksums on the complete UnRAID array; and (b) Run a verify on the array that recomputed and checked those sums. [This would, of course, run for DAYS on any large server ... but it'd be a useful thing to be able to do automatically -- 2 minutes of "human time", days of "computer time" is no big deal.] I can do effectively that now by running compares against all my backup disks, but I have to intervene and change the disk every 12 hours or so, so it typically takes me a week to do it. Link to comment
dgaschk Posted June 19, 2013 Share Posted June 19, 2013 I'm currently working on a bash script to do file inventory and optional SHA checksums all on the UNRAID server. It's functional but in a alpha state. I'll be posting the script when I feel it's in a beta state. The md5deep package in unMenu does this. Link to comment
jbartlett Posted June 19, 2013 Share Posted June 19, 2013 Yup, that's where I got the apps to generate the SHA values. I'll be including the app along with the sqlite3 app along with the script so users won't need to install unmenu to get the apps. Link to comment
garycase Posted June 19, 2013 Share Posted June 19, 2013 Nice ... looking forward to it Link to comment
JackBauer Posted September 17, 2013 Share Posted September 17, 2013 So for the first time ever I have ran into a parity error (using nocorrect option). My plan was always to check the file that the error is in and determine which was wrong - was the file corrupted or was the parity corrupt. (By manually reviewing each affected file) I just came to the conclusion looking at syslog, that there is no way to determine which file is effected by the parity error. Is that true? Is there no way to determine which is the potentially affected file other than the two options earlier in this thread? (Neither of which will help me here) Thanks Link to comment
JackBauer Posted September 17, 2013 Share Posted September 17, 2013 Thank you. I had intended on doing just that (SMART reports). I'm just really surprised to be honest that I'm the only one who really had this intention of detecting which set of data was "good". Link to comment
JackBauer Posted September 18, 2013 Share Posted September 18, 2013 Anything specific I might want to pay close attention to in the smartctl output when I run for each drive? TY Link to comment
dgaschk Posted September 18, 2013 Share Posted September 18, 2013 Any drive with a pending sector(s) or FAILING needs attention. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.