David Bott Posted October 1, 2015 Posted October 1, 2015 Hi... Just wondering.... I just ran my monthly parity check. On the main screen for PARITY it shows ERRORS 156. However down below it reads... Last checked on Thu Oct 1 10:23:46 2015 EDT, finding 0 errors. Not sure what to make of this. 156 errors on parity yet found 0 errors? Looking in the log, I believe it read that they were sector read errors. (I say believe as I restarted the server so the log was reset.) So...Wondering about the above and also if I should rerun the CHECK with or without "Correct any Parity-Check errors by writing the Parity disk with corrected parity" checked off? (I had run it with it off over night ending this morning. 10hr 37 mins it took.) Thanks BTW...Smart Shows... » reallocated_sector_ct=64 » reported_uncorrect=10 » high_fly_writes=1011 » ata_error_count=10 ...For the parity drive at this time. Not sure how long it had those numbers however.
Squid Posted October 1, 2015 Posted October 1, 2015 It means that during the parity check the unRaid had 156 read errors from the parity drive. It then recalculated those sectors and rewrote them. It wrote them correctly because it didn't wind up red-balling the drive. I would definitely rerun a parity check on the system, and also look carefully at the smart results for the drive and possibly replace it.
David Bott Posted October 1, 2015 Author Posted October 1, 2015 Thanks. For that drive SMART shows the following... BTW...Smart Shows... » reallocated_sector_ct=64 » reported_uncorrect=10 » high_fly_writes=1011 » ata_error_count=10 ...Looking up info on those it does not seem like the drive it bad. The reallocated_sector_ct is not all that high seeing it is a 4TB drive and I think I recall it being 64 for a long, long time.
JonathanM Posted October 1, 2015 Posted October 1, 2015 Thanks. For that drive SMART shows the following... BTW...Smart Shows... » reallocated_sector_ct=64 » reported_uncorrect=10 » high_fly_writes=1011 » ata_error_count=10 ...Looking up info on those it does not seem like the drive it bad. The reallocated_sector_ct is not all that high seeing it is a 4TB drive and I think I recall it being 64 for a long, long time. If it stays at 64, and you don't have any pending sectors through another parity check, then great, just monitor it. If the number grows, I'd replace the drive.
David Bott Posted October 1, 2015 Author Posted October 1, 2015 Thanks. For that drive SMART shows the following... BTW...Smart Shows... » reallocated_sector_ct=64 » reported_uncorrect=10 » high_fly_writes=1011 » ata_error_count=10 ...Looking up info on those it does not seem like the drive it bad. The reallocated_sector_ct is not all that high seeing it is a 4TB drive and I think I recall it being 64 for a long, long time. If it stays at 64, and you don't have any pending sectors through another parity check, then great, just monitor it. If the number grows, I'd replace the drive. Thanks, that was my thought. I just found it interesting that the home page the drive reported the errors, but the parity check area said it found no errors.
JonathanM Posted October 1, 2015 Posted October 1, 2015 I just found it interesting that the home page the drive reported the errors, but the parity check area said it found no errors. Since there was a read error, there was no possibility of knowing whether or not there was a parity error. The only recourse was to calculate what should be there, and attempt to write it. Since the write succeeded, then the error count for that drive was incremented, but the parity was not in error. A parity error means all the drives read successfully for that location, but the calculated sum didn't match. A non-correcting parity check notes the discrepancy as a parity error, and leaves it uncorrected. A correcting parity check assumes the data drives are right and the parity drive is wrong and changes the parity drive to match and increments the parity error counter. If you were to have a drive failure and one of your other drives generates a read error while reconstructing the failed drive, then the reconstructed drive would have errors.
David Bott Posted October 1, 2015 Author Posted October 1, 2015 I just found it interesting that the home page the drive reported the errors, but the parity check area said it found no errors. Since there was a read error, there was no possibility of knowing whether or not there was a parity error. The only recourse was to calculate what should be there, and attempt to write it. Since the write succeeded, then the error count for that drive was incremented, but the parity was not in error. A parity error means all the drives read successfully for that location, but the calculated sum didn't match. A non-correcting parity check notes the discrepancy as a parity error, and leaves it uncorrected. A correcting parity check assumes the data drives are right and the parity drive is wrong and changes the parity drive to match and increments the parity error counter. If you were to have a drive failure and one of your other drives generates a read error while reconstructing the failed drive, then the reconstructed drive would have errors. Thanks so much for this. Might I ask...When you run a monthly check (assuming you do), do you have it correct or not correct? (BTW...I am now on V6. Just moved.)
JonathanM Posted October 1, 2015 Posted October 1, 2015 Might I ask...When you run a monthly check (assuming you do), do you have it correct or not correct? (BTW...I am now on V6. Just moved.)I definitely run monthly checks, and I am one of a minority that runs non-correcting. You can find arguments for both sides on here, if you are going to stay on top of things 100% of the time, monitor smart reports for your drives, and be proactive in maintaining your server, then non-correcting is probably for you. I prefer no writes be made until I can make an attempt to figure out WHY there was an error. However, most people just want the parity back in sync ASAP, because 95% of the time it IS the parity disk that's wrong, and while it's wrong, a rebuild will be corrupt, so sooner corrected = less time at risk, especially if the server is never managed, just used to consume and store media.
David Bott Posted October 1, 2015 Author Posted October 1, 2015 Thank you once again for the great detail.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.