April 3, 20179 yr Hi guys, i ran my monthly parity check yesterday and for the first time it's logged 1751 errors which were corrected by default. My question is what shiuld i be doing or looking at now? Any specific logs to take a look at or disk checks to run? All the disks show as healthy. System comprise 4 x 8Tb plus the parity disk. Been fault free for many months. Any advice would be welcome.
April 3, 20179 yr Community Expert 9 minutes ago, superloopy1 said: Power has been down since unfortunately. Do you mean the server was powered down or there was a power cut? Sync errors are normal after an unclean shutdown and should be corrected ASAP. If there wasn't any unclean shutdown next time it happens grab the diagnostics before rebooting.
April 3, 20179 yr Author No it was a clean shutdown. I obviously made the wrong call in doing that then. Are there any disk checks i can make to find any errors? Edited April 3, 20179 yr by superloopy1
April 3, 20179 yr Community Expert You can look at the SMART reports, but without the syslog they may not be much help.
April 3, 20179 yr You had 1751 "errors". Are those sync errors - or are you seeing that number in the main GUI in the errors column? (Assuming you mean sync errors.) It is not unusual after a dirty shutdown to have a handful of sync errors, but 1761 exceeds "handful" IMO. If you are using XFS or RFS, I would tend to trust that the file systems are being maintained. Most file systems are extremely robust. So in a case of a dirty shutdown, the chances are enormously likely that it is parity and not a data disk. But even if it was a data disk, unless you knew which one definitively, it would be hard to recommend a course other than to let parity be repaired to match data. Having MD5s is useful to tracking down corruption. You can run file system checks to confirm that the structure of the disks are ok. My first thought would be to run another parity check. If you have another large number of sync errors, that would certainly point to a problem. If it comes back clean, it would give you more confidence that the array is maintaining parity properly. Keep running parity checks every few days and making sure they are clean over and over as the array is utilized. If so, I'd have a certain amount of confidence going forward. But if parity checks are returning errors, even 1, you have a hardware problem of some kind. First and easiest thing to check is you RAM. Run a memtst for 24 hours. Otherwise, could be controller, motherboard, or disk. Hard to tell. I'd post your diagnostics file as there may be clues - especially after a parity check that returned errors.
April 3, 20179 yr Community Expert 51 minutes ago, bjp999 said: It is not unusual after a dirty shutdown to have a handful of sync errors, but 1761 exceeds "handful" IMO. Agree with everything you posted but just want to add that while it's normal to have just a few sync errors following an unclean shutdown, they can be considerably higher if it happens during writes to the array, see example below that happened to me: And the next parity check found 1855 errors:
April 3, 20179 yr Author Thanks gents. This was 1751 corrected errors reported on the main gui. There was no unclean shutdown to my knowledge. I'll run it again even though it takes 18 hours!!
April 4, 20179 yr Author Update ... i'm busy running a second parity check, uncorrected this time around and its logged 1592 errors before 'stopped logging' message. They read 'md: recovery thread P incorrect' followed by a sector identifier. Check has some 3 hrs to go and i'm guessing that as its stopped logging it'll not repirt anymore. Hiw do i trap the diagnostics here? Dont want to do restart like yesterday, d'oh!
April 4, 20179 yr Author Thanks. Parity still running but strangely the number of sync errors now logging is the same as yesterday ... 1751. Yesterdays check was a correcting run, todays i chose to select not to correct. So yesterdays run seems to have done nothing to correct any issues. Any ideas? I'll post diagnostics once everythings done.
April 4, 20179 yr Community Expert Possibly the errors yesterday were wrongly corrected, so now it's detecting actual errors, but without yesterday's logs we may never be sure.
April 4, 20179 yr This is not common, but has happened before, a number of times in fact. I have no explanation, but I think it is good news. In essence, every parity error that was reported on the earlier run was improperly detected and the corrections, rather than correcting, actually introduced parity errors. The subsequent run detected them. I say it's happened before, but never seen it in such dramatic fashion with so many parity sectors involved. You'll need to run yet another parity check - correcting this time. Hopefully it will detect and correct the sectors again. Then, as I suggested before, keep running the parity checks to see if they stay clean. I'd say run at least 3 non-correcting checks. If this repeats, my thought would be a controller issue. @RobJ, here is an example of a disk returning invalid data. Maybe shouldn't ever happen, but does sometimes. It may not be the disk itself, but the OS sees it as a valid read.
April 4, 20179 yr Community Expert The cases I remember where it happened before, each check would have the same number of errors, and at the same sectors, so it would be good to have multiple checks logged.
April 6, 20179 yr On 4/4/2017 at 7:14 AM, superloopy1 said: Thanks. Parity still running but strangely the number of sync errors now logging is the same as yesterday ... 1751. Yesterdays check was a correcting run, todays i chose to select not to correct. So yesterdays run seems to have done nothing to correct any issues. Any ideas? I'll post diagnostics once everythings done. How's it going?
April 6, 20179 yr It's going/gone well. I reran the parity check and corrected the errors which, now that i think back, may well gave come from an unclean powerdown some time since the last check ran on 1st of last month. With the errors corrected i reran another check and all still clean. I'm now busy running md5 checks (corz) against each disk with minimal reported errors. So far, so good. Thanks for all the help everyone. If nothing else i now know the procedures if it ever happens again!Sent from my LG-D855 using Tapatalk
April 6, 20179 yr 4 hours ago, skylark said: If nothing else i now know the procedures if it ever happens again! Yep! Most of what I know about unRaid have come from the mistakes I've made and how I've recovered!
Archived
This topic is now archived and is closed to further replies.