June 2, 20179 yr I'm having issues chasing parity check errors. I added an additional memory stick about a month ago and had parity check errors. I was able to remove the memory stick and solve them. Replaced it and ran the new one through 3 memtest passes with no issues. Have just completed my monthly parity check with the new memory in place and this time have almost 3000 errors. Help trouble shooting would be greatly appreciated. My diagnostic info is attached. You can see the errors at: Jun 1 18:13:48 Thanks!!! bunker-diagnostics-20170601-2148.zip
June 2, 20179 yr 2 hours ago, shooga said: I was able to remove the memory stick and solve them By this do you mean that you've ran at least 2 consecutive checks without any errors? If yes run another non correcting check and see if you get or not the exact same number of errors, if not run a correcting check and then run a non correcting one right after.
June 2, 20179 yr Author Here's the full sequence of events: The monthly parity check setting had inadvertently changed from non-correcting to correcting. It ran and found/corrected 5 errors. I then changed the setting and ran a non-correcting check. This check found something like 239 errors. I pulled the memory and ran a non-correcting check. This found 5 errors. (from the correcting check that found false positives) I ran a correcting check, which found and corrected 5 errors. I ran a non-correcting check, which found zero errors. I RMA'd the new stick of memory (a second 8GB stick), while keeping my original 8GB stick in place While waiting for the replacement, I had a server crash that required a hard reset of the server. After coming back up, the server did a non-correcting parity check and found zero errors After coming back up, I got a "Warning [BUNKER] - current pending sector is 1" for one of my drives (unrelated I think) Received the new stick of memory and completed 3 cycles of memtest with no errors Non-correcting monthly parity check completed with 2789 errors. I will run another non-correcting check to see if I get the same number of errors.
June 3, 20179 yr Author The second non-correcting check just finished with the exact same number of errors. Any advice?
June 3, 20179 yr That would suggest it's not bad RAM or something similar, correct them. Are you sure there wasn't an unclean shutdown between the last 0 check and now?
June 3, 20179 yr Author I guess I'm not 100% sure, but I think the only recent unclean shutdown triggered a non-correcting check, which came back with zero errors. Isn't there a chance that I will be writing bad data to parity? Is there further troubleshooting that can be done to figure out whether the parity or data is correct? (thanks for the replies)
June 3, 20179 yr 17 minutes ago, shooga said: Isn't there a chance that I will be writing bad data to parity? Yes, but unless you have checksums for all your files there's no way to tell, also you could only fix them if only one disk is incorrect and every other disk + parity are correct, but if you do have checksums, and IMO everyone should have them, check all files first.
June 3, 20179 yr Author Unfortunately, I don't have checksums. Is there a recommended tool for calculating and then checking the files? Just did a quick search and I see that Squid has a plugin, but it's no longer being actively maintained.
June 3, 20179 yr btrfs auto cheksums all files, folks using xfs can use the dynamix file integrity plugin:
Archived
This topic is now archived and is closed to further replies.