October 26, 20169 yr OK thats confusing. I did a noncorrecting parity check last night. I'm told I have valid parity in one part of the web page, but also told the 'last check completed' found 396 errors. Thats confusing. Is it telling me that there is a drive with errors, and that the errors will be recreated if there is a data drive rebuilt? I have about 100 lines of this in my system log: Oct 25 18:38:53 Tower2 kernel: md: recovery thread: PQ incorrect, sector=81804808 Oct 25 18:38:53 Tower2 kernel: md: recovery thread: PQ incorrect, sector=81804824 Oct 25 18:38:53 Tower2 kernel: md: recovery thread: PQ incorrect, sector=81804840 . . Oct 25 18:38:53 Tower2 kernel: md: recovery thread: PQ incorrect, sector=81808136 Oct 25 18:38:53 Tower2 kernel: md: recovery thread: PQ incorrect, sector=81808152 That wording from the parity check is confusing. If I put a check in there to correct errors, would it help anything?
October 26, 20169 yr I had a similar thing a while back, when I first implemented dual parity. A non-correcting parity check found a handful of errors single error in both P and Q. Now that suggested to me the possibility of a real data error but the problem was how to find which data disk was affected. Some people argue that with some complex maths it's possible with only two parity bits to work out which data disk is affected, but I'm not yet completely convinced. Either way, unRAID does nothing about errors in both P and Q. I sought help and I was persuaded that, since I didn't know if I had some files subtlely corrupted or whether there was an error when the original parity values were calculated, my best option was to run a correcting parity check to at least bring the parity back into agreement with the data. I did that and it's been fine ever since. I'm struggling to find the link to the discussion at the moment, but if I succeed I'll add it. EDIT: I found the link (http://lime-technology.com/forum/index.php?topic=48193.msg468265#msg468265) and refreshed my memory. It wasn't a handful of errors, but in fact just the one. Perhaps your problem is more serious. Post your diagnostics.
October 26, 20169 yr Author Its still confusing to see 'parity valid' and also see errors listed. Diagnostics attached, thanks for looking! tower2-diagnostics-20161026-0641.zip
October 26, 20169 yr Community Expert Parity valid just means that parity was built successfully at the time and the disk is working, you have some sync errors, many things can cause them, most commonly unclean shutdowns, if there were some since last parity check/build do a correcting check and all should be fine, if there were none and there's no apparent reason for the errors you should a correcting check anyway and if next check has more errors you need to investigate what's causing them, it could be, RAM, disks, controller, etc.
October 26, 20169 yr Author Oh I certainly do know one unclean shutdown. Blinking clock radio a few weeks ago in the house tells me we lost power. Yet my other unraid server has no parity errors. I fully expected the drives were probably already spun down at the time of the power outage, since both systems were probably idle. OK, new mental note... good idea to manually run parity check after unclean shutdown.
October 26, 20169 yr Community Expert Oh I certainly do know one unclean shutdown. Blinking clock radio a few weeks ago in the house tells me we lost power. Yet my other unraid server has no parity errors. I fully expected the drives were probably already spun down at the time of the power outage, since both systems were probably idle. OK, new mental note... good idea to manually run parity check after unclean shutdown. Your servers should be on UPS.
Archived
This topic is now archived and is closed to further replies.