cybersteel8 Posted May 27, 2022 Share Posted May 27, 2022 My monthly parity check completed with 5 errors. This is in my syslog: kernel: md: recovery thread: P incorrect, sector=1962934168 kernel: md: recovery thread: P incorrect, sector=1962934176 kernel: md: recovery thread: P incorrect, sector=1962934184 kernel: md: recovery thread: P incorrect, sector=1962934192 kernel: md: recovery thread: P incorrect, sector=1962934200 [...] kernel: md: sync done. time=82684sec kernel: md: recovery thread: exit status: 0 There is no other messages in my syslog related to the parity check. I don't actually know what happens now. Unraid says there were 5 errors, but all my disks are still active, nothing's simulated or whatever. What action did Unraid take once these errors appeared? As it was non-correcting, did nothing happen? It just reports them and that's it? Which files are these in particular - is there a way to know? Do I need to know? I don't know what action I take once errors appear in the non-correcting parity check. Quote Link to comment
JorgeB Posted May 27, 2022 Share Posted May 27, 2022 Just now, cybersteel8 said: It just reports them and that's it? Yes. Unless you have pre-existing checksums or are using btrfs there's no way of knowing if the sync errors are on the data or parity, in that case you should just run a correcting check. Quote Link to comment
cybersteel8 Posted May 28, 2022 Author Share Posted May 28, 2022 12 hours ago, JorgeB said: using btrfs I am using btrfs - how do I find out if the errors are in data or parity? I understand that a correcting parity check will update parity, but what if the errors are in the data, not the parity? How would I fix the data using the parity information? Quote Link to comment
itimpi Posted May 28, 2022 Share Posted May 28, 2022 41 minutes ago, cybersteel8 said: How would I fix the data using the parity information? You would have to force Unraid to rebuild a complete drive - you cannot do anything at a lower level of granularity with parity. Much easier is restore any affected files from your backups. Quote Link to comment
JorgeB Posted May 28, 2022 Share Posted May 28, 2022 2 hours ago, cybersteel8 said: I am using btrfs - how do I find out if the errors are in data or parity? Run a scrub on all array drives, if no corruptions are found run a correcting check. Quote Link to comment
cybersteel8 Posted May 30, 2022 Author Share Posted May 30, 2022 (edited) On 5/28/2022 at 3:29 PM, itimpi said: You would have to force Unraid to rebuild a complete drive - you cannot do anything at a lower level of granularity with parity. Much easier is restore any affected files from your backups. I still don't know how to determine which files are affected. I don't even know which drive I would want to rebuild, if I took this approach. How do I know which files are affected? On 5/28/2022 at 5:06 PM, JorgeB said: Run a scrub on all array drives, if no corruptions are found run a correcting check. Thanks. I have done this on all my array drives, and no errors were found on any of them. Does this mean the error was on the parity drive and not on my data drives? Do I know this with confidence? Edited May 30, 2022 by cybersteel8 Quote Link to comment
trurl Posted May 30, 2022 Share Posted May 30, 2022 Have you had unclean shutdown since last good parity check? If you haven't rebooted since this parity check get diagnostics so it can be compared to the next. Quote Link to comment
cybersteel8 Posted May 30, 2022 Author Share Posted May 30, 2022 3 minutes ago, trurl said: Have you had unclean shutdown since last good parity check? My last good parity check was a month ago, and unfortunately I have rebooted since then, as I upgraded from 6.9.2 to 6.10.1 and that requires a reboot. It was not unclean. Quote Link to comment
JorgeB Posted May 30, 2022 Share Posted May 30, 2022 4 hours ago, cybersteel8 said: Do I know this with confidence? If no checksum errors were found you can have confidence that the data is OK, so just correct parity. Quote Link to comment
Solution trurl Posted May 30, 2022 Solution Share Posted May 30, 2022 Just to make sure there isn't something else causing sync errors, you should run a correcting check, followed by a non-correcting check, all without rebooting. If there are still sync errors after the non-correcting check then diagnostics would be needed to compare the checks in case you have some other hardware (RAM or something) causing these. Quote Link to comment
cybersteel8 Posted June 2, 2022 Author Share Posted June 2, 2022 On 5/30/2022 at 5:01 PM, JorgeB said: If no checksum errors were found you can have confidence that the data is OK, so just correct parity. Thanks. Due to the following test, I have now run a correcting parity check, and it corrected the 5 errors. On 5/30/2022 at 9:39 PM, trurl said: Just to make sure there isn't something else causing sync errors, you should run a correcting check, followed by a non-correcting check, all without rebooting. I have done this and the results seem successful. After the correcting check, I ran a non-correcting check immediately without rebooting at all. There were no errors detected in the non-correcting check. Would the diagnostics still be useful, considering the second parity check resulted in no errors? I am unsure if it is safe to assume that my hardware is fault-free, but I suppose if I keep getting parity errors, diagnostics would become more useful. I think this is something I will keep an eye on, but not worry about. I'd like your opinion on this. -- It seems the answer to my original question is that, when a non-correcting parity check results in errors, no action is taken by Unraid. A second parity check will show the same errors. The action that ought to be taken by the user is to run a btrfs scrub on all of the array drives to determine if there are any problems with the data on the drive. If the btrfs scrubs result in no errors, then the errors are on the parity drive, so a correcting parity check is appropriate. Another non-correcting parity check after correcting parity is appropriate, to ensure that parity is indeed in a valid state. "If there are errors as a result of the scrub, then that drive should be rebuilt from parity, as the parity mismatch is from the data not the parity" - This is my assumption. I would appreciate a comment on this to know if my assumption is correct. Quote Link to comment
itimpi Posted June 2, 2022 Share Posted June 2, 2022 2 minutes ago, cybersteel8 said: If there are errors as a result of the scrub, then that drive should be rebuilt from parity, as the parity mismatch is from the data not the parity" - This is my assumption. I would appreciate a comment on this to know if my assumption is correct. At one level that is correct, but if you have backups it would normally be MUCH quicker to simply restore the files identified by the scrub as being corrupt. Quote Link to comment
JorgeB Posted June 2, 2022 Share Posted June 2, 2022 2 hours ago, cybersteel8 said: Would the diagnostics still be useful, considering the second parity check resulted in no errors? Could still be, if for example you're suing a controller or other hardware that has known issues. 1 Quote Link to comment
cybersteel8 Posted June 2, 2022 Author Share Posted June 2, 2022 2 hours ago, itimpi said: At one level that is correct, but if you have backups it would normally be MUCH quicker to simply restore the files identified by the scrub as being corrupt. Ah, so a btrfs scrub actually documents the names of the files that are corrupt? That's incredibly useful to know, thanks! Quote Link to comment
JorgeB Posted June 2, 2022 Share Posted June 2, 2022 1 hour ago, cybersteel8 said: so a btrfs scrub actually documents the names of the files that are corrupt? Yes, it will list them in the syslog. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.