Parity Check finding 1 error


Recommended Posts

Is it a correcting check? If so, run a non-correcting check and see if it's still there.

 

If it's a non-correcting check, verify the health of all the array drives then run a correcting check. Follow up with a non-correcting check to see if it goes away.

 

Don't run a correcting check with a known bad drive involved.

Link to comment
12 minutes ago, autumnwalker said:

By validate health do you mean "green balls"?

That's one sign of health. When you click on each drive in the main GUI it takes you to a page that lists smart attributes and allows you to run smart tests and view the results. In the ideal world you would want each drive to have recently passed a long smart test. However, those take several hours to complete, and don't need to be run regularly since unraid reads each sector during a parity check.

 

Drive health is a complicated subject.

Link to comment
16 hours ago, autumnwalker said:

Just ran my monthly parity check. It found exactly one error. Thoughts?

Assuming it was non correct run another one, to see if still finds the same error, single error could be from a bit flip, if not using ECC RAM, if it was correct still run again and if it was a bit flip it should also find 1 error.

 

Also check or post the syslog around the time the error happened, if no disk related entries unlikely for them to be the reason.

Link to comment

Log from start to finish of the parity check (non-correcting) here:

 

Oct 8 22:58:37 nas01 kernel: mdcmd (64): check nocorrect

Oct 8 22:58:37 nas01 kernel: md: recovery thread: check P ...

Oct 8 22:58:37 nas01 kernel: md: using 1536k window, over a total of 3907018532 blocks.

Oct 9 02:26:14 nas01 afpd[5310]: Reading IPC header failed (-1 of 14 bytes read): Connection reset by peer

Oct 9 10:33:18 nas01 kernel: md: recovery thread: P incorrect, sector=4646561192

Oct 9 11:18:28 nas01 kernel: md: recovery thread: completion status: 0

 

System using non ECC RAM.

Edited by autumnwalker
Link to comment

As I see it, if all your disks are healthy, your only real option is to run a correcting parity check. That will update the parity disk to match what's on the data disks, making the assumption that the error is in the parity since there is no way of knowing which actual disk is in error. How valid that assumption is is debatable. Before you do run a correcting parity check though you might want to do either or both of the following:

  • Post your diagnostics or check the SMART data of your disks yourself
  • Run a MemTest to check for bad RAM

You might want to consider the use of a checksumming technique to detect data corruption going forward. It won't help with your current situation but it might help in the future. I use the Dynamix File Integrity plugin. Some people use btrfs as the format on their data disks.

Link to comment

Like mentioned best option now is to run a correcting check, if it was bad ram it wouldn't be finding an error on the exact same sector, so not so clear what caused the error, hopefully a single event and no more will be detected in the near future, one of those situations where having checksums is valuable.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.