Jump to content
autumnwalker

Parity Check finding 1 error

21 posts in this topic Last Reply

Recommended Posts

Is it a correcting check? If so, run a non-correcting check and see if it's still there.

 

If it's a non-correcting check, verify the health of all the array drives then run a correcting check. Follow up with a non-correcting check to see if it goes away.

 

Don't run a correcting check with a known bad drive involved.

Share this post


Link to post

It was a non-correcting check.

 

By validate health do you mean "green balls"? If so, each is showing healthy. None have failed SMART.

Share this post


Link to post
12 minutes ago, autumnwalker said:

By validate health do you mean "green balls"?

That's one sign of health. When you click on each drive in the main GUI it takes you to a page that lists smart attributes and allows you to run smart tests and view the results. In the ideal world you would want each drive to have recently passed a long smart test. However, those take several hours to complete, and don't need to be run regularly since unraid reads each sector during a parity check.

 

Drive health is a complicated subject.

Share this post


Link to post
16 hours ago, autumnwalker said:

Just ran my monthly parity check. It found exactly one error. Thoughts?

Assuming it was non correct run another one, to see if still finds the same error, single error could be from a bit flip, if not using ECC RAM, if it was correct still run again and if it was a bit flip it should also find 1 error.

 

Also check or post the syslog around the time the error happened, if no disk related entries unlikely for them to be the reason.

Share this post


Link to post

Log from start to finish of the parity check (non-correcting) here:

 

Oct 8 22:58:37 nas01 kernel: mdcmd (64): check nocorrect

Oct 8 22:58:37 nas01 kernel: md: recovery thread: check P ...

Oct 8 22:58:37 nas01 kernel: md: using 1536k window, over a total of 3907018532 blocks.

Oct 9 02:26:14 nas01 afpd[5310]: Reading IPC header failed (-1 of 14 bytes read): Connection reset by peer

Oct 9 10:33:18 nas01 kernel: md: recovery thread: P incorrect, sector=4646561192

Oct 9 11:18:28 nas01 kernel: md: recovery thread: completion status: 0

 

System using non ECC RAM.

Edited by autumnwalker

Share this post


Link to post

So run another non-correcting parity check and see if I get the same error as below?

 

Oct 9 10:33:18 nas01 kernel: md: recovery thread: P incorrect, sector=4646561192

Share this post


Link to post
6 hours ago, autumnwalker said:

So run another non-correcting parity check and see if I get the same error as below?

Correct, if the first one was also non-correct and you don't get an error it was likely a memory bit flip, if it was correcting you should now get the same error.

Share this post


Link to post

Non correcting returned same error: Oct 10 22:12:58 nas01 kernel: md: recovery thread: P incorrect, sector=4646561192

Share this post


Link to post

As I see it, if all your disks are healthy, your only real option is to run a correcting parity check. That will update the parity disk to match what's on the data disks, making the assumption that the error is in the parity since there is no way of knowing which actual disk is in error. How valid that assumption is is debatable. Before you do run a correcting parity check though you might want to do either or both of the following:

  • Post your diagnostics or check the SMART data of your disks yourself
  • Run a MemTest to check for bad RAM

You might want to consider the use of a checksumming technique to detect data corruption going forward. It won't help with your current situation but it might help in the future. I use the Dynamix File Integrity plugin. Some people use btrfs as the format on their data disks.

Share this post


Link to post

SMART is coming back clean on each disk.

 

Is there any way to memtest with the system live (afaik there is not).

 

I will check out the FIle Integrity plugin, but I'm still on reiserfs so I cannot use that until I migrate. Yet another reason for me to migrate now.

Share this post


Link to post

Running MemTest86 involves a reboot, I'm afraid.

Share this post


Link to post

Like mentioned best option now is to run a correcting check, if it was bad ram it wouldn't be finding an error on the exact same sector, so not so clear what caused the error, hopefully a single event and no more will be detected in the near future, one of those situations where having checksums is valuable.

Share this post


Link to post

Correcting check ran, fixed one error. Just ordered a new drive to start migrating away from reiserfs and I'll look at checksums.

 

I suspect this was related to my failing PSU which was causing my SATA card to crap out (remember that?).

Share this post


Link to post
2 hours ago, autumnwalker said:

I suspect this was related to my failing PSU

Failing PSUs can be responsible for the most obscure problems. I think that's a very plausible explanation.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.