Jump to content

Errors on Parity Check


Ross

Recommended Posts

I have been running Unraid for years without any issue, but the day has finally arrived.  A scheduled parity check finished but reported:

Last check completed on Wed 02 Feb 2022 03:30:45 PM CST (yesterday)
Finding 511132 errors  Duration: 14 hours, 30 minutes, 44 seconds. Average speed: 153.2 MB/sec

 

I have three 8TB WD Red drives, and a small SSD cache drive.  

 

There are no SMART errors reported and all disks report as "Healthy".

 

Parity was reported as Valid.

 

When the errors were emailed to me I did shutdown the server, because I did not know when I would have time to get to it.  I guess that was a mistake since it emptied the log files...   So I think I need to run another parity check, without correction, and see what I get?   

 

I also think I may want to get a second parity drive after correcting this problem, because I realize how important it might be to identify which disk is having issues...

 

I thought having parity check report errors would be a common problem and that there would be a guide setup for my scenario, but under the troubleshooting areas, I did not yet find a guide for my situation.  Is there one out there?

 

Thanks in advance for any help!

 

Ross

Link to comment

I manually started a parity check and immediately got errors:

Feb 3 11:31:59 WServer1 kernel: md: recovery thread: check P ...
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=2488
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=12720
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=14768
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=15792
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=16824
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=10680
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=24656
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=30192
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=39408
Feb 3 11:32:00 WServer1 kernel: md: recovery thread: P incorrect, sector=47264

Link to comment

I can do that, but I have figured out that there were two correcting parity checks run, and both reported a lot of errors (first time was a week ago), and the server was not rebooted between those two runs.  The latest non-correcting one I ran for a few minutes and aborted and it logged over 3k errors.   I realize now that my scheduled checks should not have correcting on, but they did.n.  So have have had repeat of this error.

 

So I can run two more back to back, will take over a day, but I am wondering if the above information confirms the issue?.   I suppose now that I have run two correcting ones already, that if I corrupted my data, it is probably already too late?  That's why my next addition will be a secondary parity drive...

 

Thanks again for any help!

 

Ross

Link to comment

So after rebooting the array I ran a correcting parity check and it corrected thousands of errors.  I have since run two non- correcting parity checks and they finished without errors. None of the drives have any SMART errors reported.  
 

After years of no parity errors, I’m contemplating about this situation.  The array is mostly a media file server and some backup storage, so not a lot of write activity.  
 

So when parity is “corrected”, I’m not sure I understand how it determines truth.  If there were drive errors, then I could see how I might figure out which drive was the source of error.   I’m assuming that either a data drive or the parity drive could develop an issue.  
 

So even in my small array I was thinking now that having a second parity drive might help figure out which drive has the “bad” data needing to be corrected.

 

I also saw discussions about an app that calculates a checksum that allows detection of file degradation aside from disk failure.

 

So my server seems fine now but I’m not sure I did not lose something…

 

Is a second parity drive a good idea?

 

Thanks for any guidance!

 

Ross

Link to comment

Thanks for the quick reply.
 

But is that assumption, that the data drives are always good, valid?  
 

Why do we believe the data drives are always valid and only the parity drive can be wrong?

 

Maybe the app that creates file checksums is the way to go…

 

It’s a little disconcerting to see over 500,000 errors need correction on a system that aside from auto updates of the software is just being read.   And the drives are old enough that failure is definitely getting to be more likely soon.

 

I’m not sure how I can validate that I do not have file corruption…

 

Thanks again for any information or suggestions.

 

Ross

Link to comment
14 minutes ago, Ross said:

Why do we believe the data drives are always valid and only the parity drive can be wrong?

Because there if there is a problem on a drive there is no way to identify which one it might be.


 

17 minutes ago, Ross said:

 

I’m not sure how I can validate that I do not have corruption.


With modern drives the assumption is that they will return an error if they do not read a sector successfully, but it is possible that is not always the case.

 

checksums is the only way to be certain (either built into the file system or via an add on are the only way to validate this.

 

 

Link to comment

Got it.

 

I was thinking that a dual parity drive setup would allow the system to figure out which drive was the “odd man out”, and therefore the one drive needing to be corrected.  But that would not be true if the parity process had an issue and wrote the wrong parity to both parity drives.

 

I think I’m going to look at that file checksum program to see if that is the way to go.  
 

What is the theoretical cause of a stable array (for years) developing over 500,000 parity errors in one week?

 

Ross

Link to comment
  • 1 year later...
On 2/6/2022 at 3:01 PM, itimpi said:

When ‘correcting’ parity the assumption is that the data drives are good and parity needs to be updated to match.   This is the same whether you have single or dual parity, in neither case is a problem drive identifiable as the cause of a parity error.

Hello, 

Came across parity sync errors as well
There are no disk errors at the moment.
I am having concerns pinpointing what may be the cause of this.

 

I did run into a concern last weekend when updating a plugin

  • the GUI download halted.
  • I performed a clean shutdown of the server
  • rebooted the server
  • the USB /sda was erroring or corrupted (server would not boot)
  • I pulled the USB and restored my backup on a WinOS system
  • booted the server and ran a parity check
  • the first run saw 2 errors
  • yesterdays scheduled run sees 592 errors

image.png.ce339bd32979c8cc1661363999037994.png
 

Wondering if I should be concerned with this, or where the errors are. 
Are they possible sync errors? 

Looking forward to some clarification.

Thank you kindly,

Cheers

diagnostics-20230303-0705.zip

Link to comment
On 3/3/2023 at 7:43 AM, JorgeB said:

Yes, run another check and post new diags if new errors are found.

Hello,

Appreciate the prompt follow-up.
I will kick off another parity sync, or possibly wait for the next monthly scheduled sync to see how things pan out.

I will report back regardless of the findings.

Thank you kindly!

Link to comment
  • 1 month later...
On 3/3/2023 at 7:43 AM, JorgeB said:

Yes, run another check and post new diags if new errors are found.

Hello again, 

 

Ran another pass on both my servers and this monthly pass showed no errors reposted. Strange why they produced in the first place.
As always I appreciate the feedback, support and assistance with this inquiry. 

 

Thank you again!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...