Jump to content

[ALL] Parity Check Correct Errors by Default


WizADSL

Recommended Posts

Not sure if this topic belongs here, please move as needed.

 

It has been my experience that the process of checking array parity is one of the more stressful activities the array typically performs and drives/hardware seem more prone to failure while the parity process is running.  The reason for this post is that I've changed my scheduled parity check to NOT correct parity errors because if a drive fails (or starts to) you'll end up with parity that is valid to the array ("parity valid") but not useful to reconstruct a failed drive.  This has happened to me more than once but thankfully it was a controller/power problem so the drives did not actually have an issue but parity had to be rebuilt.  Should the default parity check be changed to NOT correct errors by default?  I'm curious to know if anyone else has had a similar experience.

Link to comment

As I mentioned in my original post, I have my scheduled check to be non-correcting.  One concern I have is the automatic parity check that occurs when the array comes up "dirty".  If the previous shutdown was caused by a drive going bad (which shouldn't but does happen) then the automatic parity check can invalidate your parity.  Is there any way to make the automatic check non-correcting?  In all cases where a parity check might fail I'd like to evaluate the situation before I allow any corrections just in case a faulty drive or other hardware is the cause.

Link to comment
12 minutes ago, WizADSL said:

As I mentioned in my original post, I have my scheduled check to be non-correcting.  One concern I have is the automatic parity check that occurs when the array comes up "dirty".  If the previous shutdown was caused by a drive going bad (which shouldn't but does happen) then the automatic parity check can invalidate your parity.  Is there any way to make the automatic check non-correcting?  In all cases where a parity check might fail I'd like to evaluate the situation before I allow any corrections just in case a faulty drive or other hardware is the cause.

How do you go about evaluating the situation? 

Literally. I have no idea what I would actually do to make a decision in any scenario where the parity check fails for some reason. 

 

Or is there a guide somewhere? 

I've been all over the original wiki and that gave me a basic understanding of the parity check, but that was it.

Link to comment
1 hour ago, whipdancer said:

How do you go about evaluating the situation? 

Literally. I have no idea what I would actually do to make a decision in any scenario where the parity check fails for some reason. 

Examine syslog during the error and smart reports for all involved drives. Sometimes you really can't find a direct cause, and as long as you can't pin down any corrupt files, a correcting check is the best option.

 

If you just had a hard shutdown with active writes, then a correcting check is probably warranted. The parity drive is lowest priority for writes, so it's quite feasible for the data drive to be correct and the parity a step behind. That's why a correcting check is set as default after a hard shutdown. In almost all cases, it's the correct thing to do if your hardware is healthy.

 

It's the failing hardware scenario where things get sticky, and hopefully you have some warning that things may be going sideways before stuff goes completely bonkers. Unfortunately failing hardware likes to cause hard shutdowns, so I prefer to NOT set the array to auto start, so I have a chance to look things over EACH and EVERY time the machine is fired up.

Link to comment
7 minutes ago, jonathanm said:

Examine syslog during the error and smart reports for all involved drives. Sometimes you really can't find a direct cause, and as long as you can't pin down any corrupt files, a correcting check is the best option.

 

If you just had a hard shutdown with active writes, then a correcting check is probably warranted. The parity drive is lowest priority for writes, so it's quite feasible for the data drive to be correct and the parity a step behind. That's why a correcting check is set as default after a hard shutdown. In almost all cases, it's the correct thing to do if your hardware is healthy.

 

It's the failing hardware scenario where things get sticky, and hopefully you have some warning that things may be going sideways before stuff goes completely bonkers. Unfortunately failing hardware likes to cause hard shutdowns, so I prefer to NOT set the array to auto start, so I have a chance to look things over EACH and EVERY time the machine is fired up.


Maybe I missed it, but this summary is pretty much exactly what I (think I was) missing.

Link to comment
1 hour ago, whipdancer said:

Maybe I missed it, but this summary is pretty much exactly what I (think I was) missing.

That's how I run my servers - I don't autostart them and my scheduled parity checks are non-correcting. That way I stay in control. I think a lot of users beg to differ on both counts though.

Link to comment

I agree with what jonathanm has suggested but I would still like a way (Limetech?) to set the automatic parity check after an unclean shutdown to be non-correcting.  I had been thinking about writing a post about this for quite some time but kept putting it off.  A few days ago a came home to an array that had been working fine earlier that day but was unresponsive.  I forced it to power off and then turned it back on.  Everything seemed fine so I walked away and suddenly remembered that a parity check would have probably been kicked off so I want to check it.  To my surprise about 2 gigabytes in one of my drives had 450 or so read errors and go taken offline.  I stopped the parity check and the array, the drive that was marked bad was still online and showed no problems in the SMART attributes (surprising how often this happens).  I brought the system down and replaced the drive, it was rebuilt from parity and everything is fine.  If that drive had started reading invalid data and that had been written to parity this would not have turned out well for me.  The point is that even manual array starts (when doing so will immediately start a parity check due to a dirty shutdown) could still be dangerous if a drive has problems right away as it did in my case.

Link to comment
5 minutes ago, WizADSL said:

The point is that even manual array starts (when doing so will immediately start a parity check due to a dirty shutdown) could still be dangerous if a drive has problems right away as it did in my case.

But with a manual array start you have the option to do a thorough check first. In the situation where you know that starting the array will force a parity check, have you tried rebooting before starting the array? The last time I tried it it didn't start an automatic parity check when the array was eventually started and I had the opportunity of kicking off a manual non-correcting check instead, which found no errors. I don't know if that's a bug (if so, it might have been fixed), or by design, but I mention it because it might be useful.

Link to comment
1 minute ago, trurl said:

Isn't the unclean shutdown parity check non-correcting on recent versions of Unraid? I haven't had one in a very long time but I seem to remember this change sometime in the 6.x versions.

I don't actually know.  I assumed (shame on me) that it was a correcting check because that is the default everywhere else.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...