parity errors, possibly the same ones recurring from test to test


Recommended Posts

  • Replies 126
  • Created
  • Last Reply

Top Posters In This Topic

Actually, I thought I should clarify my issue. I don't actually get errors on the parity check... I get them on the parity drive.

 

If I set spindown to never I get a perfect parity check. If I set the spindown to default I get 128 errors on the parity drive almost immediately.

 

In hindsight this is very different than the issue opentoe and JustinChase are getting - though it's interesting that my suggestion on the spindown of the parity drive helped their issue as well - in fact, it helped them more than it does me.

 

Doh...

Link to comment

Hmmm, that is very interesting.  I'm sorry your solution doesn't resolve your issue, but happy it seems to have resolved mine.

 

Hopefully this information helps LT get to the bottom of this.

 

I still don't understand how a spin down setting can have such effects on a parity check, but it sure seems to.

Link to comment

Hmmm, that is very interesting.  I'm sorry your solution doesn't resolve your issue, but happy it seems to have resolved mine.

 

Hopefully this information helps LT get to the bottom of this.

 

I still don't understand how a spin down setting can have such effects on a parity check, but it sure seems to.

 

I agree, though having to leave my parity drive spun up is not that big a deal... it's just a minor annoyance knowing this shouldn't be the case. That said, it would be great to find the solution to this niggling issue.

Link to comment

I had a similar issue regarding spindown>parity errors, mine went away though. I don't know exactly what resolved it. I consolidated lots of smaller drives onto a few larger drives and went to all motherboard sata connections (SASLP currently not in use anymore) and redid the cabling.

Link to comment

I had a similar issue regarding spindown>parity errors, mine went away though. I don't know exactly what resolved it. I consolidated lots of smaller drives onto a few larger drives and went to all motherboard sata connections (SASLP currently not in use anymore) and redid the cabling.

 

Been confirmed really that it is not related to a cabling issue. Ran another parity check and no errors. I'll change the parity drive back to DEFAULT and run another one tomorrow and see what happens.

 

Link to comment

Just thought I'd chime in.

 

I recently upgraded my server to unRAID 6.0.1 and I also now receive 5 parity check errors. (I performed a Clean upgrade, meaning I backed up my v5.0.3 Flash, then reformatted, and loaded a fresh installation of unRAID v6.0.1)

 

I vaguely remember getting errors in v5.0.3 but honestly I can't remember all the details and specifics, and sadly I didn't have much time to research it. (remember 3 being the magical number).  I normally don't run correcting Parity Checks, (Is this something I should have enabled?).  I do BELIEVE I ran correcting parity checks in v5.0.3 after I saw the 3 errors a couple times in a row, BUT I can't remember if I ran subsequent checks after to verify they had been corrected. (I know ...  I'm just so much help..)

 

 

Before I jump into too much detail I'll test out the solutions provided here with disable/re-enable parity Spin-Down, and report back.

 

It's should be noted I used a much different HW configuration than the other posters, and my error sectors are completely different, and I've never checked/compared between Parity checks (This was my first with v6.0.1).  My last Parity check was set to not correct the errors.

 

Sector Errors from my syslog:

 

Jul  1 01:55:37 unRAID kernel: md: parity incorrect, sector=1177606472

Jul  1 01:55:37 unRAID kernel: md: parity incorrect, sector=1177606480

Jul  1 01:55:37 unRAID kernel: md: parity incorrect, sector=1177606488

Jul  1 01:55:37 unRAID kernel: md: parity incorrect, sector=1177606496

Jul  1 01:55:37 unRAID kernel: md: parity incorrect, sector=1177606504

 

I'll start one tonight with error correcting. (Should I also disable the Parity drive Spin-Down? Or would we prefer to test variables one at a time controlled?).

I'll report back my findings, I suspect I'll have the same results.

I'll then run another Parity check, With or Without correcting? With or Without Spin-down on the Parity?

 

Thanks,

 

-D

Link to comment

I vaguely remember getting errors in v5.0.3 but honestly I can't remember all the details and specifics, and sadly I didn't have much time to research it. (remember 3 being the magical number).  I normally don't run correcting Parity Checks, (Is this something I should have enabled?).  I do BELIEVE I ran correcting parity checks in v5.0.3 after I saw the 3 errors a couple times in a row, BUT I can't remember if I ran subsequent checks after to verify they had been corrected. (I know ...  I'm just so much help..)

Whether or not you are experiencing the same issue, it sounds as if you have not really been paying attention to whether or not your parity errors were zero in the past. The only correct answer for parity errors is zero, not some low number. If you don't have some real reason to believe parity is more correct than the data disks you should correct parity errors and then check again to make sure they were corrected.
Link to comment

I vaguely remember getting errors in v5.0.3 but honestly I can't remember all the details and specifics, and sadly I didn't have much time to research it. (remember 3 being the magical number).  I normally don't run correcting Parity Checks, (Is this something I should have enabled?).  I do BELIEVE I ran correcting parity checks in v5.0.3 after I saw the 3 errors a couple times in a row, BUT I can't remember if I ran subsequent checks after to verify they had been corrected. (I know ...  I'm just so much help..)

Whether or not you are experiencing the same issue, it sounds as if you have not really been paying attention to whether or not your parity errors were zero in the past. The only correct answer for parity errors is zero, not some low number. If you don't have some real reason to believe parity is more correct than the data disks you should correct parity errors and then check again to make sure they were corrected.

 

My Parity checks were always Zero, for the 3 years I've had my unRAID server running.  It was only about 3-4 months ago I remember seeing errors, again I think it was 3, over and over.  That being said, I know my data is not in a controlled environment, due to my lack of time to investigate and troubleshoot.  I just found it ironic that I also now have 5 parity check errors, and I am willing to help troubleshoot if I as well may be seeing the same occurrence.

 

I'm not sure I fully understand your reply.  Are you saying I should set the monthly parity to correct errors?  If so, what if hypothetically I have a failing disk and the parity corrects itself with incorrect data? (Again not sure if you were answering that question)

Link to comment

I'm not sure I fully understand your reply.  Are you saying I should set the monthly parity to correct errors?  If so, what if hypothetically I have a failing disk and the parity corrects itself with incorrect data? (Again not sure if you were answering that question)

For the monthly parity check, a non-correcting check is fine for the reason you state, but if there is a parity error then something should be done to correct it, whether by replacing a data disk or doing a correcting parity check. I wasn't sure from your initial description whether or not you had been neglecting parity errors.

 

The apparent parity errors which are the subject of this thread is a different matter, since a correcting parity check does not appear to correct the parity errors. In fact it is not clear that there are any actual parity errors or whether the issue would impact attempts to rebuild a data disk. You may be indeed experiencing the same issue so please continue to contribute to the discussion.

Link to comment

There have been several folks now who have experienced this issue => although the specific addresses noted differ; one thing is very consistent ... the 5 errors are always a set of sectors that differ by 8.

 

The weird "fix" of simply setting the parity drive to not spin down is REALLY strange.  It definitely makes me wonder if there are indeed any errors at all.  [i believe it was Justin who actually tried correcting checks and they had no impact -- which indeed suggests there aren't any real errors to correct => and that this may in fact simply be a reporting glitch.

 

 

Link to comment

... Are you saying I should set the monthly parity to correct errors?  If so, what if hypothetically I have a failing disk and the parity corrects itself with incorrect data?

 

Personally, I never run a non-correcting check.  If there are errors, I want them fixed.  It's far more likely that a parity error is an actual error in the parity bit than a data error on one of your data disks (at least unless that data disk has been reporting errors).    If you have a failing disk with reported errors that's a different story ... in that case you should replace the bad disk and let it rebuild BEFORE you do another parity check.

 

Link to comment

Just a quick update.

 

Ran the correcting parity the night before, unRAID reported errors, and corrected them.

Ran another correcting parity check, without changing drive Spin-Down, no errors reported.

 

Appears my error are/were not of the same as OP.

 

Hope you guys figure this out.

 

-D

Link to comment

I had to shut down the server a few times yesterday, including one hard boot when unRAID froze up completely.

 

This caused a parity check upon startup.

 

Issue has returned :( :( :(

 

*I also suspect it's just a reporting error, not 'real' errors, but still, it's annoying

 

A parity error means that, on those specific sectors, the parity is not aligned with the disk values.

 

This can mean two things - either parity has not being maintained properly, or that a disk is being updated outside of parity protection (e.g., hardware issue, updating disk using its device (like sdf1)).

 

If parity is not being maintained properly, a correcting parity check will fix it. It will align parity to the values on the data disks. This can legitimately happen, for example, if a server crash or power cut.

 

But worse and of more immediate concern, is if one of my drives is being corrupted. I think this is very unlikely, and RobJ (if he is watching), would be giving all sorts of reasons that could not happen. But if it were me, that would be my #1 concern with recurring parity errors.

 

To figure that out, I would create MD5s on all of my disks, and perform checks to see if the MD5s are changing. If they are, that is a very bad problem. Things that come to mind are memory errors, disk controller failures, and other hardware level issues.

 

I have seen, over the years, situations where specific parity locations "flip flop". For example you run a correcting parity check and it reports 5 errors. Then it runs again and the exact same 5 errors are found. What actually this means, is the first parity check actually detected a parity error that did not truly exist, and that correction was wrong. The second parity check works properly, noting that parity is in fact out-of-sync with the data, which are then corrected. This type of thing is often traced back to a hardware issue like a memory error.

Link to comment

A parity error means that, on those specific sectors, the parity is not aligned with the disk values.

 

I suspect you've not read this whole thread ;)

 

Guilty, but after an end-to-end read, I think many of my comments are still relevant.

 

I cannot explain the fact that changing the parity drive spindown would affect parity errors. Never seen that.

 

I cannot explain two separate systems reporting the exact same parity error locations. Firmware bug in a controller?

 

I cannot explain the mysterious "5 parity errors". Nothing special about 5. Having users say "the same 5 parity errors" is an oversimplification.

 

I CAN explain two consecutive parity checks reporting the same issues (see my prior post). This has happened a number of times.

 

I can also explain the difference of 8 between the blocks. Its just the way unRAID works. unRAID is not doing sector level checks - it is checking on blocks of 8 sectors.

 

My concern would continue to be whether files are corrupted. MD5 checks is something I have instituted. I absolutely want to know if my files are getting corrupted (be it related to this or some future issue), and exactly what files are impacted. Users with this problem don't know if parity is just getting broken, or if actual data files are getting scrambled and the parity errors are just the symptom. THIS WOULD BE THE FIRST THING I WOULD DO.

 

Frustrating one. Good luck!

Link to comment

My concern would continue to be whether files are corrupted. MD5 checks is something I have instituted. I absolutely want to know if my files are getting corrupted (be it related to this or some future issue), and exactly what files are impacted. Users with this problem don't know if parity is just getting broken, or if actual data files are getting scrambled and the parity errors are just the symptom. THIS WOULD BE THE FIRST THING I WOULD DO.

 

Frustrating one. Good luck!

 

Yeah, MD5 checking is something on my short list, but I'm not there yet, and suspect it's going to take a while to get the initial data read/setup.

 

After seeing several users reporting the same basic issue, i really feel this is an unRAID reporting problem/bug, not a real data failure problem; but I don't know that for sure either.

 

Hopefully the growing list of affected users, and their supplied diagnostic data will be enough for LimeTech to get to the bottom of this issue.

 

Thinking on it a bit more; I wonder if out spin down settings relate to the time when these sectors 'fail'.

 

I'll change the default spin down setting for my array, then run another test.  Perhaps the sectors failing will change?

Link to comment

A parity error means that, on those specific sectors, the parity is not aligned with the disk values.

 

I suspect you've not read this whole thread ;)

 

Guilty

 

It's also not the only thread discussing this very interesting problem.

 

It's VERY clear that (a) the errors aren't real;  (b)  it's not due to a specific controller (not everyone having the issue has the same controller ... although I don't know if it may be a common SATA chipset);  and © disabling spindown for the parity drive makes it "go away" (REALLY strange !!);  and (d) it's very frustrating to those who are having the issue.

 

The problem is always 5 consecutive blocks;  it doesn't change after a correcting check;  and it "goes away" if  parity drive spindown is disabled (which is why I'm pretty well convinced the errors aren't real).

 

 

... I'll change the default spin down setting for my array, then run another test.  Perhaps the sectors failing will change?

 

Good idea.  If somehow the spindown timer is expiring for the parity drive (it shouldn't ... but these errors also shouldn't be happening) and generating these "errors", then changing the time when that happens may indeed change the sector values.    Won't resolve it -- but if that turns out to be the case; and we know that disabling spindown "fixes" it; then that may be enough of a clue for LimeTech to figure out what's happening.

 

What's really strange is that it's not a much more universal issue => there almost has to be something common amongst those who are having this problem; but it's not at all clear what that might be.

 

 

 

Link to comment

I changed the system spindown from 30 minutes to 45 minutes, parity check finished without any errors.  I have not rebooted again yet to test again, but will try to do so soon.

 

tldr; I changed the parity disk from system default to never the first time I got no errors, changed back and still got not errors; rebooted and errors returned.  this time I changed the system spindown setting from 30 minutes to 45, left the parity disk at system default and got no errors.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.