Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

uncorrectable parity errors ?

Featured Replies

  • Community Expert

Running checks multiple times always give me entries like these:

 

Aug  4 08:16:21 F kernel: mdcmd (36): check correct
Aug  4 08:16:21 F kernel: md: recovery thread: check P ...
Aug  4 08:51:21 F kernel: md: recovery thread: P corrected, sector=796917656
Aug  4 08:51:21 F kernel: md: recovery thread: P corrected, sector=796917664
Aug  4 08:51:21 F kernel: md: recovery thread: P corrected, sector=796917672
Aug  4 08:51:21 F kernel: md: recovery thread: P corrected, sector=796917680
Aug  4 08:51:21 F kernel: md: recovery thread: P corrected, sector=796917688

 

These 5 errors are always the same, there is no other message, no read errors, no SMART error.

It says "P corrected" but thats obviously a lie.

 

What to do?

 

f-diagnostics-20220804-0908.zip

  • Community Expert

That suggests a problem with a controller or disk, unfortunately no easy way to tell which without testing one by one.

  • Author
  • Community Expert

now, on the 5th run (thank god, it only takes an hour to get to the range where the "errors" showed up) the messages are vanished (at least until now, but the region is passed already).

But why did it take 4 runs that have failed to correct them?!?!?!?

 

And, why did they show up at all (there was no power outage or unproper shutdown). ?

 

Its a bit strange...

 

1 hour ago, JorgeB said:

a problem with a controller or disk

I've ordered some new 18Tb drives to replace and old and "slow as a dog" 10Tb one. But I have my doubts that this will solve the problem with the parity.

Controller is no problem too, the box has 4 seperate SATA controllers, each one only has one drive attached yet. I will put the new drives onto one of the other controllers and do not use the 4th for now anymore.

 

  • Community Expert

disk3 has only passed short self-test, no other disks have had self-tests.

  • Author
  • Community Expert
20 minutes ago, trurl said:

disk3 has only passed short self-test, no other disks have had self-tests.

This is only an optical illusion. I do my long term burn in tests for new or reused drives in a seperate machine. So normal operation is not interrupted.

If the disks do not show any problems, they can move over to the production machine.

Disk3 showed some slowdowns recently, therefor I did run some short tests.

It will be replaced soon. The replacements should arrive tomorrow and after a day of testing or so, d3 will be pulled out and a new drive will be put in.

 

(But besides those slowdowns (which also have vanished currently, very mystical...) there were no read errors, no seek errors and no sector reassignments)

 

And the main question is still unsolved: why was the parity not corrected even if UNRAID tells me that it has happened?

 

 

  • Community Expert
8 minutes ago, MAM59 said:

why was the parity not corrected even if UNRAID tells me that it has happened?

No evidence it wasn't correct, just that there were errors again in the next check, this was for example common for some users using a SAS2LP controller with some disks, after every check there were the same 5 sync errors.

  • Author
  • Community Expert

hmm, I dont have any SAS2LP controller, just plain SATA ones (2 onboard and 2 simple 4port ones taken from the recommendations of this board).

The only SASlike thingy here is the backplane, it uses 4 SAS connectors (4 drives each) running with "reverse SATA to SAS" cables.

 

For now I will forget about those "errors", but I will monthly check if they are reappearing.

 

  • Community Expert
3 hours ago, MAM59 said:

This is only an optical illusion.

Apparently not the same as the self-tests the drive firmware does, since your tests are not logged in SMART report. And burn-in tests of course don't say anything about how things are working currently.

 

15 minutes ago, MAM59 said:

For now I will forget about those "errors", but I will monthly check if they are reappearing.

Exactly zero sync errors is the desired result. If parity isn't all correct how can it be expected to rebuild all of a disk correctly?

  • Author
  • Community Expert
1 minute ago, trurl said:

Exactly zero sync errors is the desired result.

the current run has zero errors.

For a long period I had zero errors on every monthly check.

The 5 one just appeared again this week and did not vanish after the first three rerun tries.

Now in the fourth run they are gone again.

I had these mystical 5 errors (dunno anymore if they were the same sectors) early last year already. And, like now, they went away after a few retries. And stayed away for almost a year.

This is rather strange behaviour I think, so I asked here.

But of course, all this may be Murphy' Law #452 ("Shit happens") and just random...

 

  • 4 weeks later...
  • Author
  • Community Expert

and here we go again 😞

New month, old errors:

Sep  1 00:26:24 F kernel: md: recovery thread: P corrected, sector=796917656
Sep  1 00:26:24 F kernel: md: recovery thread: P corrected, sector=796917664
Sep  1 00:26:24 F kernel: md: recovery thread: P corrected, sector=796917672
Sep  1 00:26:24 F kernel: md: recovery thread: P corrected, sector=796917680
Sep  1 00:26:24 F kernel: md: recovery thread: P corrected, sector=796917688

 

The same sectors like before are marked. As usual, if I stop the run ("corrected") and start from scratch, they are gone, but next month they are back again.

 

Whats wrong ??? ???

(again, there are no disk errors, not even warnings, it looks like UNRAID is producing the errors by itself!)

 

  • Community Expert

Most likely a controller or one of the disks, unfortunately no easy way to test unless you start swapping them.

  • Author
  • Community Expert

but it does not even tell WHERE the error should be?

Swapping is not really an Option, I do not have any spare 18Tb drives currently and I do not want to take one out of the backup server.

In general, the "errors" do not worry me much, they seem to be a fata morgana.

What makes me angry is that they show up again after a long time of normal operation (there were no outages, no read/write errors or something else in between. I guess I have booted the box once in that period)

 

  • Community Expert
42 minutes ago, MAM59 said:

but it does not even tell WHERE the error should be?

It's not possible due to how parity works, it's just possible to know that it's wrong.

  • Author
  • Community Expert
11 minutes ago, JorgeB said:

It's not possible due to how parity works, it's just possible to know that it's wrong.

yeah, thats clear but still very unsatisfying...

Obviously there must have been a wrong write to these specific sectors (I dont think there was ANY hardware problem with disk, cable or controller). Maybe the way the parity is calculated is .... hmm... not deterministic?

(what still does not answer if it happens on writes or reads)

But always the same sector numbers? this cannot be accidentally.

 

  • Community Expert
3 hours ago, MAM59 said:

Maybe the way the parity is calculated is .... hmm... not deterministic?

It's deterministic and a very simple calculation. It must be getting different input data due to hardware issue.

  • Author
  • Community Expert
1 hour ago, trurl said:

It must be getting different input data due to hardware issue.

Almost impossible. But then...

 

Another try:

  • What is the difference between running the check automatically or starting it manually ?

(Yeah, I know, there SHOULD be NONE, but if I wait for the autostart every month I do get this 5 errors, starting a run manually gives me ZERO errors...)

 

  • Community Expert

It should be the same, disable the scheduled check and run a manual one next month, note that some of these type of errors, when they are controller related, have been known to only happen after a reboot, i.e., if you run two consecutive checks without rebooting there won't be any errors, if you reboot there will be.

  • Author
  • Community Expert
1 minute ago, JorgeB said:

have been known to only happen after a reboo

hmm... sounds not really convincing... but ok, I will disable the scheduler and launch it manually next time.

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.