[SOLVED] parity errors on check; same locations

May 8, 201214 yr

Unraid 4.6; 10 drives (first logical/physical 6 through onboard SATA, then 2 through Sil3132, then 2 through Sil3132).

I had no parity drive for a long time. (long story, no lectures please; I understand the value) But, with this config, I had previously used parity drives with parity checks, and it would check cleanly.

Now:

- I do a parity check, and I come up with 25 or so errors.

- I repeat the check, and there are 24 errors, in the same places.

- I change memory from 16GB down to a smaller 1GB config that previously also cleanly worked

- I do a parity check, and I get 7 errors

- Repeat the parity check, same 7 errors in the same locations:

root@Tower:/var/log# cat syslog | grep parity

May 7 22:28:52 Tower kernel: md: recovery thread checking parity...

May 7 22:29:06 Tower kernel: md: parity incorrect: 1457392

May 7 22:30:32 Tower kernel: md: parity incorrect: 11666128

May 7 22:31:07 Tower kernel: md: parity incorrect: 16181704

May 7 22:31:09 Tower kernel: md: parity incorrect: 16373960

May 7 22:34:02 Tower kernel: md: parity incorrect: 38137720

May 8 00:46:17 Tower kernel: md: parity incorrect: 1033482040

May 8 00:53:42 Tower kernel: md: parity incorrect: 1091494984

What should I look for? I'm grinding on the memory with memtest86+ right now, and I'm thinking later I should grind on it with google's stressapptest (perhaps it's the pathways at fault more than the memory itself).

Quote

May 8, 201214 yr

Unraid 4.6; 10 drives (first logical/physical 6 through onboard SATA, then 2 through Sil3132, then 2 through Sil3132).

I had no parity drive for a long time. (long story, no lectures please; I understand the value) But, with this config, I had previously used parity drives with parity checks, and it would check cleanly.

Now:

- I do a parity check, and I come up with 25 or so errors.

- I repeat the check, and there are 24 errors, in the same places.

- I change memory from 16GB down to a smaller 1GB config that previously also cleanly worked

- I do a parity check, and I get 7 errors

- Repeat the parity check, same 7 errors in the same locations:

root@Tower:/var/log# cat syslog | grep parity

May 7 22:28:52 Tower kernel: md: recovery thread checking parity...

May 7 22:29:06 Tower kernel: md: parity incorrect: 1457392

May 7 22:30:32 Tower kernel: md: parity incorrect: 11666128

May 7 22:31:07 Tower kernel: md: parity incorrect: 16181704

May 7 22:31:09 Tower kernel: md: parity incorrect: 16373960

May 7 22:34:02 Tower kernel: md: parity incorrect: 38137720

May 8 00:46:17 Tower kernel: md: parity incorrect: 1033482040

May 8 00:53:42 Tower kernel: md: parity incorrect: 1091494984

What should I look for? I'm grinding on the memory with memtest86+ right now, and I'm thinking later I should grind on it with google's stressapptest (perhaps it's the pathways at fault more than the memory itself).

How are you invoking the parity "check"

(I'm trying to determine is you are performing a "correcting" check, or a a "nocorrect" check.

You symptoms of a set of blocks being found as incorrect, and on a subsequent check also found could be either:

You are performing a NOCORRECT type of check, and in that case, the exact same blocks will show up again and again until you correct them.

OR

you are performing a correcting type of check, and something in the hardware is intermittent so the FIRST parity check changed parity to what it thought was correct based upon the incorrect returned data AND the second parity check read correct data and so changed parity once more on the same blocks to fix parity as it should be.

A third parity check would then find no errors. (if this was the case)

Joe L.

Quote

May 8, 201214 yr

Author

Actually, funny you mention. The parity check started on its own when I boot the server.

But, at least one of them I manually invoked by pressing the big "check parity" button from the web GUI.

Keep in mind this is 4.6 -- How do I know if what kind of check I'm running? I don't think I saw that in /var/log/ ...

I'm starting to think that in the case of the 2 consecutive, what I saw was "NOCORRECT", followed by an implicit "CORRECT" (from the web gui button). Perhaps a third test now will come back cleanly?

Quote

May 9, 201214 yr

Author

Ok, as Joe suspected, I think what was happening was that unclean shutdowns were inducing a NOCORRECT parity check on startup.

I see the command now in the log (when I press the "parity check" button in the GUI):

May 8 20:30:02 Tower kernel: mdcmd (26): check CORRECT

And this check ran with no errors. So, we're good here. Thanks :-)

Quote

[SOLVED] parity errors on check; same locations

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)