Parity check error

February 2, 20179 yr

I had my monthly parity check and it came back w/ 1 error. So i ran it again w/ correct error checked it and then ran it again w/o the error check and I'm still getting

Event: unRAID Parity check

Subject: Notice [TOWER] - Parity check finished (1 errors)

Description: Duration: 18 hours, 34 minutes, 18 seconds. Average speed: 119.7 MB/s

Importance: warning

I had this last month as well as I tend to run monthly parity checks.

I'm currently at work and will post syslog when I get home. I'm not sure why i keep getting this.

Thanks

Quote

February 2, 20179 yr

... will post syslog when I get home...

Please don't post syslog.

Go to Tools - Diagnostics and post complete diagnostics zip.

Quote

February 2, 20179 yr

Author

Diagnostics attached.

tower-diagnostics-20170202-1851.zip

Quote

February 3, 20179 yr

Your syslog only shows two parity checks, not three as you suggested. First, a scheduled check runs at midnight:

Feb 1 00:00:01 Tower kernel: mdcmd (59): check

Feb 1 00:00:01 Tower kernel: md: recovery thread: check P ...

...

Feb 1 01:27:36 Tower kernel: md: recovery thread: P corrected, sector=1072051656

...

Feb 1 18:52:49 Tower kernel: md: sync done. time=67967sec

Feb 1 18:52:49 Tower kernel: md: recovery thread: completion status: 0

then a manual check is run:

Feb 1 19:41:34 Tower kernel: mdcmd (64): check correct

Feb 1 19:41:34 Tower kernel: md: recovery thread: check P ...

...

Feb 1 21:09:09 Tower kernel: md: recovery thread: P corrected, sector=1072051656

...

Feb 2 14:15:52 Tower kernel: md: sync done. time=66858sec

Feb 2 14:15:53 Tower kernel: md: recovery thread: completion status: 0

and the log ends a few hours later so I don't see your third, non-correcting check.

Now, what I find odd is that while your manual, correcting check puts

Feb 1 19:41:34 Tower kernel: mdcmd (64): check correct

in the log, your automated, non-correcting check puts

Feb 1 00:00:01 Tower kernel: mdcmd (59): check

in the log. Compare that with the output from my server running the same version 6.2.4 of unRAID when it starts its automatic monthly non-correcting check:

Feb 1 05:00:01 Northolt kernel: mdcmd (216): check NOCORRECT

Feb 1 05:00:01 Northolt kernel:

Feb 1 05:00:01 Northolt kernel: md: recovery thread: check P Q ...

Is that difference significant? Is it because you have single parity and I have dual? I can't find anything else of any relevance, though a couple of other, probably unrelated issues caught my eye.

Every day at 04:00 the mover tries to move your system folder from the array to the cache but fails due to files being in use:

Jan 31 04:00:01 Tower root: mover started

Jan 31 04:00:01 Tower root: moving "s..m" to cache

Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system/docker (39) Directory not empty

Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system/docker Directory not empty

Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system/libvirt (39) Directory not empty

Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system/libvirt Directory not empty

Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system (39) Directory not empty

Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system Directory not empty

Jan 31 04:00:01 Tower root: mover finished

I recommend that you put it out of its misery once and for all by stopping your dockers and then stopping the docker service before running the mover manually. Then you can re-enable the docker service and start your dockers again.

The other thing I see is a lot of this:

Jan 31 20:56:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty

Jan 31 20:56:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty

Jan 31 20:57:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty

It appears hundreds and hundreds of times. Perhaps a restart will fix it.

I don't see any disk or controller issues. I suggest you reboot, sort out your system share, and then run another parity check. It might as well be a correcting one. When it has finished grab a new set of diagnostics and post them.

Quote

February 3, 20179 yr

Both parity checks on the log are check correct, for some reason the scheduled one only shows check, but you can tell both were correct because the sync error was corrected, as in:

Feb 1 01:27:36 Tower kernel: md: recovery thread: P corrected, sector=1072051656

Same error being corrected twice rules out a memory issue and other random factors, but disks look fine so no clue.

PS: Unrelated but first time I've noticed this SMART attribute:

22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100

Curious to see how it holds up as years go by.

Quote

February 3, 20179 yr

This flip-flopping of the same sector back and forth has been seen before.

If I read this correctly, a correcting check found one error and fixed it.

And a second correcting check ran and found the same error and fixed it.

I believe, in truth, the first error was a mis-detection. Could have been caused by a memory error, a disk error, cable issue, or something else. But the misdetection caused parity to be wong. One mistake in how many memory write? Seems unlikely, but it does happen.

The second "fix" was correct, it was correctly undoing the first fix.

I think I remember this did turn out to be a memory error. ECC memory is a wonderful thing.

Quote

February 3, 20179 yr

The second "fix" was correct, it was correctly undoing the first fix.

I think I remember this did turn out to be a memory error. ECC memory is a wonderful thing.

Didn't think of that, but it makes sense, OP time to run memtest.

Quote

Parity check error

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)