mickeykool Posted February 2, 2017 Share Posted February 2, 2017 I had my monthly parity check and it came back w/ 1 error. So i ran it again w/ correct error checked it and then ran it again w/o the error check and I'm still getting Event: unRAID Parity check Subject: Notice [TOWER] - Parity check finished (1 errors) Description: Duration: 18 hours, 34 minutes, 18 seconds. Average speed: 119.7 MB/s Importance: warning I had this last month as well as I tend to run monthly parity checks. I'm currently at work and will post syslog when I get home. I'm not sure why i keep getting this. Thanks Quote Link to comment
trurl Posted February 2, 2017 Share Posted February 2, 2017 ... will post syslog when I get home... Please don't post syslog. Go to Tools - Diagnostics and post complete diagnostics zip. Quote Link to comment
mickeykool Posted February 2, 2017 Author Share Posted February 2, 2017 Diagnostics attached. tower-diagnostics-20170202-1851.zip Quote Link to comment
John_M Posted February 3, 2017 Share Posted February 3, 2017 Your syslog only shows two parity checks, not three as you suggested. First, a scheduled check runs at midnight: Feb 1 00:00:01 Tower kernel: mdcmd (59): check Feb 1 00:00:01 Tower kernel: md: recovery thread: check P ... ... Feb 1 01:27:36 Tower kernel: md: recovery thread: P corrected, sector=1072051656 ... Feb 1 18:52:49 Tower kernel: md: sync done. time=67967sec Feb 1 18:52:49 Tower kernel: md: recovery thread: completion status: 0 then a manual check is run: Feb 1 19:41:34 Tower kernel: mdcmd (64): check correct Feb 1 19:41:34 Tower kernel: md: recovery thread: check P ... ... Feb 1 21:09:09 Tower kernel: md: recovery thread: P corrected, sector=1072051656 ... Feb 2 14:15:52 Tower kernel: md: sync done. time=66858sec Feb 2 14:15:53 Tower kernel: md: recovery thread: completion status: 0 and the log ends a few hours later so I don't see your third, non-correcting check. Now, what I find odd is that while your manual, correcting check puts Feb 1 19:41:34 Tower kernel: mdcmd (64): check correct in the log, your automated, non-correcting check puts Feb 1 00:00:01 Tower kernel: mdcmd (59): check in the log. Compare that with the output from my server running the same version 6.2.4 of unRAID when it starts its automatic monthly non-correcting check: Feb 1 05:00:01 Northolt kernel: mdcmd (216): check NOCORRECT Feb 1 05:00:01 Northolt kernel: Feb 1 05:00:01 Northolt kernel: md: recovery thread: check P Q ... Is that difference significant? Is it because you have single parity and I have dual? I can't find anything else of any relevance, though a couple of other, probably unrelated issues caught my eye. Every day at 04:00 the mover tries to move your system folder from the array to the cache but fails due to files being in use: Jan 31 04:00:01 Tower root: mover started Jan 31 04:00:01 Tower root: moving "s..m" to cache Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system/docker (39) Directory not empty Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system/docker Directory not empty Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system/libvirt (39) Directory not empty Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system/libvirt Directory not empty Jan 31 04:00:01 Tower shfs/user0: err: shfs_rmdir: rmdir: /mnt/disk1/system (39) Directory not empty Jan 31 04:00:01 Tower move: rmdir: /mnt/user0/./system Directory not empty Jan 31 04:00:01 Tower root: mover finished I recommend that you put it out of its misery once and for all by stopping your dockers and then stopping the docker service before running the mover manually. Then you can re-enable the docker service and start your dockers again. The other thing I see is a lot of this: Jan 31 20:56:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty Jan 31 20:56:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty Jan 31 20:57:31 Tower shfs/user: err: shfs_rmdir: rmdir: /mnt/cache/appdata/FoldingAtHome/work/02 (39) Directory not empty It appears hundreds and hundreds of times. Perhaps a restart will fix it. I don't see any disk or controller issues. I suggest you reboot, sort out your system share, and then run another parity check. It might as well be a correcting one. When it has finished grab a new set of diagnostics and post them. Quote Link to comment
JorgeB Posted February 3, 2017 Share Posted February 3, 2017 Both parity checks on the log are check correct, for some reason the scheduled one only shows check, but you can tell both were correct because the sync error was corrected, as in: Feb 1 01:27:36 Tower kernel: md: recovery thread: P corrected, sector=1072051656 Same error being corrected twice rules out a memory issue and other random factors, but disks look fine so no clue. PS: Unrelated but first time I've noticed this SMART attribute: 22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100 Curious to see how it holds up as years go by. Quote Link to comment
SSD Posted February 3, 2017 Share Posted February 3, 2017 This flip-flopping of the same sector back and forth has been seen before. If I read this correctly, a correcting check found one error and fixed it. And a second correcting check ran and found the same error and fixed it. I believe, in truth, the first error was a mis-detection. Could have been caused by a memory error, a disk error, cable issue, or something else. But the misdetection caused parity to be wong. One mistake in how many memory write? Seems unlikely, but it does happen. The second "fix" was correct, it was correctly undoing the first fix. I think I remember this did turn out to be a memory error. ECC memory is a wonderful thing. Quote Link to comment
JorgeB Posted February 3, 2017 Share Posted February 3, 2017 The second "fix" was correct, it was correctly undoing the first fix. I think I remember this did turn out to be a memory error. ECC memory is a wonderful thing. Didn't think of that, but it makes sense, OP time to run memtest. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.