Jump to content

Unraid 4.6 "Parity is valid" (all green), but still reporting 3 errors?


Recommended Posts

I'm on an old Unraid, version 4.6. I know this is the v5.0 forum, but I didn't know a more appropriate sub-forum to put this in. Moderator(s), please feel free to move this if appropriate. Thanks.

 

For today's monthly parity run, it reported 3 errors. All drives are green. Here are lines from the log file. You can see the scheduled parity check start at 4am, and it appears the 3 errors are occurred back to back 22 seconds after the parity check began.

 

Jul 31 20:30:10 Tower dhcpcd[2628]: dhcpIPaddrLeaseTime=86400 in DHCP server response. (Routine)
Jul 31 20:30:10 Tower dhcpcd[2628]: DHCP_ACK received from (192.168.1.1) (Routine)
Aug 1 04:00:01 Tower kernel: mdcmd (39): check NOCORRECT (unRAID engine)
Aug 1 04:00:01 Tower kernel: (Routine)
Aug 1 04:00:01 Tower kernel: md: recovery thread woken up ... (unRAID engine)
Aug 1 04:00:01 Tower kernel: md: recovery thread checking parity... (unRAID engine)
Aug 1 04:00:01 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. (unRAID engine)
Aug 1 04:00:22 Tower kernel: md: parity incorrect: 7680 (Errors)
Aug 1 04:00:22 Tower kernel: md: parity incorrect: 7688 (Errors)
Aug 1 04:00:22 Tower kernel: md: parity incorrect: 7696 (Errors)
Aug 1 07:11:18 Tower dhcpcd[2628]: sending DHCP_REQUEST for 192.168.1.102 to 192.168.1.1 (Routine)
Aug 1 07:11:18 Tower dhcpcd[2628]: dhcpIPaddrLeaseTime=86400 in DHCP server response. (Routine)
Aug 1 07:11:18 Tower dhcpcd[2628]: DHCP_ACK received from (192.168.1.1) (Routine)
Aug 1 13:13:49 Tower kernel: md: sync done. time=33228sec rate=58791K/sec (unRAID engine)
Aug 1 13:13:49 Tower kernel: md: recovery thread sync completion status: 0 (unRAID engine)

 

By the way, the parity drive does report the following:

 

reallocated_sector_ct=1
reallocated_event_count=1
current_pending_sector=1

 

But it's been showing those 3 values for years, and they've never incremented. I've had 100% successful rebuilds for two failed drives while those three issues have been reported. I've posted about it a few years back here: http://lime-technology.com/forum/index.php?topic=15521.msg145003

 

All my data drives show zero for the following within their quick SMART reports:

reallocated_sector_ct
reallocated_event_count
current_pending_sector
offline_uncorrectable
multi_zone_error_rate

 

I really don't know what 3 errors Unraid is referring to.

 

The last red ball replaced drive was about a month ago, and the follow-up parity check reported no errors. The only out of the ordinary thing I've done since then was power down the drives/computer two to three times in the past 30 days.

 

I've read about running a parity check again and choosing the "correct" option, but it appears that option is unavailable to me because I'm on v4.6 of Unraid. I understand I'd have to upgrade to 4.7 then 5.0 (or above), but I'd like to avoid upgrading for now if possible. I'm not really confident I won't mess things up, and if it's not broke, I want to avoid unnecessarily fixing something.

 

So, these 3 errors? Significant, worth ignoring for now? I'm wondering if the next parity check is going to keep reporting it.

 

Thanks.

Link to comment

Parity check and drive smart values are different things..

 

Errors in the parity check mean that parity was not correct in three cases, this means that should you rebuild a drive using parity that what you rebuild will not be the same as the original drive.. You do not know, and cannot know, what was wrong or where.. Meaning:

 

- You do not know what disk caused the parity error

- You do not know what file caused the parity error

- You do not know if it even was a file or an error in the empty space on the drive..

 

If al your smart reports are ok then it was most likely an unclean shutdown.. Run a correcting parity check and check a couple of days later if  the errors come back, most likely they will not...

Link to comment
Run a correcting parity check and check a couple of days later if  the errors come back, most likely they will not...

 

Helmonder, thanks for responding. With my version of unraid (4.6) I don't see an option to run a correcting parity check. I've attached a screenshot of the admin area to show you what I mean.

 

Thanks.

parity-check-report.png.9c3d5e5a759165fd210fd8fd67e2ffa0.png

Link to comment

I just started another parity check, and the log file at the outset is showing the same three errors. It's showing "3 sync errors":

 

Aug 2 00:57:12 Tower kernel: md: recovery thread woken up ...
Aug 2 00:57:12 Tower kernel: md: recovery thread checking parity...
Aug 2 00:57:12 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks.
Aug 2 00:57:34 Tower kernel: md: parity incorrect: 7680
Aug 2 00:57:34 Tower kernel: md: parity incorrect: 7688
Aug 2 00:57:34 Tower kernel: md: parity incorrect: 7696

 

I stopped the parity check after 15 minutes, as it looks like it's reporting the same 3 errors a minute  or so after starting.

 

If al your smart reports are ok then it was most likely an unclean shutdown.. Run a correcting parity check and check a couple of days later if  the errors come back, most likely they will not...

 

@Helmonder I did want to respond to the first part of this. I don't believe I've experienced any recent unclean shutdowns. I've always stopped the array, then chose power down. The computer is attached to an APC backup battery. It also has a clean auto-power down script that properly shuts down if the APC battery is close to running out of juice. As far as I know, there hasn't been any power outages recently.

 

I was looking at the package manager in unmenu, and it shows the following for the monthly parity check scheduler:

Monthly Parity Check

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

Description: This package installs a script that will schedule a monthly parity check on the 1st of the month at midnight.

Use NOCORRECT if you do not want parity to be automatically updated.

Use CORRECT if you do want it automatically updated. It is recommended that you NOT automatically correct parity, since it might be a data drive that is in error, and the parity drive might be correct.

Challenge is determining which is correct, and which is in error. unRAID normally assumes the data is correct and parity is wrong. Pressing the "Check" button on the web-interface will check AND update parity based on the data disks.

 

So the option to run a CORRECT instead of default NOCORRECT parity check does appear to be available in v4.6 of UNRAID. There's a similar post of something discussing NOCORRECT with their 4.6 setup:

http://lime-technology.com/forum/index.php?topic=11972.0

 

So, it seems clear to me, based on what @Helmonder  originally recommended -- that I run a parity check with "CORRECT" option/flag chosen, but how do I do this? There's no option within the GUI.

 

Is there a command I could execute via command line?

 

Thanks.

Link to comment

Never used 4.6, but I would assume that if the GUI doesn't present an option, then unRAID is doing a correcting parity check, as that has always been Tom's position on correcting vs non-correcting, and for good reasons.

 

Hi @trurl. When I ran the parity check a second time, it reported the same three errors a minute or so after beginning, so it looks like it's doing a non-correcting (e.g., NOCORRECT) parity check by default. It looks like I'm going to have to figure out how to run a "CORRECT" parameter parity check via the command line instead of the GUI. It does look like the option/parameter does exist within v4.6, but it's just not an available option within the GUI. That option/checkbox must have been added as a later option.

 

Thanks.

Link to comment

There is this

Aug  1 04:00:01 Tower kernel: mdcmd (39): check NOCORRECT
Aug  1 04:00:01 Tower kernel: 
Aug  1 04:00:01 Tower kernel: md: recovery thread woken up ...
Aug  1 04:00:01 Tower kernel: md: recovery thread checking parity...
Aug  1 04:00:01 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks.
Aug  1 04:00:22 Tower kernel: md: parity incorrect: 7680
Aug  1 04:00:22 Tower kernel: md: parity incorrect: 7688
Aug  1 04:00:22 Tower kernel: md: parity incorrect: 7696
Aug  1 07:11:18 Tower dhcpcd[2628]: sending DHCP_REQUEST for 192.168.1.102 to 192.168.1.1 
Aug  1 07:11:18 Tower dhcpcd[2628]: dhcpIPaddrLeaseTime=86400 in DHCP server response. 
Aug  1 07:11:18 Tower dhcpcd[2628]: DHCP_ACK received from  (192.168.1.1) 
Aug  1 13:13:49 Tower kernel: md: sync done. time=33228sec rate=58791K/sec
Aug  1 13:13:49 Tower kernel: md: recovery thread sync completion status: 0

which must be the scheduled non-correcting parity check with normal completion status.

 

And this near the end

Aug  2 00:57:12 Tower kernel: mdcmd (51): check CORRECT
Aug  2 00:57:12 Tower kernel: md: recovery thread woken up ...
Aug  2 00:57:12 Tower kernel: md: recovery thread checking parity...
Aug  2 00:57:12 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks.
Aug  2 00:57:34 Tower kernel: md: parity incorrect: 7680
Aug  2 00:57:34 Tower kernel: md: parity incorrect: 7688
Aug  2 00:57:34 Tower kernel: md: parity incorrect: 7696
Aug  2 01:31:46 Tower kernel: NTFS driver 2.1.29 [Flags: R/O MODULE].
Aug  2 01:31:56 Tower unmenu[2369]: which: no bwm-ng in (/bin:/usr/bin:/sbin:/usr/sbin)
Aug  2 01:41:18 Tower kernel: mdcmd (52): nocheck 
Aug  2 01:41:18 Tower kernel: md: md_do_sync: got signal, exit...
Aug  2 01:41:18 Tower kernel: md: recovery thread sync completion status: -4

a correcting check that would have corrected the errors found but not corrected by the previous non-correcting check. This one didn't complete since you stopped it.

 

So, I guess we would have to have another parity check to confirm the parity was corrected.

Link to comment

So, I guess we would have to have another parity check to confirm the parity was corrected.

 

Thanks for catching that. I didn't realize the second time was running the CORRECT parity check.

 

Thanks you @trurl for looking into the syslog and discovering this. I would have never noticed it. I should have followed @Helmonder original recommendation and let the second parity check run its course.

 

I'm running the third parity check right now and will report what occurs after it's done.

 

One thing I've already noticed with this third parity check, is that the GUI is showing "Sync errors: 0"

 

When I ran the second, incomplete parity check this morning it immediately showed "Sync errors: 3", so perhaps that means the second parity check I previously cut short did infact correct the errors and this currently running third check will just confirm that?

 

Thanks again.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...