Parity check error and docker updates not available

omartian · May 26, 2021

Hi Everyone-

Ran my monthly parity check (for the first time where i unchecked "write corrections to parity") and came up with 9 errors.

Last 2 months, i had 1 error each and the month before that I had 9.

Don't recall any unclean shutdowns in the last 30 days. I occasionally get a message about once a week from unraid stating, "the connection to your UPS has been restored" for my APC UPS. Haven't had power go out in months.

When i look under the main tab, all of my disks have 0 errors.

Is this something to be worried about? Any idea how i identify which files/disks are involved?

Attached is my diagnostics.

I'm currently on 6.8.3 and haven't updated to 6.9.2

The other issue is my krusader and plex docker programs won't let me update them. It states "version not available". I've restarted both dockers and clicked the check for updates box. Any idea? Are the 2 problems somehow related?

Thanks in advance.

nasgard-diagnostics-20210526-1506.zip

JorgeB · May 27, 2021

You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here.

ChatNoir · May 27, 2021

Under 6.8, the "not available" part is normal, Docker changed stuff on their side.

The best solution is to upgrade to 6.9.

If not possible, you can find a workaround there. (also look for the go-file if you want it to survive a reboot)

omartian · May 27, 2021

7 hours ago, JorgeB said:

You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here.

That's strange. I updated the bios a few months ago but don't recall manually doing it.

I'll take a look and see. Hoping that's the issue.

How would I identify what the 9 errors are? Should I click write corrections to parity for my next check?

Edited May 27, 2021 by omartian

omartian · May 27, 2021

5 hours ago, ChatNoir said:

Under 6.8, the "not available" part is normal, Docker changed stuff on their side.

The best solution is to upgrade to 6.9.

If not possible, you can find a workaround there. (also look for the go-file if you want it to survive a reboot)

Awesome.

I'll just upgrade. Thank you.

omartian · May 27, 2021

10 hours ago, JorgeB said:

You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here.

Checked in the Bios and XMP was enabled for my ram. I disabled the xmp profile and took speeds from 3200 to 2100hz. Hopefully that won't affect my plex server performance. Will try doing a correcting check now, hoping for only 9 errors.

Edited May 27, 2021 by omartian
error

JorgeB · May 27, 2021

First check after fixing the problem can still find errors, but after that it should always be 0 errors, which is the only acceptable number of sync errors.

omartian · June 1, 2021

On 5/27/2021 at 1:43 PM, JorgeB said:

First check after fixing the problem can still find errors, but after that it should always be 0 errors, which is the only acceptable number of sync errors.

So i just ran two parity checks. after switching off xmp on my ram, i ran a non-correcting check and only got 2 errors which i thought was weird since i was expecting the 9 from before.

I attached that diagnostic.

I then went to my server and re-seated all my sata cables and my sata-sas adapter (LSI SAS 9207-8i SATA/SAS 6Gb/s PCI-E 3.0 Host Bus Adapter IT Mode SAS9207-8i US).

Decided to run another non-correcting check, and received 6 errors. attached below.

I noticed that when the process is running, i get about 65% of the way through w/0 errors. It seems like when i get to disk 6 + 7 (or maybe just 7), the parity errors occur. Do you think it might be that connector or that disk based on the diagnostics?

2 errors noncorrecting check.zip 6 errors noncorrecting check.zip

Edited June 1, 2021 by omartian
updated

itimpi · June 1, 2021

The first set of diagnostics shows the parity check starting but only lasts for s few minutes more so no indication of what sectors had the problem. Are you sure they were taken AFTER the 2 errors occurred?

the second diagnostics sdlog shows which sectors had the problem as it has these entries

May 31 13:51:10 Nasgard kernel: md: recovery thread: Q incorrect, sector=18795850688
May 31 14:32:14 Nasgard dhcpcd[1903]: br0: failed to renew DHCP, rebinding
May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdj
May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdh
May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdi
May 31 15:53:47 Nasgard kernel: md: recovery thread: Q incorrect, sector=20466240808
May 31 18:14:49 Nasgard kernel: md: recovery thread: Q incorrect, sector=22417914176
May 31 19:46:33 Nasgard kernel: md: recovery thread: Q incorrect, sector=23565254352
May 31 20:12:20 Nasgard kernel: md: recovery thread: Q incorrect, sector=23959071080
May 31 20:38:19 Nasgard emhttpd: spinning down /dev/sdb
May 31 20:38:19 Nasgard emhttpd: spinning down /dev/sdc
May 31 21:30:28 Nasgard kernel: md: recovery thread: Q incorrect, sector=25097677136

indicating the error sectors.

However since there were no corresponding enters in the syslog in the first diagnostics it is not possible to see if there was any correspondence in the sectors reporting errors.

itimpi · June 1, 2021

It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful).

omartian · June 1, 2021

18 minutes ago, itimpi said:

It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful).

Will check out that plugin. Thank you.

Weird. Could have sworn I downloaded the 2 error after a full scan.

Anything else you can make out of those bad sectors?

JorgeB · June 1, 2021

Run memtest.

omartian · June 1, 2021

23 minutes ago, JorgeB said:

Run memtest.

Will do.

Are you seeing anything on the newer diagnostics to indicate a memory issue or is this based on the original diagnostic on the OP?

JorgeB · June 1, 2021

#1 reason for unexpected sync errors is RAM related, if just downclocking didn't fix it it could be a bad DIMM.

omartian · June 2, 2021

On 6/1/2021 at 4:18 AM, JorgeB said:

#1 reason for unexpected sync errors is RAM related, if just downclocking didn't fix it it could be a bad DIMM.

Memtest is currently on pass 9 and has been running for about 20 hrs w/0 errors.

At this point, should i run a correcting check, or is there anything else i should be doing?

JorgeB · June 2, 2021

You can, but problem is still likely there.

omartian · June 2, 2021

1 hour ago, JorgeB said:

You can, but problem is still likely there.

Going to try that parity check tuning plugin next but any other suggestions on next steps?

JorgeB · June 2, 2021

If there are still errors on consecutive checks you basically need to rule out the hardware involved, RAM is still a good candidate even without memtest finding errors, but could also be board/CPU or a disk, I would start by using just one DIMM at at time since it's the easiest thing to rule out.

omartian · June 2, 2021

38 minutes ago, JorgeB said:

If there are still errors on consecutive checks you basically need to rule out the hardware involved, RAM is still a good candidate even without memtest finding errors, but could also be board/CPU or a disk, I would start by using just one DIMM at at time since it's the easiest thing to rule out.

Ok. One dimm at a time on a non correcting check.

Do these sync error mean that the media files on the data disk are no longer valid or just that there is a discrepancy w the parity.

I'm wondering if I fixed this issue but since I never ran a correcting check, the same random sync issues pop up. If I run a correcting now, I'm hoping the next non correcting check would be clean.

I wish unraid made it easier to isolate the issue. Too many variables....

Edited June 2, 2021 by omartian

JorgeB · June 2, 2021

34 minutes ago, omartian said:

Do these sync error mean that the media files on the data disk are no longer valid or just that there is a discrepancy w the parity.

It means parity doesn't match de calculate from the arrays data devices, but with data corruption the problem can be anywhere, could be already written corrupt, could be parity that is wrong, or just the calculation at that time is wrong.

omartian · June 2, 2021

14 minutes ago, JorgeB said:

It means parity doesn't match de calculate from the arrays data devices, but with data corruption the problem can be anywhere, could be already written corrupt, could be parity that is wrong, or just the calculation at that time is wrong.

Thanks for all of your help Jorge. I'll keep tinkering, run a correcting check.

omartian · June 3, 2021

On 6/1/2021 at 2:50 AM, itimpi said:

It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful).

Which sectors should i point it to. I have a hard time

On 6/1/2021 at 2:50 AM, itimpi said:

It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful).

So tried using this plugin, but getting an error message.

Based on the syslog that jorge highlighted above, it looks like the error happens between sectors 18795850000 and 25097678000

When i try to set it to go, i get the error message: "end point too large: The end has been set to more than the size of the disk." i punched in the above #'s as start and endpoint bc it's asking for sector numbers. Do i need to adjust it somehow?

itimpi · June 3, 2021

3 hours ago, omartian said:

Which sectors should i point it to. I have a hard time

At the moment you have to manually search the syslog for the affected sectors (it will typically be near the end if you have just run a check).

I was intending to add a button on the input page that would scan the syslog and pop up a dialog with any sectors found so that it is much easier to both know what sectors are involved and to make it easier to select them. I had been waiting on some feedback on people trying to use this assistant and finding it useful before putting the work in to implement that option. Sounds as if it is definitely going to be wanted

3 hours ago, omartian said:

So tried using this plugin, but getting an error message.

Based on the syslog that jorge highlighted above, it looks like the error happens between sectors 18795850000 and 25097678000

Looks like you added an extra 0 on the end. I will improve the error message to include the acceptable range which might help with picking this up.

Tigerherz · June 3, 2021

I had similar problems.

It was a problem with spin down disks.

I set spin down in disksettings to never

and clear stats on the mainpage.

I think my controller has a problem with spin down.

omartian · June 3, 2021

4 hours ago, Tigerherz said:

I had similar problems.

It was a problem with spin down disks.

I set spin down in disksettings to never

and clear stats on the mainpage.

I think my controller has a problem with spin down.

Do you disable spin down for parity checks or at all times? If your disks are spinning all the time, isn't that bad for longevity?

Also, if you got an error this way, do you run a correcting check afterwards?

Edited June 3, 2021 by omartian

Parity check error and docker updates not available

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation