Jump to content

Parity check finding 325939093 errors


erikbak
Go to solution Solved by JorgeB,

Recommended Posts

I've set up unRAID on my PC a few years back, everything runs excellent except every time I do a parity check it comes back with a huge amount of errors. I tried swapping the SATA cables out once but the issue still persists. Is there anything else I can try? I'm not sure what else to look for in the diagnostics to point me in the right direction. Would greatly appreciate some guidance. 

phoenix-diagnostics-20220120-0639.zip

Edited by erikbak
spelling
Link to comment

Are these correcting or non-correcting parity checks?   

 

Give us a better idea of how long this condition has existed?  (Six days, six months, six years...)

 

Have you ever run a correcting parity check?

 

Does all of your data seems to be there and does it appear to be correct?

 

Now for a bit of explanation.  Non-correcting parity checks only report that there are errors in the parity calculation.  It does nothing to fix/correct them.  The notification that there is a problem gives the user (you) the opportunity to decide what to about the situation.  Most of the time (~99.9% of the time), if there is not obvious data problem, you would run a correcting parity check to update parity to reflect the state of the array.  Then you run a non-correcting parity check to verify that everything is working as it should.

 

I see that @JorgeB has given you the short version of my post...

Link to comment
1 hour ago, JorgeB said:

Run a correcting check, then run a non correcting one, without rebooting, and post new diags.

Just to confirm, a correcting check would be a parity check with the "write correction to parity" box check marked, correct? Just want to make sure there isn't any other setting I'm missing before starting the process.

Link to comment
1 hour ago, Frank1940 said:

Are these correcting or non-correcting parity checks?   

 

Give us a better idea of how long this condition has existed?  (Six days, six months, six years...)

 

Have you ever run a correcting parity check?

 

Does all of your data seems to be there and does it appear to be correct?

I believe they have all been correcting checks as the "write corrections to parity" box has been check marked this whole time and I have no memory of ever changing this setting.

 

Parity check history screenshot attached. Looking at the numbers, the errors seem to have started in January 2020. I'm trying to remember what settings I may have changed at that time but nothing is coming to mind.

 

All of my data has been there and I haven't noticed any errors in actual use. I use the server for Plex most of the time, as well as a NAS where I connect to it to download and upload files from my computer through the network.

 

Some more info that could be useful:

1x 8TB Parity drive

3x 4TB Data drives

1x 120GB Cache SSD

1x 16GB USB Boot drive

The only dockers I use are Plex, DuckDNS for my Wireguard VPN and Krusader for moving some files around internally.

Also the SMART report for all of my drives has consistently shown a "healthy" status, for whatever thats worth.

Screen Shot 2022-01-20 at 8.42.53 AM.png

Link to comment
49 minutes ago, erikbak said:

believe they have all been correcting checks as the "write corrections to parity" box has been check marked this whole time and I have no memory of ever changing this setting.

Are you talking about this setting:

image.png.a28dfc00271fef90825bb75998afd8f0.png

Or this one:

image.thumb.png.4e0eb36df34242da6fe4f21e262cc775.png

 

As I look at your Parity checks, it appears that you are not using the Scheduler to do regular parity checks.  I also notice that from May 2021 to recently, you have had the same number of errors (basically).  Your data must be alright or you would have noticed it by now and be questioning us about that situation rather than parity errors...

 

I think you need to follow @JorgeB's advise and do the correcting check, followed by a non-correcting one with out shutting down or rebooting your server.  Then upload the Diagnostics file after the second check is completed.  We need a complete history of what happened during these events to figure out what is going on. 

 

EDIT: Run both checks to completion.  Do not stop them before they finish!

Edited by Frank1940
Link to comment
3 hours ago, Frank1940 said:

Are you talking about this setting:

 

Attached are screenshots of both settings, seems like they both are set to write corrections to parity checks.

I will leave these both the way they are and run this first check (correcting check), then run an uncorrected parity check afterwards as recommended. Will follow up with the diagnostics afterwards. Thank you, see you in a few days!

Screen Shot 2022-01-20 at 12.39.06 PM.png

Screen Shot 2022-01-20 at 12.39.20 PM.png

Link to comment

You shouldn't have scheduled parity checks set to correct parity. You don't want an unnoticed problem with hardware to corrupt parity by "correcting" it.

 

The usual recommendation is to only do noncorrecting parity checks, and if that turns out to have parity sync errors, determine the cause if you can and fix it.

 

Parity sync errors must be corrected, but not while you have hardware problems.  The only acceptable number of sync errors is exactly zero.

Link to comment
2 minutes ago, JorgeB said:

memtest

Where can I find memtest? Is it an application or plugin? I haven't done much maintenance on my server besides the initial setup and occasional docker installs, so my experience is below average to say the least. Appreciate the guidance.

Link to comment
On 1/23/2022 at 9:14 AM, Frank1940 said:

It is one of the options on the Unraid boot menu.  I would suggest running it for 24 hours.  Zero errors is the only acceptable result.

Got it. Started running the test a few hours ago, will update tomorrow with the results. Would you need diagnostics following the 24 hour memtest as well or just a screenshot of the memtest menu?

Link to comment
  • Solution

One thing I've remembered, you have two ports using IDE mode:

 

00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller [1002:439c] (rev 40)
    Subsystem: Gigabyte Technology Co., Ltd SB7x0/SB8x0/SB9x0 IDE Controller [1458:5002]
    Kernel driver in use: pata_atiixp
    Kernel modules: pata_atiixp

 

IIRC this can cause sync errors with these AMD chip sets, change those (usually SATA5/6) to AHCI/SATA and try again 2 consecutive checks.

Link to comment
14 hours ago, JorgeB said:

change those (usually SATA5/6) to AHCI/SATA and try again 2 consecutive checks.

Cancelled the memtest and went into the BIOS to change the 2 ports using IDE mode into SATA. Should I run the parity checks the same as before? First parity check as a correcting check, then the second as non correcting? 

Link to comment

Update on second set of checks so far:

1st check (correcting) found & corrected the same amount of errors as before.

2nd check (non correcting) is about halfway through right now and has found 0 errors so far. 

 

It's looking like the IDE setting on my last 2 SATA ports was the culprit. 

Will post final diagnostics after 2nd check is complete just for reference.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...