erikbak Posted January 20, 2022 Share Posted January 20, 2022 (edited) I've set up unRAID on my PC a few years back, everything runs excellent except every time I do a parity check it comes back with a huge amount of errors. I tried swapping the SATA cables out once but the issue still persists. Is there anything else I can try? I'm not sure what else to look for in the diagnostics to point me in the right direction. Would greatly appreciate some guidance. phoenix-diagnostics-20220120-0639.zip Edited January 20, 2022 by erikbak spelling Quote Link to comment
JorgeB Posted January 20, 2022 Share Posted January 20, 2022 Run a correcting check, then run a non correcting one, without rebooting, and post new diags. Quote Link to comment
Frank1940 Posted January 20, 2022 Share Posted January 20, 2022 Are these correcting or non-correcting parity checks? Give us a better idea of how long this condition has existed? (Six days, six months, six years...) Have you ever run a correcting parity check? Does all of your data seems to be there and does it appear to be correct? Now for a bit of explanation. Non-correcting parity checks only report that there are errors in the parity calculation. It does nothing to fix/correct them. The notification that there is a problem gives the user (you) the opportunity to decide what to about the situation. Most of the time (~99.9% of the time), if there is not obvious data problem, you would run a correcting parity check to update parity to reflect the state of the array. Then you run a non-correcting parity check to verify that everything is working as it should. I see that @JorgeB has given you the short version of my post... Quote Link to comment
trurl Posted January 20, 2022 Share Posted January 20, 2022 With such a large number, my guess is you did something that invalidated parity, such as New Config / Trust Parity when it wasn't appropriate. 45 minutes ago, JorgeB said: Run a correcting check, then run a non correcting one, without rebooting, and post new diags. Quote Link to comment
erikbak Posted January 20, 2022 Author Share Posted January 20, 2022 1 hour ago, JorgeB said: Run a correcting check, then run a non correcting one, without rebooting, and post new diags. Just to confirm, a correcting check would be a parity check with the "write correction to parity" box check marked, correct? Just want to make sure there isn't any other setting I'm missing before starting the process. Quote Link to comment
erikbak Posted January 20, 2022 Author Share Posted January 20, 2022 1 hour ago, Frank1940 said: Are these correcting or non-correcting parity checks? Give us a better idea of how long this condition has existed? (Six days, six months, six years...) Have you ever run a correcting parity check? Does all of your data seems to be there and does it appear to be correct? I believe they have all been correcting checks as the "write corrections to parity" box has been check marked this whole time and I have no memory of ever changing this setting. Parity check history screenshot attached. Looking at the numbers, the errors seem to have started in January 2020. I'm trying to remember what settings I may have changed at that time but nothing is coming to mind. All of my data has been there and I haven't noticed any errors in actual use. I use the server for Plex most of the time, as well as a NAS where I connect to it to download and upload files from my computer through the network. Some more info that could be useful: 1x 8TB Parity drive 3x 4TB Data drives 1x 120GB Cache SSD 1x 16GB USB Boot drive The only dockers I use are Plex, DuckDNS for my Wireguard VPN and Krusader for moving some files around internally. Also the SMART report for all of my drives has consistently shown a "healthy" status, for whatever thats worth. Quote Link to comment
Frank1940 Posted January 20, 2022 Share Posted January 20, 2022 (edited) 49 minutes ago, erikbak said: believe they have all been correcting checks as the "write corrections to parity" box has been check marked this whole time and I have no memory of ever changing this setting. Are you talking about this setting: Or this one: As I look at your Parity checks, it appears that you are not using the Scheduler to do regular parity checks. I also notice that from May 2021 to recently, you have had the same number of errors (basically). Your data must be alright or you would have noticed it by now and be questioning us about that situation rather than parity errors... I think you need to follow @JorgeB's advise and do the correcting check, followed by a non-correcting one with out shutting down or rebooting your server. Then upload the Diagnostics file after the second check is completed. We need a complete history of what happened during these events to figure out what is going on. EDIT: Run both checks to completion. Do not stop them before they finish! Edited January 20, 2022 by Frank1940 Quote Link to comment
itimpi Posted January 20, 2022 Share Posted January 20, 2022 Even if you do not use any of its features, it might be worth installing the Parity Check Tuning plugin as that will start enhancing the Parity History entries with the type of check that was run. Quote Link to comment
erikbak Posted January 20, 2022 Author Share Posted January 20, 2022 3 hours ago, Frank1940 said: Are you talking about this setting: Attached are screenshots of both settings, seems like they both are set to write corrections to parity checks. I will leave these both the way they are and run this first check (correcting check), then run an uncorrected parity check afterwards as recommended. Will follow up with the diagnostics afterwards. Thank you, see you in a few days! Quote Link to comment
erikbak Posted January 20, 2022 Author Share Posted January 20, 2022 2 hours ago, itimpi said: it might be worth installing the Parity Check Tuning plugin Thanks for the tip, will look into this after I have this error situation sorted out. Quote Link to comment
trurl Posted January 20, 2022 Share Posted January 20, 2022 You shouldn't have scheduled parity checks set to correct parity. You don't want an unnoticed problem with hardware to corrupt parity by "correcting" it. The usual recommendation is to only do noncorrecting parity checks, and if that turns out to have parity sync errors, determine the cause if you can and fix it. Parity sync errors must be corrected, but not while you have hardware problems. The only acceptable number of sync errors is exactly zero. Quote Link to comment
erikbak Posted January 20, 2022 Author Share Posted January 20, 2022 3 minutes ago, trurl said: You shouldn't have scheduled parity checks set to correct parity. I wasn't aware of this, thanks for shedding some light on the topic for me. I'll do as recommended from here on out. Quote Link to comment
erikbak Posted January 23, 2022 Author Share Posted January 23, 2022 On 1/20/2022 at 7:03 AM, JorgeB said: Run a correcting check, then run a non correcting one, without rebooting, and post new diags. Circling back with updated diagnostics. Seems to have found the same amount of errors on both checks. Any suggested next action? phoenix-diagnostics-20220123-0800.zip Quote Link to comment
JorgeB Posted January 23, 2022 Share Posted January 23, 2022 There's clearly a hardware issue there, start by running memtest, if that doesn't find any issues board/controller would be my next suspect. Quote Link to comment
erikbak Posted January 23, 2022 Author Share Posted January 23, 2022 2 minutes ago, JorgeB said: memtest Where can I find memtest? Is it an application or plugin? I haven't done much maintenance on my server besides the initial setup and occasional docker installs, so my experience is below average to say the least. Appreciate the guidance. Quote Link to comment
Frank1940 Posted January 23, 2022 Share Posted January 23, 2022 1 hour ago, erikbak said: Where can I find memtest? It is one of the options on the Unraid boot menu. I would suggest running it for 24 hours. Zero errors is the only acceptable result. Quote Link to comment
erikbak Posted January 25, 2022 Author Share Posted January 25, 2022 On 1/23/2022 at 9:14 AM, Frank1940 said: It is one of the options on the Unraid boot menu. I would suggest running it for 24 hours. Zero errors is the only acceptable result. Got it. Started running the test a few hours ago, will update tomorrow with the results. Would you need diagnostics following the 24 hour memtest as well or just a screenshot of the memtest menu? Quote Link to comment
Frank1940 Posted January 25, 2022 Share Posted January 25, 2022 (edited) Basically, get the diagnostics file for uploading if requested. In the same vein, capture in a screenshot of anything that you think displays something that might help in diagnosing your problem. It is much better to have collected too much data than not enough... Edited January 25, 2022 by Frank1940 Quote Link to comment
Solution JorgeB Posted January 25, 2022 Solution Share Posted January 25, 2022 One thing I've remembered, you have two ports using IDE mode: 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller [1002:439c] (rev 40) Subsystem: Gigabyte Technology Co., Ltd SB7x0/SB8x0/SB9x0 IDE Controller [1458:5002] Kernel driver in use: pata_atiixp Kernel modules: pata_atiixp IIRC this can cause sync errors with these AMD chip sets, change those (usually SATA5/6) to AHCI/SATA and try again 2 consecutive checks. Quote Link to comment
erikbak Posted January 25, 2022 Author Share Posted January 25, 2022 Update: Been running memtest for almost 24 hours now with 0 errors. Any opinions on if I should let it keep running at this point? The next thing I'm going to try is @JorgeB's suggestion if I'm good to close the memtest at this point. Quote Link to comment
Frank1940 Posted January 25, 2022 Share Posted January 25, 2022 If it hasn't found any errors by this point, memory is probably not the problem. Quote Link to comment
erikbak Posted January 25, 2022 Author Share Posted January 25, 2022 14 hours ago, JorgeB said: change those (usually SATA5/6) to AHCI/SATA and try again 2 consecutive checks. Cancelled the memtest and went into the BIOS to change the 2 ports using IDE mode into SATA. Should I run the parity checks the same as before? First parity check as a correcting check, then the second as non correcting? Quote Link to comment
erikbak Posted January 25, 2022 Author Share Posted January 25, 2022 Thanks, started checks now. Will be back in a few days again. Hopefully this fixes it! Quote Link to comment
erikbak Posted January 27, 2022 Author Share Posted January 27, 2022 Update on second set of checks so far: 1st check (correcting) found & corrected the same amount of errors as before. 2nd check (non correcting) is about halfway through right now and has found 0 errors so far. It's looking like the IDE setting on my last 2 SATA ports was the culprit. Will post final diagnostics after 2nd check is complete just for reference. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.