DocHodges Posted August 27, 2021 Share Posted August 27, 2021 Hey guys been using Unraid for a few years and never had an issue. Recently (around 3 months ago) I began seeing large amounts (Currently 42000+) of parity errors. I would run the parity again and it was hit or miss if it finished the second time with errors remaining. Now days it seems that the errors come on every single parity check. I have no idea if its bad memory failing drive or failing LSI SAS controller. I do not have replacements for the SAS controller nor memory to test so I am wondering if anyone would mind taking a look at my diagnostics to see what they see that may help guide me to the solution of my problem. I'm at my wits ends with this one. tower-diagnostics-20210827-0733.zip Quote Link to comment
JorgeB Posted August 27, 2021 Share Posted August 27, 2021 Start by running memtest from the Unraid flash drive boot menu (need to boot in legacy/CSM mode, it won't work with UEFI boot). Quote Link to comment
DocHodges Posted August 27, 2021 Author Share Posted August 27, 2021 1 minute ago, JorgeB said: Start by running memtest from the Unraid flash drive boot menu (need to boot in legacy/CSM mode, it won't work with UEFI boot). Thanks for the quick reply. I will start running that now and get back with results Quote Link to comment
DocHodges Posted August 27, 2021 Author Share Posted August 27, 2021 (edited) 24 minutes ago, JorgeB said: Start by running memtest from the Unraid flash drive boot menu (need to boot in legacy/CSM mode, it won't work with UEFI boot). Got it running. Is it normal for the memory to be reading in wrong? Says the timings are all wrong. Just want to confirm before letting it run and the test be invalid. Also looks like it’s only running a single core? Edited August 27, 2021 by DocHodges Quote Link to comment
JorgeB Posted August 27, 2021 Share Posted August 27, 2021 It's normal with very recent CPUs, should still find a major problem. Quote Link to comment
DocHodges Posted August 29, 2021 Author Share Posted August 29, 2021 On 8/27/2021 at 9:32 AM, JorgeB said: It's normal with very recent CPUs, should still find a major problem. Ok so I let it run while I was out of town. Came back to 47 hours running but it appears as though no errors. Quote Link to comment
JorgeB Posted August 29, 2021 Share Posted August 29, 2021 Boot Unraid, run two consecutive correcting parity checks and post new diags. Quote Link to comment
DocHodges Posted August 29, 2021 Author Share Posted August 29, 2021 11 minutes ago, JorgeB said: Boot Unraid, run two consecutive correcting parity checks and post new diags. First off I want to thank you for your support as this has been driving me crazy trying to figure out the root cause. I am running the first parity check now. Typically takes around 12-14 hours so may be a minute before I’ve got the logs but will post them as soon as I get both correction checks completed Quote Link to comment
trurl Posted August 29, 2021 Share Posted August 29, 2021 5 hours ago, JorgeB said: run two consecutive correcting parity checks and post new diags. without any reboots. Quote Link to comment
DocHodges Posted August 30, 2021 Author Share Posted August 30, 2021 22 hours ago, DocHodges said: First off I want to thank you for your support as this has been driving me crazy trying to figure out the root cause. I am running the first parity check now. Typically takes around 12-14 hours so may be a minute before I’ve got the logs but will post them as soon as I get both correction checks completed Ok so the first parity check finished without errors. Sometimes it will do that. Now to clarify I do fully understand that on any shit down it’s possible to have errors and a correcting parity check will need to be ran. That said when I get the thousands of errors I do not shut down the PC nor has it been shut down. I will continue to recreate the issue and report back after the second parity check is completed. since this one did complete without errors I started thinking. Could it be that the windows VM could cause the issue? I have a drive passed through specifically for the VM to run at near bare metal speeds. During the last parity check I did not have the VM running. Most of the time I do. I am beginning to wonder if there is a correlation between the VM running and the tons of parity errors. Any backing to this thinking? Quote Link to comment
JorgeB Posted August 30, 2021 Share Posted August 30, 2021 26 minutes ago, DocHodges said: I have a drive passed through specifically for the VM to run at near bare metal speeds. Is this an array drive? Quote Link to comment
DocHodges Posted August 30, 2021 Author Share Posted August 30, 2021 6 minutes ago, JorgeB said: Is this an array drive? No it’s not mounted to the array. It’s a drive I previously used as an SSD cache but ended up buying a larger drive so that drive was used to pass through to the vm. I followed one of spaceinvaders tutorials. To be honest I’m still learning as much as I can about all of this. Quote Link to comment
JorgeB Posted August 30, 2021 Share Posted August 30, 2021 9 minutes ago, DocHodges said: No it’s not mounted to the array. Then it's not parity protected. Quote Link to comment
DocHodges Posted August 30, 2021 Author Share Posted August 30, 2021 10 minutes ago, JorgeB said: Then it's not parity protected. Ok thank you for the info. 11 minutes ago, JorgeB said: Then it's not parity protected. Thank you for the clarification. I figured as much. I guess my line of thinking was by applying some logical cores to the VM and dedicated memory maybe something was happening there but you are right I am prob just trying to find a correlation that doesn’t exist. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.