adamreid Posted September 14, 2021 Share Posted September 14, 2021 HI there, When I've had failing disks before, unraid would show an error count on the main screen, however after the most recent parity check I've noticed that the system found and fixed 142 errors. I've attached my diagnostics and smart reports but I can't seem to figure out what drive or controller is causing the issue. Could somebody please point me in the right direction? My system is a Ryzen 2600, MSI B450-A Pro, 24gb ram @ 2933, LSI 6Gbps SAS HBA LSI 9200-8i = (9211-8I) IT Mode controller & Asus 1050ti Thanks in advance for your help Unraid issues.zip Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 First we need to know if this was a one time thing or if you keep getting sync errors after they were fixed, so if that was a correcting check run another one, also you're overclocking your RAM and that is known to corrupt data with some Ryzen servers, see here for max officially supported speed for your config. Quote Link to comment
adamreid Posted September 14, 2021 Author Share Posted September 14, 2021 I had 12 errors two weeks ago but nothing before this, ram has been stable in the system since I built it in 2018 but maybe the memory controller is degraded, I'll down clock it. Is there any way to tell from my logs which disks or files were impacted by the errors? Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 57 minutes ago, adamreid said: Is there any way to tell from my logs which disks or files were impacted by the errors? The diags you posted don't show any parity sync, but even if they did it's not possible to know where the errors come from. Quote Link to comment
adamreid Posted September 14, 2021 Author Share Posted September 14, 2021 8 minutes ago, JorgeB said: The diags you posted don't show any parity sync, but even if they did it's not possible to know where the errors come from. Sorry for being a noob, I just followed the guide to download diagnostics, could you please let me know where I could find these logs? Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 Syslog starts over after every reboot. Quote Link to comment
adamreid Posted September 14, 2021 Author Share Posted September 14, 2021 ughhhh, I had to turn the system off (gracefully) last night because one of the breakers in our house tripped so I shut everything down before resetting the RCD, I didn't see the results of my parity check until this afternoon. I'll watch for this happening again and then check syslog. Thank you. Quote Link to comment
trurl Posted September 14, 2021 Share Posted September 14, 2021 7 hours ago, adamreid said: found and fixed 142 errors Why were you running a correcting parity check? You should run non-correcting parity checks until you determine you have sync errors that need to be corrected. You don't want to discover that you have another disk obviously causing problems and corrupting parity because you always run correcting checks. Quote Link to comment
adamreid Posted September 14, 2021 Author Share Posted September 14, 2021 1 hour ago, trurl said: Why were you running a correcting parity check? You should run non-correcting parity checks until you determine you have sync errors that need to be corrected. You don't want to discover that you have another disk obviously causing problems and corrupting parity because you always run correcting checks. You know what I have no idea, I must've set 'Write corrections to parity disk' to yes without thinking about it years ago. Should I set this to no? Quote Link to comment
ChatNoir Posted September 14, 2021 Share Posted September 14, 2021 1 hour ago, adamreid said: Should I set this to no? Yes. Quote Link to comment
adamreid Posted September 17, 2021 Author Share Posted September 17, 2021 (edited) Ok so I set it to no, re-ran the test and my parity1 drive is now showing 493 errors and unraid reports the following: Last check completed on Fri 17 Sep 2021 04:11:25 PM AEST (today) Finding 5789 errors Duration: 19 hours, 7 minutes, 57 seconds. Average speed: 87.1 MB/sec unraid-diagnostics-20210917-1615.zip Could this be from a failing sas to sata cable or a my hba failing? I suppose it could be the disk but just seems random that I've had two fail like this in the last month. At one stage during the check it slowed down to single digits megabytes /sec for a while as well. How should I proceed from here? Edit: I'm a fucking idiot and probably didn't un-tick write corrections to parity when I ran the check, I just un-ticked it in scheduler. Edited September 17, 2021 by adamreid Quote Link to comment
JorgeB Posted September 17, 2021 Share Posted September 17, 2021 RAM is still overclocked, after fixing that run a couple of correcting checks without rebooting and post new diags. On 9/14/2021 at 9:15 AM, JorgeB said: you're overclocking your RAM and that is known to corrupt data with some Ryzen servers, see here for max officially supported speed for your config. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.