prymordial Posted February 27, 2022 Share Posted February 27, 2022 (edited) Hey everyone! Fairly new to unRAID and looking for some guidance. My scheduled parity check with correcting enabled found 320 errors. Not sure what kind of errors they are as my disks are all showing 0 errors in the main tab. I'm re-running a noncorrecting check now to validate. Any information or guidance would be greatly appreciated! deathstar-diagnostics-20220227-1625.zip Edited February 27, 2022 by prymordial Quote Link to comment
trurl Posted February 27, 2022 Share Posted February 27, 2022 8 minutes ago, prymordial said: scheduled parity check with correcting You should set the scheduled check to not correct. You only want to correct when there are sync errors that you know aren't caused by some other problem. Don't corrupt parity when another disk has a bad connection, for example. On mobile now so can't look at Diagnostics. Quote Link to comment
trurl Posted February 27, 2022 Share Posted February 27, 2022 You have csum errors on cache. Have you done memtest? Bad RAM might also explain sync errors. You shouldn't even try to run a computer if the RAM isn't perfect. Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 22 minutes ago, trurl said: You have csum errors on cache. Have you done memtest? Bad RAM might also explain sync errors. You shouldn't even try to run a computer if the RAM isn't perfect. I have not. Although, I put 2 new kits of memory in the server about a month ago. I haven't noticed any issues with the server outside of the errors that the parity check called out recently. Quote Link to comment
trurl Posted February 28, 2022 Share Posted February 28, 2022 1 hour ago, prymordial said: I have not You should Quote Link to comment
trurl Posted February 28, 2022 Share Posted February 28, 2022 3 hours ago, prymordial said: found 320 errors Probably because Jan 29 15:43:02 Deathstar emhttpd: unclean shutdown detected Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 1 hour ago, trurl said: You should Will do! And thanks for the help. Do you think that the unclean shutdown could have caused these memory issues? Quote Link to comment
trurl Posted February 28, 2022 Share Posted February 28, 2022 Unclean shutdowns often cause a small number of sync errors. Do you know why you had unclean shutdown? Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 2 hours ago, trurl said: Unclean shutdowns often cause a small number of sync errors. Do you know why you had unclean shutdown? Yes I do. I'm an idiot and cut the power to that section of the house and completely forgot about the server. The second parity check finished this morning and is showing 549 errors now. I just did a clean shutdown to start the memtest. Also, I believe I found what was causing the memory errors. I had done some OCing on my RAM while this mobo was still being used as my gaming rig. I thought I had applied the optimized defaults but I guess not. I reset those back to default, applied XMP and now running a memtest. Will follow up with my findings in the next day or so. Thanks again! Quote Link to comment
itimpi Posted February 28, 2022 Share Posted February 28, 2022 Just now, prymordial said: applied XMP XMP is an over clock. Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 4 minutes ago, itimpi said: XMP is an over clock. I know and I should have clarified a bit more 😅 I had done some manual tuning to the timings and clock speeds on an older set of memory while I was still using those parts in my gaming PC. I then moved those parts over to my new server build. Mid-January (prior to the unclean shutdown), I bought two 32GB kits of memory to add to my server as a replacement to the old 16GB kit. I was in the bios, thought I had hit apply optimized defaults and shut down the server. I dropped in the new kits and thought everything was right as rain. Jumping into the bios again this morning, I discovered those manual timings were still being applied as if it were my previous set of memory. That's when I realized I didn't load the defaults when I replaces the memory, haha. I reset everything back to stock settings in the mobo, made the adjustments I needed (boot order, fan curve, etc) and applied the XMP profile to the current set of memory so that the all the timings were set to the specs that the kit is rated for. Hope that clarifies a bit more about the steps that I've taken to get to this point. Right now, the memtest has been running for about 30min and showing as 16% passed with no errors. I plan on running this test for the 24 hours to validate that everything is ok with my memory. Quote Link to comment
trurl Posted February 28, 2022 Share Posted February 28, 2022 16 minutes ago, prymordial said: cut the power Get an UPS Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 (edited) 4 minutes ago, trurl said: Get an UPS Way ahead of you. I bought a UPS the day that I cut the power like a fucking idiot😆 It's all set up to cleanly shut the server down. I don't remember what I set it to, but I do know that it's all configured and I'm able to monitor it from the web UI Edited February 28, 2022 by prymordial Quote Link to comment
JonathanM Posted February 28, 2022 Share Posted February 28, 2022 1 hour ago, prymordial said: applied the XMP profile to the current set of memory so that the all the timings were set to the specs that the kit is rated for. Many times the RAM is rated for a higher speed than the CPU / chipset is. You should verify that your motherboard / CPU is rated to run at those speeds for that combination of memory size and layout. The motherboard will happily try to run at whatever speeds you tell it, regardless of whether it's stable. Usually it's the CPU that's the speed limiting factor, NOT the memory chips. Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 55 minutes ago, JonathanM said: Many times the RAM is rated for a higher speed than the CPU / chipset is. You should verify that your motherboard / CPU is rated to run at those speeds for that combination of memory size and layout. The motherboard will happily try to run at whatever speeds you tell it, regardless of whether it's stable. Usually it's the CPU that's the speed limiting factor, NOT the memory chips. This is looking more and more to be my issue as I have NEVER run my Ryzen 2700+Asus B450-F with 64GB and all 4 slots filled. After about 90min of memtest, I started getting tons of errors. I stopped the test, dropped the speed to 'Auto' which runs at 2666Mhz. New memtest has been running about 90min now and 0 errors. Fingers crossed I'm on the right path! Thanks again everyone for the help and letting me think out loud Quote Link to comment
JorgeB Posted February 28, 2022 Share Posted February 28, 2022 3 minutes ago, prymordial said: dropped the speed to 'Auto' which runs at 2666Mhz. That's still above max supported speed and known to corrupt data in some cases, see here, for that config it's 2133 or 1866MT/s depending on the number of ranks. Quote Link to comment
prymordial Posted February 28, 2022 Author Share Posted February 28, 2022 Well shit. That shows I have a ton to learn still. Looks like I'm going to need to drop the speed down to 1866 as this Crucial kit is dual rank. Which that being said, @JorgeB should I disable c-states (or look for the applicable setting) just to be on the safe side? Quote Link to comment
JorgeB Posted March 1, 2022 Share Posted March 1, 2022 14 hours ago, prymordial said: should I disable c-states (or look for the applicable setting) just to be on the safe side? Make sure the correct Power Supply Idle Control is set, this way you can still have C-States enable to improve power consumption. Quote Link to comment
prymordial Posted March 2, 2022 Author Share Posted March 2, 2022 Here's where I'm at with this so far - I corrected my memory speed to match 1866 per the link that JorgeB provided. I set the "Power Supply Idle Control" to "Typical Current Idle." Once I set the memory to 1866, I let memtest bake in for another 12 hours and returned 0 errors (I only ran this second round of testing because at 2666 I had no reported errors). Server is back up and found 154 errors with a non-correcting check and now running a correcting check. I've been monitoring the logs as well for any additional csum errors which I have not seen any. I plan to run another non-correcting check this evening to validate that the errors have been resolved. Thanks again for everyone's help! Quote Link to comment
prymordial Posted March 8, 2022 Author Share Posted March 8, 2022 Sorry for the delay y'all but here is where I landed. My latest parity check has reported 0 errors after fixing them. I reduced my memory speed down to 1866 per the link above. Additionally, I believe I found what was causing the csum errors on my cache drive - a corrupted file. I deleted the file from the cache drive and have yet to see any further errors of this nature. I think my server is finally in a good, stable state. Quote Link to comment
JorgeB Posted March 9, 2022 Share Posted March 9, 2022 Corrupted file on cache was likely the result of the same RAM issue, if that's fixed there shouldn't be anymore. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.