July 30, 20223 yr Hi all, thank you for your help! My unraid build hasn't been the most stable, since building it last week, it has gone unresponsive twice. I'm experimenting with turning off "Aggressive Link Power Management" which might help. But, I've been receiving these "Machine Check Error" warnings, which may be the true source of the flakes. I first noticed the warning yesterday, and then a new one popped up over night during a parity build. (in case the mcelog data isn't included, I copied output here - https://pastebin.com/V4UzpeL3) donatobox-diagnostics-20220730-1012.zip Edited July 30, 20223 yr by donatobox
July 30, 20223 yr But, mcelog is reporting memory errors SOCKET 0 CHANNEL 0 DIMM any corrected memory errors: 1 total 1 in 24h and you should replace the dimm
August 3, 20223 yr Author I replaced the dimm, but keep getting the same warning for the same dimm slot. I'm going to try using a different slot next (even though it's not recommended to skip the first slot in the supermicro manual.
August 3, 20223 yr Maybe try to re-socket the CPU and check the pins. A bad contact can sometimes RAM issues as the memory controller is in CPU. Or the MB slot is somehow faulty ?
August 4, 20223 yr Author It's been about 24 hrs without any MCE errors, so I'm hopeful that using a different slot fixed the problem. But I'll keep checking for a few more days before I get complacent
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.