Lawlanator Posted July 23, 2018 Share Posted July 23, 2018 So I just upgraded my board from an i7 build to a SuperMicro build in a rack mount but since then UnRaid has been hardlocking itself, seemingly at random. It lasted 3 days once, then it was 8 hours, then it was only a few hours (happened while watching Plex) then it lasted almost another day again before it locked up in the middle of the night. The only thing that had been running at the time that I figure would be intensive, or should have been running, was pre-clear on two Disks I'm trying to add to the array. tower-diagnostics-20180723-0213.zip This is the last diagnostics I have before the last crash, it was the latest thing in my syslog file. So I tried to do some digging, as I'm assuming it's a RAM issue. I couldn't get memtest to work on the server (at boot, I'd scroll down to memtest and it'd just keep trying to start and then returning me to the menu) and then I found out ECC ram doesn't actually work well with MemTest so I went into the SEL Log. It was throwing DIMM errors on one slot every minute, single bit errors. Pulled it out, reconfigured, and giving it time to test. Did an extended run of Common Problems and it threw me a MCE error and told me to run for the hills and hope someone smarter then I can help. I've done so, attached here tower-diagnostics-20180723-0612.zip . And when I check into my log it's still throwing memory errors. Mobo: Supermicro - X9DRi-LN4+ CPU: 2x Xeon® CPU E5-2660 v2 @ 2.20GH RAM: Currently 24GB Muti-bit ECC Cache: 1tb ADATA SSD. The new additions are the Mobo, CPU, RAM, LSI-9811, and an SAS Expander. For drives I upgraded the SSD and added a larger Parity drive (the parity built fine, i Just can't clear the other two drives to get added in. Or it just doesn't finish before it locks up) Currently, I'm still trying to run preclear on one of the two HDD"s and it can't get past the starting phase so I've cancelled that. I'm not really sure what else might be the issue besides letting it stay up and testing RAM one by one, it all seems a bit above my google-fu. Any help is mightly appreciated. Quote Link to comment
Vr2Io Posted July 23, 2018 Share Posted July 23, 2018 (edited) Any memory error must fix first. Recently, I add more RAM to 48GB (non-ECC) , I can't boot in memtest+ tool. Anyway I have well test under Windows and after confirm no issue found, just stat unRAID as usual. Edited July 23, 2018 by Benson Quote Link to comment
Lawlanator Posted July 24, 2018 Author Share Posted July 24, 2018 (edited) So I took it down to just 2 RAM for the time being (as I didn't know if it'd be able to function with 2 CPU's without the two RAM). I didn't see any memory log errors like before but my server once again froze up and was unresponsive at its physical location necessitating a hard-power reboot to turn off. So any ideas? I pulled the diagnostics down from it again and can't seem to see what the issue might be...tower-diagnostics-20180724-0340.zip At this point I'm debating installing on a new USB and seeing if it just locks up over time, at least then I'd know it was definitely hardware but because it bricks up I'm not sure where to go looking to find the issue. I can't just sit and stare at the log all day and hope it pops up. I've set Fix Common Problems back in Troubleshoot Mode so I hope it catches something this time. Edited July 24, 2018 by Lawlanator Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.