May 8, 20242 yr I built a new server in Dec of 2023 that has been crashing since built and I can't seem to find the reason. Sometimes it will run for a month, other times its less than an hour. I've run memory tests, recently reduced to a single ram stick so I could rotate them. Server crashes on all 4 when run independently. Latest test I removed the Nivida Drivers and my p4000, ran in safe mode with VM and Docker services off. Ran for just over 7 days. Crash errors are not being saved to the syslog. Here are some of the errors (Pictures), more recent crash i did not get to the server before monitor cut off I guess. I've tried every recommendation I could find in the forums I think. Any assistance or guidance would be appreciated ozunraidold-diagnostics-20240508-1159.zip syslog-192.168.21.90.log
May 8, 20242 yr Community Expert Solution Apr 30 21:15:54 Ozunraid kernel: mce: [Hardware Error]: Machine check events logged Apr 30 21:15:55 Ozunraid kernel: mce: [Hardware Error]: Machine check events logged These all other the other crashes without an apparent reason would suggest a hardware issue, if it's not RAM could be board/CPU.
July 18, 20241 yr Author After swapping around every hardware component, and having a few false hopes of finding a resolution, I eliminated everything except the processor. RMAed the Processor, and it has been running for a week without a crash. It had gotten so bad it was crashing every 1-2 days and I was seeing errors in the logs that where not causing crashes. Logs have been clean in the last 7 days also. Glad to have a reliable server up and running again.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.