Machine Check Events


Recommended Posts

Hello and Happy New Year!

I was having horrible issues with my server on this cheap chinese mobo with an older Xeon e5, so I decided to migrate over to an Asus/Ryzen setup I was previously using on my windows machine.  I have a second unraid server I use to backup my more important media that has been running on random ancient AMD parts I had laying around, and has been bulletproof.  I swapped everything over, plugged in the USB drive, and like magic it booted it worked like a champ instantly!  I was so impressed.  It's much more responsive and I'm now able to remotely stream 4K reliably.  The server was stable through the night into this afternoon.  

 

However, it looks like it must have crashed and rebooted itself this afternoon.  Fix Common Problems showed a "Machine Check Events" warning indicating a hardware problem, and to add the mcelog plugin from Nerd Pack and post diagnostics here.  As I'm typing this, the server froze again and had to force shutdown.  It was a joyous 18 hours or so.  Hopefully someone can let me know what's going on so I can get back up and running.

 

Thanks in advance for your help!

tower-diagnostics-20210109-1347.zip

Link to comment

The mce happened at initialization of the cpu cores and happens with certain combinations of cpu's / motherboards on occasion, and is nothing to be particularly worried about and can be safely ignored.  Alternatively, your server is involved in a conspiracy theory.  Take your pick ;) 

Link to comment

Thanks Squid! That's good news.  But not sure why it froze up again.  I did see a line about receiving it's own packet on BR0 which was a common issue in the last configuration.  It seems this NIC may not handle the 802.3ad aggregation.  I have the identical NIC in my other unraid server on the same switch with the same settings and never have an issue.  I had to undo the bonding, and for now it seems stable.  Is there a way to update or confirm bios compatibility on the NIC?  Any other way to troubleshoot this?

Link to comment
4 hours ago, Squid said:

The mce happened at initialization of the cpu cores and happens with certain combinations of cpu's / motherboards on occasion, and is nothing to be particularly worried about and can be safely ignored.  Alternatively, your server is involved in a conspiracy theory.  Take your pick ;) 

I changed my mind.... it's definitely possessed.  Won't stay up for more than an hour or two at a time now.  I removed just about every plugin and turn off all script schedules.  I keep the log window open since I won't be able to pull a diagnostic log after the fact, and no errors pop up at all before it freezes.  I attached another diagnostics file after this latest reboot, although I don't know if it would be helpful.  

tower-diagnostics-20210109-2014.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.