Machine Check Events detected on your server


Recommended Posts

FCP is telling me that I am having a hardware error due to Machine Check Events detected on your server

 

I downloaded the diag and looked at the syslog.  I see the MCE's but I can't make out heads nor tails of what it means.  Any help?

Aug  3 10:21:24 Tower kernel: mce: CPU supports 9 MCE banks

Aug  3 10:21:24 Tower kernel: mce: CPU supports 9 MCE banks
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR fffff80205102395 MISC fffff80205102395 
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1533306062 SOCKET 0 APIC 0 microcode 22
Aug  3 10:21:24 Tower kernel: Performance Events: PEBS fmt2+, Haswell events, 16-deep LBR, full-width counters, Intel PMU driver.

 

Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 3: be00000000800400
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR fffff80205102395 MISC fffff80205102395 
Aug  3 10:21:24 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1533306062 SOCKET 0 APIC 4 microcode 22

 

tower-diagnostics-20180824-0814.zip

Link to comment

You have a hardware fault that's showing up early in the boot sequence while the multiple cores of your CPU are being initialised.

 

I'd run memcheck (select it from the boot menu or download the newer version and install on its own, separate USB flash and boot from that) for a good long time (48 hours, say) first. If it passes consider reseating the CPU and checking for bent pins. You can install mcelog using the Nerd Pack plugin.

 

Other worthwhile things you can do include checking for a newer BIOS and updating to unRAID 6.5.3.

Link to comment

I have mcelog installed from Nerd Pack.  I have been running it for a while.

 

I can try the memtrest but this is a pretty heavy server running a Win 10 VM that I game with as well as running many dockers including Plex which transcodes.  I would think that if I had a stability issue or a memory issue I would have run in to it by now no?

Link to comment
22 hours ago, Squid said:

Upgrading to 6.5.3 may also help as it will include later microcode updates than your 6.5.0

 

Thanks for the heads up @Squid.  I didn't realize there was a new version out.  I do not like the new update feature of unRAID.  It used to be easy to tell if there was a new version, now I find I never get notified or can see it.

 

I will see if the issues continue with the new version.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.