CPU Machine Check Error


joeskii

Recommended Posts

Version: Unraid 6.5.3

 

I'm getting this hardware error on my CPU, what is this? I've attached my logs as well

 

Thank you for your help!

Sep  5 12:16:04 Tower kernel: mce: [Hardware Error]: Machine check events logged
Sep  5 12:16:04 Tower kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: cc0004c00001009f
Sep  5 12:16:04 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR c957ef740 MISC 102040800016c4c 
Sep  5 12:16:04 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1536174945 SOCKET 1 APIC 20 microcode a

 

einstein-diagnostics-20180906-0842.zip

Link to comment
Sep  7 18:36:15 einstein kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz (family: 0x6, model: 0x2c, stepping: 0x2)
Sep  7 18:36:15 einstein kernel: Performance Events: PEBS fmt1+, Westmere events, 16-deep LBR, Intel PMU driver.
Sep  7 18:36:15 einstein kernel: core: CPUID marked event: 'bus cycles' unavailable
Sep  7 18:36:15 einstein kernel: ... version:                3
Sep  7 18:36:15 einstein kernel: ... bit width:              48
Sep  7 18:36:15 einstein kernel: ... generic registers:      4
Sep  7 18:36:15 einstein kernel: ... value mask:             0000ffffffffffff
Sep  7 18:36:15 einstein kernel: ... max period:             000000007fffffff
Sep  7 18:36:15 einstein kernel: ... fixed-purpose events:   3
Sep  7 18:36:15 einstein kernel: ... event mask:             000000070000000f
Sep  7 18:36:15 einstein kernel: Hierarchical SRCU implementation.
Sep  7 18:36:15 einstein kernel: smp: Bringing up secondary CPUs ...
Sep  7 18:36:15 einstein kernel: x86: Booting SMP configuration:
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:        #1
Sep  7 18:36:15 einstein kernel: mce: [Hardware Error]: Machine check events logged
Sep  7 18:36:15 einstein kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: cc0004c00001009f
Sep  7 18:36:15 einstein kernel: mce: [Hardware Error]: TSC 0 ADDR c79edeb80 MISC 102040800016040 
Sep  7 18:36:15 einstein kernel: mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1536370556 SOCKET 1 APIC 20 microcode a
Sep  7 18:36:15 einstein kernel: .... node  #0, CPUs:    #2
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:    #3
Sep  7 18:36:15 einstein kernel: .... node  #0, CPUs:    #4
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:    #5
Sep  7 18:36:15 einstein kernel: .... node  #0, CPUs:    #6
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:    #7
Sep  7 18:36:15 einstein kernel: .... node  #0, CPUs:    #8
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:    #9
Sep  7 18:36:15 einstein kernel: .... node  #0, CPUs:   #10
Sep  7 18:36:15 einstein kernel: .... node  #1, CPUs:   #11
Sep  7 18:36:15 einstein kernel: smp: Brought up 2 nodes, 12 CPUs
Sep  7 18:36:15 einstein kernel: smpboot: Total of 12 processors activated (57598.65 BogoMIPS)

Does this mean anything to anyone? I found this in my syslog. the CPU 1: Machine Check: 0 Bank 8: cc0004c00001009f

Link to comment

I have encountered Machine Check Exceptions "mce:" in the past and for me they have always been a failing ECC memory chip.  Memtest says the memory is OK because as far as it is concerned, the memory is working as correct values are being written and verified, but the ECC hardware has had to correct the bits on the chip, raising an MCE exception that Memtest hasn't detected/isn't hooked into.

I've have a machine that is currently generating MCE exceptions and if I run Windows on it, I can't tell it's happening but if I run Linux I can see then errors occasionally.  They don't happen often as the machine has 256GB of ECC RAM so it's not often using the bit of ram that's 'iffy'.

This has been my experience, although there could be other reasons MCE exceptions are being raised. 🤔

  • Like 2
Link to comment
  • 9 months later...
On 9/10/2018 at 11:15 AM, binky said:

I have encountered Machine Check Exceptions "mce:" in the past and for me they have always been a failing ECC memory chip.  Memtest says the memory is OK because as far as it is concerned, the memory is working as correct values are being written and verified, but the ECC hardware has had to correct the bits on the chip, raising an MCE exception that Memtest hasn't detected/isn't hooked into.

I've have a machine that is currently generating MCE exceptions and if I run Windows on it, I can't tell it's happening but if I run Linux I can see then errors occasionally.  They don't happen often as the machine has 256GB of ECC RAM so it's not often using the bit of ram that's 'iffy'.

This has been my experience, although there could be other reasons MCE exceptions are being raised. 🤔

I confirm this! Thank you for your information.

I have a windows workstation and I started noticed in the event logger errors related to whea (every minute), but the system run smooth without crashes, for days. Memtest showed no errors for all the ecc ram modules.

When I installed unraid I noticed errors related to cpu hardware and memory; by removing the faulty ram module both errors don't show anymore.

Link to comment
25 minutes ago, ghost82 said:

I confirm this! Thank you for your information.

I have a windows workstation and I started noticed in the event logger errors related to whea (every minute), but the system run smooth without crashes, for days. Memtest showed no errors for all the ecc ram modules.

When I installed unraid I noticed errors related to cpu hardware and memory; by removing the faulty ram module both errors don't show anymore.

Good to know. I used to have these mce errors on my previous servers but they didn't cause any issue so I ignored them.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.