Jump to content

MCE Errors, Just trying to confirm my suspicions!


Recommended Posts

Hey everyone! My servers been  pretty upset lately, having a bunch of freezes and finally took a look at the MCELog and saw this:

 

Apr  5 08:47:22  kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU           L5640  @ 2.27GHz (family: 0x6, model: 0x2c, stepping: 0x2)
Apr  5 08:47:22  kernel: mce: [Hardware Error]: Machine check events logged
Apr  5 08:47:22  kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: 8c0000400001009f
Apr  5 08:47:22  kernel: mce: [Hardware Error]: TSC 0 ADDR 34dbbd500 MISC 3a40080100021083 
Apr  5 08:47:22  kernel: mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1617626815 SOCKET 0 APIC 0 microcode 1f
Apr  5 08:47:22  kernel: Performance Events: PEBS fmt1+, Westmere events, 16-deep LBR, Intel PMU driver.
Apr  5 08:47:22  kernel: core: CPUID marked event: 'bus cycles' unavailable

 

 

Now I always had a suspicion that CPU was a little weird and I'd have issues every once and a while and the freezes happened before. just recently its been about every other day. this happened at startup and it would point to a faulty CPU? I had ran a memory test a while ago and everything came up good. I originally thought it meant bank 8 of the memory but now realize its talking about the CPU itself! (CPU 0 core 8)

 

When it brings up the other cpu it just passes through like a champ in the logs!. L5640s are pretty cheap so I dont feel bad scooping up another and giving it a go!

 

Thanks!

Link to comment
Apr  5 19:27:20 barkbox root: Memory ECC error occurred during scrub
Apr  5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1
Apr  5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83
Apr  5 19:27:20 barkbox root: Memory DIMM ID of error: 2
Apr  5 19:27:20 barkbox root: Memory channel ID of error: 0

You have memory issues.

 

The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores.  That one can be safely ignored.

Link to comment
39 minutes ago, Squid said:

Apr  5 19:27:20 barkbox root: Memory ECC error occurred during scrub
Apr  5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1
Apr  5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83
Apr  5 19:27:20 barkbox root: Memory DIMM ID of error: 2
Apr  5 19:27:20 barkbox root: Memory channel ID of error: 0

You have memory issues.

 

The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores.  That one can be safely ignored.

So if I’m reading that correctly it would be cpu 0’s bank of memory, slot 2?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...