Jump to content

MCE Errors, Just trying to confirm my suspicions!

Featured Replies

Posted

Hey everyone! My servers been  pretty upset lately, having a bunch of freezes and finally took a look at the MCELog and saw this:

 

Apr  5 08:47:22  kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU           L5640  @ 2.27GHz (family: 0x6, model: 0x2c, stepping: 0x2)
Apr  5 08:47:22  kernel: mce: [Hardware Error]: Machine check events logged
Apr  5 08:47:22  kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: 8c0000400001009f
Apr  5 08:47:22  kernel: mce: [Hardware Error]: TSC 0 ADDR 34dbbd500 MISC 3a40080100021083 
Apr  5 08:47:22  kernel: mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1617626815 SOCKET 0 APIC 0 microcode 1f
Apr  5 08:47:22  kernel: Performance Events: PEBS fmt1+, Westmere events, 16-deep LBR, Intel PMU driver.
Apr  5 08:47:22  kernel: core: CPUID marked event: 'bus cycles' unavailable

 

 

Now I always had a suspicion that CPU was a little weird and I'd have issues every once and a while and the freezes happened before. just recently its been about every other day. this happened at startup and it would point to a faulty CPU? I had ran a memory test a while ago and everything came up good. I originally thought it meant bank 8 of the memory but now realize its talking about the CPU itself! (CPU 0 core 8)

 

When it brings up the other cpu it just passes through like a champ in the logs!. L5640s are pretty cheap so I dont feel bad scooping up another and giving it a go!

 

Thanks!

  • Community Expert

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

  • Community Expert

What makes you think you had a failed drive?

  • Author

Had about 1500 read errors. Tried swapping cables and psu and continued to grow error counts   Unpaid disabled the drive around 3 times. 

Edited by SeveNx7

  • Community Expert

Did you examine the SMART report for the disk?

 

Do you still have the disk?

 

Unraid disables a disk when a write to it fails. Several reasons a write can fail, and most often it isn't due to a bad disk.

  • Community Expert
Apr  5 19:27:20 barkbox root: Memory ECC error occurred during scrub
Apr  5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1
Apr  5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83
Apr  5 19:27:20 barkbox root: Memory DIMM ID of error: 2
Apr  5 19:27:20 barkbox root: Memory channel ID of error: 0

You have memory issues.

 

The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores.  That one can be safely ignored.

  • Author
39 minutes ago, Squid said:

Apr  5 19:27:20 barkbox root: Memory ECC error occurred during scrub
Apr  5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1
Apr  5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83
Apr  5 19:27:20 barkbox root: Memory DIMM ID of error: 2
Apr  5 19:27:20 barkbox root: Memory channel ID of error: 0

You have memory issues.

 

The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores.  That one can be safely ignored.

So if I’m reading that correctly it would be cpu 0’s bank of memory, slot 2?

  • Community Expert

Your System Event Log in the BIOS should also have more info.

  • Author

Yeah. Seems bank 2 was the culprit, as a bonus when I start up now I don't get the first MCE Error I used to get about cpu bank 8.

I'll run some more tests but hopefully that was it!

 

 

Thanks again everyone!

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...