Posted April 6, 20214 yr Hey everyone! My servers been pretty upset lately, having a bunch of freezes and finally took a look at the MCELog and saw this: Apr 5 08:47:22 kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz (family: 0x6, model: 0x2c, stepping: 0x2) Apr 5 08:47:22 kernel: mce: [Hardware Error]: Machine check events logged Apr 5 08:47:22 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: 8c0000400001009f Apr 5 08:47:22 kernel: mce: [Hardware Error]: TSC 0 ADDR 34dbbd500 MISC 3a40080100021083 Apr 5 08:47:22 kernel: mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1617626815 SOCKET 0 APIC 0 microcode 1f Apr 5 08:47:22 kernel: Performance Events: PEBS fmt1+, Westmere events, 16-deep LBR, Intel PMU driver. Apr 5 08:47:22 kernel: core: CPUID marked event: 'bus cycles' unavailable Now I always had a suspicion that CPU was a little weird and I'd have issues every once and a while and the freezes happened before. just recently its been about every other day. this happened at startup and it would point to a faulty CPU? I had ran a memory test a while ago and everything came up good. I originally thought it meant bank 8 of the memory but now realize its talking about the CPU itself! (CPU 0 core When it brings up the other cpu it just passes through like a champ in the logs!. L5640s are pretty cheap so I dont feel bad scooping up another and giving it a go! Thanks!
April 6, 20214 yr Community Expert Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.
April 6, 20214 yr Author dont mind the parity rebuild from a failed drive! barkbox-diagnostics-20210405-2027.zip
April 6, 20214 yr Author Had about 1500 read errors. Tried swapping cables and psu and continued to grow error counts Unpaid disabled the drive around 3 times. Edited April 6, 20214 yr by SeveNx7
April 6, 20214 yr Community Expert Did you examine the SMART report for the disk? Do you still have the disk? Unraid disables a disk when a write to it fails. Several reasons a write can fail, and most often it isn't due to a bad disk.
April 6, 20214 yr Community Expert Apr 5 19:27:20 barkbox root: Memory ECC error occurred during scrub Apr 5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1 Apr 5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83 Apr 5 19:27:20 barkbox root: Memory DIMM ID of error: 2 Apr 5 19:27:20 barkbox root: Memory channel ID of error: 0 You have memory issues. The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores. That one can be safely ignored.
April 6, 20214 yr Author 39 minutes ago, Squid said: Apr 5 19:27:20 barkbox root: Memory ECC error occurred during scrub Apr 5 19:27:20 barkbox root: Memory corrected error count (CORE_ERR_CNT): 1 Apr 5 19:27:20 barkbox root: Memory transaction Tracker ID (RTId): 83 Apr 5 19:27:20 barkbox root: Memory DIMM ID of error: 2 Apr 5 19:27:20 barkbox root: Memory channel ID of error: 0 You have memory issues. The other mce that you referenced is semi-normal, happens to a fair amount of users when the OS initializes the cores. That one can be safely ignored. So if I’m reading that correctly it would be cpu 0’s bank of memory, slot 2?
April 6, 20214 yr Author Yeah. Seems bank 2 was the culprit, as a bonus when I start up now I don't get the first MCE Error I used to get about cpu bank 8. I'll run some more tests but hopefully that was it! Thanks again everyone!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.