Help with Machine Check Events detected on your server from Fix Common Problems


Recommended Posts

Looks like a memory issue

 

Quote

Mar 26 04:30:31 TowerOfPower root: Fix Common Problems: Error: Machine Check Events detected on your server
Mar 26 04:30:31 TowerOfPower root: Hardware event. This is not a software error.
Mar 26 04:30:31 TowerOfPower root: MCE 0
Mar 26 04:30:31 TowerOfPower root: CPU 6 BANK 8 
Mar 26 04:30:31 TowerOfPower root: MISC 5cfbfb0000006040 ADDR c2c340b40 
Mar 26 04:30:43 TowerOfPower root: TIME 1490441889 Sat Mar 25 04:38:09 2017
Mar 26 04:30:43 TowerOfPower root: MCG status:
Mar 26 04:30:43 TowerOfPower root: MCi status:
Mar 26 04:30:43 TowerOfPower root: Corrected error
Mar 26 04:30:43 TowerOfPower root: MCi_MISC register valid
Mar 26 04:30:43 TowerOfPower root: MCi_ADDR register valid
Mar 26 04:30:43 TowerOfPower root: MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Mar 26 04:30:43 TowerOfPower root: Transaction: Memory read error
Mar 26 04:30:43 TowerOfPower root: Memory read ECC error

Mar 26 04:30:43 TowerOfPower root: Memory corrected error count (CORE_ERR_CNT): 1
Mar 26 04:30:43 TowerOfPower root: Memory transaction Tracker ID (RTId): 40
Mar 26 04:30:43 TowerOfPower root: Memory DIMM ID of error: 0
Mar 26 04:30:43 TowerOfPower root: Memory channel ID of error: 0
Mar 26 04:30:43 TowerOfPower root: Memory ECC syndrome: 5cfbfb00
Mar 26 04:30:43 TowerOfPower root: STATUS 8c0000400001009f MCGSTATUS 0
Mar 26 04:30:43 TowerOfPower root: MCGCAP 1c09 APICID 20 SOCKETID 1 
Mar 26 04:30:43 TowerOfPower root: CPUID Vendor Intel Family 6 Model 44

 

( @RobJ - FYI, FCP added output last month from mcelog if it's installed , and I'm impressed with the detail that it gives )

Edited by Squid
Link to comment
25 minutes ago, Squid said:

( @RobJ - FYI, FCP added output last month from mcelog if it's installed , and I'm impressed with the detail that it gives )

 

I saw you had added it, was glad to see it, and yes, I've been very pleased with the reporting it gives you, most of the time.  I don't remember it, but I believe I've seen a report or 2 of other hardware subsystem MCE issues that were too cryptic for me, and couldn't find good advice online.  But mostly it's great.

 

SDguy, I believe it's reporting that your ECC RAM detected and corrected a memory error, so this looks harmless.  But the fact it's having them, could possibly be related to your lockups in the past, a memory error that *couldn't* be corrected.  You may want to try the PassMark Memtest on your RAM (it has ECC RAM support).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.