Jump to content

(SOLVED) mce: [Hardware Error]: Machine check events logged


Recommended Posts

Can anyone please help me decipher this mce error :(  This is a brand new build so is already acting funny :( 

 

 

241831.858564] mce: [Hardware Error]: Machine check events logged

[241831.858570] mce: [Hardware Error]: Machine check events logged

 

 

root@NAS-UNRAID:~# mcelog

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 5

ADDR 22f43abc0

TIME 1591113315 Tue Jun  2 10:55:15 2020

MCG status:

MCi status:

Corrected error

Error enabled

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR

Transaction: Memory read error

STATUS 9400004000910091 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

MICROCODE 12d

CPUID Vendor Intel Family 6 Model 77

Hardware event. This is not a software error.

MCE 1

CPU 1 BANK 5

ADDR 22f43abc0

TIME 1591113315 Tue Jun  2 10:55:15 2020

MCG status:

MCi status:

Corrected error

Error enabled

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR

Transaction: Memory read error

STATUS 9400004000910091 MCGSTATUS 0

MCGCAP 806 APICID 2 SOCKETID 0

MICROCODE 12d

CPUID Vendor Intel Family 6 Model 77

root@NAS-UNRAID:~#

 

 

nas-unraid-diagnostics-20200602-2313.zip

Edited by johnwhicker
Link to comment

Nope that's pretty much 2 lines in the syslog so I run the mcelog to get more info.  This was buried between some USB/UPS issues I am having. 

 

240021.315170] usb 1-1.1: USB disconnect, device number 10

[240021.487384] usb 1-1.1: new full-speed USB device number 11 using ehci-pci

[240021.572741] hid-generic 0003:0764:0501.0009: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0

 

[241831.858564] mce: [Hardware Error]: Machine check events logged

[241831.858570] mce: [Hardware Error]: Machine check events logged

 

[243974.212849] usb 1-1.1: USB disconnect, device number 11

[243974.386787] usb 1-1.1: new full-speed USB device number 12 using ehci-pci

[243974.471684] hid-generic 0003:0764:0501.000A: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0

Link to comment
On 6/3/2020 at 8:57 AM, johnnie.black said:

I said the board's system event log, usually accessible in the BIOS or over IPMI.

Thanks much Sir. I didn't see anything in the BIOS or IMPI logs.  I even send the IPMI logs to syslog and nothing on this MCE error.

 

That being said I did run an extensive memtest86 Pro test for 24 hours straight and no errors on memory so I guess it was just ECC doing its job during this heavy data set copy?  perhaps ECC corrected some corrupted data?

Link to comment
On 6/8/2020 at 3:24 AM, johnnie.black said:

Memtest won't show errors with ECC RAM, if you can't find more info on the affected DIMM, just remove one at a time and test for a few days, or disable ECC in the BIOS (if that's an option) and run memtest again.

Thanks partner. I run memtest pro version for 2 days and nothing.  I think is ok as I haven't seen that error anymore.  It was just during a heavy copy and mdsum check from drive to drive, about 8TG of data.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...