DivideBy0 Posted June 3, 2020 Share Posted June 3, 2020 (edited) Can anyone please help me decipher this mce error This is a brand new build so is already acting funny 241831.858564] mce: [Hardware Error]: Machine check events logged [241831.858570] mce: [Hardware Error]: Machine check events logged root@NAS-UNRAID:~# mcelog Hardware event. This is not a software error. MCE 0 CPU 0 BANK 5 ADDR 22f43abc0 TIME 1591113315 Tue Jun 2 10:55:15 2020 MCG status: MCi status: Corrected error Error enabled MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error STATUS 9400004000910091 MCGSTATUS 0 MCGCAP 806 APICID 0 SOCKETID 0 MICROCODE 12d CPUID Vendor Intel Family 6 Model 77 Hardware event. This is not a software error. MCE 1 CPU 1 BANK 5 ADDR 22f43abc0 TIME 1591113315 Tue Jun 2 10:55:15 2020 MCG status: MCi status: Corrected error Error enabled MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error STATUS 9400004000910091 MCGSTATUS 0 MCGCAP 806 APICID 2 SOCKETID 0 MICROCODE 12d CPUID Vendor Intel Family 6 Model 77 root@NAS-UNRAID:~# nas-unraid-diagnostics-20200602-2313.zip Edited June 14, 2020 by johnwhicker Quote Link to comment
JorgeB Posted June 3, 2020 Share Posted June 3, 2020 That looks like an ECC RAM corrected error, there might be more information on the board's SEL (system event log) Quote Link to comment
DivideBy0 Posted June 3, 2020 Author Share Posted June 3, 2020 Nope that's pretty much 2 lines in the syslog so I run the mcelog to get more info. This was buried between some USB/UPS issues I am having. 240021.315170] usb 1-1.1: USB disconnect, device number 10 [240021.487384] usb 1-1.1: new full-speed USB device number 11 using ehci-pci [240021.572741] hid-generic 0003:0764:0501.0009: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [241831.858564] mce: [Hardware Error]: Machine check events logged [241831.858570] mce: [Hardware Error]: Machine check events logged [243974.212849] usb 1-1.1: USB disconnect, device number 11 [243974.386787] usb 1-1.1: new full-speed USB device number 12 using ehci-pci [243974.471684] hid-generic 0003:0764:0501.000A: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 Quote Link to comment
JorgeB Posted June 3, 2020 Share Posted June 3, 2020 I said the board's system event log, usually accessible in the BIOS or over IPMI. Quote Link to comment
DivideBy0 Posted June 7, 2020 Author Share Posted June 7, 2020 On 6/3/2020 at 8:57 AM, johnnie.black said: I said the board's system event log, usually accessible in the BIOS or over IPMI. Thanks much Sir. I didn't see anything in the BIOS or IMPI logs. I even send the IPMI logs to syslog and nothing on this MCE error. That being said I did run an extensive memtest86 Pro test for 24 hours straight and no errors on memory so I guess it was just ECC doing its job during this heavy data set copy? perhaps ECC corrected some corrupted data? Quote Link to comment
JorgeB Posted June 8, 2020 Share Posted June 8, 2020 Memtest won't show errors with ECC RAM, if you can't find more info on the affected DIMM, just remove one at a time and test for a few days, or disable ECC in the BIOS (if that's an option) and run memtest again. Quote Link to comment
DivideBy0 Posted June 12, 2020 Author Share Posted June 12, 2020 On 6/8/2020 at 3:24 AM, johnnie.black said: Memtest won't show errors with ECC RAM, if you can't find more info on the affected DIMM, just remove one at a time and test for a few days, or disable ECC in the BIOS (if that's an option) and run memtest again. Thanks partner. I run memtest pro version for 2 days and nothing. I think is ok as I haven't seen that error anymore. It was just during a heavy copy and mdsum check from drive to drive, about 8TG of data. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.