Kcirtap1423 Posted April 22, 2022 Share Posted April 22, 2022 Fix common problems informed me that an MCE was detected. I'm not familiar with this and it was suggested that I make a post with my diagnostics. Any help on what caused this, fixes, and what to do in the mean time would be appreciated. enterprise-diagnostics-20220422-1747.zip Quote Link to comment
Squid Posted April 23, 2022 Share Posted April 23, 2022 Apr 21 14:09:58 Enterprise kernel: mce: [Hardware Error]: Machine check events logged Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 9: 8c000041000800c0 Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: TSC 5d47b8b4313ff8 Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: ADDR f030c7000 Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: MISC 918c0008000828c Apr 21 14:09:58 Enterprise kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1650568198 SOCKET 1 APIC 20 Apr 21 14:09:58 Enterprise kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 page:0xf030c7 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c0 socket:1 ha:0 channel_mask:1 rank:255) Bad memory. Replace. Your system event log in the BIOS may have more information beyond DIMM #0 as to which is beginning to fail Quote Link to comment
Kcirtap1423 Posted April 23, 2022 Author Share Posted April 23, 2022 (edited) Does this mean the memory as actually gone bad or could it be caused by error correcting. The ram I'm using is ECC. Would this kind of error show up if the module had to actually correct something, or does this only come up if the memory is actually dead or dying? Also what kind of test should i run to make sure I replaced the correct module? Edited April 23, 2022 by Kcirtap1423 Quote Link to comment
Solution Squid Posted April 23, 2022 Solution Share Posted April 23, 2022 I'd say it's bad because it is currently being corrected by virtue of it being ECC. IMO, ECC is purchased so that you know when one is beginning to return errors so that it can then be replaced rather than waiting for the module to get bad enough that it can no longer correct the errors. The SEL should tell you exactly which DIMM it is (or mce says that it's DIMM0, but I'd trust the SEL more than what mce says for identification) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.