May 7, 20188 yr All, I'm getting the following (flagged by FCP) in my syslog: May 6 22:37:21 unServer kernel: mce: [Hardware Error]: Machine check events logged May 6 22:37:21 unServer kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR May 6 22:37:21 unServer kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 9: 8c000041000800c0 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: TSC 397c3a0867fb60 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: ADDR 1d0543000 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: MISC 90000004000428c May 6 22:37:21 unServer kernel: EDAC sbridge MC0: PROCESSOR 0:50662 TIME 1525646241 SOCKET 0 APIC 0 May 6 22:37:21 unServer kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1d0543 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c0 socket:0 ha:0 channel_mask:1 rank:0) They are occuring about every 1-3 days for about the last month. Is this a bad ECC DIMM? I haven't rebooted since this started, so I'll start there. Any ideas how to read this error? I'm running a XeonD-1520 setup on an AsRock D1520D4I on unRAID 6.4.1. It's been in service about 20 months. Thanks! Josh
May 20, 20188 yr Author I'm not seeing any corresponding events logged there. I'm reconfiguring the log rotation in the IPMI to make sure I'm not missing anything from a full log file. After rebooting the server, I've had three recurrences of the error in two weeks. Not sure what I'm dealing with here. Josh
Archived
This topic is now archived and is closed to further replies.