uiuc_josh Posted May 7, 2018 Share Posted May 7, 2018 All, I'm getting the following (flagged by FCP) in my syslog: May 6 22:37:21 unServer kernel: mce: [Hardware Error]: Machine check events logged May 6 22:37:21 unServer kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR May 6 22:37:21 unServer kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 9: 8c000041000800c0 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: TSC 397c3a0867fb60 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: ADDR 1d0543000 May 6 22:37:21 unServer kernel: EDAC sbridge MC0: MISC 90000004000428c May 6 22:37:21 unServer kernel: EDAC sbridge MC0: PROCESSOR 0:50662 TIME 1525646241 SOCKET 0 APIC 0 May 6 22:37:21 unServer kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1d0543 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c0 socket:0 ha:0 channel_mask:1 rank:0) They are occuring about every 1-3 days for about the last month. Is this a bad ECC DIMM? I haven't rebooted since this started, so I'll start there. Any ideas how to read this error? I'm running a XeonD-1520 setup on an AsRock D1520D4I on unRAID 6.4.1. It's been in service about 20 months. Thanks! Josh Quote Link to comment
JorgeB Posted May 7, 2018 Share Posted May 7, 2018 Check the board's system event log, there might be more info there. Quote Link to comment
uiuc_josh Posted May 20, 2018 Author Share Posted May 20, 2018 I'm not seeing any corresponding events logged there. I'm reconfiguring the log rotation in the IPMI to make sure I'm not missing anything from a full log file. After rebooting the server, I've had three recurrences of the error in two weeks. Not sure what I'm dealing with here. Josh Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.