statecowboy Posted February 18, 2018 Posted February 18, 2018 Hi guys. I was curious how to rectify the unraid mem errors with my actual dimms. In my case, I am able to log in to my web console and see errors. I also have an LED that blinks when an error is registered. In this case DIMM H2 is lit up and my web console output the following: 1679 02/17/2018 23:29:30 Mmry ECC Sensor Memory Correctable ECC. CPU: 2, DIMM: H2. - Asserted That said, this is the error I get in unraid. Feb 17 17:27:57 someflix-unraid kernel: mce: [Hardware Error]: Machine check events logged Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 12: 8c000043000800c3 Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: TSC 5365d5a58cd7c Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: ADDR 9dd90f000 Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: MISC 122100008000868c Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1518910077 SOCKET 1 APIC 20 Feb 17 17:27:57 someflix-unraid kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x9dd90f offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c3 socket:1 ha:0 channel_mask:1 rank:4) Can someone explain how to tell from the unraid log which DIMM I am getting an error on? Obviously I can check my web console, but I was curious what the methodology is. Thanks
statecowboy Posted February 22, 2018 Author Posted February 22, 2018 Hi guys. I have another error in my memory which I've added below to explain my question above (I've also attached my last diagnostics). From unRAID logs: Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 10: 8c000047000800c1 Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: TSC 160afab75fbec Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: ADDR 142592000 Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: MISC 908400800080e8c Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1519147845 SOCKET 0 APIC 0 Feb 20 11:30:45 someflix-unraid kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x142592 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c1 socket:0 ha:0 channel_mask:1 rank:1) From BMC Web Console: Event ID Time Stamp Sensor Name Sensor Type Description 22 02/20/2018 17:31:40 Mmry ECC Sensor Memory Correctable ECC. CPU: 1, DIMM: B1. - Asserted someflix-unraid-diagnostics-20180221-1837.zip
JorgeB Posted February 22, 2018 Posted February 22, 2018 On 2/18/2018 at 12:24 AM, statecowboy said: but I was curious what the methodology is. The methodology is to check the board's SEL, you'll likely won't be able to decipher the problem DIMM from the unRAID logs.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.