Jump to content

Dimm Error Identification


statecowboy

Recommended Posts

Posted

Hi guys.  I was curious how to rectify the unraid mem errors with my actual dimms.  In my case, I am able to log in to my web console and see errors.  I also have an LED that blinks when an error is registered.  In this case DIMM H2 is lit up and my web console output the following:

 

1679 02/17/2018 23:29:30 Mmry ECC Sensor Memory Correctable ECC. CPU: 2, DIMM: H2. - Asserted

 

That said, this is the error I get in unraid.

 

Feb 17 17:27:57 someflix-unraid kernel: mce: [Hardware Error]: Machine check events logged
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 12: 8c000043000800c3
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: TSC 5365d5a58cd7c 
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: ADDR 9dd90f000 
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: MISC 122100008000868c 
Feb 17 17:27:57 someflix-unraid kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1518910077 SOCKET 1 APIC 20
Feb 17 17:27:57 someflix-unraid kernel: EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x9dd90f offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c3 socket:1 ha:0 channel_mask:1 rank:4)

 

Can someone explain how to tell from the unraid log which DIMM I am getting an error on?  Obviously I can check my web console, but I was curious what the methodology is.

 

Thanks

Posted

Hi guys.  I have another error in my memory which I've added below to explain my question above (I've also attached my last diagnostics).

From unRAID logs:

Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 10: 8c000047000800c1
Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: TSC 160afab75fbec 
Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: ADDR 142592000 
Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: MISC 908400800080e8c 
Feb 20 11:30:45 someflix-unraid kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1519147845 SOCKET 0 APIC 0
Feb 20 11:30:45 someflix-unraid kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x142592 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c1 socket:0 ha:0 channel_mask:1 rank:1)

 

From BMC Web Console:

Event ID   Ascending   Time Stamp   Ascending   Sensor Name   Ascending   Sensor Type   Ascending   Description   Ascending
22 02/20/2018 17:31:40 Mmry ECC Sensor Memory Correctable ECC. CPU: 1, DIMM: B1. - Asserted

someflix-unraid-diagnostics-20180221-1837.zip

Posted
On 2/18/2018 at 12:24 AM, statecowboy said:

but I was curious what the methodology is.

The methodology is to check the board's SEL, you'll likely won't be able to decipher the problem DIMM from the unRAID logs.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...