MCE errors causing random crashes


Recommended Posts

Hi,

 

I have this computer:

CPU:            2 x Intel XEON E5-2650 SR0KQ Octa Core CPU 8x 2,00 GHz 2011 8 Core Matched Pair
Motherboard:    Intel S2600CP2J Motherboard Dual 2011 Socket Motherboard
Memory:         Used Micron VLP 32GB (8x4GB) DDR3 PC3-10600R 
CPU Cooler:     2 x Arctic Alpine 11 Pro Rev.2
Storage:        HGST Deskstar NAS H3IKNAS400012872SWW 4TB

Cache:        250GB Samsung 850 EVO
PSU:            Seasonic S12II-520 Bronze 520 Watt
Case:           Phanteks Enthoo Pro Midi-Tower mit Fenster titanium grün
Case fans:      2 x Arctic F12 PWM 120mm Lüfter Rev. 2

 

It has been giving me MCE errors but can't figure out how to fix them.

k Event: 0 Bank 11: cc000090000800c3
Mar 2 14:04:56 GerardServer kernel: EDAC sbridge MC1: TSC 0 
Mar 2 14:04:56 GerardServer kernel: EDAC sbridge MC1: ADDR 600b64000 
Mar 2 14:04:56 GerardServer kernel: EDAC sbridge MC1: MISC 90000000000208c 
Mar 2 14:04:56 GerardServer kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1519995896 SOCKET 1 APIC 20
Mar 2 14:04:56 GerardServer kernel: EDAC MC1: 2 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x600b64 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c3 socket:1 ha:0 channel_mask:1 rank:0)
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 5: cc16b44000010091
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: TSC 0 
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: ADDR 82e215d40 
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: MISC 214240a086 
Mar 2 14:04:57 GerardServer kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1519995897 SOCKET 1 APIC 20
Mar 2 14:04:57 GerardServer kernel: EDAC MC1: 23249 CE memory read error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x82e215 offset:0xd40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0091 socket:1 ha:0 channel_mask:2 rank:0)
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 5: cc12768000010091
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: TSC 0 
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: ADDR 82e20f140 
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: MISC 21421cfc86 
Mar 2 14:04:58 GerardServer kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1519995898 SOCKET 1 APIC 20
Mar 2 14:04:58 GerardServer kernel: EDAC MC1: 18906 CE memory read error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x82e20f offset:0x140 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0091 socket:1 ha:0 channel_mask:2 rank:1)
Mar 2 14:04:59 GerardServer kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 2 14:04:59 GerardServer kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 5: cc16b34000010091
Mar 2 14:04:59 GerardServer kernel: EDAC sbridge MC1: TSC 0 
Mar 2 14:04:59 GerardServer kernel: EDAC sbridge MC1: ADDR 81b166040 

 

 

I have tried booting with just one stick of memory and different memory configurations but these errors still show up.

Can anyone help me sort this out. Thanks

Edited by gertab
Link to comment
26 minutes ago, Squid said:

Does the DIMM # referenced change if you swap the chips around?  

This is from when I changed some RAMs, so that do seem to change.

 

Feb 24 08:29:37 GerardServer kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 8: cc080810000800c0
Feb 24 08:29:37 GerardServer kernel: EDAC sbridge MC0: TSC 0 
Feb 24 08:29:37 GerardServer kernel: EDAC sbridge MC0: ADDR 42ab9000 
Feb 24 08:29:37 GerardServer kernel: EDAC sbridge MC0: MISC 90000000000208c 
Feb 24 08:29:37 GerardServer kernel: EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1519457377 SOCKET 0 APIC 0
Feb 24 08:29:37 GerardServer kernel: EDAC MC0: 8224 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x42ab9 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:0 ha:0 channel_mask:1 rank:0)
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 5: cc11edc000010090
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: TSC 0 
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: ADDR 1ff2d00 
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: MISC 20422aaa86 
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1519457378 SOCKET 0 APIC 0
Feb 24 08:29:38 GerardServer kernel: EDAC MC0: 18359 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1ff2 offset:0xd00 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
Feb 24 08:29:38 GerardServer kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR

Link to comment
On 3/3/2018 at 1:42 PM, johnnie.black said:

It's SEL, page 64 on the manual, and according to the manual your board does have IPMI.

I'm sorry but I tried to find it but figure that my board is Intel S2600CP2 (no J), which does not have IPMI.

Link to comment

According to the manual these are the only differences between the available models:

 

The Intel® Server Board S2600CP family includes different board configurations:
 Intel® Server Board S2600CP2: dual NIC ports
 Intel® Server Board S2600CP4: quad NIC ports
 Intel® Server Board S2600CP2J: dual NIC ports and no SCU ports 

 

And all of them have IPMI.

Link to comment
8 hours ago, johnnie.black said:

If you haven't found the system event log yet see here, ask the OP where it is since he's using the same board . . .

 

OP from the linked thread here, I happened to stumble across this thread.  I used the guide from Intel here, which worked like a charm.  You need to load the software onto a thumb drive and run it from the internal EFI shell.  You can then save the output as a .sel file back onto the thumb drive and open from Notepad++ in Windows or whatever.  Hope that helps.

 

EDIT: words.

Edited by blocker85
  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.