6.8.3 - Machine Check Event - HANDLING MCE MEMORY ERROR


Grrrreg

Recommended Posts

Hello Everyone,

 

I was noticing some instability and rebooted my server the other day.  Today I saw the notice and instructions in Fix Common Problems to install the mcelogs plugin.  It looks like there are some memory errors.   I've got about an hour left on the parity check, but am wondering what my next steps should be.  I could definitely use some recommendation on what's the best practice with server memory.   From reading the SuperMicro guide, it seems like it would be not recommended to just remove the faulty module, but maybe I'm reading that wrong.   I was thinking I'd shutdown the array and reboot and run a memory check from bios.   SEL logging was turned off in the bios, now enabled.  

 

Also, in replacing the bad dimm, I was thinking of getting 4 new dimms, same spec, but larger capacity, 8 or 16GB dimms and replacing the other three in the same channel/rank etc for a small upgrade.   I'm still trying to make sense of all the complexity of server memory, so all advice is welcome and appreciated.

 

Thanks in advance for your help!  

-Greg

 

Server

UNRAID 6.8.3 
SuperMicro - SuperStorage 6047R-E1R24N
 MB: Super X9DRi-LN4F+
    Processor 1: Intel Xeon E5-2660 v2 2.2GHz 10 Core 25MB Cache Processor
    Processor 2: Intel Xeon E5-2660 v2 2.2GHz 10 Core 25MB Cache Processor
    Memory: 64GB (16x4GB) PC3-10600R 1333MHz DDR3 ECC

 

Errors

Dec 15 16:05:47 Seine kernel: mce: [Hardware Error]: Machine check events logged
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 7: 8c00004000010090
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: TSC d7f4672046556 
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: ADDR bce285600 
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: MISC 207e5286 
Dec 15 16:05:47 Seine kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1608077147 SOCKET 1 APIC 20
Dec 15 16:05:47 Seine kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbce285 offset:0x600 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)


Dec 15 20:01:16 Seine kernel: mce: [Hardware Error]: Machine check events logged
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 7: 8c00004000010090
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: TSC d9b8bf06f72be 
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: ADDR bce285600 
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: MISC 407c0086 
Dec 15 20:01:16 Seine kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1608091276 SOCKET 1 APIC 20
Dec 15 20:01:16 Seine kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbce285 offset:0x600 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)

 

 

 

seine-diagnostics-20201216-1138.zip

Edited by Grrrreg
typos
Link to comment

Thanks Squid, I have two replacement dimms on order.   I just need to figure out which dimm to replace.  I haven't had any new errors log since the original event.   Once the replacements arrive, I'll run a memtest or the supermicro offline memtest and hope they identify which dimm to replace.  I think I've figured out the SM recommended memory config and what would happen it I just tried to upgrade the memory in regards to the reduced speed per dimms per channel etc. 

 

I love my older server config, the number of cores etc, overall it's worked well.   Things have changed a lot since I bought my first 20MB SCSI hard drive in 1990.  I think there's a lot of great hardware out there that has lots of life left in it, but at times some of us don't fully understand how best to keep it running.  I really appreciate all the advice and help from the forums.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.