Hardware Error - Need help with mcelog


Recommended Posts

It's my understanding that unraid has a memory check that's available when you log in with a monitor attached to the physical/actual computer running unraid. I know it's really there because I've seen it. Try running that. I'm scared to run it on my ECC ram.

 

PS:  I did not look at the files you attached.

 

6.

Link to comment

@mhowland24, did you run a Memtest to confirm ? Reseatting the DIMMs is a good advice in general, but it the test still detect errors it would be something to fix anyway.

 

The memtest that ships with Unraid does not detect errors on ECC, you should make a boot drive from https://www.memtest86.com/

 

Looks like real errors that are corrected by the ECC.

 

May 25 20:28:16 Tower kernel: mce: [Hardware Error]: Machine check events logged
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 7: 8c00004000010092
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: TSC a0e86f108cc10 
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: ADDR fec0de980 
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: MISC 1407ed086 
May 25 20:28:16 Tower kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1621985296 SOCKET 1 APIC 20
May 25 20:28:16 Tower kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xfec0de offset:0x980 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0092 socket:1 ha:0 channel_mask:4 rank:1)

 

May 26 04:40:11 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server
May 26 04:40:11 Tower root: Hardware event. This is not a software error.
May 26 04:40:11 Tower root: MCE 0
May 26 04:40:11 Tower root: CPU 10 BANK 7 TSC a0e86f108cc10 
May 26 04:40:11 Tower root: MISC 1407ed086 ADDR fec0de980 
May 26 04:40:11 Tower root: TIME 1621985296 Tue May 25 20:28:16 2021
May 26 04:40:11 Tower root: MCG status:
May 26 04:40:11 Tower root: MCi status:
May 26 04:40:11 Tower root: Corrected error
May 26 04:40:11 Tower root: MCi_MISC register valid
May 26 04:40:11 Tower root: MCi_ADDR register valid
May 26 04:40:11 Tower root: MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR
May 26 04:40:11 Tower root: Transaction: Memory read error
May 26 04:40:11 Tower root: STATUS 8c00004000010092 MCGSTATUS 0
May 26 04:40:11 Tower root: MCGCAP 1000c1b APICID 20 SOCKETID 1 
May 26 04:40:11 Tower root: PPIN abdb2681abd60c88
May 26 04:40:11 Tower root: MICROCODE 42e
May 26 04:40:11 Tower root: CPUID Vendor Intel Family 6 Model 62
May 26 04:40:11 Tower root: mcelog: warning: 8 bytes ignored in each record
May 26 04:40:11 Tower root: mcelog: consider an update

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.