I'm having random system crashing errors, I think due to a bad memory stick? How can I diagnose?


hawnkey
Go to solution Solved by Squid,

Recommended Posts

When this crash occurs, it seems to try to reboot the system, but never does successfully. It will show LOGIN: but the UI never starts and I can't connect via the IP or via PuTTY.

 

Here is a subset of the error reporting from syslog immediately after the event.

 

Nov 27 07:38:06 Tower kernel: mce: CMCI storm detected: switching to poll mode
Nov 27 07:38:06 Tower kernel: mce: [Hardware Error]: Machine check events logged
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc04200000010090
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: TSC 3ee1e59de211c
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: ADDR 24be480
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: MISC 142184e86
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1669556286 SOCKET 0 APIC 0
Nov 27 07:38:06 Tower kernel: EDAC MC0: 4224 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x24be offset:0x480 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:1)
Nov 27 07:38:06 Tower kernel: mce: [Hardware Error]: Machine check events logged
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc000f8000010090
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: TSC 3ee1e59e32afc
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: ADDR 2519e00
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: MISC 40181486
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1669556286 SOCKET 0 APIC 0
Nov 27 07:38:06 Tower kernel: EDAC MC0: 62 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x2519 offset:0xe00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0)
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc00064000010090
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: TSC 3ee1e59e4c06c
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: ADDR 12b60bb40
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: MISC 4218ca86
Nov 27 07:38:06 Tower kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1669556286 SOCKET 0 APIC 0

 

I've attached a file with all of the events immediately prior, during, and after this error starts.

Crash Log 27 Nov.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.