Jump to content

log hitting 100% after only 4 days


Brydezen

Recommended Posts

Hello guys,

 

I know some of you leave you're servers on for maybe months before restarting or something. I left my server on when I was away, for some download and hosting of VM's for a friend. But not the "fix commen problems plugin" says my log os almost at 100% usage - and a reboot would fix this temporary, but I do wanna fix this permanetly. Here are some screenshots, and my diganostics. Hope someone would/can help me :(

 

91d0e6948418e7d547c80fe389ceb7c3.png

345769d363c07e8fc0eed85e75dabdbf.png

ef7a2b1fe544f26e25b4f9543fe1eb9f.png

tower-diagnostics-20180312-1531.zip

Link to comment

Syslog is getting spammed with memory errors:

 

Mar  9 07:07:06 Tower kernel: EDAC MC1: 24441 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1052674 offset:0xc00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 5: cc16544000010090
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: TSC 0
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: ADDR 107ec70600
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: MISC 204a00e086
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1520575626 SOCKET 1 APIC 20
Mar  9 07:07:06 Tower kernel: EDAC MC1: 22865 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x107ec70 offset:0x600 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 9: cc0001d0000800c1
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: TSC 0
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: ADDR 89b248000
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: MISC 90840000000208c
Mar  9 07:07:06 Tower kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1520575626 SOCKET 1 APIC 20
Mar  9 07:07:06 Tower kernel: EDAC MC1: 7 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x89b248 offset:0x0 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0008:00c1 socket:1 ha:0 channel_mask:1 rank:1)
Mar  9 07:07:07 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar  9 07:07:07 Tower kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 5: cc11044000010090

 

These are hardware errors, system event log should have more info on the affected slots.

Link to comment

Oh, how do i check the system event log or post it? Do want do fix this problem if I can, or contact the seller who sold me the ram, as its "kinda" new, but still used. 

 

Is it only DIMM 0 throwing errors? I have tried to spam though the syslog.1 and syslog.2 and it looks like it only is Channel: 0 DIMM: 0 throwing out errors.

Link to comment
1 minute ago, Brydezen said:

Do I need to look at the SEL logs, or can it maybe be fixed just by rebooting?

You need to identify the memory/slot causing the errors to fix it, rebooting will fix the log size problem but not the memory errors, it will be the same after a few days.

Link to comment
On 3/12/2018 at 6:48 PM, Brydezen said:

Do I need to look at the SEL logs, or can it maybe be fixed just by rebooting? :D

 

Kinda wish I still had the manual under my bed, but put it in the basement in the box and now a lot of stuff is on top of it -.-

I have as rule to always keep downloaded manuals for motherboards etc easily accessible from some other machine and without need for working network.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...