Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Kernel Panic and Out Of Memory Errors [SOLVED]

Featured Replies

I'm not good with decoding syslog kernel panics and need some help please.

 

System Specs:

Supermicro X11SSH-LN4F - c states disable in bios

Xeon E3-1240 v6

64 GB DDR4 ECC

 

2 ZFS pools, one Nytro SSD pool and one HDD pool

UNRAID disk is on an nvme, using a Samsung Fit flash drive.

No longer running VMs or Dockers but ran stable with them.

 

Got some remote replication to another machine.

Enabled saving syslog to the flash drive and waited for another failure.

 

Found a Kernel panic Feb 23rd 1412 and repeats going forward. The system worked but cranky.

System gets worse around Feb 24th 1400.

Out of memory errors at Feb 24th 1408 and repeats. kernel: Out of memory: Kill process 12302 (monitor) score 0 or sacrifice child 

System required hard reboot Feb 24th 1443.

 

Been digging through forums and google and come up with possible bad RAM issues but it's ECC RAM.

Possible left over routes from VMs and Dockers. Easy to remote but can't see that doing it.

Takes about 6 days to blow up. Currently rebooting on day 4.

 

There is a record of a boot where the two pools tried mounting to the same mount point. Ignore that, it got fixed.

Another Kernel panic at Feb 24th 17:24 but it doesn't repeat. System stable from this point on.

 

Thanks in advance.

 

syslog

Edited by Holmesware

Can you also post your diagnostics so the task lists etc can be put into perspective.

  • Author

This is the issue I'm having, 1 cpu pegged at 100%, wdss is the process, this goes on until the kernel start panicing or running out of memory.

Trying the restart script at the end of the thread and will report back.

 

https://forums.unraid.net/topic/85073-wsdd-100-using-1-core/page/2/

 

EDIT: Script did not reset the 100% cpu usage. Disabled WDS. Kept script running for now.

Edited by Holmesware

  • Author

Finally found this, looks like I got a bad stick of RAM.

I'm running ECC RAM and reseated the ram during the first server crash.

memtest didn't show anything after a quick run, didn't have time do a full test.

Heat is not an issue with my setup. I have a good quality 750W PSU.

Going to swap the DIMM on channel 0 with the one in channel 3 and see if this shows up again.

 

Mar  4 07:03:17 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:04:56 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:15:59 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:19:42 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:22:31 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:23:28 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:25:47 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:26:18 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:26:38 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:55:16 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:58:00 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 07:59:02 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:01:05 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:09:12 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:09:58 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:10:03 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:18:00 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:21:35 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)
Mar  4 08:21:56 kernel: EDAC MC0: 1 CE ie31200 CE on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:8 syndrome:0x52)

 

Edit: To help find what DIMM is having the error:

root@system~: grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:34 <- ERROR COUNT /mc0/csrow1/ch0
/sys/devices/system/edac/mc/mc0/csrow1/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:0

 

mcX = Memory Controller (single, dual CPU)

chX = Channel 0, Channel 1, Channel 3 (single, dual, triple Channel RAM)

csrowX = see chart

 

                Channel 0     Channel 1     Channel 3
============================================
csrow0  |  DIMM_A0   |   DIMM_B0   |   DIMM_C0   |
csrow1   |  DIMM_A0   |   DIMM_B0   |   DIMM_C0   |
============================================
============================================
csrow2   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
csrow3   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
============================================

============================================
csrow4   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
csrow5   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
============================================

============================================
csrow6   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
csrow7   |  DIMM_A1   |   DIMM_B1   |   DIMM_C0   |
============================================

 

root@system~: dmidecode -t memory | grep 'Locator'

Locator: DIMMA1 <- THIS ONE - DIMM_A0
Bank Locator: P0_Node0_Channel0_Dimm0
Locator: DIMMA2
Bank Locator: P0_Node0_Channel0_Dimm1
Locator: DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Locator: DIMMB2
Bank Locator: P0_Node0_Channel1_Dimm1

 

EDIT: Moved the stick of RAM and got an error in another slot. Ordering new stick of RAM.

/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:1 <- NEW ERROR

 

EDIT2: 

/sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:2 <- NEW ERROR

Errors have slowed but at least I have a second error now. RAM incoming.

 

EDIT3:
replaced defective DIMM, 4 days no error. Turned on WDS. Watching Logs and CPU useage. WDS script still running.

 

EDIT4:

No more memory errors and WDS is running without eating a full CPU. Calling this solved.

 

 

Edited by Holmesware

  • Holmesware changed the title to Kernel Panic and Out Of Memory Errors [SOLVED]

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.