Kernel Panic and remote access issues

monghuz · December 6, 2021

Hi All,

It's been a while I'm experiencing os crashes with unraid.

The most Common one is the "Kernel Panic - not syncing: Fatal exception in interrupt". This is tricky since even I have syslog enabled (to remote host + copy to flash) I was not able to see any real logs that would reveal the smoking gun.

Like this below... after a successful backup the next message is after I did a hard reset on my machine. syslog-127.0.0.1-kernelpanic.zip

Dec  6 05:03:07 Pandora-NAS CA Backup/Restore: Backup Complete
Dec  6 05:03:07 Pandora-NAS CA Backup/Restore: Verifying backup
Dec  6 05:03:07 Pandora-NAS CA Backup/Restore: Using command: cd '/mnt/user/appdata/' && /usr/bin/tar --diff -C '/mnt/user/appdata/' -af '/mnt/user/backups/appdata-monthly/[email protected]/CA_backup.tar' > /var/lib/docker/unraid/ca.backup2.datastore/appdata_backup.log & echo $! > /tmp/ca.backup2/tempFiles/verifyInProgress
Dec  6 13:28:25 Pandora-NAS root: Delaying execution of fix common problems scan for 10 minutes

The diagnostics package pandora-nas-diagnostics-20211206-1331.zip that created post this recent crash is attached as well.

I believe that the issue started after I added a quad NIC into my system. Although I was running with it for few months without any issues.

Till once the OS crashed. Since then it had countless crashes with a various duration of 1-14 days.

My docker settings:

I have many docker container although the 3 that runs most of the time with br0 are : qbitorrent, plex, tautulli. The rest are either in bridge mode or not running.

I read many articles and I saw that it could be something with VLANs although I'm not sure how to adjust my settings.... also it seems that others experiencing the same even on 6.10.0-rc2.

Although the remote symptoms (Web UI, shares, ssh are all unavailable) are the same, I had 2 cases when the console didn't showed the kernel panic message, but the command prompt was "working".

I got invalid password although I provided the correct one hence the authentication engine should failed.
I was managed to login, ip settings were good and I was able to ping my gateway although the OS was not accessible. diagnostics was failed, but the syslog syslog-127.0.0.1-notaccesible-20nov.zipshowed some kernel info ~ midnight, then the next log is indicating the time I did a hard reset.

Nov 20 00:12:17 Pandora-NAS kernel: igb 0000:04:00.1 eth1: Reset adapter
Nov 20 00:12:17 Pandora-NAS kernel: bond0: (slave eth0): link status definitely down, disabling slave
Nov 20 00:12:17 Pandora-NAS kernel: device eth0 left promiscuous mode
Nov 20 00:12:17 Pandora-NAS kernel: bond0: now running without any active interface!
Nov 20 00:12:17 Pandora-NAS kernel: br0: port 1(bond0) entered disabled state
Nov 20 00:12:17 Pandora-NAS kernel: igb 0000:04:00.0 eth0: Reset adapter
Nov 20 09:54:32 Pandora-NAS root: Delaying execution of fix common problems scan for 10 minutes

Any help would be much appreciated since I hit my array with parity checks (7+ hours) each time I have a crash, and I had many...

Regards

monghuz

pandora-nas-diagnostics-20211206-1331.zip syslog-127.0.0.1-kernelpanic.zip syslog-127.0.0.1-notaccesible-20nov.zip

Edited December 6, 2021 by monghuz

JorgeB · December 6, 2021

Macvlan call traces are usually the result of having dockers with a custom IP address, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

Kernel Panic and remote access issues

Recommended Posts

monghuz

Link to comment

JorgeB

Link to comment

monghuz

Link to comment

JorgeB

Link to comment

Join the conversation