6.9.0/6.9.1 - Kernel Panic due to netfilter (nf_nat_setup_info) - Docker Static IP (macvlan)

bonienl · March 11, 2022

With a watchdog timing out, it looks like a hardware related issue.

Have you tried to reseat the NIC or move it to a different slot?

Mr_Jay84 · March 11, 2022

7 minutes ago, bonienl said:

With a watchdog timing out, it looks like a hardware related issue.

Have you tried to reseat the NIC or move it to a different slot?

I've tried three different lan cards in various slots over the last few months.

This happens using onboard NICs and PCI NICs

Edited March 11, 2022 by Mr_Jay84

bonienl · March 12, 2022

11 hours ago, Mr_Jay84 said:

This happens using onboard NICs and PCI NICs

Could also be a motherboard problem ...

Mr_Jay84 · March 12, 2022

On 3/12/2022 at 10:09 AM, bonienl said:

Could also be a motherboard problem ...

I've had the onboard NICs turned off too.

Assigning IPs to containers is a sure way of causing a crash in under 24hrs.

I'm not buying anymore hardware. I've already spent a lot of money replacing pretty much everything and trying out various hardware combinations over at least a year.

Upgraded to RC3 today.....so fingers crossed. I'll have to look into other solutions as unRaid isn't reliable enough if this doesn't work.

Edited March 16, 2022 by Mr_Jay84

Mr_Jay84 · March 16, 2022

Another lockup today not sure if it's related though.

Mar 16 15:06:55 Ultron kernel: veth437980e: renamed from eth0
Mar 16 15:06:55 Ultron kernel: docker0: port 26(veth244df90) entered disabled state
Mar 16 15:06:55 Ultron kernel: docker0: port 26(veth244df90) entered disabled state
Mar 16 15:06:55 Ultron kernel: device veth244df90 left promiscuous mode
Mar 16 15:06:55 Ultron kernel: docker0: port 26(veth244df90) entered disabled state
Mar 16 15:07:55 Ultron kernel: docker0: port 26(veth90a6157) entered blocking state
Mar 16 15:07:55 Ultron kernel: docker0: port 26(veth90a6157) entered disabled state
Mar 16 15:07:55 Ultron kernel: device veth90a6157 entered promiscuous mode
Mar 16 15:07:55 Ultron kernel: eth0: renamed from vethc67d2ac
Mar 16 15:07:55 Ultron kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth90a6157: link becomes ready
Mar 16 15:07:55 Ultron kernel: docker0: port 26(veth90a6157) entered blocking state
Mar 16 15:07:55 Ultron kernel: docker0: port 26(veth90a6157) entered forwarding state
Mar 16 15:07:57 Ultron kernel: docker0: port 26(veth90a6157) entered disabled state
Mar 16 15:07:57 Ultron kernel: vethc67d2ac: renamed from eth0
Mar 16 15:07:57 Ultron kernel: docker0: port 26(veth90a6157) entered disabled state
Mar 16 15:07:57 Ultron kernel: device veth90a6157 left promiscuous mode
Mar 16 15:07:57 Ultron kernel: docker0: port 26(veth90a6157) entered disabled state
Mar 16 18:48:14 Ultron kernel: br-caf2c2672c89: port 1(vethdf85790) entered disabled state
Mar 16 18:48:14 Ultron kernel: veth6c760c6: renamed from eth0
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(vethdf85790) entered disabled state
Mar 16 18:48:15 Ultron kernel: device vethdf85790 left promiscuous mode
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(vethdf85790) entered disabled state
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(veth400e60b) entered blocking state
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(veth400e60b) entered disabled state
Mar 16 18:48:15 Ultron kernel: device veth400e60b entered promiscuous mode
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(veth400e60b) entered blocking state
Mar 16 18:48:15 Ultron kernel: br-caf2c2672c89: port 1(veth400e60b) entered forwarding state
Mar 16 18:48:15 Ultron kernel: docker0: port 1(vethc04644b) entered blocking state
Mar 16 18:48:15 Ultron kernel: docker0: port 1(vethc04644b) entered disabled state
Mar 16 18:48:15 Ultron kernel: device vethc04644b entered promiscuous mode
Mar 16 18:48:15 Ultron kernel: docker0: port 1(vethc04644b) entered blocking state
Mar 16 18:48:15 Ultron kernel: docker0: port 1(vethc04644b) entered forwarding state
Mar 16 18:48:15 Ultron kernel: eth0: renamed from veth3e19731
Mar 16 18:48:15 Ultron kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth400e60b: link becomes ready

ultron.log

Edited March 16, 2022 by Mr_Jay84

Mr_Jay84 · April 3, 2022

Upgraded to RC3 a few weeks back, still the same crashing however as soon as RC4 came out I tried that.

It's been up for almost 9 days with no issues so far. This is a record for me, tomorrow i'll turn on piHole again as that's a sure way of causing the previous crashes. Fingers crossed!

Mr_Jay84 · April 15, 2022

The issue has not returned since the RC4 upgrade even with PiHole and other known offending containers re-enabled.

This is excellent!

jzawacki · October 19, 2022

There is a lot of info here.. so I'm just going to chime in so that I receive notifications as this thread progresses. Not speaking on the cause of anyone else's kernel panics, but mine match a few of these screenshots verbatim.

I've been running Unraid for as long as I can remember and running dockers since they were originally introduced. My server has been rock solid this entire time, even after upgrading MB/Proc/RAM, adding 10G NICs, replacing HDDs, etc.

Even with 6.9.2, never had any issues. Out of the blue, I decided to upgrade to 6.10.0 and everything went south. Random kernel panics. Obviously, blaming it on the upgrade, reverted to the previous version, 6.9.2 but the kernel panics continued (in hindsight, is it possible that 6.10.0 made changes to the docker configs causing 6.9.2 to continue having issues?). Doing the research, assumed it was hardware related and replaced the MB/Proc/RAM and the kernel panics continued. I noticed that it appeared the kernel panics seem to happen around the time the Appdata Backup/Restore was scheduled, so I started diving into that. Found some errors accessing files, so thought it might be corruption issues. Resolved all that, moved appdata to different drives, on to/off of cache, etc. Turned off all containers and slowly turned them on, one at a time (over the course of a month), to see if I could pinpoint the cause, but nothing was repeatable. Everything appeared to be random. As of two days ago, my server had been running for 28 days with all dockers running without issues. So, when it kernel panic'd the other day, I remove Appdata Backup/Restore, moved everything docker related onto the protected array and it kerenel panic'd that evening.

Anyway, latest change I am testing (finger's crossed) is changing the docker setup to use ipvlan instead of macvlan. if that doesn't work, I'm going to plug another NIC in and try using eth# instead of br0 for the dockers that need a static IP to operate properly.

PS: I learned of ctop, which is like top for docker containers.. loving it. Wish it was built into Unraid or that Unraid had that information available within the web GUI.

Edited October 19, 2022 by jzawacki

CorneliousJD · October 19, 2022

Ipvlan made more issues for me.

Then I went back to macvlan with vlans on docker and it was fine, until I got a new gateway (UDM-SE) and then it started crashing again.

I am now on a dedicated NIC for containers with their own IP.

Things keep changing... but I'm hopeful that this keeps things stable long term.

Shonky · October 20, 2022

Not sure exactly what fixed it for me but I can say that running the latest stable the issue has not appeared for quite some months now. I did make some minor changes to macvlan/ipvlan but seem to remember that didn't seem to help.

jzawacki · October 24, 2022

I'm afraid to post.. cause I don't want to jinx myself, but unraid has been up for 5 days after switching to ipvlan running all dockers except for one (that is known to cause issues). I hope I don't have to post again any time soon.

jzawacki · November 8, 2022

Well, I had to switch back to macvlan due to some network weirdness. unraid had been running for 20 days without issues, but I had some dockers that wouldn't stay connected and it wasn't till I realized that unraid couldn't check to see if there was a version update available, did I blame it on the docker ipvlan setting. Switched it back to macvlan and all the network weirdness went away and the server hasn't kernel panic'd yet, so I'm keeping my fingers crossed.

CorneliousJD · November 8, 2022

8 minutes ago, jzawacki said:

Well, I had to switch back to macvlan due to some network weirdness. unraid had been running for 20 days without issues, but I had some dockers that wouldn't stay connected and it wasn't till I realized that unraid couldn't check to see if there was a version update available, did I blame it on the docker ipvlan setting. Switched it back to macvlan and all the network weirdness went away and the server hasn't kernel panic'd yet, so I'm keeping my fingers crossed.

I ran into same issues with ipvlan,

Ipvlan has been less drastic of issues but more sporadic and IMO, the worse of the two

sage2050 · March 24, 2023

glad I found this thread, been dealing with kernel panics for the last month or so and can't for the life of me find the solution. It started after i installed a gpu and setup a vm to pass it to - never had any issues before, and disabling the vm doesn't stop the kernel panics. I upgraded from 6.11 to 6.12.0-rc2 and having the same problems. Going to dig through this thread a bit.

Edited March 24, 2023 by sage2050

jzawacki · March 24, 2023

22 minutes ago, sage2050 said:

glad I found this thread...

I wish there was a solid answer/fix. For me personally, It had to do with one of the dockers. I installed docker and portainer on my backup server, moved a bunch of dockers to it, and although I still have 9 dockers running full time on unraid, my server uptime is currently 21 days and I can't remember the last time it kernel panic'd. Now, since I'm posting this, it'll kernel panic by the end of the night.

sage2050 · March 31, 2023

Knocking on every piece of wood and tree in sight, but I crossed over 1 week of uptime last night after changing docker from macvlan to iplan. I also disabled some less than crucial dockers so I'll start enabling them one by one.

CorneliousJD · March 31, 2023

4 minutes ago, sage2050 said:

Knocking on every piece of wood and tree in sight, but I crossed over 1 week of uptime last night after changing docker from macvlan to iplan. I also disabled some less than crucial dockers so I'll start enabling them one by one.

I was running fine but would randomly see all sorts of network issues on my server after that. not able to update containers, not able to reach the internet from some, etc. seems the way it networks those together w/ ipvlan didn't play nice with my unifi router.

jzawacki · March 31, 2023

Same for me, I couldn't even get the host to be able to update after switching to ipvlan, and I'm not using UniFi. So, I believe, if ipvlan fixes it for anyone, it's because it is stopping the actual cause (one of the dockers?) not to be able to talk to the network. But that's just a guess.

sage2050 · April 1, 2023

*sigh* got a call trace this morning, but the server recovered at least. I hadn't even started activating dockers yet.

I haven't seen any network issues in the log but i'll keep an eye out.

edit: this trace was related to a scheduled ca backup/restore which is currently incompatible wtih 6.12 and i hadn't disabled it. I'll chalk it up to that.

Edited April 1, 2023 by sage2050

sage2050 · April 2, 2023

hmm

Apr  1 18:40:17 Servbot kernel: e1000e 0000:00:1f.6 eth0: NIC Link is Down
Apr  1 18:40:17 Servbot kernel: bond0: (slave eth0): link status definitely down, disabling slave
Apr  1 18:40:17 Servbot kernel: device eth0 left promiscuous mode
Apr  1 18:40:17 Servbot kernel: bond0: now running without any active interface!
Apr  1 18:40:17 Servbot kernel: br0: port 1(bond0) entered disabled state
Apr  1 18:40:20 Servbot kernel: e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr  1 18:40:20 Servbot kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
Apr  1 18:40:20 Servbot kernel: bond0: (slave eth0): making interface the new active one
Apr  1 18:40:20 Servbot kernel: device eth0 entered promiscuous mode
Apr  1 18:40:20 Servbot kernel: bond0: active interface up!
Apr  1 18:40:20 Servbot kernel: br0: port 1(bond0) entered blocking state
Apr  1 18:40:20 Servbot kernel: br0: port 1(bond0) entered forwarding state

sage2050 · April 7, 2023

15 days of uptime. going on a trip for 4 days, I got a feeling it's going to go down as soon as I walk out the door.

sage2050 · May 26, 2023

I did some more messing around and after another short period of instability I feel fairly confident that my crashes were related to binding the two unused USB controllers that are included with my GPU. In my VM settings i set PCIe ACS override to "both" and was able to bind only the video and audio devices on the GPU for passthrough and I haven't seen any call traces in 10 days now.

During my previous stable period I wasn't binding any IOMMU groups.

Edited May 26, 2023 by sage2050

jzawacki · May 26, 2023

Hopefully that works for you. I don't even have Virtualization enabled.

sage2050 · May 26, 2023

do you have an external gpu installed?

jzawacki · May 26, 2023

It do, it's a:

08:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rage 3 [Rage XL PCI] (rev 27)

6.9.0/6.9.1 - Kernel Panic due to netfilter (nf_nat_setup_info) - Docker Static IP (macvlan)

User Feedback

Recommended Comments

bonienl 1764

Link to comment

Mr_Jay84 16

Link to comment

bonienl 1764

Link to comment

Mr_Jay84 16

Link to comment

Mr_Jay84 16

Link to comment

Mr_Jay84 16

Link to comment

Mr_Jay84 16

Link to comment

jzawacki 5

Link to comment

CorneliousJD 117

Link to comment

Shonky 1

Link to comment

jzawacki 5

Link to comment

jzawacki 5

Link to comment

CorneliousJD 117

Link to comment

sage2050 8

Link to comment

jzawacki 5

Link to comment

sage2050 8

Link to comment

CorneliousJD 117

Link to comment

jzawacki 5

Link to comment

sage2050 8

Link to comment

sage2050 8

Link to comment

sage2050 8

Link to comment

sage2050 8

Link to comment

jzawacki 5

Link to comment

sage2050 8

Link to comment

jzawacki 5

Link to comment

Join the conversation