Docker macvlan kernel panic - ipvlan fix does not work


Recommended Posts

Since > 6.12, my unraid server is quite unstable. every 48 hours, I have to reboot it.

 

Everytime I see a kernel panic on network in syslog.

For long time, I use docker with macvlan. I have red mutliples posts about macvlan kernel issues, so I tried to use ipvlan instead.

 

I tried so many things, new custom networks etc.... but nothing works with ipvlan.

Moreover, when I swith br0 to ipvlan, after a couple of minutes, my whole unraid server is not able to reach internet (but works locally).

Just switching back to macvlan fix the issue.

 

Please could you help me to diagnose my situation, I don't know where I should look first ?

I have a swag server proxying a nextcloud, with macvlan it works like a charm, with ipvlan, host is unreachable despite I see all dockers runing fine and I can even ping dockers between them.

 

 

thanks

 

tower-diagnostics-20230715-2254.zip

Link to comment

and when I was writing this post, I just had a new kernel panic:

 

does it help ?


Jul 15 22:54:28 Tower kernel: ------------[ cut here ]------------
Jul 15 22:54:28 Tower kernel: WARNING: CPU: 9 PID: 7702 at net/netfilter/nf_nat_core.c:594 nf_nat_setup_info+0x8c/0x7d1 [nf_nat]
Jul 15 22:54:28 Tower kernel: Modules linked in: veth xt_nat xt_tcpudp macvlan xt_conntrack nf_conntrack_netlink nfnetlink xfrm_us
er xfrm_algo xt_addrtype br_netfilter nvidia_uvm(PO) xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO)
 znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6
 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel
 udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls nvidia_drm(
PO) nvidia_modeset(PO) x86_pkg_temp_thermal intel_powerclamp coretemp si2157(O) kvm_intel si2168(O) nvidia(PO) kvm drm_kms_helper
drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 mei_hdcp mei_pxp aesni_intel tbsecp3(O) gx1133(O)
tas2101(O) i2c_mux dvb_core(O) videobuf2_vmalloc(O) videobuf2_memops(O) videobuf2_common(O) wmi_bmof
Jul 15 22:54:28 Tower kernel: crypto_simd cryptd rapl mei_me nvme i2c_i801 intel_cstate syscopyarea i2c_smbus mc(O) ahci sysfillre
ct e1000e intel_uncore nvme_core sysimgblt mei i2c_core libahci fb_sys_fops thermal fan video tpm_crb tpm_tis wmi tpm_tis_core bac
klight tpm intel_pmc_core button acpi_pad acpi_tad unix
Jul 15 22:54:28 Tower kernel: CPU: 9 PID: 7702 Comm: kworker/u24:10 Tainted: P S      W  O       6.1.36-Unraid #1
Jul 15 22:54:28 Tower kernel: Hardware name: ASUS System Product Name/PRIME B560M-K, BIOS 1605 05/13/2022
Jul 15 22:54:28 Tower kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Jul 15 22:54:28 Tower kernel: RIP: 0010:nf_nat_setup_info+0x8c/0x7d1 [nf_nat]
Jul 15 22:54:28 Tower kernel: Code: a8 80 75 26 48 8d 73 58 48 8d 7c 24 20 e8 18 bb fd ff 48 8d 43 0c 4c 8b bb 88 00 00 00 48 89 4
4 24 18 eb 54 0f ba e0 08 73 07 <0f> 0b e9 75 06 00 00 48 8d 73 58 48 8d 7c 24 20 e8 eb ba fd ff 48
Jul 15 22:54:28 Tower kernel: RSP: 0018:ffffc9000030cc78 EFLAGS: 00010282
Jul 15 22:54:28 Tower kernel: RAX: 0000000000000180 RBX: ffff88818325ea00 RCX: ffff888104c26780
Jul 15 22:54:28 Tower kernel: RDX: 0000000000000000 RSI: ffffc9000030cd5c RDI: ffff88818325ea00
Jul 15 22:54:28 Tower kernel: RBP: ffffc9000030cd40 R08: 00000000870aa8c0 R09: 0000000000000000
Jul 15 22:54:28 Tower kernel: R10: 0000000000000158 R11: 0000000000000000 R12: ffffc9000030cd5c
Jul 15 22:54:28 Tower kernel: R13: 0000000000000000 R14: ffffc9000030ce40 R15: 0000000000000001
Jul 15 22:54:28 Tower kernel: FS:  0000000000000000(0000) GS:ffff888255c40000(0000) knlGS:0000000000000000
Jul 15 22:54:28 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 15 22:54:28 Tower kernel: CR2: 0000147e36709840 CR3: 000000000420a005 CR4: 00000000003706e0
Jul 15 22:54:28 Tower kernel: Call Trace:
Jul 15 22:54:28 Tower kernel

Link to comment

Hi Kilrah,

 

I tried multiple times new ipvlan or macvlan custom networks. Everytime with ipvlan, it just does not work (despite it looks like it works), and I lose internet connectivity on mu unraid server. With macvlan it works, but with br0 or ay custom networks, I get kernel panics every 48 hours.

 

Do I have to create custom routings or port fowarding with ipvlan  to make it works ?

 

Macvaln works out of the box. When I read posts, it look like ipvaln should work too as easely as macvlan. In my case, it does not.

 

For example, my nextcloud in macvlan is up and reachable..... with ipvlan, exactly the same configuration, the docker is up but no traffic in.

 

Do i need to create a custom networks with specific parameters ? what do you recommend ?

 

 

 

 

Link to comment
21 minutes ago, tapodufeu said:

For example, my nextcloud in macvlan is up and reachable

You should not even need macvlan or ipvlan in most cases, nextcloud certainly doesn't need it in a standard setup. Having dedicated IPs per container is heavily discouraged apart from the couple of rare services that absolutely need it. 

 

21 minutes ago, tapodufeu said:

Do i need to create a custom networks with specific parameters ? what do you recommend ?

For me it was simply 

docker network create -d ipvlan --subnet=192.168.0.0/24 --gateway=192.168.0.1 -o parent=br0 lan_ipvlan

 

Edited by Kilrah
Link to comment

I have a similar backtrace on my configuration also:

 

Jul 18 09:36:01 Arthur kernel: ------------[ cut here ]------------
Jul 18 09:36:01 Arthur kernel: WARNING: CPU: 0 PID: 437 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: Modules linked in: udp_diag veth xt_nat xt_tcpudp macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl_msr mei_hdcp mei_pxp wmi_bmof i915 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_kms_helper aesni_intel btusb btrtl btbcm btintel crypto_simd cryptd rapl intel_cstate e1000e intel_uncore drm bluetooth i2c_i801 i2c_smbus nvme mei_me intel_gtt video ecdh_generic ahci agpgart nvme_core i2c_core ecc mei intel_pch_thermal syscopyarea sysfillrect
Jul 18 09:36:01 Arthur kernel: libahci sysimgblt fb_sys_fops thermal fan wmi backlight intel_pmc_core acpi_pad button unix
Jul 18 09:36:01 Arthur kernel: CPU: 0 PID: 437 Comm: kworker/u12:6 Tainted: P           O       6.1.38-Unraid #2
Jul 18 09:36:01 Arthur kernel: Hardware name: ASUSTeK COMPUTER INC. VC65-C1/VC65-C1, BIOS 0602 08/09/2018
Jul 18 09:36:01 Arthur kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Jul 18 09:36:01 Arthur kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Jul 18 09:36:01 Arthur kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202
Jul 18 09:36:01 Arthur kernel: RAX: 0000000000000001 RBX: ffff8882138e9600 RCX: d35ba3b4373dc17c
Jul 18 09:36:01 Arthur kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8882138e9600
Jul 18 09:36:01 Arthur kernel: RBP: 0000000000000001 R08: 87ace0eed9699699 R09: 175ed443f1bc65da
Jul 18 09:36:01 Arthur kernel: R10: d490a8eaa63e3d03 R11: ffffc90000003d60 R12: ffffffff82a11d00
Jul 18 09:36:01 Arthur kernel: R13: 0000000000025735 R14: ffff8881035c9800 R15: 0000000000000000
Jul 18 09:36:01 Arthur kernel: FS:  0000000000000000(0000) GS:ffff88845dc00000(0000) knlGS:0000000000000000
Jul 18 09:36:01 Arthur kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 09:36:01 Arthur kernel: CR2: 000014e8890eeca0 CR3: 000000036e9b4002 CR4: 00000000003706f0
Jul 18 09:36:01 Arthur kernel: Call Trace:
Jul 18 09:36:01 Arthur kernel: <IRQ>
Jul 18 09:36:01 Arthur kernel: ? __warn+0xab/0x122
Jul 18 09:36:01 Arthur kernel: ? report_bug+0x109/0x17e
Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: ? handle_bug+0x41/0x6f
Jul 18 09:36:01 Arthur kernel: ? exc_invalid_op+0x13/0x60
Jul 18 09:36:01 Arthur kernel: ? asm_exc_invalid_op+0x16/0x20
Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]
Jul 18 09:36:01 Arthur kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Jul 18 09:36:01 Arthur kernel: nf_hook_slow+0x3a/0x96
Jul 18 09:36:01 Arthur kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Jul 18 09:36:01 Arthur kernel: NF_HOOK.constprop.0+0x79/0xd9
Jul 18 09:36:01 Arthur kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Jul 18 09:36:01 Arthur kernel: __netif_receive_skb_one_core+0x77/0x9c
Jul 18 09:36:01 Arthur kernel: process_backlog+0x8c/0x116
Jul 18 09:36:01 Arthur kernel: __napi_poll.constprop.0+0x28/0x124
Jul 18 09:36:01 Arthur kernel: net_rx_action+0x159/0x24f
Jul 18 09:36:01 Arthur kernel: __do_softirq+0x126/0x288
Jul 18 09:36:01 Arthur kernel: do_softirq+0x7f/0xab
Jul 18 09:36:01 Arthur kernel: </IRQ>
Jul 18 09:36:01 Arthur kernel: <TASK>
Jul 18 09:36:01 Arthur kernel: __local_bh_enable_ip+0x4c/0x6b
Jul 18 09:36:01 Arthur kernel: netif_rx+0x52/0x5a
Jul 18 09:36:01 Arthur kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jul 18 09:36:01 Arthur kernel: ? _raw_spin_unlock+0x14/0x29
Jul 18 09:36:01 Arthur kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Jul 18 09:36:01 Arthur kernel: process_one_work+0x1a8/0x295
Jul 18 09:36:01 Arthur kernel: worker_thread+0x18b/0x244
Jul 18 09:36:01 Arthur kernel: ? rescuer_thread+0x281/0x281
Jul 18 09:36:01 Arthur kernel: kthread+0xe4/0xef
Jul 18 09:36:01 Arthur kernel: ? kthread_complete_and_exit+0x1b/0x1b
Jul 18 09:36:01 Arthur kernel: ret_from_fork+0x1f/0x30
Jul 18 09:36:01 Arthur kernel: </TASK>
Jul 18 09:36:01 Arthur kernel: ---[ end trace 0000000000000000 ]---

 

I have a bunch of docker containers running in a separate VLAN from the main system, so I am using macvlan to support that.

 

Edited by jmshrtn
removing diags
Link to comment

I have moved all dockers connected to internet on the default br0 using macvlan, it looks like my server has no more kernel panic.

I just have a week of analysis... will see after holidays.

Kernrl panics happened often when I used more than 1 docker network (host and bridge do not count).

using ipvlan on br0 just does not work in my case. I don't know why. 

 

Link to comment
  • 2 weeks later...

I have reconfig all my dockers to use only bridge and host network.... and deleted br0.

 

After few readings on internet, I found the issue is maybe caused by my network card (embed on motherboard) Intel® I219-V 1Gb Ethernet. 

I also saw few posts about the same issue with some broadcom network chipset.

 

I had no issue prior 6.10. Maybe a kernel update ? anyone knows ?

Edited by tapodufeu
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.