tapodufeu Posted July 15, 2023 Share Posted July 15, 2023 Since > 6.12, my unraid server is quite unstable. every 48 hours, I have to reboot it. Everytime I see a kernel panic on network in syslog. For long time, I use docker with macvlan. I have red mutliples posts about macvlan kernel issues, so I tried to use ipvlan instead. I tried so many things, new custom networks etc.... but nothing works with ipvlan. Moreover, when I swith br0 to ipvlan, after a couple of minutes, my whole unraid server is not able to reach internet (but works locally). Just switching back to macvlan fix the issue. Please could you help me to diagnose my situation, I don't know where I should look first ? I have a swag server proxying a nextcloud, with macvlan it works like a charm, with ipvlan, host is unreachable despite I see all dockers runing fine and I can even ping dockers between them. thanks tower-diagnostics-20230715-2254.zip Quote Link to comment
tapodufeu Posted July 15, 2023 Author Share Posted July 15, 2023 and when I was writing this post, I just had a new kernel panic: does it help ? Jul 15 22:54:28 Tower kernel: ------------[ cut here ]------------ Jul 15 22:54:28 Tower kernel: WARNING: CPU: 9 PID: 7702 at net/netfilter/nf_nat_core.c:594 nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Jul 15 22:54:28 Tower kernel: Modules linked in: veth xt_nat xt_tcpudp macvlan xt_conntrack nf_conntrack_netlink nfnetlink xfrm_us er xfrm_algo xt_addrtype br_netfilter nvidia_uvm(PO) xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls nvidia_drm( PO) nvidia_modeset(PO) x86_pkg_temp_thermal intel_powerclamp coretemp si2157(O) kvm_intel si2168(O) nvidia(PO) kvm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 mei_hdcp mei_pxp aesni_intel tbsecp3(O) gx1133(O) tas2101(O) i2c_mux dvb_core(O) videobuf2_vmalloc(O) videobuf2_memops(O) videobuf2_common(O) wmi_bmof Jul 15 22:54:28 Tower kernel: crypto_simd cryptd rapl mei_me nvme i2c_i801 intel_cstate syscopyarea i2c_smbus mc(O) ahci sysfillre ct e1000e intel_uncore nvme_core sysimgblt mei i2c_core libahci fb_sys_fops thermal fan video tpm_crb tpm_tis wmi tpm_tis_core bac klight tpm intel_pmc_core button acpi_pad acpi_tad unix Jul 15 22:54:28 Tower kernel: CPU: 9 PID: 7702 Comm: kworker/u24:10 Tainted: P S W O 6.1.36-Unraid #1 Jul 15 22:54:28 Tower kernel: Hardware name: ASUS System Product Name/PRIME B560M-K, BIOS 1605 05/13/2022 Jul 15 22:54:28 Tower kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan] Jul 15 22:54:28 Tower kernel: RIP: 0010:nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Jul 15 22:54:28 Tower kernel: Code: a8 80 75 26 48 8d 73 58 48 8d 7c 24 20 e8 18 bb fd ff 48 8d 43 0c 4c 8b bb 88 00 00 00 48 89 4 4 24 18 eb 54 0f ba e0 08 73 07 <0f> 0b e9 75 06 00 00 48 8d 73 58 48 8d 7c 24 20 e8 eb ba fd ff 48 Jul 15 22:54:28 Tower kernel: RSP: 0018:ffffc9000030cc78 EFLAGS: 00010282 Jul 15 22:54:28 Tower kernel: RAX: 0000000000000180 RBX: ffff88818325ea00 RCX: ffff888104c26780 Jul 15 22:54:28 Tower kernel: RDX: 0000000000000000 RSI: ffffc9000030cd5c RDI: ffff88818325ea00 Jul 15 22:54:28 Tower kernel: RBP: ffffc9000030cd40 R08: 00000000870aa8c0 R09: 0000000000000000 Jul 15 22:54:28 Tower kernel: R10: 0000000000000158 R11: 0000000000000000 R12: ffffc9000030cd5c Jul 15 22:54:28 Tower kernel: R13: 0000000000000000 R14: ffffc9000030ce40 R15: 0000000000000001 Jul 15 22:54:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff888255c40000(0000) knlGS:0000000000000000 Jul 15 22:54:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 15 22:54:28 Tower kernel: CR2: 0000147e36709840 CR3: 000000000420a005 CR4: 00000000003706e0 Jul 15 22:54:28 Tower kernel: Call Trace: Jul 15 22:54:28 Tower kernel Quote Link to comment
JorgeB Posted July 16, 2023 Share Posted July 16, 2023 Call traces are about macvlan, so likely that's what's crashing the server. Quote Link to comment
Kilrah Posted July 16, 2023 Share Posted July 16, 2023 (edited) Would you by any chance have made some of your own custom macvlan networks? Those would need to go too, not just the default br0. Edited July 16, 2023 by Kilrah Quote Link to comment
tapodufeu Posted July 16, 2023 Author Share Posted July 16, 2023 Hi Kilrah, I tried multiple times new ipvlan or macvlan custom networks. Everytime with ipvlan, it just does not work (despite it looks like it works), and I lose internet connectivity on mu unraid server. With macvlan it works, but with br0 or ay custom networks, I get kernel panics every 48 hours. Do I have to create custom routings or port fowarding with ipvlan to make it works ? Macvaln works out of the box. When I read posts, it look like ipvaln should work too as easely as macvlan. In my case, it does not. For example, my nextcloud in macvlan is up and reachable..... with ipvlan, exactly the same configuration, the docker is up but no traffic in. Do i need to create a custom networks with specific parameters ? what do you recommend ? Quote Link to comment
Kilrah Posted July 16, 2023 Share Posted July 16, 2023 (edited) 21 minutes ago, tapodufeu said: For example, my nextcloud in macvlan is up and reachable You should not even need macvlan or ipvlan in most cases, nextcloud certainly doesn't need it in a standard setup. Having dedicated IPs per container is heavily discouraged apart from the couple of rare services that absolutely need it. 21 minutes ago, tapodufeu said: Do i need to create a custom networks with specific parameters ? what do you recommend ? For me it was simply docker network create -d ipvlan --subnet=192.168.0.0/24 --gateway=192.168.0.1 -o parent=br0 lan_ipvlan Edited July 16, 2023 by Kilrah Quote Link to comment
jmshrtn Posted July 17, 2023 Share Posted July 17, 2023 (edited) I have a similar backtrace on my configuration also: Jul 18 09:36:01 Arthur kernel: ------------[ cut here ]------------ Jul 18 09:36:01 Arthur kernel: WARNING: CPU: 0 PID: 437 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: Modules linked in: udp_diag veth xt_nat xt_tcpudp macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl_msr mei_hdcp mei_pxp wmi_bmof i915 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_kms_helper aesni_intel btusb btrtl btbcm btintel crypto_simd cryptd rapl intel_cstate e1000e intel_uncore drm bluetooth i2c_i801 i2c_smbus nvme mei_me intel_gtt video ecdh_generic ahci agpgart nvme_core i2c_core ecc mei intel_pch_thermal syscopyarea sysfillrect Jul 18 09:36:01 Arthur kernel: libahci sysimgblt fb_sys_fops thermal fan wmi backlight intel_pmc_core acpi_pad button unix Jul 18 09:36:01 Arthur kernel: CPU: 0 PID: 437 Comm: kworker/u12:6 Tainted: P O 6.1.38-Unraid #2 Jul 18 09:36:01 Arthur kernel: Hardware name: ASUSTeK COMPUTER INC. VC65-C1/VC65-C1, BIOS 0602 08/09/2018 Jul 18 09:36:01 Arthur kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan] Jul 18 09:36:01 Arthur kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01 Jul 18 09:36:01 Arthur kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202 Jul 18 09:36:01 Arthur kernel: RAX: 0000000000000001 RBX: ffff8882138e9600 RCX: d35ba3b4373dc17c Jul 18 09:36:01 Arthur kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8882138e9600 Jul 18 09:36:01 Arthur kernel: RBP: 0000000000000001 R08: 87ace0eed9699699 R09: 175ed443f1bc65da Jul 18 09:36:01 Arthur kernel: R10: d490a8eaa63e3d03 R11: ffffc90000003d60 R12: ffffffff82a11d00 Jul 18 09:36:01 Arthur kernel: R13: 0000000000025735 R14: ffff8881035c9800 R15: 0000000000000000 Jul 18 09:36:01 Arthur kernel: FS: 0000000000000000(0000) GS:ffff88845dc00000(0000) knlGS:0000000000000000 Jul 18 09:36:01 Arthur kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 18 09:36:01 Arthur kernel: CR2: 000014e8890eeca0 CR3: 000000036e9b4002 CR4: 00000000003706f0 Jul 18 09:36:01 Arthur kernel: Call Trace: Jul 18 09:36:01 Arthur kernel: <IRQ> Jul 18 09:36:01 Arthur kernel: ? __warn+0xab/0x122 Jul 18 09:36:01 Arthur kernel: ? report_bug+0x109/0x17e Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: ? handle_bug+0x41/0x6f Jul 18 09:36:01 Arthur kernel: ? exc_invalid_op+0x13/0x60 Jul 18 09:36:01 Arthur kernel: ? asm_exc_invalid_op+0x16/0x20 Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat] Jul 18 09:36:01 Arthur kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack] Jul 18 09:36:01 Arthur kernel: nf_hook_slow+0x3a/0x96 Jul 18 09:36:01 Arthur kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Jul 18 09:36:01 Arthur kernel: NF_HOOK.constprop.0+0x79/0xd9 Jul 18 09:36:01 Arthur kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Jul 18 09:36:01 Arthur kernel: __netif_receive_skb_one_core+0x77/0x9c Jul 18 09:36:01 Arthur kernel: process_backlog+0x8c/0x116 Jul 18 09:36:01 Arthur kernel: __napi_poll.constprop.0+0x28/0x124 Jul 18 09:36:01 Arthur kernel: net_rx_action+0x159/0x24f Jul 18 09:36:01 Arthur kernel: __do_softirq+0x126/0x288 Jul 18 09:36:01 Arthur kernel: do_softirq+0x7f/0xab Jul 18 09:36:01 Arthur kernel: </IRQ> Jul 18 09:36:01 Arthur kernel: <TASK> Jul 18 09:36:01 Arthur kernel: __local_bh_enable_ip+0x4c/0x6b Jul 18 09:36:01 Arthur kernel: netif_rx+0x52/0x5a Jul 18 09:36:01 Arthur kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Jul 18 09:36:01 Arthur kernel: ? _raw_spin_unlock+0x14/0x29 Jul 18 09:36:01 Arthur kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Jul 18 09:36:01 Arthur kernel: process_one_work+0x1a8/0x295 Jul 18 09:36:01 Arthur kernel: worker_thread+0x18b/0x244 Jul 18 09:36:01 Arthur kernel: ? rescuer_thread+0x281/0x281 Jul 18 09:36:01 Arthur kernel: kthread+0xe4/0xef Jul 18 09:36:01 Arthur kernel: ? kthread_complete_and_exit+0x1b/0x1b Jul 18 09:36:01 Arthur kernel: ret_from_fork+0x1f/0x30 Jul 18 09:36:01 Arthur kernel: </TASK> Jul 18 09:36:01 Arthur kernel: ---[ end trace 0000000000000000 ]--- I have a bunch of docker containers running in a separate VLAN from the main system, so I am using macvlan to support that. Edited July 19, 2023 by jmshrtn removing diags Quote Link to comment
JorgeB Posted July 18, 2023 Share Posted July 18, 2023 10 hours ago, jmshrtn said: so I am using macvlan to support that. You can change to ipvlan. Quote Link to comment
tapodufeu Posted July 22, 2023 Author Share Posted July 22, 2023 I have moved all dockers connected to internet on the default br0 using macvlan, it looks like my server has no more kernel panic. I just have a week of analysis... will see after holidays. Kernrl panics happened often when I used more than 1 docker network (host and bridge do not count). using ipvlan on br0 just does not work in my case. I don't know why. Quote Link to comment
tapodufeu Posted August 6, 2023 Author Share Posted August 6, 2023 argh after 2 weeks... a new kernel panic... I had to reboot my server and in 2 hours (just using plex) another kernel panic. it is really a nightmare. Please help. How can I downgrade to 6.9 ? Quote Link to comment
tapodufeu Posted August 7, 2023 Author Share Posted August 7, 2023 (edited) I have reconfig all my dockers to use only bridge and host network.... and deleted br0. After few readings on internet, I found the issue is maybe caused by my network card (embed on motherboard) Intel® I219-V 1Gb Ethernet. I also saw few posts about the same issue with some broadcom network chipset. I had no issue prior 6.10. Maybe a kernel update ? anyone knows ? Edited August 9, 2023 by tapodufeu Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.