• [6.12] Networking not working correctly after upgrading to 6.12 and changing docker to ipvlan


    fireplex
    • Solved

    Hi,

     

    Just upgraded to 6.12 from 6.11.5 and noticed I was getting repeating kernel macvlan errors reported in syslog so changed docker network type to ipvlan as recommended. Note that apart from these kernel errors everything seemed OK and networking was OK at 6.12 prior to docker change.

     

    Changing docker to ipvlan results in unRAID being unable to check dockers or plugins for updates, under status it reports "not available". I also seem to lose remote access to my swag docker and Plex docker seems unable to resolve its IP.

     

    Diagnostics attached, any ideas please?

     

    Thanks!

    tower-diagnostics-20230618-2206.zip




    User Feedback

    Recommended Comments



    OK, so think I figured this out.

     

    My unRAID networking was set to IPv4+IPv6 due to SMB related networking issues I had in the past.

     

    Changing the network type to IPv4 seems to fix this issue.

     

    I tried this due to a comment I saw about IPvlan use "Finally, auto-configured EUI-64 IPv6 type addresses are based on MAC type addresses. Therefore, all the Virtual Machines or containers that share the same parent interface, will auto generate the same IPv6 address. We advise the user to make sure that all the Virtual Machines or containers use static IPv6 addresses or IPv6 privacy addresses with SLAAC disabled."

     

    Whether the above is root cause I have no idea but IPv4 is working so far with IPvlan.

     

    Whether my SMB related network issues now return I will have to see....

    Link to comment
    14 minutes ago, dwells said:

    Still having same issue and I was already set to ipv4

    May be a slightly different issue from mine then, worthwhile starting your own thread with diagnostics

    Edited by fireplex
    Link to comment

    switching docker back to macvlan seems to keep stuff running for me, at least for now.

    Edited by dwells
    Link to comment

    image.png

     

    You can resolve bbc.co.uk, pls try ping the router 192.168.1.254.

     

    If ping bbc.co.uk still fail, try reboot router.

    Or try disable docker "host access", because there may be "shim" network + your router + 6.12 have issue.

    Link to comment

    Just done another reboot on unRAID, everything worked fine for a period of time then after approx. 40 minutes unRAID seems to lose external access.

     

    image.png.e26a35f093131d627f79ff8e19ab2a0c.png

    Link to comment

    I found the fix to the 40 minute losing external connection issue on 6.12.0:

     

    I set "Host access to custom networks" to disabled and since then the network has been fine.

     

    I will install 6.12.1 anyway

    • Like 1
    Link to comment

    My main server is configured with "ipvlan" and "host access enabled"

    I have no issues to reach the Internet, but it appears certain home routers do not properly support the network functionality which is used by ipvlan and host access.

     

    One solution to try: Instead of using DHCP set a static IP address on the server, this will leave the home router out of the picture for said IP address.

     

     

    image.png

     

    image.png

    Link to comment

    Hi, yes my unRAID server has a static IP address already.

     

    Would be nice to know what "features" the home router specifically needs to support ipvlan, I've had a dig around and can't find any specific feature/support that it needs?

     

    I am running OpenWRT on my router so it's pretty flexible.

    Link to comment

    ipvlan uses a shared mac address for multiple ip addresses.

    This means the router must accept local ip addresses which may have the same. mac address.

     

    host access makes use of so called “more specific routing”, this means the router must be able to handle multiple routes to the same destination.

     

    • Like 1
    Link to comment

    Jummm, that's weird; as I said in the other post, I experienced the same error with the same resolution. My router is an Asus with Asuswrt-Merlin fully updated, and my 2, 6.12 Unraid servers are using static IP addresses; the other one with 6.11.5 it's working just fine. In my case I don't use the ipvlan feature; none of my dockers receive exclusive IP addresses. Over the weekeend, I will try to replicate the issue and provide diagnostics.

    Edited by Iker
    Link to comment

    I'm not using docker exclusive IPs as far as I know?

     

    image.png.7ed4c4e57b5ccf0c8004325dae8ef2d0.png

     

    root@Tower:~# docker network list
    NETWORK ID     NAME       DRIVER    SCOPE
    76bd4672c776   br0        ipvlan    local
    f16d1d53f08a   bridge     bridge    local
    9631fda6a856   host       host      local
    dbc961e617d5   none       null      local
    51e76c81f980   proxynet   bridge    local
    08dbbc14524f   wg0        bridge    local

     

    Link to comment

    I am having the same issue with 6.12.1, I was using macvlan but changed it to ipvlan since I had kernel errors and recommended to change docker to use macvlan.

     

    Server has a static IPv4, IPv6 disabled, after reboot everything work well for a while, than server can't access the network (pings from a console to internal network fails), routing looks valid. Server is accessible from then network.

     

    Diagnostics attached

     

    tower-diagnostics-20230626-0049.zip

    Link to comment
    16 minutes ago, thecode said:

    I am having the same issue with 6.12.1, I was using macvlan but changed it to ipvlan since I had kernel errors and recommended to change docker to use macvlan.

     

    Server has a static IPv4, IPv6 disabled, after reboot everything work well for a while, than server can't access the network (pings from a console to internal network fails), routing looks valid. Server is accessible from then network.

     

    Diagnostics attached

     

    tower-diagnostics-20230626-0049.zip


    Update: Just for testing I have stopped the array to change docker network type to macvlan, stopping the array (which also stops docker services) is enough to get back network access without reboot.

    with macvlan the kernel errors are back:

    Jun 25 14:59:06 Tower kernel: ------------[ cut here ]------------
    Jun 25 14:59:06 Tower kernel: WARNING: CPU: 15 PID: 0 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm macvlan md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp veth ipvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb wmi_bmof amd64_edac edac_mce_amd edac_core ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif nvme sha512_ssse3 backlight aesni_intel crypto_simd cryptd rapl agpgart k10temp i2c_piix4 nvme_core i2c_algo_bit syscopyarea joydev i2c_core input_leds ahci
    Jun 25 14:59:06 Tower kernel: sysfillrect sysimgblt led_class fb_sys_fops libahci acpi_ipmi wmi ipmi_si button acpi_cpufreq unix [last unloaded: md_mod]
    Jun 25 14:59:06 Tower kernel: CPU: 15 PID: 0 Comm: swapper/15 Tainted: P           O       6.1.34-Unraid #1
    Jun 25 14:59:06 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P4.20 04/14/2021
    Jun 25 14:59:06 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
    Jun 25 14:59:06 Tower kernel: RSP: 0018:ffffc900004ec838 EFLAGS: 00010202
    Jun 25 14:59:06 Tower kernel: RAX: 0000000000000001 RBX: ffff888151b68400 RCX: 1fd18afb63311d0b
    Jun 25 14:59:06 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888151b68400
    Jun 25 14:59:06 Tower kernel: RBP: 0000000000000001 R08: e8344bd8ece47fd3 R09: 3d43667ad226f28c
    Jun 25 14:59:06 Tower kernel: R10: 00afe721eab120e0 R11: ffffc900004ec800 R12: ffffffff82a11440
    Jun 25 14:59:06 Tower kernel: R13: 000000000000b05e R14: ffff88817f02cf00 R15: 0000000000000000
    Jun 25 14:59:06 Tower kernel: FS:  0000000000000000(0000) GS:ffff889fbebc0000(0000) knlGS:0000000000000000
    Jun 25 14:59:06 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 25 14:59:06 Tower kernel: CR2: 0000000000537d30 CR3: 0000000151966000 CR4: 0000000000350ee0
    Jun 25 14:59:06 Tower kernel: Call Trace:
    Jun 25 14:59:06 Tower kernel: <IRQ>
    Jun 25 14:59:06 Tower kernel: ? __warn+0xab/0x122
    Jun 25 14:59:06 Tower kernel: ? report_bug+0x109/0x17e
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? handle_bug+0x41/0x6f
    Jun 25 14:59:06 Tower kernel: ? exc_invalid_op+0x13/0x60
    Jun 25 14:59:06 Tower kernel: ? asm_exc_invalid_op+0x16/0x20
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? nf_nat_inet_fn+0xc0/0x1a8 [nf_nat]
    Jun 25 14:59:06 Tower kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: nf_hook_slow+0x3d/0x96
    Jun 25 14:59:06 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Jun 25 14:59:06 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
    Jun 25 14:59:06 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Jun 25 14:59:06 Tower kernel: ip_sabotage_in+0x52/0x60 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: nf_hook_slow+0x3d/0x96
    Jun 25 14:59:06 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
    Jun 25 14:59:06 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
    Jun 25 14:59:06 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_one_core+0x77/0x9c
    Jun 25 14:59:06 Tower kernel: netif_receive_skb+0xbf/0x127
    Jun 25 14:59:06 Tower kernel: br_handle_frame_finish+0x438/0x472 [bridge]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: br_handle_frame+0x27a/0x2e0 [bridge]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_core.constprop.0+0x4fd/0x6e9
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_list_core+0x8a/0x11e
    Jun 25 14:59:06 Tower kernel: netif_receive_skb_list_internal+0x1d2/0x20b
    Jun 25 14:59:06 Tower kernel: gro_normal_list+0x1d/0x3f
    Jun 25 14:59:06 Tower kernel: napi_complete_done+0x7b/0x11a
    Jun 25 14:59:06 Tower kernel: igb_poll+0xd88/0xf8e [igb]
    Jun 25 14:59:06 Tower kernel: __napi_poll.constprop.0+0x2b/0x124
    Jun 25 14:59:06 Tower kernel: net_rx_action+0x159/0x24f
    Jun 25 14:59:06 Tower kernel: __do_softirq+0x129/0x288
    Jun 25 14:59:06 Tower kernel: __irq_exit_rcu+0x5e/0xb8
    Jun 25 14:59:06 Tower kernel: common_interrupt+0x9b/0xc1
    Jun 25 14:59:06 Tower kernel: </IRQ>
    Jun 25 14:59:06 Tower kernel: <TASK>
    Jun 25 14:59:06 Tower kernel: asm_common_interrupt+0x22/0x40
    Jun 25 14:59:06 Tower kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202
    Jun 25 14:59:06 Tower kernel: Code: 17 39 a0 ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 39 f8 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29
    Jun 25 14:59:06 Tower kernel: RSP: 0018:ffffc900001cfe98 EFLAGS: 00000246
    Jun 25 14:59:06 Tower kernel: RAX: ffff889fbebc0000 RBX: ffff8881086d4c00 RCX: 0000000000000000
    Jun 25 14:59:06 Tower kernel: RDX: 00004f3fbe4d29c7 RSI: ffffffff8209093c RDI: ffffffff82090e45
    Jun 25 14:59:06 Tower kernel: RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000002
    Jun 25 14:59:06 Tower kernel: R10: 0000000000000020 R11: 000000000000afc8 R12: 0000000000000002
    Jun 25 14:59:06 Tower kernel: R13: ffffffff823235a0 R14: 00004f3fbe4d29c7 R15: 0000000000000000
    Jun 25 14:59:06 Tower kernel: ? cpuidle_enter_state+0xf7/0x202
    Jun 25 14:59:06 Tower kernel: cpuidle_enter+0x2a/0x38
    Jun 25 14:59:06 Tower kernel: do_idle+0x18d/0x1fb
    Jun 25 14:59:06 Tower kernel: cpu_startup_entry+0x1d/0x1f
    Jun 25 14:59:06 Tower kernel: start_secondary+0xeb/0xeb
    Jun 25 14:59:06 Tower kernel: secondary_startup_64_no_verify+0xce/0xdb
    Jun 25 14:59:06 Tower kernel: </TASK>
    Jun 25 14:59:06 Tower kernel: ---[ end trace 0000000000000000 ]---

     

    Edited by thecode
    Link to comment

    Something to add, as I'm having similar problems - I noticed that even though my server and docker config are both set to "IPv4 only" I had some IPv6 routes in my route table (settings --> network settings --> bottom of the page)

    Turns out one of the dockers I recently installed is creating an IPv6 interface regardless of setting. 

     

    Annoyingly, it's actually rather difficult to forcibly remove. I was going to make sure it was still gone just now and it's back, so yea, worth double checking IPv6 is ACTUALLY off.

    Link to comment

    Please retest in 6.12.3-rc3, it includes several changes to networking that should help:

    https://forums.unraid.net/bug-reports/prereleases/unraid-os-version-6123-rc3-available-r2572/

    If there are still issues please restate the problem as it exists in 6.12.3-rc3 and provide fresh diagnostics. Thanks!

     

     

    Keep in mind that it is very difficult to resolve multiple problems in one thread/topic. Also, there is nothing actionable in a post that says "Same" without providing diagnostics. (These are just general statements, not calling anyone out specifically)

    • Upvote 1
    Link to comment

    Looks like your router gets confused by the shim-br0 interface.

    Test again with host acces to custom network disabled.

     

    • Like 1
    Link to comment

    Did you modify your "network.cfg" file manually?

    It is missing the last entry, which should be

    SYSNICS="1"

     

    Tip: since you are using a single interface, you can disable bonding.

    • Like 1
    Link to comment

    One other thing.

    Do you change interface settings using the "Tips and Tweaks" plugin?

    If yes, disable this (or temporary uninstall the plugin) and retest.

     

    Link to comment
    53 minutes ago, bonienl said:

    Looks like your router gets confused by the shim-br0 interface.

    Test again with host acces to custom network disabled.

     

    Disabling host access to custom networks does solve the issue but create another issue which I haven't found a solution  yet (I have 30 dockers on a custom network and 2 on br0).

     

    37 minutes ago, bonienl said:

    Did you modify your "network.cfg" file manually?

    It is missing the last entry, which should be

    SYSNICS="1"

    No I did not modify it, this is interesting since I have 3 servers and all of them are missing it. What is the meaning of this parameter?


     

    40 minutes ago, bonienl said:

    Tip: since you are using a single interface, you can disable bonding.

    The 2nd interface was used to make a direct connection between 2 unraid servers. My setup has two servers which act as Active/Passive so if one falls I can switch to the other one. Due to the network issues I physically disconnected the link between them and let the passive server take control since it doesn't suffer from the macvlan issue.
    I will try to disable bonding.


     

    22 minutes ago, bonienl said:

    One other thing.

    Do you change interface settings using the "Tips and Tweaks" plugin?

    If yes, disable this (or temporary uninstall the plugin) and retest.

     

    That was my first guess when starting to debug this issue, to isolate it I have uninstalled the plugin, tested that there is no effect and installed it again only changing the cpu governor to powersave without any other changes. Looking at the config file everything else there is set to "default"

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.