• [6.8.3] Docker Containers on VLANs with Fixed IP's kernel panic


    MrGeeza
    • Minor

    I have been having a problem for many weeks now and its taken a very long time to figure out whats going on. I've found that Docker containers that have been assigned to specific VLANS and have static IP addresses assigned to them cause kernel panics. The configuration will work for a few hours but overtime the system becomes unresponsive and eventually the entire server hangs requiring a hard power down.

     

    Currently I have only tried this with Plex and Lets Encrypt after setting up a reverse proxy. You can see my Plex configuration attached. Plex is assigned to VLAN10 with a custom IP address of 192.168.10.11:

     

    691840128_ScreenShot2020-08-10at09_38_34.thumb.png.0e3f40de6bb68e9bfd98e0d0987f21a0.png 

     

    To test my theory I have booted my server up normally (NOT SAFE MODE) and stopped all plugins which have specific VLANS and Fixed IP addresses and the server has been up now over a week with no problems.

     

    My server network configuration

    1 x Double port NIC

    Running in bonded mode

    2005749304_ScreenShot2020-08-10at09_45_34.thumb.png.cb78dc940c31d2c40a29e9c905514011.png

    51161820_ScreenShot2020-08-10at09_45_44.thumb.png.a770494fb243c2274f640a98ff7d5fc6.png

     

    rex-diagnostics-20200810-0949.zip




    User Feedback

    Recommended Comments

    I've run into this issue after adding a Mellanox x-2 10gbe card. I've never seen this issue with the built in Intel Corporation Ethernet Connection (7) I219-LM (rev 10). Adding the mellanox card,  Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0) to the bridge and unplugging the Intel card will after a while result in kernel panics. I have 2 VLANs and split dockers between vlans and the untagged lan. From the panics I've had cache pools issues where the cachepool gets issues with writes and corrupts the pool, this results in a freeze where no output to screen and loss of network. 

    Link to comment

    ...i am running multiple dockers on different VLANs (br0.xx), where the central bridge (br0) is also a bonded Interface of a Quad i-350-T (mode 4 / 802.3ad) without any trouble...uptime has been +1yr without a single glitch.

     

    Edit: the only difference I can spot in your config is, that not only do I specify the IP of the docker, but also each VLAN-Bridge interface of unraid is fully configured with IP and Gateway, like for br0.10: IP 192.168.10.25/24 gw 192.168.10.1

    All network gateways are set-up at my main router (linked via a 10G uplink to the switch, where the bonding interface terminates).

    Edited by Ford Prefect
    Link to comment

    Here is the callstack from my setup, note that this only happens with one of the ports. I have not changed any other settings other than what port br0 is using.

     

    Nov  6 10:14:04 Tower kernel: WARNING: CPU: 0 PID: 13593 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e
    Nov  6 10:14:04 Tower kernel: Modules linked in: xt_nat macvlan iptable_filter xfs dm_crypt dm_mod dax md_mod i915 i2c_algo_bit iosf_mbi drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables wireguard ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core e1000e igb(O) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd mpt3sas glue_helper wmi_bmof intel_cstate intel_uncore i2c_i801 i2c_core wmi intel_rapl_perf raid_class ahci libahci video scsi_transport_sas pcc_cpufreq backlight thermal acpi_pad button fan [last unloaded: mlx4_core]
    Nov  6 10:14:04 Tower kernel: CPU: 0 PID: 13593 Comm: kworker/0:2 Tainted: G           O      4.19.107-Unraid #1
    Nov  6 10:14:04 Tower kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/WS C246 PRO, BIOS 1201 04/15/2020
    Nov  6 10:14:04 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan]
    Nov  6 10:14:04 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e
    Nov  6 10:14:04 Tower kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48
    Nov  6 10:14:04 Tower kernel: RSP: 0018:ffff88884b803d90 EFLAGS: 00010202
    Nov  6 10:14:04 Tower kernel: RAX: 0000000000000188 RBX: ffff8887b4cf7c00 RCX: ffff888716a4c198
    Nov  6 10:14:04 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08e94
    Nov  6 10:14:04 Tower kernel: RBP: ffff888716a4c140 R08: 00000000701ca1cd R09: ffffffff81c8aa80
    Nov  6 10:14:04 Tower kernel: R10: 0000000000000098 R11: ffff8887c109a800 R12: 0000000000006225
    Nov  6 10:14:04 Tower kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 00000000000060b7
    Nov  6 10:14:04 Tower kernel: FS:  0000000000000000(0000) GS:ffff88884b800000(0000) knlGS:0000000000000000
    Nov  6 10:14:04 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov  6 10:14:04 Tower kernel: CR2: 0000000000d4c270 CR3: 0000000001e0a001 CR4: 00000000003606f0
    Nov  6 10:14:04 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Nov  6 10:14:04 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Nov  6 10:14:04 Tower kernel: Call Trace:
    Nov  6 10:14:04 Tower kernel: <IRQ>
    Nov  6 10:14:04 Tower kernel: ipv4_confirm+0xaf/0xb9
    Nov  6 10:14:04 Tower kernel: nf_hook_slow+0x3a/0x90
    Nov  6 10:14:04 Tower kernel: ip_local_deliver+0xad/0xdc
    Nov  6 10:14:04 Tower kernel: ? ip_sublist_rcv_finish+0x54/0x54
    Nov  6 10:14:04 Tower kernel: ip_rcv+0xa0/0xbe
    Nov  6 10:14:04 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1
    Nov  6 10:14:04 Tower kernel: __netif_receive_skb_one_core+0x53/0x6f
    Nov  6 10:14:04 Tower kernel: process_backlog+0x77/0x10e
    Nov  6 10:14:04 Tower kernel: net_rx_action+0x107/0x26c
    Nov  6 10:14:04 Tower kernel: __do_softirq+0xc9/0x1d7
    Nov  6 10:14:04 Tower kernel: do_softirq_own_stack+0x2a/0x40
    Nov  6 10:14:04 Tower kernel: </IRQ>
    Nov  6 10:14:04 Tower kernel: do_softirq+0x4d/0x5a
    Nov  6 10:14:04 Tower kernel: netif_rx_ni+0x1c/0x22
    Nov  6 10:14:04 Tower kernel: macvlan_broadcast+0x111/0x156 [macvlan]
    Nov  6 10:14:04 Tower kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]
    Nov  6 10:14:04 Tower kernel: process_one_work+0x16e/0x24f
    Nov  6 10:14:04 Tower kernel: worker_thread+0x1e2/0x2b8
    Nov  6 10:14:04 Tower kernel: ? rescuer_thread+0x2a7/0x2a7
    Nov  6 10:14:04 Tower kernel: kthread+0x10c/0x114
    Nov  6 10:14:04 Tower kernel: ? kthread_park+0x89/0x89
    Nov  6 10:14:04 Tower kernel: ret_from_fork+0x1f/0x40
    Nov  6 10:14:04 Tower kernel: ---[ end trace 0daf68da82e3639c ]---

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.