• [unRAID 6.10.0-rc1] - Seemingly random crashes


    danioj
    • Urgent

    I previously mentioned (on the release thread) that I had experienced a few random crashes since upgrading to 6.10.0-rc1. Each crash required a hard reset of the server making capturing diagnostics problematic.

     

    I enabled syslog mirroring though and have been able to capture the following error:

     

    Aug 24 08:58:01 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 24 08:58:01 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 24 09:56:19 unraid kernel: ------------[ cut here ]------------
    Aug 24 09:56:19 unraid kernel: WARNING: CPU: 3 PID: 4821 at net/netfilter/nf_conntrack_core.c:1132 __nf_conntrack_confirm+0xa0/0x1eb [nf_conntrack]
    Aug 24 09:56:19 unraid kernel: Modules linked in: nvidia_modeset(PO) nvidia_uvm(PO) veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle xt_nat xt_tcpudp ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia(PO) nct6775 hwmon_vid jc42 ip6table_filter ip6_tables iptable_filter ip_tables x_tables igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel drm crypto_simd cryptd rapl ahci ipmi_ssif agpgart intel_cstate mpt3sas syscopyarea intel_uncore sysfillrect sysimgblt i2c_i801 libahci fb_sys_fops input_leds intel_pch_thermal video i2c_algo_bit raid_class i2c_smbus scsi_transport_sas i2c_core led_class backlight thermal button acpi_ipmi fan ipmi_si [last unloaded: igb]
    Aug 24 09:56:19 unraid kernel: CPU: 3 PID: 4821 Comm: kworker/3:1 Tainted: P           O      5.13.8-Unraid #1
    Aug 24 09:56:19 unraid kernel: Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 3.2 06/09/2018
    Aug 24 09:56:19 unraid kernel: Workqueue: events macvlan_process_broadcast [macvlan]
    Aug 24 09:56:19 unraid kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x1eb [nf_conntrack]
    Aug 24 09:56:19 unraid kernel: Code: e8 7e f6 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 92 f4 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 c6 ed ff ff e8 09 f3 ff ff e9 22 01
    Aug 24 09:56:19 unraid kernel: RSP: 0018:ffffc9000015cd20 EFLAGS: 00010202
    Aug 24 09:56:19 unraid kernel: RAX: 0000000000000188 RBX: 0000000000008091 RCX: 00000000b4e7b974
    Aug 24 09:56:19 unraid kernel: RDX: 0000000000000000 RSI: 000000000000033c RDI: ffffffffa0264eb0
    Aug 24 09:56:19 unraid kernel: RBP: ffff888390b048c0 R08: 00000000f0e26b20 R09: ffff88818087a6a0
    Aug 24 09:56:19 unraid kernel: R10: ffff88822f778040 R11: 0000000000000000 R12: 000000000000233c
    Aug 24 09:56:19 unraid kernel: R13: ffffffff82168b00 R14: 0000000000008091 R15: 0000000000000000
    Aug 24 09:56:19 unraid kernel: FS:  0000000000000000(0000) GS:ffff8887ffcc0000(0000) knlGS:0000000000000000
    Aug 24 09:56:19 unraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 24 09:56:19 unraid kernel: CR2: 00007f9528a8af73 CR3: 000000000200a005 CR4: 00000000001726e0
    Aug 24 09:56:19 unraid kernel: Call Trace:
    Aug 24 09:56:19 unraid kernel: <IRQ>
    Aug 24 09:56:19 unraid kernel: nf_conntrack_confirm+0x2f/0x36 [nf_conntrack]
    Aug 24 09:56:19 unraid kernel: nf_hook_slow+0x3e/0x93
    Aug 24 09:56:19 unraid kernel: ? ip_protocol_deliver_rcu+0x115/0x115
    Aug 24 09:56:19 unraid kernel: NF_HOOK.constprop.0+0x70/0xc8
    Aug 24 09:56:19 unraid kernel: ? ip_protocol_deliver_rcu+0x115/0x115
    Aug 24 09:56:19 unraid kernel: ip_sabotage_in+0x4c/0x59 [br_netfilter]
    Aug 24 09:56:19 unraid kernel: nf_hook_slow+0x3e/0x93
    Aug 24 09:56:19 unraid kernel: ? ip_rcv_finish_core.constprop.0+0x351/0x351
    Aug 24 09:56:19 unraid kernel: NF_HOOK.constprop.0+0x70/0xc8
    Aug 24 09:56:19 unraid kernel: ? ip_rcv_finish_core.constprop.0+0x351/0x351
    Aug 24 09:56:19 unraid kernel: __netif_receive_skb_one_core+0x77/0x98
    Aug 24 09:56:19 unraid kernel: process_backlog+0xab/0x143
    Aug 24 09:56:19 unraid kernel: __napi_poll+0x2a/0x114
    Aug 24 09:56:19 unraid kernel: net_rx_action+0xe8/0x1f2
    Aug 24 09:56:19 unraid kernel: __do_softirq+0xef/0x21b
    Aug 24 09:56:19 unraid kernel: do_softirq+0x50/0x68
    Aug 24 09:56:19 unraid kernel: </IRQ>
    Aug 24 09:56:19 unraid kernel: netif_rx_ni+0x56/0x8b
    Aug 24 09:56:19 unraid kernel: macvlan_broadcast+0x116/0x144 [macvlan]
    Aug 24 09:56:19 unraid kernel: macvlan_process_broadcast+0xc7/0x10b [macvlan]
    Aug 24 09:56:19 unraid kernel: process_one_work+0x196/0x274
    Aug 24 09:56:19 unraid kernel: worker_thread+0x19c/0x240
    Aug 24 09:56:19 unraid kernel: ? rescuer_thread+0x2a2/0x2a2
    Aug 24 09:56:19 unraid kernel: kthread+0xdf/0xe4
    Aug 24 09:56:19 unraid kernel: ? set_kthread_struct+0x32/0x32
    Aug 24 09:56:19 unraid kernel: ret_from_fork+0x22/0x30
    Aug 24 09:56:19 unraid kernel: ---[ end trace 44186f4b6dd2c3e1 ]---

     

    After a reset, all is well and there appears to be no obvious regularity of trigger for the above. 

     

    EDIT: set to urgent as per priority definition instructions given it was a server crash.

     

     




    User Feedback

    Recommended Comments



    Interestingly, I have also noticed this in the log this morning too:

     

    Aug 25 10:59:38 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:38 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:39 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:39 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:40 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:40 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:41 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:41 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:42 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:42 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:43 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:43 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:44 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:44 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:45 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:45 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:46 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:46 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:47 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:47 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:48 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:48 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:49 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:49 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:50 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:50 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:51 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:51 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:52 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:52 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:53 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:53 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:54 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:54 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:55 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:55 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:56 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:56 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:57 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:57 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:58 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:58 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 10:59:59 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 10:59:59 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 11:00:00 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 11:00:00 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 11:00:01 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 11:00:01 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 11:00:02 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 11:00:02 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
    Aug 25 11:00:03 unraid kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000dc000-0x000dffff window]
    Aug 25 11:00:03 unraid kernel: caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs

     

    Only posting as it appears just before the crash in the above post. Not sure if this is a precurser to a crash.

     

    Edited by danioj
    Link to comment

    From the release thread:

    Quote

    The new ipvlan mode is introduced to battle the crashes some people experience when using macvlan mode. If that is your case, change to ipvlan mode and test. Changing of mode does not require to reconfigure anything on Docker level, internally everything is being taken care off.

     

    Link to comment
    35 minutes ago, trurl said:

    From the release thread:

     

     

    Thanks @trurl. I think I must have missed that because I had not had macvlan crash issues previously.

     

    I will change to ipvlan mode and report back after a week (or if another crash occurs). 

    Link to comment

    Update. I made the change to IPVLAN. Things haven't worked as expected.

     

    Things seemed to have worked fine with the change until I tried to watch Plex in the evening.  My Plex clients could not connect. Rather than debug, I switched back to macvlan and hey presto Plex was working again.

     

    My setup is not as standard as most but not overly unusual.

     

    - Plex uses br0 for a custom IP and runs on my main VLAN

    - My Plex clients sit on my IoT VLAN

    - Firewall rules and mDNS are set to allow clients discovery and access to Plex

    - Plex works fine from my phone which is on my main VLAN

     

    Additional info:

     

    Host access to custom networks is set to yes

    Preserve user defined networks is set to yes

    Link to comment
    4 minutes ago, bonienl said:

    Disable host access to custom networks and retest.

     

    Btw ipvlan should have worked in your situation...

     

    I am afraid I need to keep this enabled. This is critical to other aspects of my setup.

    Link to comment
    7 minutes ago, danioj said:

     

    I am afraid I need to keep this enabled. This is critical to other aspects of my setup.

     

    OK, I tried it anyway. Disabling host access to custom networks made no difference, Plex was unavailable. I tested further.

     

    It turns out that it is not just Plex that my Google clients (on another VLAN) can't access just by switching from macvlan to ipvlan it is any other container that is running on another network.

     

    So here is my setup to make it clearer (all my containers have their own IP and I use my secondary interface on the server to establish access to VLANS):

     

    Dockers 1-5 on br1.66 For VPN routing

    Dockers 6 on br1.77 For main LAN segregation but normal gateway

    Dockers 7-8 on br0 For main LAN

    Dockers 9-10 on custom network proxynet for external access

     

    With ipvlan enabled Docker 1-5 can access each other. Docker 6 can't access Docker 1-5.  Docker 7-8 can't access Dockers 1-5 or 6 but can access each other. There appears to be no issue with Docker 9-10.

     

    Enable macvlan, all is well until it crashes.

     

    It appears that traffic across networks doesn't happen when I enable ipvlan. Traffic is fine within the same network.

     

    Not sure what is going on in the background between the two but something is different.

    Link to comment
    45 minutes ago, ljm42 said:

    @danioj check out the 2nd post here, does it help?

     


    Thanks for the suggestion @ljm42.

     

    Comparing the logs of the two crashes they look different - certainly the one in the post you linked does have many references to netfilter where mine doesn’t - but I guess there is no harm in giving it a go.
     

    It will only take 24-48 hours to test and find out. 

    Link to comment

    Ah yes, sometimes once you have a hammer all you see are nails :) But I did notice this in your log snippet, so maybe? 
     
    Aug 24 09:56:19 unraid kernel: ------------[ cut here ]------------
    Aug 24 09:56:19 unraid kernel: WARNING: CPU: 3 PID: 4821 at net/netfilter/nf_conntrack_core.c:1132 __nf_conntrack_confirm+0xa0/0x1eb [nf_conntrack]

     

    Link to comment

    Update. I had another hard lock up over night. Same issues in the log.

     

    Tried the fix linked to me by @ljm42 above:

     

    sysctl net/netfilter/nf_conntrack_max=131072

     

    Let's see how it goes.

    Link to comment
    On 8/28/2021 at 9:15 PM, danioj said:

    Update. I had another hard lock up over night. Same issues in the log.

     

    Tried the fix linked to me by @ljm42 above:

     

    sysctl net/netfilter/nf_conntrack_max=131072

     

    Let's see how it goes.

    I reviewed your logs and you are experiencing call traces due to your networking adapter, however it appears you are somehow reaching the limit of your NIC.   If the fix above doesn't work, splitting between a couple NICs may, or upgrading your existing NIC or it's firmware even.   I didn't review the logs enough to find the systems hardware as it's late, so I will relook tomorrow and let you know if I can see anything deeper.

    Link to comment
    On 8/26/2021 at 7:14 AM, danioj said:

    My setup is not as standard as most but not overly unusual.

     

    - Plex uses br0 for a custom IP and runs on my main VLAN

    - My Plex clients sit on my IoT VLAN

    - Firewall rules and mDNS are set to allow clients discovery and access to Plex

    - Plex works fine from my phone which is on my main VLAN

    Does different VLAN set in same subnet ??

    I apply same subnet in some VLANs but no inter VLAN communicate, macvlan have minor issue and I use ipvlan mode now with host access. BTW, I don't think inter VLAN communicate in same subnet is good practice. ( I use switch ACL rule / port isolation to make control for access )

     

     

    Edited by Vr2Io
    Link to comment

    Ok - awaiting your reply on testing.  I am 100% seeing netfilter call trace cause.  your expierence on IPVLAN is expected, and you can overcome this by properly building your docker network with creating custom networks and some other config.  "host access being enabled" will cause issues and is generally not advised, it would take you some time to correct your config not to need it. But - that's a different thread.  

     

     

     

    Link to comment
    22 hours ago, Vr2Io said:

    Does different VLAN set in same subnet ??

    I apply same subnet in some VLANs but no inter VLAN communicate, macvlan have minor issue and I use ipvlan mode now with host access. BTW, I don't think inter VLAN communicate in same subnet is good practice. ( I use switch ACL rule / port isolation to make control for access )

     

     

     

    23 hours ago, fmp4m said:

    I reviewed your logs and you are experiencing call traces due to your networking adapter, however it appears you are somehow reaching the limit of your NIC.   If the fix above doesn't work, splitting between a couple NICs may, or upgrading your existing NIC or it's firmware even.   I didn't review the logs enough to find the systems hardware as it's late, so I will relook tomorrow and let you know if I can see anything deeper.

     

    Thanks for your input guys. Rather than try and pick apart your advice I thought I would just share my network config and Docker config.

     

    In short, I am using 2 interfaces. I am not sure how I could have hit the limit of my NIC? But I am also sure that I am not doing anything that wild either.

     

    eth0 is my primary unraid interface and eth1 is used to allow dockers and VM's access to other VLANS on my network.

     

    I have minimal inter VLAN routing established in pfsense to allow for things like administer dockers on other networks and allow some of them to interact with one another where needed.

     

    Docker Network.png

    Interface eth0.png

    Interface eth1.png

    Link to comment
    On 8/28/2021 at 8:35 AM, ljm42 said:

    Ah yes, sometimes once you have a hammer all you see are nails :) But I did notice this in your log snippet, so maybe? 
     
    Aug 24 09:56:19 unraid kernel: ------------[ cut here ]------------
    Aug 24 09:56:19 unraid kernel: WARNING: CPU: 3 PID: 4821 at net/netfilter/nf_conntrack_core.c:1132 __nf_conntrack_confirm+0xa0/0x1eb [nf_conntrack]

     

     

    Update since I applied this "Fix" - no call traces in the log or hard crashes yet.

     

    If this works I will probably need to add the command to a user script to execute on array start.

    Edited by danioj
    • Like 2
    Link to comment
    14 hours ago, danioj said:

     

    Update since I applied this "Fix" - no call traces in the log or hard crashes yet.

     

    If this works I will probably need to add the command to a user script to execute on array start.

     

    You should not need to,  It is a once and done command UNLESS something in the system sets the conntrack too high which shouldn't happen and hasn't since Kernel 5.12.2 https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.12.2

     

    Since this netfilter conntrack is working as of now, the limitation for the nic was a false flag. I don't know what NIC's you are using but some consumer level ones do not like multiple mac addresses and fail as low as two.  creating a virtual vlan or br0 ip etc creates new virtual MAC addresses and the card can only handle so much. Example of a enterprise card: "Many Mellanox adapters are capable of exposing up to 127 virtual instances." Consumer card: "Realtek 1Gb NICs are often limited to 6-12 virtual instances" 

     

    Looking at your config, you have I believe 7 instances on one card and 1 on the other which is controlled by a Intel® i210AT dual port 1Gb integrated, which is limited to 5 vectors (instances) per port,  so it is technically over limit - HOWEVER you're not assigning IP's to the VLANS and I believe this stops them from being true virtual instances, so theoretically it should be fine.

     

    Give it some more time, but please advise if you experience any call trace and if so, post diagnostics with the syslog please.  I don't expect you to have any though.

    Edited by fmp4m
    Intel® i210AT in logs info added
    Link to comment
    6 hours ago, fmp4m said:

     

    You should not need to,  It is a once and done command UNLESS something in the system sets the conntrack too high which shouldn't happen and hasn't since Kernel 5.12.2 https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.12.2

     

    Is this right?  My understanding that any command I issue on unRAID would not survive a reboot as it is loaded into RAM each time from a baseline config.

     

    Given I am setting contract max to 131072 and this seemingly "fixes" the call traces - I would have to set this each time I reboot?

     

    My NICS are:

     

    2 x Intel Corporation I210 Gigabit Network Connection (rev 03) onboard an Supermicro X10SL7-F Motherboard.

     

    Diagnostics attached.

    unraid-diagnostics-20210901-0925.zip

    Edited by danioj
    Link to comment
    8 hours ago, fmp4m said:

    I can state that I set it once,  it survived many reboots and now upgrade to 6.10-rc2d


    I have just tested this. The command does not survive reboots. 

    Link to comment
    2 hours ago, danioj said:


    I have just tested this. The command does not survive reboots. 

     

    Without re-issuing the command, after a reboot, have you had the call trace?  The reason I ask, is because the number will change, as designed, so verifying it with cat /proc/sys/net/netfilter/nf_conntrack_max or similar is not valid.

    Link to comment
    12 hours ago, fmp4m said:

     

    Without re-issuing the command, after a reboot, have you had the call trace?  The reason I ask, is because the number will change, as designed, so verifying it with cat /proc/sys/net/netfilter/nf_conntrack_max or similar is not valid.

     

    I have to openly admit that I do not have the technical insights as to how it works. What I can share is what I experienced.

     

    The server was stable ever since I issued the command initially. I had to shut off the server as I had an electrician in to install some smart light switches and we had to cut the power.

     

    When I turned back on - well, 5 hours after I turned back on - the server crashed.

     

    I hard reset and went back in an issued the same above command and it's been stable again ever since. I concluded from that (and it will be interesting to see if I get a crash again over the coming week - noting they did come daily previously), that the command has to keep being issued.

    Link to comment
    18 hours ago, danioj said:

     

    I have to openly admit that I do not have the technical insights as to how it works. What I can share is what I experienced.

     

    The server was stable ever since I issued the command initially. I had to shut off the server as I had an electrician in to install some smart light switches and we had to cut the power.

     

    When I turned back on - well, 5 hours after I turned back on - the server crashed.

     

    I hard reset and went back in an issued the same above command and it's been stable again ever since. I concluded from that (and it will be interesting to see if I get a crash again over the coming week - noting they did come daily previously), that the command has to keep being issued.

     

    Thank you for detailing this.  I am looking for extra information on how this solves the issue to begin with and how it reacts for others.  Hopefully someone better than I can chime in and expand.

    Link to comment

    Just checking in.

     

    Uptime is now 3 days 13 hours 9 minutes since I last issued the netfilter fix command.

     

    Not one call trace in the log or hard lock up / crash since.

    • Like 2
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.