• [6.9.2-6.10-rc1] netfilter causing call traces


    DieFalse
    • Minor

    I have been tracking a continuous call trace problem for many days and it seems the built in nvidia kernel/drivers is causing them. Any insight to alleviate this would be great.  I am not sure if this is a "bug" or a "support issue" so wanted to start here and be moved if needed.

     

    System Info

    Unraid Version:6.9.2 && 6.10-rc1

    Kernel:5.10.28-Unraid

    Compile Date:Wed Apr 7 08:23:18 PDT 2021

     

    nVidia Info:

    Nvidia Driver Version:470.63.01 (latest stable)

    Installed GPU(s):0:
    Quadro P1000
    43:00.0

     

    Aug 20 07:10:58 GSA kernel: ------------[ cut here ]------------
    Aug 20 07:10:58 GSA kernel: WARNING: CPU: 15 PID: 0 at net/netfilter/nf_nat_core.c:614 nf_nat_setup_info+0x6c/0x6aa [nf_nat]
    Aug 20 07:10:58 GSA kernel: Modules linked in: nvidia_uvm(PO) xt_mark xt_comment xt_nat veth nfsv3 nfs nfs_ssc xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart ip6table_filter ip6_tables iptable_filter ip_tables x_tables mlx4_en mlx4_core tg3 sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ipmi_ssif rapl intel_cstate i2c_core intel_uncore mpt3sas input_leds led_class raid_class scsi_transport_sas megaraid_sas wmi acpi_power_meter ipmi_si button [last unloaded: mlx4_core]
    Aug 20 07:10:58 GSA kernel: CPU: 15 PID: 0 Comm: swapper/15 Tainted: P        W  O      5.10.28-Unraid #1
    Aug 20 07:10:58 GSA kernel: Hardware name: Dell Inc. PowerEdge R720xd/0JP31P, BIOS 2.9.0 12/06/2019
    Aug 20 07:10:58 GSA kernel: RIP: 0010:nf_nat_setup_info+0x6c/0x6aa [nf_nat]
    Aug 20 07:10:58 GSA kernel: Code: 89 fb 49 89 f6 41 89 d4 76 02 0f 0b 48 8b 93 80 00 00 00 89 d0 25 00 01 00 00 45 85 e4 75 07 89 d0 25 80 00 00 00 85 c0 74 07 <0f> 0b e9 77 05 00 00 48 8b 83 90 00 00 00 4c 8d 6c 24 20 48 8d 73
    Aug 20 07:10:58 GSA kernel: RSP: 0018:ffffc90000494810 EFLAGS: 00010202
    Aug 20 07:10:58 GSA kernel: RAX: 0000000000000080 RBX: ffff88830f355b80 RCX: ffff88821645e500
    Aug 20 07:10:58 GSA kernel: RDX: 0000000000000180 RSI: ffffc900004948ec RDI: ffff88830f355b80
    Aug 20 07:10:58 GSA kernel: RBP: ffffc900004948d8 R08: 000000007313a8c0 R09: 0000000000000000
    Aug 20 07:10:58 GSA kernel: R10: 0000000000000158 R11: ffff88814f13bf00 R12: 0000000000000000
    Aug 20 07:10:58 GSA kernel: R13: 000000007313a800 R14: ffffc900004948ec R15: 0000000000000001
    Aug 20 07:10:58 GSA kernel: FS:  0000000000000000(0000) GS:ffff88debf5c0000(0000) knlGS:0000000000000000
    Aug 20 07:10:58 GSA kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 20 07:10:58 GSA kernel: CR2: 00000000017d62f0 CR3: 000000000200a001 CR4: 00000000000606e0
    Aug 20 07:10:58 GSA kernel: Call Trace:
    Aug 20 07:10:58 GSA kernel: 
    Aug 20 07:10:58 GSA kernel: ? __fib_validate_source+0x24c/0x2a5
    Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x4bb/0x5c0 [ip_tables]
    Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables]
    Aug 20 07:10:58 GSA kernel: __nf_nat_alloc_null_binding+0x5f/0x76 [nf_nat]
    Aug 20 07:10:58 GSA kernel: nf_nat_inet_fn+0x91/0x183 [nf_nat]
    Aug 20 07:10:58 GSA kernel: nf_nat_ipv4_local_in+0x25/0xa9 [nf_nat]
    Aug 20 07:10:58 GSA kernel: nf_hook_slow+0x39/0x8e
    Aug 20 07:10:58 GSA kernel: nf_hook.constprop.0+0xb1/0xd8
    Aug 20 07:10:58 GSA kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
    Aug 20 07:10:58 GSA kernel: ip_local_deliver+0x49/0x75
    Aug 20 07:10:58 GSA kernel: ip_sabotage_in+0x43/0x4d [br_netfilter]
    Aug 20 07:10:58 GSA kernel: nf_hook_slow+0x39/0x8e
    Aug 20 07:10:58 GSA kernel: nf_hook.constprop.0+0xb1/0xd8
    Aug 20 07:10:58 GSA kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50
    Aug 20 07:10:58 GSA kernel: ip_rcv+0x41/0x61
    Aug 20 07:10:58 GSA kernel: __netif_receive_skb_one_core+0x74/0x95
    Aug 20 07:10:58 GSA kernel: netif_receive_skb+0x79/0xa1
    Aug 20 07:10:58 GSA kernel: br_handle_frame_finish+0x30d/0x351
    Aug 20 07:10:58 GSA kernel: ? skb_copy_bits+0xe8/0x197
    Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables]
    Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda
    Aug 20 07:10:58 GSA kernel: br_nf_hook_thresh+0xa3/0xc3 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda
    Aug 20 07:10:58 GSA kernel: br_nf_pre_routing_finish+0x23d/0x264 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda
    Aug 20 07:10:58 GSA kernel: ? br_handle_frame_finish+0x351/0x351
    Aug 20 07:10:58 GSA kernel: ? nf_nat_ipv4_pre_routing+0x1e/0x4a [nf_nat]
    Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: ? br_handle_frame_finish+0x351/0x351
    Aug 20 07:10:58 GSA kernel: NF_HOOK+0xd7/0xf7 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: br_nf_pre_routing+0x229/0x239 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter]
    Aug 20 07:10:58 GSA kernel: br_handle_frame+0x25e/0x2a6
    Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda
    Aug 20 07:10:58 GSA kernel: __netif_receive_skb_core+0x335/0x4e7
    Aug 20 07:10:58 GSA kernel: ? dev_gro_receive+0x55d/0x578
    Aug 20 07:10:58 GSA kernel: __netif_receive_skb_list_core+0x78/0x104
    Aug 20 07:10:58 GSA kernel: netif_receive_skb_list_internal+0x1bf/0x1f2
    Aug 20 07:10:58 GSA kernel: gro_normal_list+0x1d/0x39
    Aug 20 07:10:58 GSA kernel: napi_complete_done+0x79/0x104
    Aug 20 07:10:58 GSA kernel: mlx4_en_poll_rx_cq+0xa8/0xc7 [mlx4_en]
    Aug 20 07:10:58 GSA kernel: net_rx_action+0xf4/0x29d
    Aug 20 07:10:58 GSA kernel: __do_softirq+0xc4/0x1c2
    Aug 20 07:10:58 GSA kernel: asm_call_irq_on_stack+0x12/0x20
    Aug 20 07:10:58 GSA kernel: 
    Aug 20 07:10:58 GSA kernel: do_softirq_own_stack+0x2c/0x39
    Aug 20 07:10:58 GSA kernel: __irq_exit_rcu+0x45/0x80
    Aug 20 07:10:58 GSA kernel: common_interrupt+0x119/0x12e
    Aug 20 07:10:58 GSA kernel: asm_common_interrupt+0x1e/0x40
    Aug 20 07:10:58 GSA kernel: RIP: 0010:arch_local_irq_enable+0x4/0x8
    Aug 20 07:10:58 GSA kernel: Code: d4 39 18 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 66 66 90 66 90 c3 fa 66 66 90 66 66 90 c3 fb 66 66 90 <66> 66 90 c3 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31
    Aug 20 07:10:58 GSA kernel: RSP: 0018:ffffc900000f7ea0 EFLAGS: 00000246
    Aug 20 07:10:58 GSA kernel: RAX: ffff88debf5e2380 RBX: 0000000000000004 RCX: 000000000000001f
    Aug 20 07:10:58 GSA kernel: RDX: 0000000000000000 RSI: 000000003333348b RDI: 0000000000000000
    Aug 20 07:10:58 GSA kernel: RBP: ffffe8fffddfed00 R08: 00003d6121a8df03 R09: 00003d667688e0ff
    Aug 20 07:10:58 GSA kernel: R10: 00000000000664e6 R11: 071c71c71c71c71c R12: 00003d6121a8df03
    Aug 20 07:10:58 GSA kernel: R13: ffffffff820c5dc0 R14: 0000000000000004 R15: 0000000000000000
    Aug 20 07:10:58 GSA kernel: cpuidle_enter_state+0x101/0x1c4
    Aug 20 07:10:58 GSA kernel: cpuidle_enter+0x25/0x31
    Aug 20 07:10:58 GSA kernel: do_idle+0x1a6/0x214
    Aug 20 07:10:58 GSA kernel: cpu_startup_entry+0x18/0x1a
    Aug 20 07:10:58 GSA kernel: secondary_startup_64_no_verify+0xb0/0xbb
    Aug 20 07:10:58 GSA kernel: ---[ end trace d61aac45b3f9ccb8 ]---

     

    • Like 1



    User Feedback

    Recommended Comments

    On Aug 20th, I posted the above due to debugging seeming to point to nVidia, however post troubleshooting in depth it was determined netfilter was causing the call traces.   

     

    I was asked to try "ipvlan" instead of "macvlan" - this made no change, so reverted back to macvlan.

     

    :: Place holder for details and oulying the issue, original values etc :: :: at work so limited on what I can pull, will edit to add later ::

     

    I have since, after reviewing other similar call traces, found reference to setting the conn track max in an effort to resolve the call traces.  just over 36 hours ago I made the following change: "sysctl net/netfilter/nf_conntrack_max=131072" in terminal and verified it with "cat /proc/sys/net/netfilter/nf_conntrack_max" showing the new value of 131072.  I have not had a single call trace since.

     

    TLDR: setting this 

    sysctl net/netfilter/nf_conntrack_max=131072

    stopped my call traces.

     

    If anyone knows how to help me gather what's needed to see why this stopped the call traces and prevent them from happening to others - please assist.

    • Like 1
    Link to comment

    I have been up for 24 hours on Version: 6.10.0-rc2g without the issue so far.  Will report any changes.

     

    This still occurs in rc2g
     

    Edited by fmp4m
    Link to comment

    Just wanted to add that I'm having this exact problem as well on 6.9.2. Running a Ryzen 3600, 32GB ECC memory, Nvidia Quadro P400. Seems to happen every week or two and forces me to restart server. 

    Link to comment
    On 8/27/2021 at 3:49 PM, DieFalse said:

    TLDR: setting this 

    sysctl net/netfilter/nf_conntrack_max=131072

    stopped my call traces.

     

    Is there a way to make this change so that it survives a reboot?

    Link to comment
    On 12/11/2021 at 8:37 AM, Newyorkone said:

     

    Is there a way to make this change so that it survives a reboot?

     

    Create a user script that runs on array startup

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.