Call trace issue started 6.5.3


Recommended Posts

Quote

Oct 13 16:54:08 Tower kernel: ------------[ cut here ]------------
Oct 13 16:54:08 Tower kernel: WARNING: CPU: 16 PID: 0 at net/netfilter/nf_conntrack_core.c:769 __nf_conntrack_confirm+0x97/0x4d6
Oct 13 16:54:08 Tower kernel: Modules linked in: vhost_net vhost tap kvm_intel kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_mangle macvlan tun xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat reiserfs xfs dm_crypt algif_skcipher af_alg dm_mod dax nct7904 bonding mlx4_en mlx4_core igb ptp pps_core i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ahci libahci mpt3sas isci libsas i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi ipmi_si button [last unloaded: md_mod]
Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 16:54:08 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Oct 13 16:54:08 Tower kernel: task: ffff88105bdf9b00 task.stack: ffffc900063f0000
Oct 13 16:54:08 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6
Oct 13 16:54:08 Tower kernel: RSP: 0018:ffff88105f1838c8 EFLAGS: 00010202
Oct 13 16:54:08 Tower kernel: RAX: 0000000000000188 RBX: 000000000000121b RCX: 00000000a136fe14
Oct 13 16:54:08 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000236 RDI: ffffffff81c08ed8
Oct 13 16:54:08 Tower kernel: RBP: ffff88103ce11200 R08: ffff8808f6d73480 R09: ffff880c78e45b00
Oct 13 16:54:08 Tower kernel: R10: 00000000000002b8 R11: 0000000000000006 R12: ffffffff81c88480
Oct 13 16:54:08 Tower kernel: R13: 0000000000006a36 R14: ffff8808f6d73480 R15: ffff8808f6d734d8
Oct 13 16:54:08 Tower kernel: FS:  0000000000000000(0000) GS:ffff88105f180000(0000) knlGS:0000000000000000
Oct 13 16:54:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 16:54:08 Tower kernel: CR2: 000014e583e4b8e4 CR3: 0000000001c0a004 CR4: 00000000001606e0
Oct 13 16:54:08 Tower kernel: Call Trace:
Oct 13 16:54:08 Tower kernel: <IRQ>
Oct 13 16:54:08 Tower kernel: ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4]
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: ip_local_deliver+0x97/0xb0
Oct 13 16:54:08 Tower kernel: ? inet_del_offload+0x3e/0x3e
Oct 13 16:54:08 Tower kernel: ip_sabotage_in+0x2b/0x31
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: ip_rcv+0x2e3/0x32a
Oct 13 16:54:08 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa
Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x69f/0x718
Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95
Oct 13 16:54:08 Tower kernel: br_pass_frame_up+0x111/0x11e
Oct 13 16:54:08 Tower kernel: ? br_port_flags_change+0xf/0xf
Oct 13 16:54:08 Tower kernel: br_handle_frame_finish+0x41a/0x44a
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: br_nf_hook_thresh+0x93/0x9e
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: br_nf_pre_routing_finish+0x225/0x237
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: ? nf_nat_ipv4_in+0x21/0x68 [nf_nat_ipv4]
Oct 13 16:54:08 Tower kernel: br_nf_pre_routing+0x2be/0x2ce
Oct 13 16:54:08 Tower kernel: ? br_nf_forward_ip+0x313/0x313
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: br_handle_frame+0x279/0x2a3
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: ? br_handle_local_finish+0x31/0x31
Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x448/0x718
Oct 13 16:54:08 Tower kernel: ? kmem_cache_alloc+0xdc/0xe8
Oct 13 16:54:08 Tower kernel: ? recalibrate_cpu_khz+0x6/0x6
Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95
Oct 13 16:54:08 Tower kernel: napi_gro_frags+0x14d/0x185
Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en]
Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e
Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core]
Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en]
Oct 13 16:54:08 Tower kernel: net_rx_action+0xfb/0x24f
Oct 13 16:54:08 Tower kernel: __do_softirq+0xcd/0x1c2
Oct 13 16:54:08 Tower kernel: irq_exit+0x4f/0x8e
Oct 13 16:54:08 Tower kernel: do_IRQ+0xa5/0xbb
Oct 13 16:54:08 Tower kernel: common_interrupt+0x7d/0x7d
Oct 13 16:54:08 Tower kernel: </IRQ>
Oct 13 16:54:08 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe3/0x135
Oct 13 16:54:08 Tower kernel: RSP: 0018:ffffc900063f3ef8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff3e
Oct 13 16:54:08 Tower kernel: RAX: ffff88105f1a0840 RBX: 0000000000000000 RCX: 000000000000001f
Oct 13 16:54:08 Tower kernel: RDX: 0000535816dd0cf5 RSI: 0000000000020040 RDI: 0000000000000000
Oct 13 16:54:08 Tower kernel: RBP: ffff88105f1a8a00 R08: 0000b79a07f4bd4c R09: 0000000000000060
Oct 13 16:54:08 Tower kernel: R10: ffffc900063f3ed8 R11: 000000003f2a3ad4 R12: 0000000000000004
Oct 13 16:54:08 Tower kernel: R13: 0000535816dd0cf5 R14: ffffffff81c56618 R15: 0000535816b4e400
Oct 13 16:54:08 Tower kernel: ? cpuidle_enter_state+0xbb/0x135
Oct 13 16:54:08 Tower kernel: do_idle+0x11a/0x179
Oct 13 16:54:08 Tower kernel: cpu_startup_entry+0x18/0x1a
Oct 13 16:54:08 Tower kernel: secondary_startup_64+0xa5/0xb0
Oct 13 16:54:08 Tower kernel: Code: 48 c1 eb 20 89 1c 24 e8 31 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 
Oct 13 16:54:08 Tower kernel: ---[ end trace c9362264acbc3717 ]---


Can anyone point to what's going on here? I think it started after I had added a disk, and then removed it because it wouldn't partition correctly. Doing a preclear on said disk now. Also had put in a Mellanox 10Gbe NIC (Connect-X), which is working fine. Not sure if that's making it throw errors.

Here's the errors
 

Quote

Oct 12 19:38:58 Tower kernel: CPU: 16 PID: 23101 Comm: kworker/16:1 Not tainted 4.14.49-unRAID #1
Oct 12 19:38:58 Tower kernel: Call Trace:
Oct 13 02:34:00 Tower kernel: CPU: 11 PID: 19070 Comm: kworker/11:0 Tainted: G        W       4.14.49-unRAID #1
Oct 13 02:34:00 Tower kernel: Call Trace:
Oct 13 05:29:02 Tower kernel: CPU: 15 PID: 18306 Comm: kworker/15:8 Tainted: G        W       4.14.49-unRAID #1
Oct 13 05:29:02 Tower kernel: Call Trace:
Oct 13 09:34:04 Tower kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: G        W       4.14.49-unRAID #1
Oct 13 09:34:04 Tower kernel: Call Trace:
Oct 13 14:44:07 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 14:44:07 Tower kernel: Call Trace:
Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 16:54:08 Tower kernel: Call Trace:


 

Edited by slimshizn
Link to comment
8 hours ago, slimshizn said:

Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en]
Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e
Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core]
Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en]

mlx4 is the Mellanox driver, so likely related to the NIC

Link to comment

Got up this morning with a few new ones. These look different.

 

Quote

Oct 15 02:39:16 Tower kernel: WARNING: CPU: 20 PID: 20336 at net/netfilter/nf_conntrack_core.c:763 __nf_conntrack_confirm+0x96/0x4fc
Oct 15 02:39:16 Tower kernel: Modules linked in: macvlan xt_CHECKSUM iptable_mangle ipt_REJECT xt_nat ebtable_filter ebtables veth ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs dm_crypt algif_skcipher af_alg md_mod dm_mod dax nct7904 bonding mlx4_en mlx4_core igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mpt3sas isci libsas ahci libahci i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi pcc_cpufreq ipmi_si button [last unloaded: md_mod]
Oct 15 02:39:16 Tower kernel: CPU: 20 PID: 20336 Comm: kworker/20:2 Tainted: G        W         4.18.14-unRAID #1
Oct 15 02:39:16 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Oct 15 02:39:16 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Oct 15 02:39:16 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x96/0x4fc
Oct 15 02:39:16 Tower kernel: Code: c1 ed 20 89 2c 24 e8 26 f7 ff ff 8b 54 24 04 89 ef 89 c6 41 89 c5 e8 bc f8 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 04 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 73 
Oct 15 02:39:16 Tower kernel: RSP: 0018:ffff88085fa83d30 EFLAGS: 00010202
Oct 15 02:39:16 Tower kernel: RAX: 0000000000000188 RBX: ffff880152a1bc00 RCX: 0000000000000101
Oct 15 02:39:16 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81e09160
Oct 15 02:39:16 Tower kernel: RBP: 000000000000aad8 R08: 000000004b01a867 R09: ffff88105ab8c000
Oct 15 02:39:16 Tower kernel: R10: 0000000000000098 R11: ffff880dced60000 R12: ffffffff81e8cc80
Oct 15 02:39:16 Tower kernel: R13: 0000000000008895 R14: ffff8801113e5400 R15: ffff8801113e5458
Oct 15 02:39:16 Tower kernel: FS:  0000000000000000(0000) GS:ffff88085fa80000(0000) knlGS:0000000000000000
Oct 15 02:39:16 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 15 02:39:16 Tower kernel: CR2: 000000c420c80060 CR3: 0000000001e0a006 CR4: 00000000001606e0
Oct 15 02:39:16 Tower kernel: Call Trace:
Oct 15 02:39:16 Tower kernel: <IRQ>
Oct 15 02:39:16 Tower kernel: ipv4_confirm+0xaf/0xb7 [nf_conntrack_ipv4]
Oct 15 02:39:16 Tower kernel: nf_hook_slow+0x37/0x96
Oct 15 02:39:16 Tower kernel: ip_local_deliver+0xa7/0xd5
Oct 15 02:39:16 Tower kernel: ? inet_del_offload+0x3e/0x3e
Oct 15 02:39:16 Tower kernel: ip_rcv+0x2dc/0x317
Oct 15 02:39:16 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa
Oct 15 02:39:16 Tower kernel: __netif_receive_skb_core+0x6b2/0x740
Oct 15 02:39:16 Tower kernel: process_backlog+0x7e/0x116
Oct 15 02:39:16 Tower kernel: net_rx_action+0x10b/0x274
Oct 15 02:39:16 Tower kernel: __do_softirq+0xce/0x1c8
Oct 15 02:39:16 Tower kernel: do_softirq_own_stack+0x2a/0x40
Oct 15 02:39:16 Tower kernel: </IRQ>
Oct 15 02:39:16 Tower kernel: do_softirq+0x4d/0x59
Oct 15 02:39:16 Tower kernel: netif_rx_ni+0x1c/0x22
Oct 15 02:39:16 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan]
Oct 15 02:39:16 Tower kernel: ? __switch_to_asm+0x34/0x70
Oct 15 02:39:16 Tower kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan]
Oct 15 02:39:16 Tower kernel: process_one_work+0x16e/0x243
Oct 15 02:39:16 Tower kernel: ? cancel_delayed_work_sync+0xa/0xa
Oct 15 02:39:16 Tower kernel: worker_thread+0x1dc/0x2ac
Oct 15 02:39:16 Tower kernel: kthread+0x10b/0x113
Oct 15 02:39:16 Tower kernel: ? kthread_flush_work_fn+0x9/0x9
Oct 15 02:39:16 Tower kernel: ret_from_fork+0x35/0x40
Oct 15 02:39:16 Tower kernel: ---[ end trace c068249681dd6fe7 ]---

 

Link to comment

This has never happened to this server before, I came home to find that I could not connect to it at ALL. I had to do a unclean power down to get it back ( doing a parity check as I type this). I changed the network bonding mode from active backup to 802.3ad using the 10Gbe cable and two 1Gb ethernet cables and have aggregation set on my switch. So far I haven't seen any issues but we'll see.

Edited by slimshizn
Link to comment
On 10/15/2018 at 6:54 PM, slimshizn said:

Ok I'll give it a shot. Only two that are custom are pihole and steam cache. 

I fought the macvlan call traces for a very long time.  In my case, they only happened with custom IP addresses on br0.  I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN).  Since then, I have had no macvlan call traces.

 

I got them more frequently with pihole, but, the call traces are not generally related to a specific docker.  Again, for me, they only occurred on br0, and for a couple of other users, br1.  Since creating the docker VLAN, I have had no call traces for months on br0.3

 

A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work.

Edited by Hoopster
  • Like 1
Link to comment
2 hours ago, Hoopster said:

I fought the macvlan call traces for a very long time.  In my case, they only happened with custom IP addresses on br0.  I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN).  Since then, I have had no macvlan call traces.

 

I got them more frequently with pihole, but, the call traces are not generally related to a specific docker.  Again, for me, they only occurred on br0 and for a couple of others, br1.  Since creating the docker VLAN, I have had no call traces for months on br0.3

 

A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work.

Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup. I'm going to transfer my custom IP dockers over to another server and see if that fixed the issues. I just had a few lock ups here not too long ago but they seem to have fixed themselves, and there is nothing in the syslog about it happening...

Link to comment
3 minutes ago, slimshizn said:

Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup.

Same here.  As a result of my macvlan problems, I moved Pihole to an RPi with the docker on unRAID now only a backup.  It's disabled 99.9% of the time and there is no chance that unRAID being down means no ad blocking on the network (although that is easily fixed by a non-Pihole secondary DNS on the router).

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.