Jump to content
slimshizn

Call trace issue started 6.5.3

18 posts in this topic Last Reply

Recommended Posts

Quote

Oct 13 16:54:08 Tower kernel: ------------[ cut here ]------------
Oct 13 16:54:08 Tower kernel: WARNING: CPU: 16 PID: 0 at net/netfilter/nf_conntrack_core.c:769 __nf_conntrack_confirm+0x97/0x4d6
Oct 13 16:54:08 Tower kernel: Modules linked in: vhost_net vhost tap kvm_intel kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_mangle macvlan tun xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat reiserfs xfs dm_crypt algif_skcipher af_alg dm_mod dax nct7904 bonding mlx4_en mlx4_core igb ptp pps_core i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ahci libahci mpt3sas isci libsas i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi ipmi_si button [last unloaded: md_mod]
Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 16:54:08 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Oct 13 16:54:08 Tower kernel: task: ffff88105bdf9b00 task.stack: ffffc900063f0000
Oct 13 16:54:08 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6
Oct 13 16:54:08 Tower kernel: RSP: 0018:ffff88105f1838c8 EFLAGS: 00010202
Oct 13 16:54:08 Tower kernel: RAX: 0000000000000188 RBX: 000000000000121b RCX: 00000000a136fe14
Oct 13 16:54:08 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000236 RDI: ffffffff81c08ed8
Oct 13 16:54:08 Tower kernel: RBP: ffff88103ce11200 R08: ffff8808f6d73480 R09: ffff880c78e45b00
Oct 13 16:54:08 Tower kernel: R10: 00000000000002b8 R11: 0000000000000006 R12: ffffffff81c88480
Oct 13 16:54:08 Tower kernel: R13: 0000000000006a36 R14: ffff8808f6d73480 R15: ffff8808f6d734d8
Oct 13 16:54:08 Tower kernel: FS:  0000000000000000(0000) GS:ffff88105f180000(0000) knlGS:0000000000000000
Oct 13 16:54:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 16:54:08 Tower kernel: CR2: 000014e583e4b8e4 CR3: 0000000001c0a004 CR4: 00000000001606e0
Oct 13 16:54:08 Tower kernel: Call Trace:
Oct 13 16:54:08 Tower kernel: <IRQ>
Oct 13 16:54:08 Tower kernel: ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4]
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: ip_local_deliver+0x97/0xb0
Oct 13 16:54:08 Tower kernel: ? inet_del_offload+0x3e/0x3e
Oct 13 16:54:08 Tower kernel: ip_sabotage_in+0x2b/0x31
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: ip_rcv+0x2e3/0x32a
Oct 13 16:54:08 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa
Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x69f/0x718
Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95
Oct 13 16:54:08 Tower kernel: br_pass_frame_up+0x111/0x11e
Oct 13 16:54:08 Tower kernel: ? br_port_flags_change+0xf/0xf
Oct 13 16:54:08 Tower kernel: br_handle_frame_finish+0x41a/0x44a
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: br_nf_hook_thresh+0x93/0x9e
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: br_nf_pre_routing_finish+0x225/0x237
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: ? nf_nat_ipv4_in+0x21/0x68 [nf_nat_ipv4]
Oct 13 16:54:08 Tower kernel: br_nf_pre_routing+0x2be/0x2ce
Oct 13 16:54:08 Tower kernel: ? br_nf_forward_ip+0x313/0x313
Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96
Oct 13 16:54:08 Tower kernel: br_handle_frame+0x279/0x2a3
Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e
Oct 13 16:54:08 Tower kernel: ? br_handle_local_finish+0x31/0x31
Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x448/0x718
Oct 13 16:54:08 Tower kernel: ? kmem_cache_alloc+0xdc/0xe8
Oct 13 16:54:08 Tower kernel: ? recalibrate_cpu_khz+0x6/0x6
Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95
Oct 13 16:54:08 Tower kernel: napi_gro_frags+0x14d/0x185
Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en]
Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e
Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core]
Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en]
Oct 13 16:54:08 Tower kernel: net_rx_action+0xfb/0x24f
Oct 13 16:54:08 Tower kernel: __do_softirq+0xcd/0x1c2
Oct 13 16:54:08 Tower kernel: irq_exit+0x4f/0x8e
Oct 13 16:54:08 Tower kernel: do_IRQ+0xa5/0xbb
Oct 13 16:54:08 Tower kernel: common_interrupt+0x7d/0x7d
Oct 13 16:54:08 Tower kernel: </IRQ>
Oct 13 16:54:08 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe3/0x135
Oct 13 16:54:08 Tower kernel: RSP: 0018:ffffc900063f3ef8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff3e
Oct 13 16:54:08 Tower kernel: RAX: ffff88105f1a0840 RBX: 0000000000000000 RCX: 000000000000001f
Oct 13 16:54:08 Tower kernel: RDX: 0000535816dd0cf5 RSI: 0000000000020040 RDI: 0000000000000000
Oct 13 16:54:08 Tower kernel: RBP: ffff88105f1a8a00 R08: 0000b79a07f4bd4c R09: 0000000000000060
Oct 13 16:54:08 Tower kernel: R10: ffffc900063f3ed8 R11: 000000003f2a3ad4 R12: 0000000000000004
Oct 13 16:54:08 Tower kernel: R13: 0000535816dd0cf5 R14: ffffffff81c56618 R15: 0000535816b4e400
Oct 13 16:54:08 Tower kernel: ? cpuidle_enter_state+0xbb/0x135
Oct 13 16:54:08 Tower kernel: do_idle+0x11a/0x179
Oct 13 16:54:08 Tower kernel: cpu_startup_entry+0x18/0x1a
Oct 13 16:54:08 Tower kernel: secondary_startup_64+0xa5/0xb0
Oct 13 16:54:08 Tower kernel: Code: 48 c1 eb 20 89 1c 24 e8 31 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 
Oct 13 16:54:08 Tower kernel: ---[ end trace c9362264acbc3717 ]---


Can anyone point to what's going on here? I think it started after I had added a disk, and then removed it because it wouldn't partition correctly. Doing a preclear on said disk now. Also had put in a Mellanox 10Gbe NIC (Connect-X), which is working fine. Not sure if that's making it throw errors.

Here's the errors
 

Quote

Oct 12 19:38:58 Tower kernel: CPU: 16 PID: 23101 Comm: kworker/16:1 Not tainted 4.14.49-unRAID #1
Oct 12 19:38:58 Tower kernel: Call Trace:
Oct 13 02:34:00 Tower kernel: CPU: 11 PID: 19070 Comm: kworker/11:0 Tainted: G        W       4.14.49-unRAID #1
Oct 13 02:34:00 Tower kernel: Call Trace:
Oct 13 05:29:02 Tower kernel: CPU: 15 PID: 18306 Comm: kworker/15:8 Tainted: G        W       4.14.49-unRAID #1
Oct 13 05:29:02 Tower kernel: Call Trace:
Oct 13 09:34:04 Tower kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: G        W       4.14.49-unRAID #1
Oct 13 09:34:04 Tower kernel: Call Trace:
Oct 13 14:44:07 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 14:44:07 Tower kernel: Call Trace:
Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W       4.14.49-unRAID #1
Oct 13 16:54:08 Tower kernel: Call Trace:


 

Edited by slimshizn

Share this post


Link to post
8 hours ago, slimshizn said:

Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en]
Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e
Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core]
Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en]

mlx4 is the Mellanox driver, so likely related to the NIC

Share this post


Link to post
1 hour ago, johnnie.black said:

mlx4 is the Mellanox driver, so likely related to the NIC

I'm assuming it's time to upgrade to 6.6.1 then to get the latest? Or is there another way

Share this post


Link to post

Got up this morning with a few new ones. These look different.

 

Quote

Oct 15 02:39:16 Tower kernel: WARNING: CPU: 20 PID: 20336 at net/netfilter/nf_conntrack_core.c:763 __nf_conntrack_confirm+0x96/0x4fc
Oct 15 02:39:16 Tower kernel: Modules linked in: macvlan xt_CHECKSUM iptable_mangle ipt_REJECT xt_nat ebtable_filter ebtables veth ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs dm_crypt algif_skcipher af_alg md_mod dm_mod dax nct7904 bonding mlx4_en mlx4_core igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mpt3sas isci libsas ahci libahci i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi pcc_cpufreq ipmi_si button [last unloaded: md_mod]
Oct 15 02:39:16 Tower kernel: CPU: 20 PID: 20336 Comm: kworker/20:2 Tainted: G        W         4.18.14-unRAID #1
Oct 15 02:39:16 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Oct 15 02:39:16 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Oct 15 02:39:16 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x96/0x4fc
Oct 15 02:39:16 Tower kernel: Code: c1 ed 20 89 2c 24 e8 26 f7 ff ff 8b 54 24 04 89 ef 89 c6 41 89 c5 e8 bc f8 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 04 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 73 
Oct 15 02:39:16 Tower kernel: RSP: 0018:ffff88085fa83d30 EFLAGS: 00010202
Oct 15 02:39:16 Tower kernel: RAX: 0000000000000188 RBX: ffff880152a1bc00 RCX: 0000000000000101
Oct 15 02:39:16 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81e09160
Oct 15 02:39:16 Tower kernel: RBP: 000000000000aad8 R08: 000000004b01a867 R09: ffff88105ab8c000
Oct 15 02:39:16 Tower kernel: R10: 0000000000000098 R11: ffff880dced60000 R12: ffffffff81e8cc80
Oct 15 02:39:16 Tower kernel: R13: 0000000000008895 R14: ffff8801113e5400 R15: ffff8801113e5458
Oct 15 02:39:16 Tower kernel: FS:  0000000000000000(0000) GS:ffff88085fa80000(0000) knlGS:0000000000000000
Oct 15 02:39:16 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 15 02:39:16 Tower kernel: CR2: 000000c420c80060 CR3: 0000000001e0a006 CR4: 00000000001606e0
Oct 15 02:39:16 Tower kernel: Call Trace:
Oct 15 02:39:16 Tower kernel: <IRQ>
Oct 15 02:39:16 Tower kernel: ipv4_confirm+0xaf/0xb7 [nf_conntrack_ipv4]
Oct 15 02:39:16 Tower kernel: nf_hook_slow+0x37/0x96
Oct 15 02:39:16 Tower kernel: ip_local_deliver+0xa7/0xd5
Oct 15 02:39:16 Tower kernel: ? inet_del_offload+0x3e/0x3e
Oct 15 02:39:16 Tower kernel: ip_rcv+0x2dc/0x317
Oct 15 02:39:16 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa
Oct 15 02:39:16 Tower kernel: __netif_receive_skb_core+0x6b2/0x740
Oct 15 02:39:16 Tower kernel: process_backlog+0x7e/0x116
Oct 15 02:39:16 Tower kernel: net_rx_action+0x10b/0x274
Oct 15 02:39:16 Tower kernel: __do_softirq+0xce/0x1c8
Oct 15 02:39:16 Tower kernel: do_softirq_own_stack+0x2a/0x40
Oct 15 02:39:16 Tower kernel: </IRQ>
Oct 15 02:39:16 Tower kernel: do_softirq+0x4d/0x59
Oct 15 02:39:16 Tower kernel: netif_rx_ni+0x1c/0x22
Oct 15 02:39:16 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan]
Oct 15 02:39:16 Tower kernel: ? __switch_to_asm+0x34/0x70
Oct 15 02:39:16 Tower kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan]
Oct 15 02:39:16 Tower kernel: process_one_work+0x16e/0x243
Oct 15 02:39:16 Tower kernel: ? cancel_delayed_work_sync+0xa/0xa
Oct 15 02:39:16 Tower kernel: worker_thread+0x1dc/0x2ac
Oct 15 02:39:16 Tower kernel: kthread+0x10b/0x113
Oct 15 02:39:16 Tower kernel: ? kthread_flush_work_fn+0x9/0x9
Oct 15 02:39:16 Tower kernel: ret_from_fork+0x35/0x40
Oct 15 02:39:16 Tower kernel: ---[ end trace c068249681dd6fe7 ]---

 

Share this post


Link to post

Are you using custom IP for dockers? Macvlan is usually related to that and there's was recently an issue with that, not sure if it was fixed or not, if yes try temporarily disabling them to see if that's the culprit.

Share this post


Link to post

This has never happened to this server before, I came home to find that I could not connect to it at ALL. I had to do a unclean power down to get it back ( doing a parity check as I type this). I changed the network bonding mode from active backup to 802.3ad using the 10Gbe cable and two 1Gb ethernet cables and have aggregation set on my switch. So far I haven't seen any issues but we'll see.

Edited by slimshizn

Share this post


Link to post
On 10/15/2018 at 6:54 PM, slimshizn said:

Ok I'll give it a shot. Only two that are custom are pihole and steam cache. 

I fought the macvlan call traces for a very long time.  In my case, they only happened with custom IP addresses on br0.  I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN).  Since then, I have had no macvlan call traces.

 

I got them more frequently with pihole, but, the call traces are not generally related to a specific docker.  Again, for me, they only occurred on br0, and for a couple of other users, br1.  Since creating the docker VLAN, I have had no call traces for months on br0.3

 

A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work.

Edited by Hoopster

Share this post


Link to post
2 hours ago, Hoopster said:

I fought the macvlan call traces for a very long time.  In my case, they only happened with custom IP addresses on br0.  I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN).  Since then, I have had no macvlan call traces.

 

I got them more frequently with pihole, but, the call traces are not generally related to a specific docker.  Again, for me, they only occurred on br0 and for a couple of others, br1.  Since creating the docker VLAN, I have had no call traces for months on br0.3

 

A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work.

Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup. I'm going to transfer my custom IP dockers over to another server and see if that fixed the issues. I just had a few lock ups here not too long ago but they seem to have fixed themselves, and there is nothing in the syslog about it happening...

Share this post


Link to post
3 minutes ago, slimshizn said:

Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup.

Same here.  As a result of my macvlan problems, I moved Pihole to an RPi with the docker on unRAID now only a backup.  It's disabled 99.9% of the time and there is no chance that unRAID being down means no ad blocking on the network (although that is easily fixed by a non-Pihole secondary DNS on the router).

Share this post


Link to post

Came back and crashed the server today. I believe it was due to having SteamCacheBundle with a custom br0 fixed IP. Funny though that none of this happend prior to using a 10Gbe NIC.

Share this post


Link to post

@Hoopster I'm trying out your solution with creating a VLAN.


Do I need to set anything up in my network to have this work with my VM. VM is on 192.168.0.0/24 where I setup the VLAN on 192.168.1.0/24. VLAN is br0.10image.thumb.png.4eb4a67eb443eabc3c848f7b0efafb3b.png
image.thumb.png.d9fbc9b2f94b130e074352752e05b65a.png
Anything else I need to do to have this work right? Thanks

 

Share this post


Link to post
32 minutes ago, slimshizn said:

Anything else I need to do to have this work right? Thanks

 

I followed bonienl's guide in this thread and it worked like a charm.  It's a great overview about docker/VM/VLAN networking in general.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.