slimshizn Posted October 13, 2018 Share Posted October 13, 2018 (edited) Quote Oct 13 16:54:08 Tower kernel: ------------[ cut here ]------------ Oct 13 16:54:08 Tower kernel: WARNING: CPU: 16 PID: 0 at net/netfilter/nf_conntrack_core.c:769 __nf_conntrack_confirm+0x97/0x4d6 Oct 13 16:54:08 Tower kernel: Modules linked in: vhost_net vhost tap kvm_intel kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_mangle macvlan tun xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat reiserfs xfs dm_crypt algif_skcipher af_alg dm_mod dax nct7904 bonding mlx4_en mlx4_core igb ptp pps_core i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ahci libahci mpt3sas isci libsas i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi ipmi_si button [last unloaded: md_mod] Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W 4.14.49-unRAID #1 Oct 13 16:54:08 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 Oct 13 16:54:08 Tower kernel: task: ffff88105bdf9b00 task.stack: ffffc900063f0000 Oct 13 16:54:08 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6 Oct 13 16:54:08 Tower kernel: RSP: 0018:ffff88105f1838c8 EFLAGS: 00010202 Oct 13 16:54:08 Tower kernel: RAX: 0000000000000188 RBX: 000000000000121b RCX: 00000000a136fe14 Oct 13 16:54:08 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000236 RDI: ffffffff81c08ed8 Oct 13 16:54:08 Tower kernel: RBP: ffff88103ce11200 R08: ffff8808f6d73480 R09: ffff880c78e45b00 Oct 13 16:54:08 Tower kernel: R10: 00000000000002b8 R11: 0000000000000006 R12: ffffffff81c88480 Oct 13 16:54:08 Tower kernel: R13: 0000000000006a36 R14: ffff8808f6d73480 R15: ffff8808f6d734d8 Oct 13 16:54:08 Tower kernel: FS: 0000000000000000(0000) GS:ffff88105f180000(0000) knlGS:0000000000000000 Oct 13 16:54:08 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 13 16:54:08 Tower kernel: CR2: 000014e583e4b8e4 CR3: 0000000001c0a004 CR4: 00000000001606e0 Oct 13 16:54:08 Tower kernel: Call Trace: Oct 13 16:54:08 Tower kernel: <IRQ> Oct 13 16:54:08 Tower kernel: ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4] Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96 Oct 13 16:54:08 Tower kernel: ip_local_deliver+0x97/0xb0 Oct 13 16:54:08 Tower kernel: ? inet_del_offload+0x3e/0x3e Oct 13 16:54:08 Tower kernel: ip_sabotage_in+0x2b/0x31 Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96 Oct 13 16:54:08 Tower kernel: ip_rcv+0x2e3/0x32a Oct 13 16:54:08 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x69f/0x718 Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95 Oct 13 16:54:08 Tower kernel: br_pass_frame_up+0x111/0x11e Oct 13 16:54:08 Tower kernel: ? br_port_flags_change+0xf/0xf Oct 13 16:54:08 Tower kernel: br_handle_frame_finish+0x41a/0x44a Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e Oct 13 16:54:08 Tower kernel: br_nf_hook_thresh+0x93/0x9e Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e Oct 13 16:54:08 Tower kernel: br_nf_pre_routing_finish+0x225/0x237 Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e Oct 13 16:54:08 Tower kernel: ? nf_nat_ipv4_in+0x21/0x68 [nf_nat_ipv4] Oct 13 16:54:08 Tower kernel: br_nf_pre_routing+0x2be/0x2ce Oct 13 16:54:08 Tower kernel: ? br_nf_forward_ip+0x313/0x313 Oct 13 16:54:08 Tower kernel: nf_hook_slow+0x37/0x96 Oct 13 16:54:08 Tower kernel: br_handle_frame+0x279/0x2a3 Oct 13 16:54:08 Tower kernel: ? br_pass_frame_up+0x11e/0x11e Oct 13 16:54:08 Tower kernel: ? br_handle_local_finish+0x31/0x31 Oct 13 16:54:08 Tower kernel: __netif_receive_skb_core+0x448/0x718 Oct 13 16:54:08 Tower kernel: ? kmem_cache_alloc+0xdc/0xe8 Oct 13 16:54:08 Tower kernel: ? recalibrate_cpu_khz+0x6/0x6 Oct 13 16:54:08 Tower kernel: netif_receive_skb_internal+0x8f/0x95 Oct 13 16:54:08 Tower kernel: napi_gro_frags+0x14d/0x185 Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en] Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core] Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en] Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en] Oct 13 16:54:08 Tower kernel: net_rx_action+0xfb/0x24f Oct 13 16:54:08 Tower kernel: __do_softirq+0xcd/0x1c2 Oct 13 16:54:08 Tower kernel: irq_exit+0x4f/0x8e Oct 13 16:54:08 Tower kernel: do_IRQ+0xa5/0xbb Oct 13 16:54:08 Tower kernel: common_interrupt+0x7d/0x7d Oct 13 16:54:08 Tower kernel: </IRQ> Oct 13 16:54:08 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe3/0x135 Oct 13 16:54:08 Tower kernel: RSP: 0018:ffffc900063f3ef8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff3e Oct 13 16:54:08 Tower kernel: RAX: ffff88105f1a0840 RBX: 0000000000000000 RCX: 000000000000001f Oct 13 16:54:08 Tower kernel: RDX: 0000535816dd0cf5 RSI: 0000000000020040 RDI: 0000000000000000 Oct 13 16:54:08 Tower kernel: RBP: ffff88105f1a8a00 R08: 0000b79a07f4bd4c R09: 0000000000000060 Oct 13 16:54:08 Tower kernel: R10: ffffc900063f3ed8 R11: 000000003f2a3ad4 R12: 0000000000000004 Oct 13 16:54:08 Tower kernel: R13: 0000535816dd0cf5 R14: ffffffff81c56618 R15: 0000535816b4e400 Oct 13 16:54:08 Tower kernel: ? cpuidle_enter_state+0xbb/0x135 Oct 13 16:54:08 Tower kernel: do_idle+0x11a/0x179 Oct 13 16:54:08 Tower kernel: cpu_startup_entry+0x18/0x1a Oct 13 16:54:08 Tower kernel: secondary_startup_64+0xa5/0xb0 Oct 13 16:54:08 Tower kernel: Code: 48 c1 eb 20 89 1c 24 e8 31 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 Oct 13 16:54:08 Tower kernel: ---[ end trace c9362264acbc3717 ]--- Can anyone point to what's going on here? I think it started after I had added a disk, and then removed it because it wouldn't partition correctly. Doing a preclear on said disk now. Also had put in a Mellanox 10Gbe NIC (Connect-X), which is working fine. Not sure if that's making it throw errors. Here's the errors Quote Oct 12 19:38:58 Tower kernel: CPU: 16 PID: 23101 Comm: kworker/16:1 Not tainted 4.14.49-unRAID #1 Oct 12 19:38:58 Tower kernel: Call Trace: Oct 13 02:34:00 Tower kernel: CPU: 11 PID: 19070 Comm: kworker/11:0 Tainted: G W 4.14.49-unRAID #1 Oct 13 02:34:00 Tower kernel: Call Trace: Oct 13 05:29:02 Tower kernel: CPU: 15 PID: 18306 Comm: kworker/15:8 Tainted: G W 4.14.49-unRAID #1 Oct 13 05:29:02 Tower kernel: Call Trace: Oct 13 09:34:04 Tower kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: G W 4.14.49-unRAID #1 Oct 13 09:34:04 Tower kernel: Call Trace: Oct 13 14:44:07 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W 4.14.49-unRAID #1 Oct 13 14:44:07 Tower kernel: Call Trace: Oct 13 16:54:08 Tower kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W 4.14.49-unRAID #1 Oct 13 16:54:08 Tower kernel: Call Trace: Edited October 14, 2018 by slimshizn Quote Link to comment
JorgeB Posted October 14, 2018 Share Posted October 14, 2018 8 hours ago, slimshizn said: Oct 13 16:54:08 Tower kernel: mlx4_en_process_rx_cq+0x83d/0x98b [mlx4_en] Oct 13 16:54:08 Tower kernel: ? __radix_tree_lookup+0x5a/0x7e Oct 13 16:54:08 Tower kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core] Oct 13 16:54:08 Tower kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en] Oct 13 16:54:08 Tower kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en] mlx4 is the Mellanox driver, so likely related to the NIC Quote Link to comment
slimshizn Posted October 14, 2018 Author Share Posted October 14, 2018 1 hour ago, johnnie.black said: mlx4 is the Mellanox driver, so likely related to the NIC I'm assuming it's time to upgrade to 6.6.1 then to get the latest? Or is there another way Quote Link to comment
JorgeB Posted October 14, 2018 Share Posted October 14, 2018 Upgrade to v6.6.2 and see if it makes a difference. 1 Quote Link to comment
slimshizn Posted October 14, 2018 Author Share Posted October 14, 2018 I'll post back with results. Quote Link to comment
slimshizn Posted October 14, 2018 Author Share Posted October 14, 2018 Been over an hour since I've updated and not a single issue. Thank you. ( Came from 6.5.3) Quote Link to comment
slimshizn Posted October 15, 2018 Author Share Posted October 15, 2018 Got up this morning with a few new ones. These look different. Quote Oct 15 02:39:16 Tower kernel: WARNING: CPU: 20 PID: 20336 at net/netfilter/nf_conntrack_core.c:763 __nf_conntrack_confirm+0x96/0x4fc Oct 15 02:39:16 Tower kernel: Modules linked in: macvlan xt_CHECKSUM iptable_mangle ipt_REJECT xt_nat ebtable_filter ebtables veth ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs dm_crypt algif_skcipher af_alg md_mod dm_mod dax nct7904 bonding mlx4_en mlx4_core igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mpt3sas isci libsas ahci libahci i2c_i801 ipmi_ssif intel_cstate intel_uncore i2c_core intel_rapl_perf raid_class nvme scsi_transport_sas nvme_core wmi pcc_cpufreq ipmi_si button [last unloaded: md_mod] Oct 15 02:39:16 Tower kernel: CPU: 20 PID: 20336 Comm: kworker/20:2 Tainted: G W 4.18.14-unRAID #1 Oct 15 02:39:16 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 Oct 15 02:39:16 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan] Oct 15 02:39:16 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x96/0x4fc Oct 15 02:39:16 Tower kernel: Code: c1 ed 20 89 2c 24 e8 26 f7 ff ff 8b 54 24 04 89 ef 89 c6 41 89 c5 e8 bc f8 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 04 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 73 Oct 15 02:39:16 Tower kernel: RSP: 0018:ffff88085fa83d30 EFLAGS: 00010202 Oct 15 02:39:16 Tower kernel: RAX: 0000000000000188 RBX: ffff880152a1bc00 RCX: 0000000000000101 Oct 15 02:39:16 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81e09160 Oct 15 02:39:16 Tower kernel: RBP: 000000000000aad8 R08: 000000004b01a867 R09: ffff88105ab8c000 Oct 15 02:39:16 Tower kernel: R10: 0000000000000098 R11: ffff880dced60000 R12: ffffffff81e8cc80 Oct 15 02:39:16 Tower kernel: R13: 0000000000008895 R14: ffff8801113e5400 R15: ffff8801113e5458 Oct 15 02:39:16 Tower kernel: FS: 0000000000000000(0000) GS:ffff88085fa80000(0000) knlGS:0000000000000000 Oct 15 02:39:16 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 15 02:39:16 Tower kernel: CR2: 000000c420c80060 CR3: 0000000001e0a006 CR4: 00000000001606e0 Oct 15 02:39:16 Tower kernel: Call Trace: Oct 15 02:39:16 Tower kernel: <IRQ> Oct 15 02:39:16 Tower kernel: ipv4_confirm+0xaf/0xb7 [nf_conntrack_ipv4] Oct 15 02:39:16 Tower kernel: nf_hook_slow+0x37/0x96 Oct 15 02:39:16 Tower kernel: ip_local_deliver+0xa7/0xd5 Oct 15 02:39:16 Tower kernel: ? inet_del_offload+0x3e/0x3e Oct 15 02:39:16 Tower kernel: ip_rcv+0x2dc/0x317 Oct 15 02:39:16 Tower kernel: ? ip_local_deliver_finish+0x1aa/0x1aa Oct 15 02:39:16 Tower kernel: __netif_receive_skb_core+0x6b2/0x740 Oct 15 02:39:16 Tower kernel: process_backlog+0x7e/0x116 Oct 15 02:39:16 Tower kernel: net_rx_action+0x10b/0x274 Oct 15 02:39:16 Tower kernel: __do_softirq+0xce/0x1c8 Oct 15 02:39:16 Tower kernel: do_softirq_own_stack+0x2a/0x40 Oct 15 02:39:16 Tower kernel: </IRQ> Oct 15 02:39:16 Tower kernel: do_softirq+0x4d/0x59 Oct 15 02:39:16 Tower kernel: netif_rx_ni+0x1c/0x22 Oct 15 02:39:16 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan] Oct 15 02:39:16 Tower kernel: ? __switch_to_asm+0x34/0x70 Oct 15 02:39:16 Tower kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan] Oct 15 02:39:16 Tower kernel: process_one_work+0x16e/0x243 Oct 15 02:39:16 Tower kernel: ? cancel_delayed_work_sync+0xa/0xa Oct 15 02:39:16 Tower kernel: worker_thread+0x1dc/0x2ac Oct 15 02:39:16 Tower kernel: kthread+0x10b/0x113 Oct 15 02:39:16 Tower kernel: ? kthread_flush_work_fn+0x9/0x9 Oct 15 02:39:16 Tower kernel: ret_from_fork+0x35/0x40 Oct 15 02:39:16 Tower kernel: ---[ end trace c068249681dd6fe7 ]--- Quote Link to comment
slimshizn Posted October 15, 2018 Author Share Posted October 15, 2018 Anyone have an idea? Quote Link to comment
JorgeB Posted October 15, 2018 Share Posted October 15, 2018 Are you using custom IP for dockers? Macvlan is usually related to that and there's was recently an issue with that, not sure if it was fixed or not, if yes try temporarily disabling them to see if that's the culprit. Quote Link to comment
slimshizn Posted October 16, 2018 Author Share Posted October 16, 2018 Ok I'll give it a shot. Only two that are custom are pihole and steam cache. Quote Link to comment
slimshizn Posted October 16, 2018 Author Share Posted October 16, 2018 (edited) This has never happened to this server before, I came home to find that I could not connect to it at ALL. I had to do a unclean power down to get it back ( doing a parity check as I type this). I changed the network bonding mode from active backup to 802.3ad using the 10Gbe cable and two 1Gb ethernet cables and have aggregation set on my switch. So far I haven't seen any issues but we'll see. Edited October 16, 2018 by slimshizn Quote Link to comment
Hoopster Posted October 16, 2018 Share Posted October 16, 2018 (edited) On 10/15/2018 at 6:54 PM, slimshizn said: Ok I'll give it a shot. Only two that are custom are pihole and steam cache. I fought the macvlan call traces for a very long time. In my case, they only happened with custom IP addresses on br0. I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN). Since then, I have had no macvlan call traces. I got them more frequently with pihole, but, the call traces are not generally related to a specific docker. Again, for me, they only occurred on br0, and for a couple of other users, br1. Since creating the docker VLAN, I have had no call traces for months on br0.3 A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work. Edited October 17, 2018 by Hoopster 1 Quote Link to comment
slimshizn Posted October 16, 2018 Author Share Posted October 16, 2018 2 hours ago, Hoopster said: I fought the macvlan call traces for a very long time. In my case, they only happened with custom IP addresses on br0. I created a docker VLAN and assigned the dockers I wished to have their own IP address to br0.3 (the docker VLAN). Since then, I have had no macvlan call traces. I got them more frequently with pihole, but, the call traces are not generally related to a specific docker. Again, for me, they only occurred on br0 and for a couple of others, br1. Since creating the docker VLAN, I have had no call traces for months on br0.3 A couple of times, I went several days without call traces, but, sometimes I would get several in a day and they would completely lock up the server and only a hard reboot would work. Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup. I'm going to transfer my custom IP dockers over to another server and see if that fixed the issues. I just had a few lock ups here not too long ago but they seem to have fixed themselves, and there is nothing in the syslog about it happening... Quote Link to comment
Hoopster Posted October 17, 2018 Share Posted October 17, 2018 3 minutes ago, slimshizn said: Thanks for the response. I already have pihole running on a raspberry pi, the one on unraid was just as a backup. Same here. As a result of my macvlan problems, I moved Pihole to an RPi with the docker on unRAID now only a backup. It's disabled 99.9% of the time and there is no chance that unRAID being down means no ad blocking on the network (although that is easily fixed by a non-Pihole secondary DNS on the router). Quote Link to comment
slimshizn Posted October 17, 2018 Author Share Posted October 17, 2018 Disabled pihole no problem since Quote Link to comment
slimshizn Posted October 24, 2018 Author Share Posted October 24, 2018 Came back and crashed the server today. I believe it was due to having SteamCacheBundle with a custom br0 fixed IP. Funny though that none of this happend prior to using a 10Gbe NIC. Quote Link to comment
slimshizn Posted November 13, 2018 Author Share Posted November 13, 2018 @Hoopster I'm trying out your solution with creating a VLAN. Do I need to set anything up in my network to have this work with my VM. VM is on 192.168.0.0/24 where I setup the VLAN on 192.168.1.0/24. VLAN is br0.10 Anything else I need to do to have this work right? Thanks Quote Link to comment
Hoopster Posted November 13, 2018 Share Posted November 13, 2018 32 minutes ago, slimshizn said: Anything else I need to do to have this work right? Thanks I followed bonienl's guide in this thread and it worked like a charm. It's a great overview about docker/VM/VLAN networking in general. 2 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.