gdeyoung Posted January 13, 2021 Share Posted January 13, 2021 Can anyone take a look and let me know what this kernel panic is caused by. Here is the trace from the syslog. I'm actually getting these somewhat regularlly on three different 6.9-RC@ unraid servers. Jan 12 07:15:08 Homeserver kernel: ------------[ cut here ]------------ Jan 12 07:15:08 Homeserver kernel: WARNING: CPU: 0 PID: 0 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x99/0x1e1 Jan 12 07:15:08 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd mxm_wmi raid_class wmi_bmof cryptd i2c_piix4 scsi_transport_sas wmi glue_helper i2c_core k10temp ccp ahci rapl libahci button acpi_cpufreq [last unloaded: mdio] Jan 12 07:15:08 Homeserver kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P S O 5.10.1-Unraid #1 Jan 12 07:15:08 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018 Jan 12 07:15:08 Homeserver kernel: RIP: 0010:__nf_conntrack_confirm+0x99/0x1e1 Jan 12 07:15:08 Homeserver kernel: Code: e4 e3 ff ff 8b 54 24 14 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 54 e1 ff ff 84 c0 75 b8 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 89 de ff ff e8 af e0 ff ff e9 1f 01 Jan 12 07:15:08 Homeserver kernel: RSP: 0018:ffffc90000003898 EFLAGS: 00010202 Jan 12 07:15:08 Homeserver kernel: RAX: 0000000000000188 RBX: 00000000000034f1 RCX: 000000003317c1ec Jan 12 07:15:08 Homeserver kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8200a078 Jan 12 07:15:08 Homeserver kernel: RBP: ffff8882471a2f00 R08: 000000005cf7db60 R09: ffff888182b03440 Jan 12 07:15:08 Homeserver kernel: R10: 0000000000000158 R11: ffff888104bed100 R12: 000000000000d20e Jan 12 07:15:08 Homeserver kernel: R13: ffffffff8210da40 R14: 00000000000034f1 R15: ffff8882471a2f0c Jan 12 07:15:08 Homeserver kernel: FS: 0000000000000000(0000) GS:ffff8887fec00000(0000) knlGS:0000000000000000 Jan 12 07:15:08 Homeserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 12 07:15:08 Homeserver kernel: CR2: 0000145dc4003428 CR3: 00000001aceda000 CR4: 00000000003506f0 Jan 12 07:15:08 Homeserver kernel: Call Trace: Jan 12 07:15:08 Homeserver kernel: <IRQ> Jan 12 07:15:08 Homeserver kernel: nf_conntrack_confirm+0x2f/0x36 Jan 12 07:15:08 Homeserver kernel: nf_hook_slow+0x39/0x8e Jan 12 07:15:08 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8 Jan 12 07:15:08 Homeserver kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe Jan 12 07:15:08 Homeserver kernel: ip_local_deliver+0x49/0x75 Jan 12 07:15:08 Homeserver kernel: ip_sabotage_in+0x43/0x4d Jan 12 07:15:08 Homeserver kernel: nf_hook_slow+0x39/0x8e Jan 12 07:15:08 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8 Jan 12 07:15:08 Homeserver kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50 Jan 12 07:15:08 Homeserver kernel: ip_rcv+0x41/0x61 Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_one_core+0x74/0x95 Jan 12 07:15:08 Homeserver kernel: netif_receive_skb+0x79/0xa1 Jan 12 07:15:08 Homeserver kernel: br_handle_frame_finish+0x30d/0x351 Jan 12 07:15:08 Homeserver kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables] Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 12 07:15:08 Homeserver kernel: br_nf_hook_thresh+0xa3/0xc3 Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 12 07:15:08 Homeserver kernel: br_nf_pre_routing_finish+0x23d/0x264 Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 12 07:15:08 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351 Jan 12 07:15:08 Homeserver kernel: ? nf_nat_ipv4_in+0x1e/0x4a [nf_nat] Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 12 07:15:08 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351 Jan 12 07:15:08 Homeserver kernel: NF_HOOK+0xd7/0xf7 Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 12 07:15:08 Homeserver kernel: br_nf_pre_routing+0x229/0x239 Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 12 07:15:08 Homeserver kernel: br_handle_frame+0x25e/0x2a6 Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_core+0x335/0x4e7 Jan 12 07:15:08 Homeserver kernel: ? find_busiest_group+0x3b/0x2bc Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_list_core+0x78/0x104 Jan 12 07:15:08 Homeserver kernel: netif_receive_skb_list_internal+0x1bf/0x1f2 Jan 12 07:15:08 Homeserver kernel: ? dev_gro_receive+0x55d/0x578 Jan 12 07:15:08 Homeserver kernel: gro_normal_list+0x1d/0x39 Jan 12 07:15:08 Homeserver kernel: napi_complete_done+0x79/0x104 Jan 12 07:15:08 Homeserver kernel: ixgbe_poll+0xc95/0xd4f [ixgbe] Jan 12 07:15:08 Homeserver kernel: ? update_cfs_rq_load_avg+0x14b/0x154 Jan 12 07:15:08 Homeserver kernel: net_rx_action+0xf4/0x29d Jan 12 07:15:08 Homeserver kernel: __do_softirq+0xc4/0x1c2 Jan 12 07:15:08 Homeserver kernel: asm_call_irq_on_stack+0x12/0x20 Jan 12 07:15:08 Homeserver kernel: </IRQ> Jan 12 07:15:08 Homeserver kernel: do_softirq_own_stack+0x2c/0x39 Jan 12 07:15:08 Homeserver kernel: __irq_exit_rcu+0x45/0x80 Jan 12 07:15:08 Homeserver kernel: common_interrupt+0x119/0x12e Jan 12 07:15:08 Homeserver kernel: asm_common_interrupt+0x1e/0x40 Jan 12 07:15:08 Homeserver kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Jan 12 07:15:08 Homeserver kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Jan 12 07:15:08 Homeserver kernel: RSP: 0018:ffffffff82003e60 EFLAGS: 00000246 Jan 12 07:15:08 Homeserver kernel: RAX: ffff8887fec22300 RBX: 0000000000000002 RCX: 000000000000001f Jan 12 07:15:08 Homeserver kernel: RDX: 0000000000000000 RSI: 00000000229839cd RDI: 0000000000000000 Jan 12 07:15:08 Homeserver kernel: RBP: ffff888101ae5400 R08: 000016ef236af812 R09: 0000000000000000 Jan 12 07:15:08 Homeserver kernel: R10: 0000000000004653 R11: 071c71c71c71c71c R12: 000016ef236af812 Jan 12 07:15:08 Homeserver kernel: R13: ffffffff820cad00 R14: 0000000000000002 R15: 0000000000000000 Jan 12 07:15:08 Homeserver kernel: cpuidle_enter_state+0x101/0x1c4 Jan 12 07:15:08 Homeserver kernel: cpuidle_enter+0x25/0x31 Jan 12 07:15:08 Homeserver kernel: do_idle+0x1a1/0x20f Jan 12 07:15:08 Homeserver kernel: cpu_startup_entry+0x18/0x1a Jan 12 07:15:08 Homeserver kernel: start_kernel+0x4c0/0x4e3 Jan 12 07:15:08 Homeserver kernel: secondary_startup_64_no_verify+0xb0/0xbb Jan 12 07:15:08 Homeserver kernel: ---[ end trace 20ff384d3480aa64 ]--- Quote Link to comment
gdeyoung Posted January 15, 2021 Author Share Posted January 15, 2021 Happened again 24hrs later. Page faulted: Jan 11 08:30:00 Homeserver kernel: BUG: unable to handle page fault for address: 00000000000053d8 Whole trace: an 11 04:07:30 Homeserver kernel: br0: port 1(bond0) entered forwarding state Jan 11 04:08:25 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 04:15:05 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:05 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:06 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:07 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:07 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:06 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 05:08:32 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 06:08:38 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 07:04:35 Homeserver vsftpd[1787]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:35 Homeserver sshd[1788]: Connection from 192.168.1.1 port 50482 on 192.168.1.251 port 22 rdomain "" Jan 11 07:04:35 Homeserver sshd[1788]: error: kex_exchange_identification: Connection closed by remote host Jan 11 07:04:35 Homeserver sshd[1788]: Connection closed by 192.168.1.1 port 50482 Jan 11 07:04:35 Homeserver vsftpd[1794]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:35 Homeserver vsftpd[1796]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:35 Homeserver vsftpd[1798]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:35 Homeserver vsftpd[1800]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:35 Homeserver vsftpd[1802]: connect from 192.168.1.1 (192.168.1.1) Jan 11 07:04:46 Homeserver smbd[1790]: [2021/01/11 07:04:46.002732, 0] ../../source3/smbd/process.c:341(read_packet_remainder) Jan 11 07:04:46 Homeserver smbd[1790]: read_fd_with_timeout failed for client 192.168.1.1 read error = NT_STATUS_END_OF_FILE. Jan 11 07:08:45 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 08:08:51 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 08:30:00 Homeserver kernel: BUG: unable to handle page fault for address: 00000000000053d8 Jan 11 08:30:00 Homeserver kernel: #PF: supervisor write access in kernel mode Jan 11 08:30:00 Homeserver kernel: #PF: error_code(0x0002) - not-present page Jan 11 08:30:00 Homeserver kernel: PGD 0 P4D 0 Jan 11 08:30:00 Homeserver kernel: Oops: 0002 [#1] SMP NOPTI Jan 11 08:30:00 Homeserver kernel: CPU: 6 PID: 11516 Comm: php7 Tainted: P S W O 5.10.1-Unraid #1 Jan 11 08:30:00 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018 Jan 11 08:30:00 Homeserver kernel: RIP: 0010:slab_post_alloc_hook+0xe1/0x14a Jan 11 08:30:00 Homeserver kernel: Code: 0b 48 8b 55 08 f0 48 83 02 01 eb 04 65 48 ff 02 48 89 44 24 08 e8 da 4b f6 ff 49 8b 57 38 48 89 ef 48 8b 44 24 08 48 83 e2 fe <48> 89 2c c2 49 8b 07 8b 53 08 8b 4b 18 48 c1 e8 3a 48 8b 34 c5 a0 Jan 11 08:30:00 Homeserver kernel: RSP: 0000:ffffc90008cbfd90 EFLAGS: 00010202 Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000a79 RBX: ffff888100045500 RCX: 0000000000000005 Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000010 RSI: ffff888187c80000 RDI: ffff888164310f40 Jan 11 08:30:00 Homeserver kernel: RBP: ffff888164310f40 R08: ffffc90008cbfdd8 R09: 0000000080000000 Jan 11 08:30:00 Homeserver kernel: R10: ffffea00061f2000 R11: 000000000000ffff R12: 0000000000000000 Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000cc0 R14: ffffc90008cbfdd8 R15: ffffea00061f2000 Jan 11 08:30:00 Homeserver kernel: FS: 000014d90dc12d48(0000) GS:ffff8887fed80000(0000) knlGS:0000000000000000 Jan 11 08:30:00 Homeserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8 CR3: 0000000272046000 CR4: 00000000003506e0 Jan 11 08:30:00 Homeserver kernel: Call Trace: Jan 11 08:30:00 Homeserver kernel: ? __anon_vma_prepare+0x2b/0x110 Jan 11 08:30:00 Homeserver kernel: kmem_cache_alloc+0x108/0x130 Jan 11 08:30:00 Homeserver kernel: __anon_vma_prepare+0x2b/0x110 Jan 11 08:30:00 Homeserver kernel: handle_mm_fault+0xd5d/0xec3 Jan 11 08:30:00 Homeserver kernel: exc_page_fault+0x253/0x36d Jan 11 08:30:00 Homeserver kernel: ? asm_exc_page_fault+0x8/0x30 Jan 11 08:30:00 Homeserver kernel: asm_exc_page_fault+0x1e/0x30 Jan 11 08:30:00 Homeserver kernel: RIP: 0033:0x14d90dbcddc0 Jan 11 08:30:00 Homeserver kernel: Code: 27 48 89 47 2f 48 89 47 37 48 89 44 17 c1 48 89 44 17 c9 48 89 44 17 d1 48 89 44 17 d9 48 89 f8 c3 f7 c7 0f 00 00 00 49 89 f8 <48> 89 44 17 f8 48 89 d1 75 0b 48 c1 e9 03 f3 48 ab 4c 89 c0 c3 31 Jan 11 08:30:00 Homeserver kernel: RSP: 002b:00007ffd4698f118 EFLAGS: 00010206 Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000000 RBX: 00007ffd4698f258 RCX: 000014d90dbb2c32 Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000f14 RSI: 0000000000000000 RDI: 000014d90c9910ec Jan 11 08:30:00 Homeserver kernel: RBP: 00007ffd4698f630 R08: 000014d90c9910ec R09: 0000000000012000 Jan 11 08:30:00 Homeserver kernel: R10: 0000000000000012 R11: 000014d90c992000 R12: 0000000000016000 Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003 Jan 11 08:30:00 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_piix4 crypto_simd i2c_core mxm_wmi mpt3sas wmi_bmof cryptd ahci ccp raid_class scsi_transport_sas glue_helper wmi k10temp libahci rapl button acpi_cpufreq [last unloaded: mdio] Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8 Jan 11 08:30:00 Homeserver kernel: ---[ end trace 5ed146f2988e06a2 ]--- Jan 11 08:30:00 Homeserver kernel: RIP: 0010:slab_post_alloc_hook+0xe1/0x14a Jan 11 08:30:00 Homeserver kernel: Code: 0b 48 8b 55 08 f0 48 83 02 01 eb 04 65 48 ff 02 48 89 44 24 08 e8 da 4b f6 ff 49 8b 57 38 48 89 ef 48 8b 44 24 08 48 83 e2 fe <48> 89 2c c2 49 8b 07 8b 53 08 8b 4b 18 48 c1 e8 3a 48 8b 34 c5 a0 Jan 11 08:30:00 Homeserver kernel: RSP: 0000:ffffc90008cbfd90 EFLAGS: 00010202 Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000a79 RBX: ffff888100045500 RCX: 0000000000000005 Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000010 RSI: ffff888187c80000 RDI: ffff888164310f40 Jan 11 08:30:00 Homeserver kernel: RBP: ffff888164310f40 R08: ffffc90008cbfdd8 R09: 0000000080000000 Jan 11 08:30:00 Homeserver kernel: R10: ffffea00061f2000 R11: 000000000000ffff R12: 0000000000000000 Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000cc0 R14: ffffc90008cbfdd8 R15: ffffea00061f2000 Jan 11 08:30:00 Homeserver kernel: FS: 000014d90dc12d48(0000) GS:ffff8887fed80000(0000) knlGS:0000000000000000 Jan 11 08:30:00 Homeserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8 CR3: 0000000272046000 CR4: 00000000003506e0 Jan 11 08:44:18 Homeserver kernel: general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#2] SMP NOPTI Jan 11 08:44:18 Homeserver kernel: CPU: 12 PID: 0 Comm: swapper/12 Tainted: P S D W O 5.10.1-Unraid #1 Jan 11 08:44:18 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018 Jan 11 08:44:18 Homeserver kernel: RIP: 0010:nf_nat_setup_info+0x5f3/0x652 [nf_nat] Jan 11 08:44:18 Homeserver kernel: Code: 8b 05 01 56 00 00 48 8d 8b 98 00 00 00 4a 8d 14 e0 48 8b 02 48 89 93 a0 00 00 00 48 89 83 98 00 00 00 48 85 c0 48 89 0a 74 04 <48> 89 48 08 4c 89 ef e8 bf 34 5b e1 eb 13 41 83 fc 01 75 0d 48 81 Jan 11 08:44:18 Homeserver kernel: RSP: 0018:ffffc900003a0838 EFLAGS: 00010286 Jan 11 08:44:18 Homeserver kernel: RAX: dead000000000122 RBX: ffff88826a62c3c0 RCX: ffff88826a62c458 Jan 11 08:44:18 Homeserver kernel: RDX: ffff888187ca9e60 RSI: 00000000bdba70db RDI: ffffffffa0146550 Jan 11 08:44:18 Homeserver kernel: RBP: ffffc900003a0900 R08: 000000003221cea3 R09: ffff888235dfdf80 Jan 11 08:44:18 Homeserver kernel: R10: 0000000000000098 R11: ffff88826ca1df00 R12: 00000000000053cc Jan 11 08:44:18 Homeserver kernel: R13: ffffffffa0146550 R14: ffffc900003a0914 R15: 0000000000000001 Jan 11 08:44:18 Homeserver kernel: FS: 0000000000000000(0000) GS:ffff8887fef00000(0000) knlGS:0000000000000000 Jan 11 08:44:18 Homeserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 11 08:44:18 Homeserver kernel: CR2: 000034f46d1b1000 CR3: 00000001e1848000 CR4: 00000000003506e0 Jan 11 08:44:18 Homeserver kernel: Call Trace: Jan 11 08:44:18 Homeserver kernel: <IRQ> Jan 11 08:44:18 Homeserver kernel: ? ipt_do_table+0x4a2/0x5c0 [ip_tables] Jan 11 08:44:18 Homeserver kernel: nf_nat_alloc_null_binding+0x71/0x88 [nf_nat] Jan 11 08:44:18 Homeserver kernel: nf_nat_inet_fn+0x91/0x182 [nf_nat] Jan 11 08:44:18 Homeserver kernel: nf_hook_slow+0x39/0x8e Jan 11 08:44:18 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8 Jan 11 08:44:18 Homeserver kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe Jan 11 08:44:18 Homeserver kernel: ip_local_deliver+0x49/0x75 Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_one_core+0x74/0x95 Jan 11 08:44:18 Homeserver kernel: netif_receive_skb+0x79/0xa1 Jan 11 08:44:18 Homeserver kernel: br_handle_frame_finish+0x30d/0x351 Jan 11 08:44:18 Homeserver kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables] Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 11 08:44:18 Homeserver kernel: br_nf_hook_thresh+0xa3/0xc3 Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 11 08:44:18 Homeserver kernel: br_nf_pre_routing_finish+0x23d/0x264 Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 11 08:44:18 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351 Jan 11 08:44:18 Homeserver kernel: ? nf_nat_ipv4_in+0x1e/0x4a [nf_nat] Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 11 08:44:18 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351 Jan 11 08:44:18 Homeserver kernel: NF_HOOK+0xd7/0xf7 Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 11 08:44:18 Homeserver kernel: br_nf_pre_routing+0x229/0x239 Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0 Jan 11 08:44:18 Homeserver kernel: br_handle_frame+0x25e/0x2a6 Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_core+0x335/0x4e7 Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_list_core+0x78/0x104 Jan 11 08:44:18 Homeserver kernel: netif_receive_skb_list_internal+0x1bf/0x1f2 Jan 11 08:44:18 Homeserver kernel: ? dev_gro_receive+0x55d/0x578 Jan 11 08:44:18 Homeserver kernel: gro_normal_list+0x1d/0x39 Jan 11 08:44:18 Homeserver kernel: napi_complete_done+0x79/0x104 Jan 11 08:44:18 Homeserver kernel: ixgbe_poll+0xc95/0xd4f [ixgbe] Jan 11 08:44:18 Homeserver kernel: net_rx_action+0xf4/0x29d Jan 11 08:44:18 Homeserver kernel: __do_softirq+0xc4/0x1c2 Jan 11 08:44:18 Homeserver kernel: asm_call_irq_on_stack+0x12/0x20 Jan 11 08:44:18 Homeserver kernel: </IRQ> Jan 11 08:44:18 Homeserver kernel: do_softirq_own_stack+0x2c/0x39 Jan 11 08:44:18 Homeserver kernel: __irq_exit_rcu+0x45/0x80 Jan 11 08:44:18 Homeserver kernel: common_interrupt+0x119/0x12e Jan 11 08:44:18 Homeserver kernel: asm_common_interrupt+0x1e/0x40 Jan 11 08:44:18 Homeserver kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8 Jan 11 08:44:18 Homeserver kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5 Jan 11 08:44:18 Homeserver kernel: RSP: 0018:ffffc9000016fea0 EFLAGS: 00000246 Jan 11 08:44:18 Homeserver kernel: RAX: ffff8887fef22300 RBX: 0000000000000002 RCX: 000000000000001f Jan 11 08:44:18 Homeserver kernel: RDX: 0000000000000000 RSI: 0000000022983893 RDI: 0000000000000000 Jan 11 08:44:18 Homeserver kernel: RBP: ffff8881059e0c00 R08: 0000b5548bebb872 R09: 0000b55487f71fc0 Jan 11 08:44:18 Homeserver kernel: R10: 00000000000012f5 R11: 071c71c71c71c71c R12: 0000b5548bebb872 Jan 11 08:44:18 Homeserver kernel: R13: ffffffff820cad00 R14: 0000000000000002 R15: 0000000000000000 Jan 11 08:44:18 Homeserver kernel: cpuidle_enter_state+0x101/0x1c4 Jan 11 08:44:18 Homeserver kernel: cpuidle_enter+0x25/0x31 Jan 11 08:44:18 Homeserver kernel: do_idle+0x1a1/0x20f Jan 11 08:44:18 Homeserver kernel: cpu_startup_entry+0x18/0x1a Jan 11 08:44:18 Homeserver kernel: secondary_startup_64_no_verify+0xb0/0xbb Jan 11 08:44:18 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_piix4 crypto_simd i2c_core mxm_wmi mpt3sas wmi_bmof cryptd ahci ccp raid_class scsi_transport_sas glue_helper wmi k10temp libahci rapl button acpi_cpufreq [last unloaded: mdio] Jan 11 08:44:18 Homeserver kernel: ---[ end trace 5ed146f2988e06a3 ]--- Quote Link to comment
gdeyoung Posted January 15, 2021 Author Share Posted January 15, 2021 Does the above traces connect with the Nvidia driver at all? I'm seeing this in the log this morning after a reboot repeated a lot. Jan 15 10:12:30 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:30 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:32 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:32 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:34 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:34 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:35 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:35 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:36 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:36 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:38 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:38 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:39 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:39 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Quote Link to comment
JorgeB Posted January 15, 2021 Share Posted January 15, 2021 IIRC that was caused by the GPU statistics plugin, or a related one. Quote Link to comment
gdeyoung Posted January 15, 2021 Author Share Posted January 15, 2021 Yes, I have the GPU stat plugin installed. Any insight on the kernel panic trace above? Quote Link to comment
JorgeB Posted January 15, 2021 Share Posted January 15, 2021 Not really, but it looks LAN related, so try to simplify the network config as much as possible. Quote Link to comment
gdeyoung Posted January 15, 2021 Author Share Posted January 15, 2021 So I have a MB with a integrated 1G ethernet that is mounted as Eth0 I have a Intel 10G 2port SFP+ card that is mounted at Eth1 and Eth2 I have a single DAC cable in Eth1 of the 10G card It is configured as a active bridge on br0 for Eth0, Eth1, Eth2 This is all the default config I went into the BIOS and turned off the built in 1G card mounted as Eth0 I wanted the system to default to the port 0 of the 10G as Eth0 On bootup it has an error that Eth0 can't be found. How do I make the server forget the disabled 1G port and make the 10G port 0 as Eth0? Quote Link to comment
gdeyoung Posted January 15, 2021 Author Share Posted January 15, 2021 (edited) Ok, I figured it out. I was going about it backwards. In the network settings you can arrange the MAC addresses of the NIC's to what Eth port you want to assign them to. I just rearranged the port 0 MAC address to the Eth0 configuration To simplify the networking I turned off the bond for Eth 0-2 that was set to active-passive (that was the unraid default BTW) . I'm betting it was bouncing since I only had 10G port 1 (Eth1) plugged in. I will report back on the stability Edited January 15, 2021 by gdeyoung Quote Link to comment
gdeyoung Posted January 16, 2021 Author Share Posted January 16, 2021 So my second server just crashed with a kernel panic, all three are having panics and they are all different hardware. Any idea from this trace? Jan 15 22:36:42 Mediaserver kernel: rcu: INFO: rcu_sched self-detected stall on CPU Jan 15 22:36:42 Mediaserver kernel: rcu: #0110-....: (59999 ticks this GP) idle=e7a/1/0x4000000000000000 softirq=11770626/11770626 fqs=14993 Jan 15 22:36:42 Mediaserver kernel: #011(t=60000 jiffies g=13660245 q=3404623) Jan 15 22:36:42 Mediaserver kernel: NMI backtrace for cpu 0 Jan 15 22:36:42 Mediaserver kernel: CPU: 0 PID: 28592 Comm: kworker/u24:0 Tainted: P O 5.10.1-Unraid #1 Jan 15 22:36:42 Mediaserver kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Extreme4, BIOS P2.30 12/25/2018 Jan 15 22:36:42 Mediaserver kernel: Workqueue: events_power_efficient gc_worker Jan 15 22:36:42 Mediaserver kernel: Call Trace: Jan 15 22:36:42 Mediaserver kernel: <IRQ> Jan 15 22:36:42 Mediaserver kernel: dump_stack+0x6b/0x83 Jan 15 22:36:42 Mediaserver kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Jan 15 22:36:42 Mediaserver kernel: nmi_cpu_backtrace+0x7d/0x8f Jan 15 22:36:42 Mediaserver kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Jan 15 22:36:42 Mediaserver kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Jan 15 22:36:42 Mediaserver kernel: rcu_sched_clock_irq+0x1ec/0x543 Jan 15 22:36:42 Mediaserver kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe Jan 15 22:36:42 Mediaserver kernel: update_process_times+0x50/0x6e Jan 15 22:36:42 Mediaserver kernel: tick_sched_timer+0x36/0x64 Jan 15 22:36:42 Mediaserver kernel: __hrtimer_run_queues+0xb7/0x10b Jan 15 22:36:42 Mediaserver kernel: ? tick_sched_do_timer+0x39/0x39 Jan 15 22:36:42 Mediaserver kernel: hrtimer_interrupt+0x8d/0x160 Jan 15 22:36:42 Mediaserver kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Jan 15 22:36:42 Mediaserver kernel: asm_call_irq_on_stack+0x12/0x20 Jan 15 22:36:42 Mediaserver kernel: </IRQ> Jan 15 22:36:42 Mediaserver kernel: sysvec_apic_timer_interrupt+0x71/0x95 Jan 15 22:36:42 Mediaserver kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Jan 15 22:36:42 Mediaserver kernel: RIP: 0010:gc_worker+0xf4/0x240 Jan 15 22:36:42 Mediaserver kernel: Code: 5c 26 05 41 89 47 08 e9 bc 00 00 00 48 8b 15 ec 05 a4 00 29 d0 85 c0 7f 11 4c 89 ff e8 10 f0 ff ff ff 44 24 08 e9 9e 00 00 00 <85> db 0f 84 96 00 00 00 49 8b 87 80 00 00 00 a8 08 0f 84 87 00 00 Jan 15 22:36:42 Mediaserver kernel: RSP: 0018:ffffc9000525fe48 EFLAGS: 00000206 Jan 15 22:36:42 Mediaserver kernel: RAX: 0000000001447690 RBX: 0000000000000000 RCX: ffff888103000000 Jan 15 22:36:42 Mediaserver kernel: RDX: 00000001014602fe RSI: ffffc9000525fe5c RDI: ffff88840da28548 Jan 15 22:36:42 Mediaserver kernel: RBP: 000000000000c386 R08: 0000000000000000 R09: ffffffff815c56ac Jan 15 22:36:42 Mediaserver kernel: R10: 8080808080808080 R11: ffff88830e1fa780 R12: ffffffff82547ec0 Jan 15 22:36:42 Mediaserver kernel: R13: 000000009fd57c44 R14: ffff88840da28548 R15: ffff88840da28500 Jan 15 22:36:42 Mediaserver kernel: ? nf_conntrack_free+0x2b/0x35 Jan 15 22:36:42 Mediaserver kernel: ? gc_worker+0x9a/0x240 Jan 15 22:36:42 Mediaserver kernel: process_one_work+0x13c/0x1d5 Jan 15 22:36:42 Mediaserver kernel: worker_thread+0x18b/0x22f Jan 15 22:36:42 Mediaserver kernel: ? process_scheduled_works+0x27/0x27 Jan 15 22:36:42 Mediaserver kernel: kthread+0xe5/0xea Jan 15 22:36:42 Mediaserver kernel: ? kthread_unpark+0x52/0x52 Jan 15 22:36:42 Mediaserver kernel: ret_from_fork+0x22/0x30 Quote Link to comment
JorgeB Posted January 16, 2021 Share Posted January 16, 2021 3 hours ago, gdeyoung said: Any idea from this trace? No, but look for a BIOS update and/or try a different release(with a different kernel). Quote Link to comment
gdeyoung Posted January 20, 2021 Author Share Posted January 20, 2021 (edited) Ok to update this thread. I tried going back to 6.8.3 on the 2nd and 3rd of my 4 servers that are kernel panicking and they still having panics and crashes daily. My only server that is not experiencing any issues is my 4thone that is 1G connected one. All of my 10G are panicking, and I have replaced the nics to intel server class 10g nics. I finally took my 2nd server back to a 1G connection to see if that stays stable. I have more log snippets from the 10G servers. It looks like they are also having a native_queued_spin_lock_slowpath error in the panic. Call Trace: Jan 19 12:52:28 Mediaserver kernel: <IRQ> Jan 19 12:52:28 Mediaserver kernel: dump_stack+0x67/0x83 Jan 19 12:52:28 Mediaserver kernel: nmi_cpu_backtrace+0x71/0x83 Jan 19 12:52:28 Mediaserver kernel: ? lapic_can_unplug_cpu+0x97/0x97 Jan 19 12:52:28 Mediaserver kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Jan 19 12:52:28 Mediaserver kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Jan 19 12:52:28 Mediaserver kernel: rcu_check_callbacks+0x296/0x5a0 Jan 19 12:52:28 Mediaserver kernel: update_process_times+0x24/0x47 Jan 19 12:52:28 Mediaserver kernel: tick_sched_timer+0x36/0x64 Jan 19 12:52:28 Mediaserver kernel: __hrtimer_run_queues+0xb7/0x10b Jan 19 12:52:28 Mediaserver kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Jan 19 12:52:28 Mediaserver kernel: hrtimer_interrupt+0xf4/0x20e Jan 19 12:52:28 Mediaserver kernel: smp_apic_timer_interrupt+0x7b/0x93 Jan 19 12:52:28 Mediaserver kernel: apic_timer_interrupt+0xf/0x20 Jan 19 12:52:28 Mediaserver kernel: </IRQ> RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171 Jan 19 12:52:28 Mediaserver kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 80 6a f8 Jan 19 12:52:28 Mediaserver kernel: RSP: 0018:ffffc90003ce3b88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Jan 19 12:52:28 Mediaserver kernel: RAX: 00000000001c0101 RBX: ffffc90003ce3c10 RCX: 000ffffffffff000 Jan 19 12:52:28 Mediaserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffea002085d368 Jan 19 12:52:28 Mediaserver kernel: RBP: ffffea0004d77200 R08: ffff888000000000 R09: ffffea0004d77240 Jan 19 12:52:28 Mediaserver kernel: R10: 0000000000000008 R11: 0000000000023eb8 R12: ffffea0004d77200 Jan 19 12:52:28 Mediaserver kernel: R13: ffff8882ed6dc400 R14: ffffea0004d77200 R15: ffff888114684600 Edited January 20, 2021 by gdeyoung Quote Link to comment
gdeyoung Posted January 21, 2021 Author Share Posted January 21, 2021 (edited) Server 3 just panic'd again. again this is a 10G server. also on it's second 10G Intel nic. It appears the panics happen more under large file copy loads on the 10G connection. Will move it back to 1G to see if it makes a difference. Jan 19 16:27:05 Homeserver kernel: Call Trace: Jan 19 16:27:05 Homeserver kernel: <IRQ> Jan 19 16:27:05 Homeserver kernel: dump_stack+0x67/0x83 Jan 19 16:27:05 Homeserver kernel: nmi_cpu_backtrace+0x71/0x83 Jan 19 16:27:05 Homeserver kernel: ? lapic_can_unplug_cpu+0x97/0x97 Jan 19 16:27:05 Homeserver kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4 Jan 19 16:27:05 Homeserver kernel: rcu_dump_cpu_stacks+0x8b/0xb4 Jan 19 16:27:05 Homeserver kernel: rcu_check_callbacks+0x296/0x5a0 Jan 19 16:27:05 Homeserver kernel: update_process_times+0x24/0x47 Jan 19 16:27:05 Homeserver kernel: tick_sched_timer+0x36/0x64 Jan 19 16:27:05 Homeserver kernel: __hrtimer_run_queues+0xb7/0x10b Jan 19 16:27:05 Homeserver kernel: ? tick_sched_handle.isra.0+0x2f/0x2f Jan 19 16:27:05 Homeserver kernel: hrtimer_interrupt+0xf4/0x20e Jan 19 16:27:05 Homeserver kernel: smp_apic_timer_interrupt+0x7b/0x93 Jan 19 16:27:05 Homeserver kernel: apic_timer_interrupt+0xf/0x20 Jan 19 16:27:05 Homeserver kernel: </IRQ> Jan 19 16:27:05 Homeserver kernel: RIP: 0010:gc_worker+0xad/0x270 Jan 19 16:27:05 Homeserver kernel: Code: f6 c6 01 0f 85 4a 01 00 00 41 0f b6 46 37 49 c7 c0 f0 ff ff ff 41 ff c5 48 6b c0 38 49 29 c0 4f 8d 3c 06 49 8b 97 80 00 00 00 <41> 8b 87 88 00 00 00 0f ba e2 0e 73 2c 48 8b 15 ce dc 88 00 29 d0 Jan 19 16:27:05 Homeserver kernel: RSP: 0018:ffffc9001683fe60 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff13 Jan 19 16:27:05 Homeserver kernel: RAX: 0000000000000038 RBX: 0000000000000000 RCX: 0000000000010000 Jan 19 16:27:05 Homeserver kernel: RDX: 0000000000000188 RSI: 00000000000000ad RDI: ffff8887f610d500 Jan 19 16:27:05 Homeserver kernel: RBP: 0000000000005aae R08: ffffffffffffffb8 R09: ffffffff81574c00 Jan 19 16:27:05 Homeserver kernel: R10: ffffea000edc5700 R11: ffff8887f610d501 R12: ffffffff822aa760 Jan 19 16:27:05 Homeserver kernel: R13: 00000000dba74d6c R14: ffff8887abf8ca48 R15: ffff8887abf8ca00 Jan 19 16:27:05 Homeserver kernel: ? nf_ct_get_id+0x80/0xb7 Jan 19 16:27:05 Homeserver kernel: process_one_work+0x16e/0x24f Jan 19 16:27:05 Homeserver kernel: worker_thread+0x1e2/0x2b8 Jan 19 16:27:05 Homeserver kernel: ? rescuer_thread+0x2a7/0x2a7 Jan 19 16:27:05 Homeserver kernel: kthread+0x10c/0x114 Jan 19 16:27:05 Homeserver kernel: ? kthread_park+0x89/0x89 Jan 19 16:27:05 Homeserver kernel: ret_from_fork+0x22/0x40 Edited January 21, 2021 by gdeyoung Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 So 2 days ago I switched the my 2nd server from 10g to 1G. 1 day ago I switched my 3rd server to 1G from 10G. These are all different hardware machines Intel & Ryzen running a combo of 6.8.3 and 6.9rc2. All of my servers on 10G (all on their swapped out/2nd 10G NIC) kernel panics under heavy/sustained file copy within 24hrs. Without heavy file load they will panic under 72hrs. I have reworked network and simplified network configs. I have up to date bios on mobo's. It all comes down to sustained load on the 10G Intel and Aquantia nics. I have even three 3 different 10G switches, new 10G DAC cables, 10000base-T transcievers with Cat-7. It all comes back to there is something in the kernel that isn't right with heavy 10G network loads and causes panics. One thing I'm seeing is native_queued_spin_lock_slowpath errors before the full panic but I'm not seeing high CPU loads. Found these two articles/posts that might have some relevance. High CPU load by native_queued_spin_lock_slowpath (linuxquestions.org) The need for speed and the kernel datapath - recent improvements in UDP packets processing - Red Hat Developer What can be done to get 10G working in a stable fashion with sustained file copy loads? The whole reason for 10G... @limetech @JorgeB Quote Link to comment
JorgeB Posted January 22, 2021 Share Posted January 22, 2021 3 hours ago, gdeyoung said: the kernel that isn't right with heavy 10G network loads and causes panics. Seems to me like you're possibly generalizing and jumping to conclusions, I've been using 10GbE with various workloads for years in all my servers without issues, as well as many other users, since you never posted diags what NICs are you using? Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 @JorgeB Thank for continuing to engage, I really appreciate it. I have completed troubleshooting to try and localize down the issues. I have completely rebuilt two of the four servers with new components the only remaining thing is the Drives and still get the panic issues. I have swapped out all of the all of the network gear, three different 10G switches, new cables. I have removed all external items or replaced several times with new and still get the panics. In the last couple of days I switched two of the servers back to 1G and they are rock solid with no issues and are not chatty in the logs. Where with the 10G I was getting a variety of things pop up in logs every hour. This is not my first post on this. In my previous post I did post full diags and got NO replies. I DM'd @limetech for help and still silence. So I am trying, I really would like to get this working. All I have are the panic traces to go on now and don't have the knowledge to trouble shoot at that level. Here are the two types of NIC's I have used that are supposed to be fully supported. TRENDnet - TRENDnet TEG-10GECSFP - SFP+ Aquantia chipset Supermicro AOC-STGF-i2S - Dual SFP+ Intel chipset Quote Link to comment
JorgeB Posted January 22, 2021 Share Posted January 22, 2021 There have been some issues with Aquantia NICs, Intel should be OK, I use Mellanox myself. Quote Link to comment
JorgeB Posted January 22, 2021 Share Posted January 22, 2021 To expand a little, I was just trying to point out that IMHO it won't be a general kernel issue, it could be a driver issue, just recently due to me re-organizing multiple servers, I transferred over 100TB using 6 different servers, all using 10GbE (though all Mellanox) at an average speed of around 400MB/s without any issues, and it's not the first time I do similar large transfers with Unraid. Quote Link to comment
JorgeB Posted January 22, 2021 Share Posted January 22, 2021 Also did you ever try v6.8.3? Is it the same? Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 (edited) Yes, the Intel NIC seem to be more stable. I was also having issues with some of my Windows PC with Aquantia 10G and transfers, so I switched to Intel 10G across the board. Yes, rolled back to 6.8.3 and had the same issues for both nics. My other observation is the panics are happening on the ingest servers where I copy files to more often. Edited January 22, 2021 by gdeyoung Quote Link to comment
Vr2Io Posted January 22, 2021 Share Posted January 22, 2021 (edited) Seldom get report have call trace cause by 10G NIC only and change to 1G then fine, suppose some setting issue relate. Could you try safe mode ( no plugin / docker ) ? Any network end use jumbo frame ? I use Intel / Emulex / Mellanox haven't issue at Unraid / Windows. Edited January 22, 2021 by Vr2Io Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 They have the default 1500 MTU on the 10G nic. Should I be using 9000 for the jumbo frames? Quote Link to comment
Vr2Io Posted January 22, 2021 Share Posted January 22, 2021 Just now, gdeyoung said: They have the default 1500 MTU on the 10G nic. Should I be using 9000 for the jumbo frames? Not suggest, 1500 MTU fine. Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 Ok, I put one of the servers in safe mode on 10G and doing some file copies. Quote Link to comment
gdeyoung Posted January 22, 2021 Author Share Posted January 22, 2021 The 3rd server is in safe mode and still going I switched the 2nd server back to 10G in normal mode with no file copies and it panic'd in 20 minutes. diag's attached mediaserver-diagnostics-20210122-1520.zip Quote Link to comment
Vr2Io Posted January 23, 2021 Share Posted January 23, 2021 (edited) 8 hours ago, gdeyoung said: The 3rd server is in safe mode and still going I switched the 2nd server back to 10G in normal mode with no file copies and it panic'd in 20 minutes. diag's attached mediaserver-diagnostics-20210122-1520.zip 138.69 kB · 0 downloads A good milestone on 3rd server. For 2nd server ( Asrock Z390 ), pls try update BIOS ( 2.3 quite old, I previous use Asrock Z390 Taichi with BIOS 4.3 haven't issue ). Then try safe mode too, this is ensure hardware work with minimal software in first. Edited January 23, 2021 by Vr2Io Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.