[SOLVED] 6.9-RC2 Kernal panic and trace


Recommended Posts

Can anyone take a look and let me know what this kernel panic is caused by.  Here is the trace from the syslog.  I'm actually getting these somewhat regularlly on three different 6.9-RC@ unraid servers.

 

 

 

Jan 12 07:15:08 Homeserver kernel: ------------[ cut here ]------------
Jan 12 07:15:08 Homeserver kernel: WARNING: CPU: 0 PID: 0 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x99/0x1e1
Jan 12 07:15:08 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd mxm_wmi raid_class wmi_bmof cryptd i2c_piix4 scsi_transport_sas wmi glue_helper i2c_core k10temp ccp ahci rapl libahci button acpi_cpufreq [last unloaded: mdio]
Jan 12 07:15:08 Homeserver kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P S         O      5.10.1-Unraid #1
Jan 12 07:15:08 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018
Jan 12 07:15:08 Homeserver kernel: RIP: 0010:__nf_conntrack_confirm+0x99/0x1e1
Jan 12 07:15:08 Homeserver kernel: Code: e4 e3 ff ff 8b 54 24 14 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 54 e1 ff ff 84 c0 75 b8 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 89 de ff ff e8 af e0 ff ff e9 1f 01
Jan 12 07:15:08 Homeserver kernel: RSP: 0018:ffffc90000003898 EFLAGS: 00010202
Jan 12 07:15:08 Homeserver kernel: RAX: 0000000000000188 RBX: 00000000000034f1 RCX: 000000003317c1ec
Jan 12 07:15:08 Homeserver kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8200a078
Jan 12 07:15:08 Homeserver kernel: RBP: ffff8882471a2f00 R08: 000000005cf7db60 R09: ffff888182b03440
Jan 12 07:15:08 Homeserver kernel: R10: 0000000000000158 R11: ffff888104bed100 R12: 000000000000d20e
Jan 12 07:15:08 Homeserver kernel: R13: ffffffff8210da40 R14: 00000000000034f1 R15: ffff8882471a2f0c
Jan 12 07:15:08 Homeserver kernel: FS:  0000000000000000(0000) GS:ffff8887fec00000(0000) knlGS:0000000000000000
Jan 12 07:15:08 Homeserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 07:15:08 Homeserver kernel: CR2: 0000145dc4003428 CR3: 00000001aceda000 CR4: 00000000003506f0
Jan 12 07:15:08 Homeserver kernel: Call Trace:
Jan 12 07:15:08 Homeserver kernel: <IRQ>
Jan 12 07:15:08 Homeserver kernel: nf_conntrack_confirm+0x2f/0x36
Jan 12 07:15:08 Homeserver kernel: nf_hook_slow+0x39/0x8e
Jan 12 07:15:08 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8
Jan 12 07:15:08 Homeserver kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
Jan 12 07:15:08 Homeserver kernel: ip_local_deliver+0x49/0x75
Jan 12 07:15:08 Homeserver kernel: ip_sabotage_in+0x43/0x4d
Jan 12 07:15:08 Homeserver kernel: nf_hook_slow+0x39/0x8e
Jan 12 07:15:08 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8
Jan 12 07:15:08 Homeserver kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50
Jan 12 07:15:08 Homeserver kernel: ip_rcv+0x41/0x61
Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_one_core+0x74/0x95
Jan 12 07:15:08 Homeserver kernel: netif_receive_skb+0x79/0xa1
Jan 12 07:15:08 Homeserver kernel: br_handle_frame_finish+0x30d/0x351
Jan 12 07:15:08 Homeserver kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables]
Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 12 07:15:08 Homeserver kernel: br_nf_hook_thresh+0xa3/0xc3
Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 12 07:15:08 Homeserver kernel: br_nf_pre_routing_finish+0x23d/0x264
Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 12 07:15:08 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351
Jan 12 07:15:08 Homeserver kernel: ? nf_nat_ipv4_in+0x1e/0x4a [nf_nat]
Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 12 07:15:08 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351
Jan 12 07:15:08 Homeserver kernel: NF_HOOK+0xd7/0xf7
Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 12 07:15:08 Homeserver kernel: br_nf_pre_routing+0x229/0x239
Jan 12 07:15:08 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 12 07:15:08 Homeserver kernel: br_handle_frame+0x25e/0x2a6
Jan 12 07:15:08 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_core+0x335/0x4e7
Jan 12 07:15:08 Homeserver kernel: ? find_busiest_group+0x3b/0x2bc
Jan 12 07:15:08 Homeserver kernel: __netif_receive_skb_list_core+0x78/0x104
Jan 12 07:15:08 Homeserver kernel: netif_receive_skb_list_internal+0x1bf/0x1f2
Jan 12 07:15:08 Homeserver kernel: ? dev_gro_receive+0x55d/0x578
Jan 12 07:15:08 Homeserver kernel: gro_normal_list+0x1d/0x39
Jan 12 07:15:08 Homeserver kernel: napi_complete_done+0x79/0x104
Jan 12 07:15:08 Homeserver kernel: ixgbe_poll+0xc95/0xd4f [ixgbe]
Jan 12 07:15:08 Homeserver kernel: ? update_cfs_rq_load_avg+0x14b/0x154
Jan 12 07:15:08 Homeserver kernel: net_rx_action+0xf4/0x29d
Jan 12 07:15:08 Homeserver kernel: __do_softirq+0xc4/0x1c2
Jan 12 07:15:08 Homeserver kernel: asm_call_irq_on_stack+0x12/0x20
Jan 12 07:15:08 Homeserver kernel: </IRQ>
Jan 12 07:15:08 Homeserver kernel: do_softirq_own_stack+0x2c/0x39
Jan 12 07:15:08 Homeserver kernel: __irq_exit_rcu+0x45/0x80
Jan 12 07:15:08 Homeserver kernel: common_interrupt+0x119/0x12e
Jan 12 07:15:08 Homeserver kernel: asm_common_interrupt+0x1e/0x40
Jan 12 07:15:08 Homeserver kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8
Jan 12 07:15:08 Homeserver kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5
Jan 12 07:15:08 Homeserver kernel: RSP: 0018:ffffffff82003e60 EFLAGS: 00000246
Jan 12 07:15:08 Homeserver kernel: RAX: ffff8887fec22300 RBX: 0000000000000002 RCX: 000000000000001f
Jan 12 07:15:08 Homeserver kernel: RDX: 0000000000000000 RSI: 00000000229839cd RDI: 0000000000000000
Jan 12 07:15:08 Homeserver kernel: RBP: ffff888101ae5400 R08: 000016ef236af812 R09: 0000000000000000
Jan 12 07:15:08 Homeserver kernel: R10: 0000000000004653 R11: 071c71c71c71c71c R12: 000016ef236af812
Jan 12 07:15:08 Homeserver kernel: R13: ffffffff820cad00 R14: 0000000000000002 R15: 0000000000000000
Jan 12 07:15:08 Homeserver kernel: cpuidle_enter_state+0x101/0x1c4
Jan 12 07:15:08 Homeserver kernel: cpuidle_enter+0x25/0x31
Jan 12 07:15:08 Homeserver kernel: do_idle+0x1a1/0x20f
Jan 12 07:15:08 Homeserver kernel: cpu_startup_entry+0x18/0x1a
Jan 12 07:15:08 Homeserver kernel: start_kernel+0x4c0/0x4e3
Jan 12 07:15:08 Homeserver kernel: secondary_startup_64_no_verify+0xb0/0xbb
Jan 12 07:15:08 Homeserver kernel: ---[ end trace 20ff384d3480aa64 ]---

Link to comment

Happened again 24hrs later.   Page faulted:

 

Jan 11 08:30:00 Homeserver kernel: BUG: unable to handle page fault for address: 00000000000053d8

 

Whole trace:

 

an 11 04:07:30 Homeserver kernel: br0: port 1(bond0) entered forwarding state
Jan 11 04:08:25 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update
Jan 11 04:15:05 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 04:15:05 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 04:15:06 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 04:15:07 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 04:15:07 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 04:15:06 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0)
Jan 11 05:08:32 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update
Jan 11 06:08:38 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update
Jan 11 07:04:35 Homeserver vsftpd[1787]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:35 Homeserver sshd[1788]: Connection from 192.168.1.1 port 50482 on 192.168.1.251 port 22 rdomain ""
Jan 11 07:04:35 Homeserver sshd[1788]: error: kex_exchange_identification: Connection closed by remote host
Jan 11 07:04:35 Homeserver sshd[1788]: Connection closed by 192.168.1.1 port 50482
Jan 11 07:04:35 Homeserver vsftpd[1794]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:35 Homeserver vsftpd[1796]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:35 Homeserver vsftpd[1798]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:35 Homeserver vsftpd[1800]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:35 Homeserver vsftpd[1802]: connect from 192.168.1.1 (192.168.1.1)
Jan 11 07:04:46 Homeserver smbd[1790]: [2021/01/11 07:04:46.002732,  0] ../../source3/smbd/process.c:341(read_packet_remainder)
Jan 11 07:04:46 Homeserver smbd[1790]:   read_fd_with_timeout failed for client 192.168.1.1 read error = NT_STATUS_END_OF_FILE.
Jan 11 07:08:45 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update
Jan 11 08:08:51 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update
Jan 11 08:30:00 Homeserver kernel: BUG: unable to handle page fault for address: 00000000000053d8
Jan 11 08:30:00 Homeserver kernel: #PF: supervisor write access in kernel mode
Jan 11 08:30:00 Homeserver kernel: #PF: error_code(0x0002) - not-present page
Jan 11 08:30:00 Homeserver kernel: PGD 0 P4D 0 
Jan 11 08:30:00 Homeserver kernel: Oops: 0002 [#1] SMP NOPTI
Jan 11 08:30:00 Homeserver kernel: CPU: 6 PID: 11516 Comm: php7 Tainted: P S      W  O      5.10.1-Unraid #1
Jan 11 08:30:00 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018
Jan 11 08:30:00 Homeserver kernel: RIP: 0010:slab_post_alloc_hook+0xe1/0x14a
Jan 11 08:30:00 Homeserver kernel: Code: 0b 48 8b 55 08 f0 48 83 02 01 eb 04 65 48 ff 02 48 89 44 24 08 e8 da 4b f6 ff 49 8b 57 38 48 89 ef 48 8b 44 24 08 48 83 e2 fe <48> 89 2c c2 49 8b 07 8b 53 08 8b 4b 18 48 c1 e8 3a 48 8b 34 c5 a0
Jan 11 08:30:00 Homeserver kernel: RSP: 0000:ffffc90008cbfd90 EFLAGS: 00010202
Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000a79 RBX: ffff888100045500 RCX: 0000000000000005
Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000010 RSI: ffff888187c80000 RDI: ffff888164310f40
Jan 11 08:30:00 Homeserver kernel: RBP: ffff888164310f40 R08: ffffc90008cbfdd8 R09: 0000000080000000
Jan 11 08:30:00 Homeserver kernel: R10: ffffea00061f2000 R11: 000000000000ffff R12: 0000000000000000
Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000cc0 R14: ffffc90008cbfdd8 R15: ffffea00061f2000
Jan 11 08:30:00 Homeserver kernel: FS:  000014d90dc12d48(0000) GS:ffff8887fed80000(0000) knlGS:0000000000000000
Jan 11 08:30:00 Homeserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8 CR3: 0000000272046000 CR4: 00000000003506e0
Jan 11 08:30:00 Homeserver kernel: Call Trace:
Jan 11 08:30:00 Homeserver kernel: ? __anon_vma_prepare+0x2b/0x110
Jan 11 08:30:00 Homeserver kernel: kmem_cache_alloc+0x108/0x130
Jan 11 08:30:00 Homeserver kernel: __anon_vma_prepare+0x2b/0x110
Jan 11 08:30:00 Homeserver kernel: handle_mm_fault+0xd5d/0xec3
Jan 11 08:30:00 Homeserver kernel: exc_page_fault+0x253/0x36d
Jan 11 08:30:00 Homeserver kernel: ? asm_exc_page_fault+0x8/0x30
Jan 11 08:30:00 Homeserver kernel: asm_exc_page_fault+0x1e/0x30
Jan 11 08:30:00 Homeserver kernel: RIP: 0033:0x14d90dbcddc0
Jan 11 08:30:00 Homeserver kernel: Code: 27 48 89 47 2f 48 89 47 37 48 89 44 17 c1 48 89 44 17 c9 48 89 44 17 d1 48 89 44 17 d9 48 89 f8 c3 f7 c7 0f 00 00 00 49 89 f8 <48> 89 44 17 f8 48 89 d1 75 0b 48 c1 e9 03 f3 48 ab 4c 89 c0 c3 31
Jan 11 08:30:00 Homeserver kernel: RSP: 002b:00007ffd4698f118 EFLAGS: 00010206
Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000000 RBX: 00007ffd4698f258 RCX: 000014d90dbb2c32
Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000f14 RSI: 0000000000000000 RDI: 000014d90c9910ec
Jan 11 08:30:00 Homeserver kernel: RBP: 00007ffd4698f630 R08: 000014d90c9910ec R09: 0000000000012000
Jan 11 08:30:00 Homeserver kernel: R10: 0000000000000012 R11: 000014d90c992000 R12: 0000000000016000
Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
Jan 11 08:30:00 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_piix4 crypto_simd i2c_core mxm_wmi mpt3sas wmi_bmof cryptd ahci ccp raid_class scsi_transport_sas glue_helper wmi k10temp libahci rapl button acpi_cpufreq [last unloaded: mdio]
Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8
Jan 11 08:30:00 Homeserver kernel: ---[ end trace 5ed146f2988e06a2 ]---
Jan 11 08:30:00 Homeserver kernel: RIP: 0010:slab_post_alloc_hook+0xe1/0x14a
Jan 11 08:30:00 Homeserver kernel: Code: 0b 48 8b 55 08 f0 48 83 02 01 eb 04 65 48 ff 02 48 89 44 24 08 e8 da 4b f6 ff 49 8b 57 38 48 89 ef 48 8b 44 24 08 48 83 e2 fe <48> 89 2c c2 49 8b 07 8b 53 08 8b 4b 18 48 c1 e8 3a 48 8b 34 c5 a0
Jan 11 08:30:00 Homeserver kernel: RSP: 0000:ffffc90008cbfd90 EFLAGS: 00010202
Jan 11 08:30:00 Homeserver kernel: RAX: 0000000000000a79 RBX: ffff888100045500 RCX: 0000000000000005
Jan 11 08:30:00 Homeserver kernel: RDX: 0000000000000010 RSI: ffff888187c80000 RDI: ffff888164310f40
Jan 11 08:30:00 Homeserver kernel: RBP: ffff888164310f40 R08: ffffc90008cbfdd8 R09: 0000000080000000
Jan 11 08:30:00 Homeserver kernel: R10: ffffea00061f2000 R11: 000000000000ffff R12: 0000000000000000
Jan 11 08:30:00 Homeserver kernel: R13: 0000000000000cc0 R14: ffffc90008cbfdd8 R15: ffffea00061f2000
Jan 11 08:30:00 Homeserver kernel: FS:  000014d90dc12d48(0000) GS:ffff8887fed80000(0000) knlGS:0000000000000000
Jan 11 08:30:00 Homeserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 11 08:30:00 Homeserver kernel: CR2: 00000000000053d8 CR3: 0000000272046000 CR4: 00000000003506e0
Jan 11 08:44:18 Homeserver kernel: general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#2] SMP NOPTI
Jan 11 08:44:18 Homeserver kernel: CPU: 12 PID: 0 Comm: swapper/12 Tainted: P S    D W  O      5.10.1-Unraid #1
Jan 11 08:44:18 Homeserver kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.20 04/24/2018
Jan 11 08:44:18 Homeserver kernel: RIP: 0010:nf_nat_setup_info+0x5f3/0x652 [nf_nat]
Jan 11 08:44:18 Homeserver kernel: Code: 8b 05 01 56 00 00 48 8d 8b 98 00 00 00 4a 8d 14 e0 48 8b 02 48 89 93 a0 00 00 00 48 89 83 98 00 00 00 48 85 c0 48 89 0a 74 04 <48> 89 48 08 4c 89 ef e8 bf 34 5b e1 eb 13 41 83 fc 01 75 0d 48 81
Jan 11 08:44:18 Homeserver kernel: RSP: 0018:ffffc900003a0838 EFLAGS: 00010286
Jan 11 08:44:18 Homeserver kernel: RAX: dead000000000122 RBX: ffff88826a62c3c0 RCX: ffff88826a62c458
Jan 11 08:44:18 Homeserver kernel: RDX: ffff888187ca9e60 RSI: 00000000bdba70db RDI: ffffffffa0146550
Jan 11 08:44:18 Homeserver kernel: RBP: ffffc900003a0900 R08: 000000003221cea3 R09: ffff888235dfdf80
Jan 11 08:44:18 Homeserver kernel: R10: 0000000000000098 R11: ffff88826ca1df00 R12: 00000000000053cc
Jan 11 08:44:18 Homeserver kernel: R13: ffffffffa0146550 R14: ffffc900003a0914 R15: 0000000000000001
Jan 11 08:44:18 Homeserver kernel: FS:  0000000000000000(0000) GS:ffff8887fef00000(0000) knlGS:0000000000000000
Jan 11 08:44:18 Homeserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 11 08:44:18 Homeserver kernel: CR2: 000034f46d1b1000 CR3: 00000001e1848000 CR4: 00000000003506e0
Jan 11 08:44:18 Homeserver kernel: Call Trace:
Jan 11 08:44:18 Homeserver kernel: <IRQ>
Jan 11 08:44:18 Homeserver kernel: ? ipt_do_table+0x4a2/0x5c0 [ip_tables]
Jan 11 08:44:18 Homeserver kernel: nf_nat_alloc_null_binding+0x71/0x88 [nf_nat]
Jan 11 08:44:18 Homeserver kernel: nf_nat_inet_fn+0x91/0x182 [nf_nat]
Jan 11 08:44:18 Homeserver kernel: nf_hook_slow+0x39/0x8e
Jan 11 08:44:18 Homeserver kernel: nf_hook.constprop.0+0xb1/0xd8
Jan 11 08:44:18 Homeserver kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
Jan 11 08:44:18 Homeserver kernel: ip_local_deliver+0x49/0x75
Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_one_core+0x74/0x95
Jan 11 08:44:18 Homeserver kernel: netif_receive_skb+0x79/0xa1
Jan 11 08:44:18 Homeserver kernel: br_handle_frame_finish+0x30d/0x351
Jan 11 08:44:18 Homeserver kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables]
Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 11 08:44:18 Homeserver kernel: br_nf_hook_thresh+0xa3/0xc3
Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 11 08:44:18 Homeserver kernel: br_nf_pre_routing_finish+0x23d/0x264
Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 11 08:44:18 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351
Jan 11 08:44:18 Homeserver kernel: ? nf_nat_ipv4_in+0x1e/0x4a [nf_nat]
Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 11 08:44:18 Homeserver kernel: ? br_handle_frame_finish+0x351/0x351
Jan 11 08:44:18 Homeserver kernel: NF_HOOK+0xd7/0xf7
Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 11 08:44:18 Homeserver kernel: br_nf_pre_routing+0x229/0x239
Jan 11 08:44:18 Homeserver kernel: ? br_nf_forward_finish+0xd0/0xd0
Jan 11 08:44:18 Homeserver kernel: br_handle_frame+0x25e/0x2a6
Jan 11 08:44:18 Homeserver kernel: ? br_pass_frame_up+0xda/0xda
Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_core+0x335/0x4e7
Jan 11 08:44:18 Homeserver kernel: __netif_receive_skb_list_core+0x78/0x104
Jan 11 08:44:18 Homeserver kernel: netif_receive_skb_list_internal+0x1bf/0x1f2
Jan 11 08:44:18 Homeserver kernel: ? dev_gro_receive+0x55d/0x578
Jan 11 08:44:18 Homeserver kernel: gro_normal_list+0x1d/0x39
Jan 11 08:44:18 Homeserver kernel: napi_complete_done+0x79/0x104
Jan 11 08:44:18 Homeserver kernel: ixgbe_poll+0xc95/0xd4f [ixgbe]
Jan 11 08:44:18 Homeserver kernel: net_rx_action+0xf4/0x29d
Jan 11 08:44:18 Homeserver kernel: __do_softirq+0xc4/0x1c2
Jan 11 08:44:18 Homeserver kernel: asm_call_irq_on_stack+0x12/0x20
Jan 11 08:44:18 Homeserver kernel: </IRQ>
Jan 11 08:44:18 Homeserver kernel: do_softirq_own_stack+0x2c/0x39
Jan 11 08:44:18 Homeserver kernel: __irq_exit_rcu+0x45/0x80
Jan 11 08:44:18 Homeserver kernel: common_interrupt+0x119/0x12e
Jan 11 08:44:18 Homeserver kernel: asm_common_interrupt+0x1e/0x40
Jan 11 08:44:18 Homeserver kernel: RIP: 0010:arch_local_irq_enable+0x7/0x8
Jan 11 08:44:18 Homeserver kernel: Code: 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 d2 39 c5
Jan 11 08:44:18 Homeserver kernel: RSP: 0018:ffffc9000016fea0 EFLAGS: 00000246
Jan 11 08:44:18 Homeserver kernel: RAX: ffff8887fef22300 RBX: 0000000000000002 RCX: 000000000000001f
Jan 11 08:44:18 Homeserver kernel: RDX: 0000000000000000 RSI: 0000000022983893 RDI: 0000000000000000
Jan 11 08:44:18 Homeserver kernel: RBP: ffff8881059e0c00 R08: 0000b5548bebb872 R09: 0000b55487f71fc0
Jan 11 08:44:18 Homeserver kernel: R10: 00000000000012f5 R11: 071c71c71c71c71c R12: 0000b5548bebb872
Jan 11 08:44:18 Homeserver kernel: R13: ffffffff820cad00 R14: 0000000000000002 R15: 0000000000000000
Jan 11 08:44:18 Homeserver kernel: cpuidle_enter_state+0x101/0x1c4
Jan 11 08:44:18 Homeserver kernel: cpuidle_enter+0x25/0x31
Jan 11 08:44:18 Homeserver kernel: do_idle+0x1a1/0x20f
Jan 11 08:44:18 Homeserver kernel: cpu_startup_entry+0x18/0x1a
Jan 11 08:44:18 Homeserver kernel: secondary_startup_64_no_verify+0xb0/0xbb
Jan 11 08:44:18 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect nvidia_uvm(PO) sysimgblt fb_sys_fops nvidia(PO) bonding ixgbe mdio igb i2c_algo_bit edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_piix4 crypto_simd i2c_core mxm_wmi mpt3sas wmi_bmof cryptd ahci ccp raid_class scsi_transport_sas glue_helper wmi k10temp libahci rapl button acpi_cpufreq [last unloaded: mdio]
Jan 11 08:44:18 Homeserver kernel: ---[ end trace 5ed146f2988e06a3 ]---

Link to comment

Does the above traces connect with the Nvidia driver at all?  I'm seeing this in the log this morning after a reboot repeated a lot.

 

Jan 15 10:12:30 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:30 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:32 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:32 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:34 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:34 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:35 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:35 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:36 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:36 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:38 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:38 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Jan 15 10:12:39 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

Jan 15 10:12:39 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs 

Link to comment

So I have a MB with a integrated 1G ethernet that is mounted as Eth0

I have a Intel 10G 2port SFP+ card that is mounted at Eth1 and Eth2

I have a single DAC cable in Eth1 of the 10G card

It is configured as a active bridge on br0 for Eth0, Eth1, Eth2

This is all the default config

 

I went into the BIOS and turned off the built in 1G card mounted as Eth0

I wanted the system to default to the port 0 of the 10G as Eth0

On bootup it has an error that Eth0 can't be found.

How do I make the server forget the disabled 1G port and make the 10G port 0 as Eth0?

Link to comment

Ok, I figured it out.  I was going about it backwards.

 

In the network settings you can arrange the MAC addresses of the NIC's to what Eth port you want to assign them to.  I just rearranged the port 0 MAC address to the Eth0 configuration

 

To simplify the networking I turned off the bond for Eth 0-2 that was set to active-passive (that was the unraid default BTW) .  I'm betting it was bouncing since I only had 10G port 1 (Eth1) plugged in.

 

I will report back on the stability

 

Edited by gdeyoung
Link to comment

So my second server just crashed with a kernel panic, all three are having panics and they are all different hardware.  Any idea from this trace?

 

Jan 15 22:36:42 Mediaserver kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Jan 15 22:36:42 Mediaserver kernel: rcu: #0110-....: (59999 ticks this GP) idle=e7a/1/0x4000000000000000 softirq=11770626/11770626 fqs=14993
Jan 15 22:36:42 Mediaserver kernel: #011(t=60000 jiffies g=13660245 q=3404623)
Jan 15 22:36:42 Mediaserver kernel: NMI backtrace for cpu 0
Jan 15 22:36:42 Mediaserver kernel: CPU: 0 PID: 28592 Comm: kworker/u24:0 Tainted: P           O      5.10.1-Unraid #1
Jan 15 22:36:42 Mediaserver kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Extreme4, BIOS P2.30 12/25/2018
Jan 15 22:36:42 Mediaserver kernel: Workqueue: events_power_efficient gc_worker
Jan 15 22:36:42 Mediaserver kernel: Call Trace:
Jan 15 22:36:42 Mediaserver kernel: <IRQ>
Jan 15 22:36:42 Mediaserver kernel: dump_stack+0x6b/0x83
Jan 15 22:36:42 Mediaserver kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Jan 15 22:36:42 Mediaserver kernel: nmi_cpu_backtrace+0x7d/0x8f
Jan 15 22:36:42 Mediaserver kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Jan 15 22:36:42 Mediaserver kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Jan 15 22:36:42 Mediaserver kernel: rcu_sched_clock_irq+0x1ec/0x543
Jan 15 22:36:42 Mediaserver kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Jan 15 22:36:42 Mediaserver kernel: update_process_times+0x50/0x6e
Jan 15 22:36:42 Mediaserver kernel: tick_sched_timer+0x36/0x64
Jan 15 22:36:42 Mediaserver kernel: __hrtimer_run_queues+0xb7/0x10b
Jan 15 22:36:42 Mediaserver kernel: ? tick_sched_do_timer+0x39/0x39
Jan 15 22:36:42 Mediaserver kernel: hrtimer_interrupt+0x8d/0x160
Jan 15 22:36:42 Mediaserver kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Jan 15 22:36:42 Mediaserver kernel: asm_call_irq_on_stack+0x12/0x20
Jan 15 22:36:42 Mediaserver kernel: </IRQ>
Jan 15 22:36:42 Mediaserver kernel: sysvec_apic_timer_interrupt+0x71/0x95
Jan 15 22:36:42 Mediaserver kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Jan 15 22:36:42 Mediaserver kernel: RIP: 0010:gc_worker+0xf4/0x240
Jan 15 22:36:42 Mediaserver kernel: Code: 5c 26 05 41 89 47 08 e9 bc 00 00 00 48 8b 15 ec 05 a4 00 29 d0 85 c0 7f 11 4c 89 ff e8 10 f0 ff ff ff 44 24 08 e9 9e 00 00 00 <85> db 0f 84 96 00 00 00 49 8b 87 80 00 00 00 a8 08 0f 84 87 00 00
Jan 15 22:36:42 Mediaserver kernel: RSP: 0018:ffffc9000525fe48 EFLAGS: 00000206
Jan 15 22:36:42 Mediaserver kernel: RAX: 0000000001447690 RBX: 0000000000000000 RCX: ffff888103000000
Jan 15 22:36:42 Mediaserver kernel: RDX: 00000001014602fe RSI: ffffc9000525fe5c RDI: ffff88840da28548
Jan 15 22:36:42 Mediaserver kernel: RBP: 000000000000c386 R08: 0000000000000000 R09: ffffffff815c56ac
Jan 15 22:36:42 Mediaserver kernel: R10: 8080808080808080 R11: ffff88830e1fa780 R12: ffffffff82547ec0
Jan 15 22:36:42 Mediaserver kernel: R13: 000000009fd57c44 R14: ffff88840da28548 R15: ffff88840da28500
Jan 15 22:36:42 Mediaserver kernel: ? nf_conntrack_free+0x2b/0x35
Jan 15 22:36:42 Mediaserver kernel: ? gc_worker+0x9a/0x240
Jan 15 22:36:42 Mediaserver kernel: process_one_work+0x13c/0x1d5
Jan 15 22:36:42 Mediaserver kernel: worker_thread+0x18b/0x22f
Jan 15 22:36:42 Mediaserver kernel: ? process_scheduled_works+0x27/0x27
Jan 15 22:36:42 Mediaserver kernel: kthread+0xe5/0xea
Jan 15 22:36:42 Mediaserver kernel: ? kthread_unpark+0x52/0x52
Jan 15 22:36:42 Mediaserver kernel: ret_from_fork+0x22/0x30

Link to comment

Ok to update this thread.  I tried going back to 6.8.3 on the 2nd and 3rd of my 4 servers that are kernel panicking and they still having panics and crashes daily.  My only server that is not experiencing any issues is my 4thone that is 1G connected one.  All of my 10G are panicking, and I have replaced the nics to intel server class 10g nics.  I finally took my 2nd server back to a 1G connection to see if that stays stable.

 

I have more log snippets from the 10G servers.  It looks like they are also having a native_queued_spin_lock_slowpath error in the panic.

 

Call Trace:
Jan 19 12:52:28 Mediaserver kernel: <IRQ>
Jan 19 12:52:28 Mediaserver kernel: dump_stack+0x67/0x83
Jan 19 12:52:28 Mediaserver kernel: nmi_cpu_backtrace+0x71/0x83
Jan 19 12:52:28 Mediaserver kernel: ? lapic_can_unplug_cpu+0x97/0x97
Jan 19 12:52:28 Mediaserver kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4
Jan 19 12:52:28 Mediaserver kernel: rcu_dump_cpu_stacks+0x8b/0xb4
Jan 19 12:52:28 Mediaserver kernel: rcu_check_callbacks+0x296/0x5a0
Jan 19 12:52:28 Mediaserver kernel: update_process_times+0x24/0x47
Jan 19 12:52:28 Mediaserver kernel: tick_sched_timer+0x36/0x64
Jan 19 12:52:28 Mediaserver kernel: __hrtimer_run_queues+0xb7/0x10b
Jan 19 12:52:28 Mediaserver kernel: ? tick_sched_handle.isra.0+0x2f/0x2f
Jan 19 12:52:28 Mediaserver kernel: hrtimer_interrupt+0xf4/0x20e
Jan 19 12:52:28 Mediaserver kernel: smp_apic_timer_interrupt+0x7b/0x93
Jan 19 12:52:28 Mediaserver kernel: apic_timer_interrupt+0xf/0x20
Jan 19 12:52:28 Mediaserver kernel: </IRQ>

RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171
Jan 19 12:52:28 Mediaserver kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 80 6a f8
Jan 19 12:52:28 Mediaserver kernel: RSP: 0018:ffffc90003ce3b88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Jan 19 12:52:28 Mediaserver kernel: RAX: 00000000001c0101 RBX: ffffc90003ce3c10 RCX: 000ffffffffff000
Jan 19 12:52:28 Mediaserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffea002085d368
Jan 19 12:52:28 Mediaserver kernel: RBP: ffffea0004d77200 R08: ffff888000000000 R09: ffffea0004d77240
Jan 19 12:52:28 Mediaserver kernel: R10: 0000000000000008 R11: 0000000000023eb8 R12: ffffea0004d77200
Jan 19 12:52:28 Mediaserver kernel: R13: ffff8882ed6dc400 R14: ffffea0004d77200 R15: ffff888114684600

Edited by gdeyoung
Link to comment

Server 3 just panic'd again.  again this is a 10G server. also on it's second 10G Intel nic.   It appears the panics happen more under large file copy loads on the 10G connection.  Will move it back to 1G to see if it makes a difference.

 

Jan 19 16:27:05 Homeserver kernel: Call Trace:
Jan 19 16:27:05 Homeserver kernel: <IRQ>
Jan 19 16:27:05 Homeserver kernel: dump_stack+0x67/0x83
Jan 19 16:27:05 Homeserver kernel: nmi_cpu_backtrace+0x71/0x83
Jan 19 16:27:05 Homeserver kernel: ? lapic_can_unplug_cpu+0x97/0x97
Jan 19 16:27:05 Homeserver kernel: nmi_trigger_cpumask_backtrace+0x57/0xd4
Jan 19 16:27:05 Homeserver kernel: rcu_dump_cpu_stacks+0x8b/0xb4
Jan 19 16:27:05 Homeserver kernel: rcu_check_callbacks+0x296/0x5a0
Jan 19 16:27:05 Homeserver kernel: update_process_times+0x24/0x47
Jan 19 16:27:05 Homeserver kernel: tick_sched_timer+0x36/0x64
Jan 19 16:27:05 Homeserver kernel: __hrtimer_run_queues+0xb7/0x10b
Jan 19 16:27:05 Homeserver kernel: ? tick_sched_handle.isra.0+0x2f/0x2f
Jan 19 16:27:05 Homeserver kernel: hrtimer_interrupt+0xf4/0x20e
Jan 19 16:27:05 Homeserver kernel: smp_apic_timer_interrupt+0x7b/0x93
Jan 19 16:27:05 Homeserver kernel: apic_timer_interrupt+0xf/0x20
Jan 19 16:27:05 Homeserver kernel: </IRQ>
Jan 19 16:27:05 Homeserver kernel: RIP: 0010:gc_worker+0xad/0x270
Jan 19 16:27:05 Homeserver kernel: Code: f6 c6 01 0f 85 4a 01 00 00 41 0f b6 46 37 49 c7 c0 f0 ff ff ff 41 ff c5 48 6b c0 38 49 29 c0 4f 8d 3c 06 49 8b 97 80 00 00 00 <41> 8b 87 88 00 00 00 0f ba e2 0e 73 2c 48 8b 15 ce dc 88 00 29 d0
Jan 19 16:27:05 Homeserver kernel: RSP: 0018:ffffc9001683fe60 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff13
Jan 19 16:27:05 Homeserver kernel: RAX: 0000000000000038 RBX: 0000000000000000 RCX: 0000000000010000
Jan 19 16:27:05 Homeserver kernel: RDX: 0000000000000188 RSI: 00000000000000ad RDI: ffff8887f610d500
Jan 19 16:27:05 Homeserver kernel: RBP: 0000000000005aae R08: ffffffffffffffb8 R09: ffffffff81574c00
Jan 19 16:27:05 Homeserver kernel: R10: ffffea000edc5700 R11: ffff8887f610d501 R12: ffffffff822aa760
Jan 19 16:27:05 Homeserver kernel: R13: 00000000dba74d6c R14: ffff8887abf8ca48 R15: ffff8887abf8ca00
Jan 19 16:27:05 Homeserver kernel: ? nf_ct_get_id+0x80/0xb7
Jan 19 16:27:05 Homeserver kernel: process_one_work+0x16e/0x24f
Jan 19 16:27:05 Homeserver kernel: worker_thread+0x1e2/0x2b8
Jan 19 16:27:05 Homeserver kernel: ? rescuer_thread+0x2a7/0x2a7
Jan 19 16:27:05 Homeserver kernel: kthread+0x10c/0x114
Jan 19 16:27:05 Homeserver kernel: ? kthread_park+0x89/0x89
Jan 19 16:27:05 Homeserver kernel: ret_from_fork+0x22/0x40

Edited by gdeyoung
Link to comment

So 2 days ago I switched the my 2nd server from 10g to 1G.  1 day ago I switched my 3rd server to 1G from 10G.  These are all different hardware machines Intel & Ryzen running a combo of 6.8.3 and 6.9rc2.  All of my servers on 10G (all on their swapped out/2nd 10G NIC) kernel panics under heavy/sustained file copy within 24hrs.  Without heavy file load they will panic under 72hrs.   I have reworked network and simplified network configs.  I have up to date bios on mobo's.  It all comes down to sustained load on the 10G Intel and Aquantia nics.  I have even three 3 different 10G switches, new 10G DAC cables, 10000base-T transcievers with Cat-7.

 

It all comes back to there is something in the kernel that isn't right with heavy 10G network loads and causes panics.

 

One thing I'm seeing is native_queued_spin_lock_slowpath errors before the full panic but I'm not seeing high CPU loads.

 

Found these two articles/posts that might have some relevance.

High CPU load by native_queued_spin_lock_slowpath (linuxquestions.org)

The need for speed and the kernel datapath - recent improvements in UDP packets processing - Red Hat Developer

 

What can be done to get 10G working in a stable fashion with sustained file copy loads?  The whole reason for 10G...

 

@limetech @JorgeB

Link to comment
3 hours ago, gdeyoung said:

the kernel that isn't right with heavy 10G network loads and causes panics.

Seems to me like you're possibly generalizing and jumping to conclusions, I've been using 10GbE with various workloads for years in all my servers without issues, as well as many other users, since you never posted diags what NICs are you using?

Link to comment

@JorgeB Thank for continuing to engage, I really appreciate it.

 

I have completed troubleshooting to try and localize down the issues.  I have completely rebuilt two of the four servers with new components the only remaining thing is the Drives and still get the panic issues.  I have swapped out all of the all of the network gear, three different 10G switches, new cables.  I have removed all external items or replaced several times with new and still get the panics. 

 

In the last couple of days I switched two of the servers back to 1G and they are rock solid with no issues and are not chatty in the logs.  Where with the 10G I was getting a variety of things pop up in logs every hour.

 

This is not my first post on this. In my previous post I did post full diags and got NO replies.  I DM'd @limetech for help and still silence.  So I am trying, I really would like to get this working.

 

All I have are the panic traces to go on now and don't have the knowledge to trouble shoot at that level.

 

Here are the two types of NIC's I have used that are supposed to be fully supported.

TRENDnet - TRENDnet TEG-10GECSFP - SFP+ Aquantia chipset

Supermicro AOC-STGF-i2S - Dual SFP+ Intel chipset

 

 

Link to comment

To expand a little, I was just trying to point out that IMHO it won't be a general kernel issue, it could be a driver issue, just recently due to me re-organizing multiple servers, I transferred over 100TB using 6 different servers, all using 10GbE (though all Mellanox) at an average speed of around 400MB/s without any issues, and it's not the first time I do similar large transfers with Unraid.

Link to comment

Yes, the Intel NIC seem to be more stable.  I was also having issues with some of my Windows PC with Aquantia 10G and transfers, so I switched to Intel 10G across the board.

 

Yes, rolled back to 6.8.3 and had the same issues for both nics. 

 

My other observation is the panics are happening on the ingest servers where I copy files to more often.  

 

Edited by gdeyoung
Link to comment

Seldom get report have call trace cause by 10G NIC only and change to 1G then fine, suppose some setting issue relate. Could you try safe mode ( no plugin / docker ) ? Any network end use jumbo frame ?

 

I use Intel / Emulex / Mellanox haven't issue at Unraid / Windows.

Edited by Vr2Io
Link to comment
8 hours ago, gdeyoung said:

The 3rd server is in safe mode and still going

 

I switched the 2nd server back to 10G in normal mode with no file copies and it panic'd in 20 minutes.  diag's attached

mediaserver-diagnostics-20210122-1520.zip 138.69 kB · 0 downloads

 

A good milestone on 3rd server. For 2nd server ( Asrock Z390 ), pls try update BIOS ( 2.3 quite old, I previous use Asrock Z390 Taichi with BIOS 4.3 haven't issue ). Then try safe mode too, this is ensure hardware work with minimal software in first. 

Edited by Vr2Io
Link to comment
  • JorgeB changed the title to [SOLVED] 6.9-RC2 Kernal panic and trace

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.