bdowden Posted March 12, 2022 Share Posted March 12, 2022 Hi everyone. I switched the server that I have unraid running on from a custom built Norco case with a mellanox 10g card to a dell R720xd with the 57800S 2 port SFP+ daughter card. I had also installed a Quadro P2000 for plex transcodings at the same time. I also have a 9102-16E HBA card that is connected to a 45 bay supermicro chassis (with the built in port expander backplane) and a 24 bay sff supermicro chassis with a standard backplane (6x SAS connectors) that is connected via a SFF-8088 to SFF-8087 pass-through bracket. I'm now getting kernel panics at random times of the day but at least twice a week and I need to perform a hard reboot. I finally got diagnostics where I THINK it's caused by the network driver but I can't be 100% sure. Can anyone verify? If you require more logs let me know. I'm new to storage chassis and SFF--8088 cards/cables, but I think everything is ok there. Mar 11 21:24:15 Tower2 kernel: WARNING: CPU: 24 PID: 24271 at lib/vsprintf.c:2556 vsnprintf+0x30/0x4ef Mar 11 21:24:15 Tower2 kernel: Modules linked in: nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart ip6table_filter ip6_tables iptable_filter ip_tables x_tables bnx2x mdio sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd ipmi_ssif i2c_core glue_helper rapl intel_cstate input_leds mpt3sas intel_uncore nvme acpi_power_meter led_class raid_class scsi_transport_sas nvme_core wmi ipmi_si button [last unloaded: mdio] Mar 11 21:24:15 Tower2 kernel: CPU: 24 PID: 24271 Comm: ethtool Tainted: P O 5.10.28-Unraid #1 Mar 11 21:24:15 Tower2 kernel: Hardware name: Dell Inc. PowerEdge R720xd/0020HJ, BIOS 2.9.0 12/06/2019 Mar 11 21:24:15 Tower2 kernel: RIP: 0010:vsnprintf+0x30/0x4ef Mar 11 21:24:15 Tower2 kernel: Code: 41 54 55 53 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 48 81 fe ff ff ff 7f 48 c7 44 24 08 00 00 00 00 76 07 <0f> 0b e9 94 04 00 00 48 89 fd 49 89 fc 49 89 f5 48 01 f5 49 89 d0 Mar 11 21:24:15 Tower2 kernel: RSP: 0018:ffffc900285e7a90 EFLAGS: 00010296 Mar 11 21:24:15 Tower2 kernel: RAX: 0000000000000000 RBX: 0000000000070a0b RCX: ffffc900285e7ae0 Mar 11 21:24:15 Tower2 kernel: RDX: ffffffffa00cba0f RSI: fffffffffffffff8 RDI: ffffc900285e7bc8 Mar 11 21:24:15 Tower2 kernel: RBP: ffffc900285e7b30 R08: 000000000000000a R09: 000000000000000b Mar 11 21:24:15 Tower2 kernel: R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000020 Mar 11 21:24:15 Tower2 kernel: R13: ffffc900285e7b54 R14: ffff888124299db8 R15: 00000000ffffffed Mar 11 21:24:15 Tower2 kernel: FS: 0000149bb1295740(0000) GS:ffff889fffb00000(0000) knlGS:0000000000000000 Mar 11 21:24:15 Tower2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 11 21:24:15 Tower2 kernel: CR2: 0000000000428670 CR3: 0000000144506005 CR4: 00000000000606e0 Mar 11 21:24:15 Tower2 kernel: Call Trace: Mar 11 21:24:15 Tower2 kernel: snprintf+0x49/0x60 Mar 11 21:24:15 Tower2 kernel: ? prep_new_page+0x25/0x71 Mar 11 21:24:15 Tower2 kernel: bnx2x_fill_fw_str+0xc7/0xf9 [bnx2x] Mar 11 21:24:15 Tower2 kernel: ? get_page_from_freelist+0x8e0/0xbd4 Mar 11 21:24:15 Tower2 kernel: bnx2x_get_drvinfo+0xf1/0x14c [bnx2x] Mar 11 21:24:15 Tower2 kernel: ethtool_get_drvinfo+0x6e/0x1b5 Mar 11 21:24:15 Tower2 kernel: dev_ethtool+0x59a/0x2126 Mar 11 21:24:15 Tower2 kernel: ? ___slab_alloc+0x23a/0x4aa Mar 11 21:24:15 Tower2 kernel: ? sk_prot_alloc.isra.0+0x26/0xad Mar 11 21:24:15 Tower2 kernel: ? inet_ioctl+0x17d/0x1a6 Mar 11 21:24:15 Tower2 kernel: ? page_add_file_rmap+0xc9/0xd4 Mar 11 21:24:15 Tower2 kernel: ? set_pte+0x5/0x8 Mar 11 21:24:15 Tower2 kernel: ? alloc_set_pte+0x2f0/0x301 Mar 11 21:24:15 Tower2 kernel: ? full_name_hash+0x12/0x6c Mar 11 21:24:15 Tower2 kernel: ? dev_name_hash+0x23/0x3a Mar 11 21:24:15 Tower2 kernel: dev_ioctl+0x2d7/0x3d5 Mar 11 21:24:15 Tower2 kernel: sock_do_ioctl+0xd9/0x12a Mar 11 21:24:15 Tower2 kernel: ? __do_sys_copy_file_range+0x178/0x18f Mar 11 21:24:15 Tower2 kernel: sock_ioctl+0x314/0x33b Mar 11 21:24:15 Tower2 kernel: ? alloc_file_pseudo+0xba/0xfd Mar 11 21:24:15 Tower2 kernel: vfs_ioctl+0x19/0x26 Mar 11 21:24:15 Tower2 kernel: __do_sys_ioctl+0x51/0x74 Mar 11 21:24:15 Tower2 kernel: do_syscall_64+0x5d/0x6a Mar 11 21:24:15 Tower2 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 11 21:24:15 Tower2 kernel: RIP: 0033:0x149bb13a4417 Mar 11 21:24:15 Tower2 kernel: Code: 00 00 90 48 8b 05 79 2a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 2a 0d 00 f7 d8 64 89 01 48 Mar 11 21:24:15 Tower2 kernel: RSP: 002b:00007fff8c5cfd28 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Mar 11 21:24:15 Tower2 kernel: RAX: ffffffffffffffda RBX: 00007fff8c5cffa8 RCX: 0000149bb13a4417 Mar 11 21:24:15 Tower2 kernel: RDX: 00007fff8c5cfe30 RSI: 0000000000008946 RDI: 0000000000000003 Mar 11 21:24:15 Tower2 kernel: RBP: 00007fff8c5cfe20 R08: 00007fff8c5cfe30 R09: 0000000000000003 Mar 11 21:24:15 Tower2 kernel: R10: 0000000000401387 R11: 0000000000000206 R12: 000000000040abe0 Mar 11 21:24:15 Tower2 kernel: R13: 0000000000000001 R14: 0000000000435043 R15: 000000000043504b Mar 11 21:24:15 Tower2 kernel: ---[ end trace 033002dfefbd6b3e ]--- Quote Link to comment
JorgeB Posted March 12, 2022 Share Posted March 12, 2022 8 minutes ago, bdowden said: it's caused by the network driver Correct, but not the Mellanox, likely the onboard NICs. Quote Link to comment
bdowden Posted March 12, 2022 Author Share Posted March 12, 2022 Ok cool. I had removed the Mellanox when I installed the daughter card. Do you have any suggestions on next steps? I'm still a linux noob (even though I've been using Unraid for almost a decade). Quote Link to comment
JorgeB Posted March 13, 2022 Share Posted March 13, 2022 20 hours ago, bdowden said: Do you have any suggestions on next steps? Use different NICs or wait for a newer release with a newer driver, or if you're on v6.9.2 try v6.10-rc3. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.