Pera78 Posted October 25, 2023 Share Posted October 25, 2023 Hi, after changing hardware from Intel to AMD threadripper, everything works except that after a while I can no longer access unraid from LAN. the PC is turned on and seems to be working. anyone have any ideas? Quote Link to comment
JorgeB Posted October 25, 2023 Share Posted October 25, 2023 See here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote Link to comment
Pera78 Posted October 26, 2023 Author Share Posted October 26, 2023 I followed the guide, but the situation did not change. After 7-8 hours, no access from the LAN to the server. The network card is turned off. Any suggestions? Quote Link to comment
JorgeB Posted October 26, 2023 Share Posted October 26, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
Pera78 Posted October 29, 2023 Author Share Posted October 29, 2023 Hi, attached is the unraid syslog. Without having to read, just move inside the file to the date of October 29th at 01:31 in which the error occurs. Here is an excerpt from the log Quote Oct 28 20:02:46 Serverone webGUI: Successful login user root from 192.168.0.102 Oct 28 20:03:06 Serverone root: Fix Common Problems Version 2023.10.08a Oct 28 20:03:56 Serverone root: Fix Common Problems Version 2023.10.08a Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state Oct 28 20:06:52 Serverone kernel: veth610c3aa: renamed from eth0 Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state Oct 28 20:06:52 Serverone kernel: device veth5d9e892 left promiscuous mode Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state Oct 28 20:12:09 Serverone monitor: Stop running nchan processes Oct 28 20:18:38 Serverone webGUI: Successful login user root from 192.168.0.102 Oct 28 20:21:25 Serverone root: Fix Common Problems Version 2023.10.08a Oct 28 20:22:42 Serverone monitor: Stop running nchan processes Oct 28 21:06:46 Serverone monitor: Stop running nchan processes Oct 28 21:53:49 Serverone monitor: Stop running nchan processes Oct 29 01:31:25 Serverone kernel: pcieport 0000:00:01.1: AER: Multiple Corrected error received: 0000:02:04.0 Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: device [1022:43b4] error status/mask=00000040/00002000 Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: [ 6] BadTLP Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: AER: Error of this Agent is reported first Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: Internal error detected: Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[00]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[01]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[02]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[03]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[04]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[05]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[06]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[07]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[08]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[09]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0a]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0b]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0c]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0d]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0e]: ffffffff Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: buf[0f]: ffffffff Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: mlx4_cmd_post:cmd_pending failed Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, op_mod=0x0 Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: device is going to be reset Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: crdump: FW doesn't support health buffer access, skipping Oct 29 01:31:46 Serverone kernel: mlx4_core 0000:07:00.0: Failed to obtain HW semaphore, aborting Oct 29 01:31:46 Serverone kernel: mlx4_core 0000:07:00.0: Fail to reset HCA Oct 29 01:31:46 Serverone kernel: ------------[ cut here ]------------ Oct 29 01:31:46 Serverone kernel: kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:191! Oct 29 01:31:46 Serverone kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Oct 29 01:31:46 Serverone kernel: CPU: 31 PID: 60879 Comm: kworker/u256:0 Tainted: P O 6.1.49-Unraid #1 Oct 29 01:31:46 Serverone kernel: Hardware name: System manufacturer System Product Name/PRIME X399-A, BIOS 1203 10/09/2019 Oct 29 01:31:46 Serverone kernel: Workqueue: mlx4_en mlx4_en_do_get_stats [mlx4_en] Oct 29 01:31:46 Serverone kernel: RIP: 0010:mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: Code: 74 2a 48 8b 03 48 c7 c6 15 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 e8 b7 df 51 e1 48 8b 03 48 8b 00 83 b8 c8 00 00 00 01 75 1b <0f> 0b 48 8b 03 48 c7 c6 28 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 Oct 29 01:31:46 Serverone kernel: RSP: 0018:ffffc90022297cc8 EFLAGS: 00010246 Oct 29 01:31:46 Serverone kernel: RAX: ffff888101ed0000 RBX: ffff88812d1701e0 RCX: 0000000000000027 Oct 29 01:31:46 Serverone kernel: RDX: 0000000000000000 RSI: ffffffff820ed4af RDI: 00000000ffffffff Oct 29 01:31:46 Serverone kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8294e3f0 Oct 29 01:31:46 Serverone kernel: R10: 00000fffffffffff R11: fefefefefefefeff R12: ffff88810909c960 Oct 29 01:31:46 Serverone kernel: R13: 00000000fffffffb R14: 0000000000000000 R15: 000000000000ea60 Oct 29 01:31:46 Serverone kernel: FS: 0000000000000000(0000) GS:ffff88903d7c0000(0000) knlGS:0000000000000000 Oct 29 01:31:46 Serverone kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 29 01:31:46 Serverone kernel: CR2: 000014c3f0bf511c CR3: 00000001462be000 CR4: 00000000003506e0 Oct 29 01:31:46 Serverone kernel: Call Trace: Oct 29 01:31:46 Serverone kernel: <TASK> Oct 29 01:31:46 Serverone kernel: ? __die_body+0x1a/0x5c Oct 29 01:31:46 Serverone kernel: ? die+0x30/0x49 Oct 29 01:31:46 Serverone kernel: ? do_trap+0x7b/0xfe Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: ? do_error_trap+0x6e/0x98 Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: ? exc_invalid_op+0x4c/0x60 Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: ? asm_exc_invalid_op+0x16/0x20 Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x23c/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: mlx4_cmd_reset_flow+0x1c/0x31 [mlx4_core] Oct 29 01:31:46 Serverone kernel: __mlx4_cmd+0x336/0x6ce [mlx4_core] Oct 29 01:31:46 Serverone kernel: mlx4_en_DUMP_ETH_STATS+0xc2/0x94a [mlx4_en] Oct 29 01:31:46 Serverone kernel: ? get_nohz_timer_target+0x2e/0xdd Oct 29 01:31:46 Serverone kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a Oct 29 01:31:46 Serverone kernel: mlx4_en_do_get_stats+0x63/0x284 [mlx4_en] Oct 29 01:31:46 Serverone kernel: process_one_work+0x1ab/0x295 Oct 29 01:31:46 Serverone kernel: worker_thread+0x18b/0x244 Oct 29 01:31:46 Serverone kernel: ? rescuer_thread+0x281/0x281 Oct 29 01:31:46 Serverone kernel: kthread+0xe7/0xef Oct 29 01:31:46 Serverone kernel: ? kthread_complete_and_exit+0x1b/0x1b Oct 29 01:31:46 Serverone kernel: ret_from_fork+0x22/0x30 Oct 29 01:31:46 Serverone kernel: </TASK> Oct 29 01:31:46 Serverone kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc mlx4_en zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi zcommon(PO) znvpair(PO) spl(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd mxm_wmi wmi_bmof asus_wmi_sensors mpt3sas rapl nvme mlx4_core raid_class i2c_piix4 ahci scsi_transport_sas i2c_core nvme_core k10temp libahci wmi button acpi_cpufreq unix [last unloaded: md_mod] Oct 29 01:31:46 Serverone kernel: ---[ end trace 0000000000000000 ]--- Oct 29 01:31:46 Serverone kernel: RIP: 0010:mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core] Oct 29 01:31:46 Serverone kernel: Code: 74 2a 48 8b 03 48 c7 c6 15 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 e8 b7 df 51 e1 48 8b 03 48 8b 00 83 b8 c8 00 00 00 01 75 1b <0f> 0b 48 8b 03 48 c7 c6 28 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 As you can see, the cause is due to the mellanox network card syslog-192.168.0.150.log Quote Link to comment
JorgeB Posted October 29, 2023 Share Posted October 29, 2023 Don't usually see issues with the Mellanox driver, try moving the NIC to a different PCIe slot if possible, Quote Link to comment
Solution Pera78 Posted October 29, 2023 Author Solution Share Posted October 29, 2023 I move the nic in another pc and the nic don't run. The nic result Power off Quote Link to comment
JorgeB Posted October 29, 2023 Share Posted October 29, 2023 That suggests some compatibility issue between the NIC and the board. Quote Link to comment
Pera78 Posted October 29, 2023 Author Share Posted October 29, 2023 the nic no longer works on both amd and intel motherboards. it's dead Quote Link to comment
JorgeB Posted October 30, 2023 Share Posted October 30, 2023 Easy fix them, just get a new one. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.