Jump to content

Access problem after hardware change from Intel to threadripper


Pera78
Go to solution Solved by Pera78,

Recommended Posts

Hi, attached is the unraid syslog. Without having to read, just move inside the file to the date of October 29th at 01:31 in which the error occurs. Here is an excerpt from the log

 

Quote

Oct 28 20:02:46 Serverone webGUI: Successful login user root from 192.168.0.102
Oct 28 20:03:06 Serverone root: Fix Common Problems Version 2023.10.08a
Oct 28 20:03:56 Serverone root: Fix Common Problems Version 2023.10.08a
Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state
Oct 28 20:06:52 Serverone kernel: veth610c3aa: renamed from eth0
Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state
Oct 28 20:06:52 Serverone kernel: device veth5d9e892 left promiscuous mode
Oct 28 20:06:52 Serverone kernel: docker0: port 4(veth5d9e892) entered disabled state
Oct 28 20:12:09 Serverone monitor: Stop running nchan processes
Oct 28 20:18:38 Serverone webGUI: Successful login user root from 192.168.0.102
Oct 28 20:21:25 Serverone root: Fix Common Problems Version 2023.10.08a
Oct 28 20:22:42 Serverone monitor: Stop running nchan processes
Oct 28 21:06:46 Serverone monitor: Stop running nchan processes
Oct 28 21:53:49 Serverone monitor: Stop running nchan processes
Oct 29 01:31:25 Serverone kernel: pcieport 0000:00:01.1: AER: Multiple Corrected error received: 0000:02:04.0
Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0:   device [1022:43b4] error status/mask=00000040/00002000
Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0:    [ 6] BadTLP                
Oct 29 01:31:26 Serverone kernel: pcieport 0000:02:04.0: AER:   Error of this Agent is reported first
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0: Internal error detected:
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[00]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[01]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[02]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[03]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[04]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[05]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[06]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[07]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[08]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[09]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0a]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0b]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0c]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0d]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0e]: ffffffff
Oct 29 01:31:28 Serverone kernel: mlx4_core 0000:07:00.0:   buf[0f]: ffffffff
Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: mlx4_cmd_post:cmd_pending failed
Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, op_mod=0x0
Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: device is going to be reset
Oct 29 01:31:36 Serverone kernel: mlx4_core 0000:07:00.0: crdump: FW doesn't support health buffer access, skipping
Oct 29 01:31:46 Serverone kernel: mlx4_core 0000:07:00.0: Failed to obtain HW semaphore, aborting
Oct 29 01:31:46 Serverone kernel: mlx4_core 0000:07:00.0: Fail to reset HCA
Oct 29 01:31:46 Serverone kernel: ------------[ cut here ]------------
Oct 29 01:31:46 Serverone kernel: kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:191!
Oct 29 01:31:46 Serverone kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Oct 29 01:31:46 Serverone kernel: CPU: 31 PID: 60879 Comm: kworker/u256:0 Tainted: P           O       6.1.49-Unraid #1
Oct 29 01:31:46 Serverone kernel: Hardware name: System manufacturer System Product Name/PRIME X399-A, BIOS 1203 10/09/2019
Oct 29 01:31:46 Serverone kernel: Workqueue: mlx4_en mlx4_en_do_get_stats [mlx4_en]
Oct 29 01:31:46 Serverone kernel: RIP: 0010:mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: Code: 74 2a 48 8b 03 48 c7 c6 15 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 e8 b7 df 51 e1 48 8b 03 48 8b 00 83 b8 c8 00 00 00 01 75 1b <0f> 0b 48 8b 03 48 c7 c6 28 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00
Oct 29 01:31:46 Serverone kernel: RSP: 0018:ffffc90022297cc8 EFLAGS: 00010246
Oct 29 01:31:46 Serverone kernel: RAX: ffff888101ed0000 RBX: ffff88812d1701e0 RCX: 0000000000000027
Oct 29 01:31:46 Serverone kernel: RDX: 0000000000000000 RSI: ffffffff820ed4af RDI: 00000000ffffffff
Oct 29 01:31:46 Serverone kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8294e3f0
Oct 29 01:31:46 Serverone kernel: R10: 00000fffffffffff R11: fefefefefefefeff R12: ffff88810909c960
Oct 29 01:31:46 Serverone kernel: R13: 00000000fffffffb R14: 0000000000000000 R15: 000000000000ea60
Oct 29 01:31:46 Serverone kernel: FS:  0000000000000000(0000) GS:ffff88903d7c0000(0000) knlGS:0000000000000000
Oct 29 01:31:46 Serverone kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 29 01:31:46 Serverone kernel: CR2: 000014c3f0bf511c CR3: 00000001462be000 CR4: 00000000003506e0
Oct 29 01:31:46 Serverone kernel: Call Trace:
Oct 29 01:31:46 Serverone kernel: <TASK>
Oct 29 01:31:46 Serverone kernel: ? __die_body+0x1a/0x5c
Oct 29 01:31:46 Serverone kernel: ? die+0x30/0x49
Oct 29 01:31:46 Serverone kernel: ? do_trap+0x7b/0xfe
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: ? do_error_trap+0x6e/0x98
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: ? exc_invalid_op+0x4c/0x60
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: ? asm_exc_invalid_op+0x16/0x20
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: ? mlx4_enter_error_state+0x23c/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: mlx4_cmd_reset_flow+0x1c/0x31 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: __mlx4_cmd+0x336/0x6ce [mlx4_core]
Oct 29 01:31:46 Serverone kernel: mlx4_en_DUMP_ETH_STATS+0xc2/0x94a [mlx4_en]
Oct 29 01:31:46 Serverone kernel: ? get_nohz_timer_target+0x2e/0xdd
Oct 29 01:31:46 Serverone kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
Oct 29 01:31:46 Serverone kernel: mlx4_en_do_get_stats+0x63/0x284 [mlx4_en]
Oct 29 01:31:46 Serverone kernel: process_one_work+0x1ab/0x295
Oct 29 01:31:46 Serverone kernel: worker_thread+0x18b/0x244
Oct 29 01:31:46 Serverone kernel: ? rescuer_thread+0x281/0x281
Oct 29 01:31:46 Serverone kernel: kthread+0xe7/0xef
Oct 29 01:31:46 Serverone kernel: ? kthread_complete_and_exit+0x1b/0x1b
Oct 29 01:31:46 Serverone kernel: ret_from_fork+0x22/0x30
Oct 29 01:31:46 Serverone kernel: </TASK>
Oct 29 01:31:46 Serverone kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc mlx4_en zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi zcommon(PO) znvpair(PO) spl(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd mxm_wmi wmi_bmof asus_wmi_sensors mpt3sas rapl nvme mlx4_core raid_class i2c_piix4 ahci scsi_transport_sas i2c_core nvme_core k10temp libahci wmi button acpi_cpufreq unix [last unloaded: md_mod]
Oct 29 01:31:46 Serverone kernel: ---[ end trace 0000000000000000 ]---
Oct 29 01:31:46 Serverone kernel: RIP: 0010:mlx4_enter_error_state+0x24b/0x2c6 [mlx4_core]
Oct 29 01:31:46 Serverone kernel: Code: 74 2a 48 8b 03 48 c7 c6 15 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00 e8 b7 df 51 e1 48 8b 03 48 8b 00 83 b8 c8 00 00 00 01 75 1b <0f> 0b 48 8b 03 48 c7 c6 28 30 37 a0 48 8b 38 48 81 c7 d0 00 00 00

 

As you can see, the cause is due to the mellanox network card

syslog-192.168.0.150.log

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...