[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...


Recommended Posts

Heya,

 

I'm having trouble installing the RadeonTOP plugin with my AMD APU.

 

[1002:1636]05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c9)

 

Whenever I try to install it, this comes up 
 

plugin: downloading: radeontop-2023.02.22.txz ... done


+==============================================================================
| Installing new package /boot/config/plugins/radeontop/radeontop-2023.02.22.txz
+==============================================================================

Verifying package radeontop-2023.02.22.txz.
Installing package radeontop-2023.02.22.txz:
PACKAGE DESCRIPTION:
Package radeontop-2023.02.22.txz installed.

---------Enabling AMDGPU Kernel Module---------

------Something went wrong! Can't enable-------
----AMDGPU Kernel Module, removing package!----
Removing package: radeontop-2023.02.22
Removing files:
--> Deleting /usr/bin/radeontop
--> Deleting /usr/local/emhttp/plugins/radeontop/bin/radeontop
--> Deleting /usr/local/emhttp/plugins/radeontop/images/radeontop.png
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm.so
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm.so.2
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm.so.2.4.0
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm_amdgpu.so
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm_amdgpu.so.1
--> Deleting /usr/local/emhttp/plugins/radeontop/lib/libdrm_amdgpu.so.1.0.0
--> Deleting /usr/share/libdrm/amdgpu.ids
--> Deleting empty directory /usr/share/libdrm/
--> Deleting empty directory /usr/local/emhttp/plugins/radeontop/lib/
--> Deleting empty directory /usr/local/emhttp/plugins/radeontop/images/
--> Deleting empty directory /usr/local/emhttp/plugins/radeontop/bin/
WARNING: Unique directory /usr/local/emhttp/plugins/radeontop/ contains new files
plugin: run failed: '/bin/bash' returned 1
Executing hook script: post_plugin_checks


I tried to enable the amdgpu kernel module and I get this:

 

modprobe -v amdgpu
insmod /lib/modules/6.1.49-Unraid/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz 
modprobe: ERROR: could not insert 'amdgpu': Invalid argument

 

Any idea what how I can solve this? You can find my diagonistics attached as well. Thanks!

brotherman-diagnostics-20231126-2127.zip

Link to comment
14 minutes ago, courteous-ox7459 said:

Any idea what how I can solve this? You can find my diagonistics attached as well. Thanks!

Your Kernel Module crashed while the plugin tried to enable it, however it seems related to a MACVLAN issue, please solve this issue first:

Nov 26 20:49:30 Brotherman kernel: ------------[ cut here ]------------
Nov 26 20:49:30 Brotherman kernel: WARNING: CPU: 4 PID: 19190 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun bluetooth ecdh_generic ecc veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi gpu_sched drm_buddy kvm_amd i2c_algo_bit drm_ttm_helper ttm drm_display_helper drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd wmi_bmof cryptd agpgart nvme rapl i2c_piix4 syscopyarea
Nov 26 20:49:30 Brotherman kernel: i2c_core r8169 nvme_core sysfillrect ccp k10temp joydev sysimgblt ahci fb_sys_fops realtek libahci tpm_crb tpm_tis video tpm_tis_core wmi tpm backlight acpi_cpufreq button unix
Nov 26 20:49:30 Brotherman kernel: CPU: 4 PID: 19190 Comm: kworker/u64:2 Tainted: P           O       6.1.49-Unraid #1
Nov 26 20:49:30 Brotherman kernel: Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P2.60 02/07/2023
Nov 26 20:49:30 Brotherman kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Nov 26 20:49:30 Brotherman kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Nov 26 20:49:30 Brotherman kernel: RSP: 0018:ffffc900002acd98 EFLAGS: 00010202
Nov 26 20:49:30 Brotherman kernel: RAX: 0000000000000001 RBX: ffff88817b7fcf00 RCX: 614dcb50d5e59f96
Nov 26 20:49:30 Brotherman kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88817b7fcf00
Nov 26 20:49:30 Brotherman kernel: RBP: 0000000000000001 R08: 3807721c7df55c84 R09: 26b204f7336a83ad
Nov 26 20:49:30 Brotherman kernel: R10: 808473b5991f4ea0 R11: ffffc900002acd60 R12: ffffffff82a11d00
Nov 26 20:49:30 Brotherman kernel: R13: 000000000003f226 R14: ffff888015853500 R15: 0000000000000000
Nov 26 20:49:30 Brotherman kernel: FS:  0000000000000000(0000) GS:ffff88842e100000(0000) knlGS:0000000000000000
Nov 26 20:49:30 Brotherman kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 20:49:30 Brotherman kernel: CR2: 00007fffb1435084 CR3: 000000018251e000 CR4: 0000000000350ee0
Nov 26 20:49:30 Brotherman kernel: Call Trace:
Nov 26 20:49:30 Brotherman kernel: <IRQ>
Nov 26 20:49:30 Brotherman kernel: ? __warn+0xab/0x122
Nov 26 20:49:30 Brotherman kernel: ? report_bug+0x109/0x17e
Nov 26 20:49:30 Brotherman kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: ? handle_bug+0x41/0x6f
Nov 26 20:49:30 Brotherman kernel: ? exc_invalid_op+0x13/0x60
Nov 26 20:49:30 Brotherman kernel: ? asm_exc_invalid_op+0x16/0x20
Nov 26 20:49:30 Brotherman kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: ? nf_nat_inet_fn+0x126/0x1a8 [nf_nat]
Nov 26 20:49:30 Brotherman kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Nov 26 20:49:30 Brotherman kernel: nf_hook_slow+0x3d/0x96
Nov 26 20:49:30 Brotherman kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 26 20:49:30 Brotherman kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 26 20:49:30 Brotherman kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 26 20:49:30 Brotherman kernel: __netif_receive_skb_one_core+0x77/0x9c
Nov 26 20:49:30 Brotherman kernel: process_backlog+0x8c/0x116
Nov 26 20:49:30 Brotherman kernel: __napi_poll.constprop.0+0x2b/0x124
Nov 26 20:49:30 Brotherman kernel: net_rx_action+0x159/0x24f
Nov 26 20:49:30 Brotherman kernel: __do_softirq+0x129/0x288
Nov 26 20:49:30 Brotherman kernel: do_softirq+0x7f/0xab
Nov 26 20:49:30 Brotherman kernel: </IRQ>
Nov 26 20:49:30 Brotherman kernel: <TASK>
Nov 26 20:49:30 Brotherman kernel: __local_bh_enable_ip+0x4c/0x6b
Nov 26 20:49:30 Brotherman kernel: netif_rx+0x52/0x5a
Nov 26 20:49:30 Brotherman kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Nov 26 20:49:30 Brotherman kernel: ? _raw_spin_unlock+0x14/0x29
Nov 26 20:49:30 Brotherman kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Nov 26 20:49:30 Brotherman kernel: process_one_work+0x1ab/0x295
Nov 26 20:49:30 Brotherman kernel: worker_thread+0x18b/0x244
Nov 26 20:49:30 Brotherman kernel: ? rescuer_thread+0x281/0x281
Nov 26 20:49:30 Brotherman kernel: kthread+0xe7/0xef
Nov 26 20:49:30 Brotherman kernel: ? kthread_complete_and_exit+0x1b/0x1b
Nov 26 20:49:30 Brotherman kernel: ret_from_fork+0x22/0x30
Nov 26 20:49:30 Brotherman kernel: </TASK>
Nov 26 20:49:30 Brotherman kernel: ---[ end trace 0000000000000000 ]---

 

Please either disable the Bridge in your networks settings and make sure to select MACVLAN in the Docker Settings if you need MACVLAN or switch to IPVLAN in your Docker Settings.

 

 

Please also remove this from your syslinux.config (click on the blue text Flash on your Main page):

nomodeset

since Unraid should try to enable the GPU first not the plugin.

  • Like 1
Link to comment
On 11/26/2023 at 9:50 PM, ich777 said:

Your Kernel Module crashed while the plugin tried to enable it, however it seems related to a MACVLAN issue, please solve this issue first:

Please either disable the Bridge in your networks settings and make sure to select MACVLAN in the Docker Settings if you need MACVLAN or switch to IPVLAN in your Docker Settings.

 

Please also remove this from your syslinux.config (click on the blue text Flash on your Main page):

since Unraid should try to enable the GPU first not the plugin.

 


Thanks a lot, it's has been working flawlessly since I have done these changes.

  • Like 1
Link to comment
  • 2 weeks later...
On 9/18/2022 at 3:28 PM, ericswpark said:

Finally managed to get the temperature to show up in the plugin. Turns out the "detect" button is broken and does not scan available drivers properly.

 

Following this comment: 

 

I had to create a `drivers.conf` file in `/boot/config/plugins/dynamix.system.temp` and add the following two lines:

 

it87
k10temp

 

Then once I went back to the temperature plugin settings I was able to select the CPU/MB temperature from the dropdown.

 

One thing to note – already mentioned in the linked comment but just to make sure – don't click on "Detect" or else it will wipe out your changes and you'll have to start over.

 

The commenter in the link had to do the `modprobe force_id` thing, but I didn't have to thanks to this plugin. You probably shouldn't need it if you have this it87 plugin installed.

This helped me a lot - Final result - Done - after about 4h searching :) Thanks again

Link to comment
8 minutes ago, mikeyosm said:

Any chance we can get the Mellanox temperature in to the main UNRAID dashboard instead of having to go to the plugin each time?

I'm not super convinced that this is necessary since the NIC temperature is usually consistent on Mellanox cards and I have no plans currently to implement that.

Link to comment
23 minutes ago, Wizard_ said:

Does intel 13th CPU work with this plugin poperly?

Yes, but in your case it can't because your iGPU is blacklisted.

Please remove the file /boot/config/modprobe.d/i915.conf and reboot your system.

 

The plugin is just the application intel_gpu_top and makes sure that your iGPU is enabled (of course only when the iGPU is not blacklisted).

Link to comment
18 hours ago, ich777 said:

Yes, but in your case it can't because your iGPU is blacklisted.

Please remove the file /boot/config/modprobe.d/i915.conf and reboot your system.

 

The plugin is just the application intel_gpu_top and makes sure that your iGPU is enabled (of course only when the iGPU is not blacklisted).

Thanks for answering my question!

I have removed the i915.conf and reboot the system, it seems nothing happened? 

image.png.67e059bbabb281afa760999890f03599.png

By the way, i can't shutdown the server normally (use the "shutdown" button in the webui). It will stuck at somewhere and i have to press the power button manually.

wizard-server-diagnostics-20231216-1204.zip

Link to comment
2 hours ago, Wizard_ said:

I have removed the i915.conf and reboot the system, it seems nothing happened? 

You have a call trace in your syslog which is most likely the cause of the issue:

Dec 16 11:57:06 Wizard-Server kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access
Dec 16 11:57:06 Wizard-Server kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Dec 16 11:57:06 Wizard-Server kernel: #PF: supervisor read access in kernel mode
Dec 16 11:57:06 Wizard-Server kernel: #PF: error_code(0x0000) - not-present page
Dec 16 11:57:06 Wizard-Server kernel: PGD 105f4f067 P4D 105f4f067 PUD 108c01067 PMD 0 
Dec 16 11:57:06 Wizard-Server kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 16 11:57:06 Wizard-Server kernel: CPU: 0 PID: 1148 Comm: udevd Tainted: P           O       6.1.64-Unraid #1
Dec 16 11:57:06 Wizard-Server kernel: Hardware name: Default string Default string/MS-WS W680 D4, BIOS H4.2G 11/26/2022
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0010:klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: Code: 03 00 31 c0 48 89 03 5b 89 e8 5d 41 5c 41 5d c3 cc cc cc cc 41 55 41 54 41 89 f4 55 53 48 8b 2f 48 89 fb 48 83 e5 fe 48 89 ef <4c> 8b 6d 20 e8 d2 9b 03 00 45 84 e4 74 10 48 8b 03 a8 01 74 02 0f
Dec 16 11:57:06 Wizard-Server kernel: RSP: 0018:ffffc9000103bab8 EFLAGS: 00010246
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffff888135074b80 RBX: ffff888135074ba8 RCX: ffff888135074b80
Dec 16 11:57:06 Wizard-Server kernel: RDX: ffff888103c4b410 RSI: 0000000000000001 RDI: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829513f0
Dec 16 11:57:06 Wizard-Server kernel: R10: 00003fffffffffff R11: fefefefefefefeff R12: 0000000000000001
Dec 16 11:57:06 Wizard-Server kernel: R13: ffff8881010cc000 R14: ffff888105d19b50 R15: ffff8881010cc0d0
Dec 16 11:57:06 Wizard-Server kernel: FS:  000014e9445d8240(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020 CR3: 000000010368e000 CR4: 0000000000750ef0
Dec 16 11:57:06 Wizard-Server kernel: PKRU: 55555554
Dec 16 11:57:06 Wizard-Server kernel: Call Trace:
Dec 16 11:57:06 Wizard-Server kernel: <TASK>
Dec 16 11:57:06 Wizard-Server kernel: ? __die_body+0x1a/0x5c
Dec 16 11:57:06 Wizard-Server kernel: ? page_fault_oops+0x329/0x376
Dec 16 11:57:06 Wizard-Server kernel: ? do_user_addr_fault+0x12e/0x48d
Dec 16 11:57:06 Wizard-Server kernel: ? exc_page_fault+0xfb/0x11d
Dec 16 11:57:06 Wizard-Server kernel: ? asm_exc_page_fault+0x22/0x30
Dec 16 11:57:06 Wizard-Server kernel: ? klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: device_del+0xb6/0x31d
Dec 16 11:57:06 Wizard-Server kernel: ? i915_ggtt_probe_hw+0x593/0x5be [i915]
Dec 16 11:57:06 Wizard-Server kernel: platform_device_del+0x21/0x70
Dec 16 11:57:06 Wizard-Server kernel: platform_device_unregister+0xf/0x19
Dec 16 11:57:06 Wizard-Server kernel: sysfb_disable+0x2b/0x54
Dec 16 11:57:06 Wizard-Server kernel: aperture_remove_conflicting_pci_devices+0x1e/0x82
Dec 16 11:57:06 Wizard-Server kernel: i915_driver_probe+0x83f/0xc19 [i915]
Dec 16 11:57:06 Wizard-Server kernel: ? slab_free_freelist_hook.constprop.0+0x3b/0xaf
Dec 16 11:57:06 Wizard-Server kernel: local_pci_probe+0x3d/0x81
Dec 16 11:57:06 Wizard-Server kernel: pci_device_probe+0x197/0x1eb
Dec 16 11:57:06 Wizard-Server kernel: ? sysfs_do_create_link_sd+0x71/0xb7
Dec 16 11:57:06 Wizard-Server kernel: really_probe+0x115/0x282
Dec 16 11:57:06 Wizard-Server kernel: __driver_probe_device+0xc0/0xf2
Dec 16 11:57:06 Wizard-Server kernel: driver_probe_device+0x1f/0x77
Dec 16 11:57:06 Wizard-Server kernel: ? __device_attach_driver+0x97/0x97
Dec 16 11:57:06 Wizard-Server kernel: __driver_attach+0xd7/0xee
Dec 16 11:57:06 Wizard-Server kernel: ? __device_attach_driver+0x97/0x97
Dec 16 11:57:06 Wizard-Server kernel: bus_for_each_dev+0x6e/0xa7
Dec 16 11:57:06 Wizard-Server kernel: bus_add_driver+0xd8/0x1d0
Dec 16 11:57:06 Wizard-Server kernel: driver_register+0x99/0xd7
Dec 16 11:57:06 Wizard-Server kernel: i915_init+0x1f/0x7f [i915]
Dec 16 11:57:06 Wizard-Server kernel: ? 0xffffffffa2257000
Dec 16 11:57:06 Wizard-Server kernel: do_one_initcall+0x82/0x19f
Dec 16 11:57:06 Wizard-Server kernel: ? kmalloc_trace+0x43/0x52
Dec 16 11:57:06 Wizard-Server kernel: do_init_module+0x4b/0x1d4
Dec 16 11:57:06 Wizard-Server kernel: __do_sys_init_module+0xb6/0xf9
Dec 16 11:57:06 Wizard-Server kernel: do_syscall_64+0x68/0x81
Dec 16 11:57:06 Wizard-Server kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0033:0x14e944aeadfa
Dec 16 11:57:06 Wizard-Server kernel: Code: 48 8b 0d 21 20 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee 1f 0d 00 f7 d8 64 89 01 48
Dec 16 11:57:06 Wizard-Server kernel: RSP: 002b:00007ffe72d55f08 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffffffffffffffda RBX: 0000000000468c70 RCX: 000014e944aeadfa
Dec 16 11:57:06 Wizard-Server kernel: RDX: 000014e944bdfaad RSI: 00000000004b1868 RDI: 000014e943cc0010
Dec 16 11:57:06 Wizard-Server kernel: RBP: 000014e944bdfaad R08: 0000000000000007 R09: 0000000000464e80
Dec 16 11:57:06 Wizard-Server kernel: R10: 0000000000000005 R11: 0000000000000246 R12: 000014e943cc0010
Dec 16 11:57:06 Wizard-Server kernel: R13: 0000000000000000 R14: 0000000000459c30 R15: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: </TASK>
Dec 16 11:57:06 Wizard-Server kernel: Modules linked in: kvm_intel(+) znvpair(PO) i915(+) spl(O) kvm iosf_mbi drm_buddy i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul drm_display_helper crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel drm_kms_helper mei_hdcp mei_pxp crypto_simd cryptd rapl intel_cstate wmi_bmof drm mpt3sas intel_uncore ahci mei_me intel_gtt i2c_i801 nvme agpgart raid_class i2c_smbus hid_apple input_leds syscopyarea r8125(O) i2c_core nvme_core scsi_transport_sas joydev mei libahci led_class sysfillrect thermal sysimgblt fb_sys_fops fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020
Dec 16 11:57:06 Wizard-Server kernel: ---[ end trace 0000000000000000 ]---
Dec 16 11:57:06 Wizard-Server kernel: sdg: sdg1
Dec 16 11:57:06 Wizard-Server kernel: sd 2:0:4:0: [sdg] Attached SCSI disk
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0010:klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: Code: 03 00 31 c0 48 89 03 5b 89 e8 5d 41 5c 41 5d c3 cc cc cc cc 41 55 41 54 41 89 f4 55 53 48 8b 2f 48 89 fb 48 83 e5 fe 48 89 ef <4c> 8b 6d 20 e8 d2 9b 03 00 45 84 e4 74 10 48 8b 03 a8 01 74 02 0f
Dec 16 11:57:06 Wizard-Server kernel: RSP: 0018:ffffc9000103bab8 EFLAGS: 00010246
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffff888135074b80 RBX: ffff888135074ba8 RCX: ffff888135074b80
Dec 16 11:57:06 Wizard-Server kernel: RDX: ffff888103c4b410 RSI: 0000000000000001 RDI: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829513f0
Dec 16 11:57:06 Wizard-Server kernel: R10: 00003fffffffffff R11: fefefefefefefeff R12: 0000000000000001
Dec 16 11:57:06 Wizard-Server kernel: R13: ffff8881010cc000 R14: ffff888105d19b50 R15: ffff8881010cc0d0
Dec 16 11:57:06 Wizard-Server kernel: FS:  000014e9445d8240(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020 CR3: 000000010368e000 CR4: 0000000000750ef0
Dec 16 11:57:06 Wizard-Server kernel: PKRU: 55555554

 

Do you have a monitor or at least a HDMI dummy plug connected to your iGPU?

 

Please note that this is not related to my plugin.

Link to comment
4 hours ago, ich777 said:

You have a call trace in your syslog which is most likely the cause of the issue:

Dec 16 11:57:06 Wizard-Server kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access
Dec 16 11:57:06 Wizard-Server kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Dec 16 11:57:06 Wizard-Server kernel: #PF: supervisor read access in kernel mode
Dec 16 11:57:06 Wizard-Server kernel: #PF: error_code(0x0000) - not-present page
Dec 16 11:57:06 Wizard-Server kernel: PGD 105f4f067 P4D 105f4f067 PUD 108c01067 PMD 0 
Dec 16 11:57:06 Wizard-Server kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 16 11:57:06 Wizard-Server kernel: CPU: 0 PID: 1148 Comm: udevd Tainted: P           O       6.1.64-Unraid #1
Dec 16 11:57:06 Wizard-Server kernel: Hardware name: Default string Default string/MS-WS W680 D4, BIOS H4.2G 11/26/2022
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0010:klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: Code: 03 00 31 c0 48 89 03 5b 89 e8 5d 41 5c 41 5d c3 cc cc cc cc 41 55 41 54 41 89 f4 55 53 48 8b 2f 48 89 fb 48 83 e5 fe 48 89 ef <4c> 8b 6d 20 e8 d2 9b 03 00 45 84 e4 74 10 48 8b 03 a8 01 74 02 0f
Dec 16 11:57:06 Wizard-Server kernel: RSP: 0018:ffffc9000103bab8 EFLAGS: 00010246
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffff888135074b80 RBX: ffff888135074ba8 RCX: ffff888135074b80
Dec 16 11:57:06 Wizard-Server kernel: RDX: ffff888103c4b410 RSI: 0000000000000001 RDI: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829513f0
Dec 16 11:57:06 Wizard-Server kernel: R10: 00003fffffffffff R11: fefefefefefefeff R12: 0000000000000001
Dec 16 11:57:06 Wizard-Server kernel: R13: ffff8881010cc000 R14: ffff888105d19b50 R15: ffff8881010cc0d0
Dec 16 11:57:06 Wizard-Server kernel: FS:  000014e9445d8240(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020 CR3: 000000010368e000 CR4: 0000000000750ef0
Dec 16 11:57:06 Wizard-Server kernel: PKRU: 55555554
Dec 16 11:57:06 Wizard-Server kernel: Call Trace:
Dec 16 11:57:06 Wizard-Server kernel: <TASK>
Dec 16 11:57:06 Wizard-Server kernel: ? __die_body+0x1a/0x5c
Dec 16 11:57:06 Wizard-Server kernel: ? page_fault_oops+0x329/0x376
Dec 16 11:57:06 Wizard-Server kernel: ? do_user_addr_fault+0x12e/0x48d
Dec 16 11:57:06 Wizard-Server kernel: ? exc_page_fault+0xfb/0x11d
Dec 16 11:57:06 Wizard-Server kernel: ? asm_exc_page_fault+0x22/0x30
Dec 16 11:57:06 Wizard-Server kernel: ? klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: device_del+0xb6/0x31d
Dec 16 11:57:06 Wizard-Server kernel: ? i915_ggtt_probe_hw+0x593/0x5be [i915]
Dec 16 11:57:06 Wizard-Server kernel: platform_device_del+0x21/0x70
Dec 16 11:57:06 Wizard-Server kernel: platform_device_unregister+0xf/0x19
Dec 16 11:57:06 Wizard-Server kernel: sysfb_disable+0x2b/0x54
Dec 16 11:57:06 Wizard-Server kernel: aperture_remove_conflicting_pci_devices+0x1e/0x82
Dec 16 11:57:06 Wizard-Server kernel: i915_driver_probe+0x83f/0xc19 [i915]
Dec 16 11:57:06 Wizard-Server kernel: ? slab_free_freelist_hook.constprop.0+0x3b/0xaf
Dec 16 11:57:06 Wizard-Server kernel: local_pci_probe+0x3d/0x81
Dec 16 11:57:06 Wizard-Server kernel: pci_device_probe+0x197/0x1eb
Dec 16 11:57:06 Wizard-Server kernel: ? sysfs_do_create_link_sd+0x71/0xb7
Dec 16 11:57:06 Wizard-Server kernel: really_probe+0x115/0x282
Dec 16 11:57:06 Wizard-Server kernel: __driver_probe_device+0xc0/0xf2
Dec 16 11:57:06 Wizard-Server kernel: driver_probe_device+0x1f/0x77
Dec 16 11:57:06 Wizard-Server kernel: ? __device_attach_driver+0x97/0x97
Dec 16 11:57:06 Wizard-Server kernel: __driver_attach+0xd7/0xee
Dec 16 11:57:06 Wizard-Server kernel: ? __device_attach_driver+0x97/0x97
Dec 16 11:57:06 Wizard-Server kernel: bus_for_each_dev+0x6e/0xa7
Dec 16 11:57:06 Wizard-Server kernel: bus_add_driver+0xd8/0x1d0
Dec 16 11:57:06 Wizard-Server kernel: driver_register+0x99/0xd7
Dec 16 11:57:06 Wizard-Server kernel: i915_init+0x1f/0x7f [i915]
Dec 16 11:57:06 Wizard-Server kernel: ? 0xffffffffa2257000
Dec 16 11:57:06 Wizard-Server kernel: do_one_initcall+0x82/0x19f
Dec 16 11:57:06 Wizard-Server kernel: ? kmalloc_trace+0x43/0x52
Dec 16 11:57:06 Wizard-Server kernel: do_init_module+0x4b/0x1d4
Dec 16 11:57:06 Wizard-Server kernel: __do_sys_init_module+0xb6/0xf9
Dec 16 11:57:06 Wizard-Server kernel: do_syscall_64+0x68/0x81
Dec 16 11:57:06 Wizard-Server kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0033:0x14e944aeadfa
Dec 16 11:57:06 Wizard-Server kernel: Code: 48 8b 0d 21 20 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee 1f 0d 00 f7 d8 64 89 01 48
Dec 16 11:57:06 Wizard-Server kernel: RSP: 002b:00007ffe72d55f08 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffffffffffffffda RBX: 0000000000468c70 RCX: 000014e944aeadfa
Dec 16 11:57:06 Wizard-Server kernel: RDX: 000014e944bdfaad RSI: 00000000004b1868 RDI: 000014e943cc0010
Dec 16 11:57:06 Wizard-Server kernel: RBP: 000014e944bdfaad R08: 0000000000000007 R09: 0000000000464e80
Dec 16 11:57:06 Wizard-Server kernel: R10: 0000000000000005 R11: 0000000000000246 R12: 000014e943cc0010
Dec 16 11:57:06 Wizard-Server kernel: R13: 0000000000000000 R14: 0000000000459c30 R15: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: </TASK>
Dec 16 11:57:06 Wizard-Server kernel: Modules linked in: kvm_intel(+) znvpair(PO) i915(+) spl(O) kvm iosf_mbi drm_buddy i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul drm_display_helper crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel drm_kms_helper mei_hdcp mei_pxp crypto_simd cryptd rapl intel_cstate wmi_bmof drm mpt3sas intel_uncore ahci mei_me intel_gtt i2c_i801 nvme agpgart raid_class i2c_smbus hid_apple input_leds syscopyarea r8125(O) i2c_core nvme_core scsi_transport_sas joydev mei libahci led_class sysfillrect thermal sysimgblt fb_sys_fops fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020
Dec 16 11:57:06 Wizard-Server kernel: ---[ end trace 0000000000000000 ]---
Dec 16 11:57:06 Wizard-Server kernel: sdg: sdg1
Dec 16 11:57:06 Wizard-Server kernel: sd 2:0:4:0: [sdg] Attached SCSI disk
Dec 16 11:57:06 Wizard-Server kernel: RIP: 0010:klist_put+0x16/0x74
Dec 16 11:57:06 Wizard-Server kernel: Code: 03 00 31 c0 48 89 03 5b 89 e8 5d 41 5c 41 5d c3 cc cc cc cc 41 55 41 54 41 89 f4 55 53 48 8b 2f 48 89 fb 48 83 e5 fe 48 89 ef <4c> 8b 6d 20 e8 d2 9b 03 00 45 84 e4 74 10 48 8b 03 a8 01 74 02 0f
Dec 16 11:57:06 Wizard-Server kernel: RSP: 0018:ffffc9000103bab8 EFLAGS: 00010246
Dec 16 11:57:06 Wizard-Server kernel: RAX: ffff888135074b80 RBX: ffff888135074ba8 RCX: ffff888135074b80
Dec 16 11:57:06 Wizard-Server kernel: RDX: ffff888103c4b410 RSI: 0000000000000001 RDI: 0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829513f0
Dec 16 11:57:06 Wizard-Server kernel: R10: 00003fffffffffff R11: fefefefefefefeff R12: 0000000000000001
Dec 16 11:57:06 Wizard-Server kernel: R13: ffff8881010cc000 R14: ffff888105d19b50 R15: ffff8881010cc0d0
Dec 16 11:57:06 Wizard-Server kernel: FS:  000014e9445d8240(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
Dec 16 11:57:06 Wizard-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 11:57:06 Wizard-Server kernel: CR2: 0000000000000020 CR3: 000000010368e000 CR4: 0000000000750ef0
Dec 16 11:57:06 Wizard-Server kernel: PKRU: 55555554

 

Do you have a monitor or at least a HDMI dummy plug connected to your iGPU?

 

Please note that this is not related to my plugin.

Errr...no,but i don't need such thing when i still use 6.11.5+12400.

That's kinda wierd

Link to comment

Hi!

I'm trying to get my Mellanox ConnectX4LX MCX4121A-ACAT working with unraid. Started with the 'German' and the 'General Support' sub-forums before I remembered that I used your plugin to get the ConnectX-3 working.

Right now it seems that my connectX-4 nic isn't supported.

 

Quote

mstconfig q

Device type:        ConnectX4LX         
Name:               MCX4121A-ACA_Ax     
Description:        ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
Device:             /sys/bus/pci/devices/0000:06:00.0/config

...

-E- Unsupported device

 

Is there a chance to get this card working with unraid also? What is it that I have to do?

Kind regards,

Tom

Link to comment
21 hours ago, ich777 said:

Connect X4 cards are known to work well with Unraid.

 

Without Diagnostics I can‘t say anything.

Hi ich777!

Thank you for your short dated reply! That is what I thought also after having read about the connectx-4 cards and unraid.

What comes to my mind... I just installed the new card and changed the assignment for the first eth-port to the new nic within network settings. Is it that I have to 'reset' the network before assigning the nic?

 

I added the diagnostics-file

 

 

Edited by DerTom
Link to comment
1 hour ago, DerTom said:

Thank you for your short dated reply! That is what I thought also after having read about the connectx-4 cards and unraid.

I don't see why your card should not work:

04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
	Subsystem: Mellanox Technologies ConnectX-3 10 GbE Single Port SFP+ Adapter [15b3:0055]
	Kernel driver in use: mlx4_core
	Kernel modules: mlx4_core
06:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
	Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT [15b3:0003]
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core
06:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
	Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT [15b3:0003]
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

 

Both your ConnectX3 and ConnectX4 are detected and running.

 

1 hour ago, DerTom said:

Is it that I have to 'reset' the network before assigning the nic?

You can delete network.cfg and network-rules.cfg from /boot/config, reboot and see if that changes anything (keep in mind that your server may have another IP and it is not reachable on the IP where it was before).

  • Thanks 1
Link to comment
19 hours ago, ich777 said:

I don't see why your card should not work:

04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
	Subsystem: Mellanox Technologies ConnectX-3 10 GbE Single Port SFP+ Adapter [15b3:0055]
	Kernel driver in use: mlx4_core
	Kernel modules: mlx4_core
06:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
	Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT [15b3:0003]
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core
06:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
	Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT [15b3:0003]
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

 

Both your ConnectX3 and ConnectX4 are detected and running.

 

You can delete network.cfg and network-rules.cfg from /boot/config, reboot and see if that changes anything (keep in mind that your server may have another IP and it is not reachable on the IP where it was before).

I had to delete the two network config files. Right now it seems to work.

  • Like 1
Link to comment
Quote

 

Step 3: Verify Fan Reporting

Check to see if the fans are being correctly reported on your UnRAID dashboard. If it's correctly set up, you should see the fan speed under the hardware status. You might see two other fans that are unrelated. I'd just ignore them as they seem to be harmless.

 

 

There is no "hardware status" on my dashboard.

 

This is with QNAP TS-EC1079 Pro.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.