[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

4 hours ago, cjjermont said:

according to there driver comparability it shows the m2000

image.png.b5ffefc01a8d1098c1cb160907782736.pngimage.png.5a1bc2c5d12b75d008edc1c4ecb2c6d3.png

No idea how I came to that conclusion then.

Post a screenhot of the Nvidia plugin page and the output of lspci -k

Edited by saarg
Link to comment
2 hours ago, saarg said:

No idea how I came to that conclusion then.

Post a screenhot of the Nvidia plugin page and the output of lspci -k

image.thumb.png.59d7f22761428858874dd1002900293f.pngi get two different warnings.. if i put it in vfio i get a different warning..last is just for system stats so you know  

image.png

image.png

Link to comment

I tried installing this yesterday and after the reboot, my OS does not start all the way. I am unable able to log in with ssh, but not through the GUI. I see these logs in syslog and if I reboot per the message it is the same. It does start all the way in safe mode.

Jun 30 20:13:20 Tower root: plugin: installing: /boot/config/plugins/Unraid-Nvidia.plg
Jun 30 20:13:20 Tower root: plugin: running: anonymous
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root: plugin: running: anonymous
Jun 30 20:13:20 Tower root: plugin: skipping: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz already exists
Jun 30 20:13:20 Tower root: plugin: running: /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root: +==============================================================================
Jun 30 20:13:20 Tower root: | Installing new package /boot/config/plugins/Unraid-Nvidia/Unraid-Nvidia-2019.06.23.txz
Jun 30 20:13:20 Tower root: +==============================================================================
Jun 30 20:13:20 Tower root:
Jun 30 20:13:20 Tower root: Verifying package Unraid-Nvidia-2019.06.23.txz.
Jun 30 20:13:20 Tower root: Installing package Unraid-Nvidia-2019.06.23.txz:
Jun 30 20:13:20 Tower root: PACKAGE DESCRIPTION:
Jun 30 20:13:20 Tower root: Package Unraid-Nvidia-2019.06.23.txz installed.
Jun 30 20:13:20 Tower root: plugin: running: anonymous
Jun 30 20:13:21 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Jun 30 20:13:21 Tower kernel: #PF: supervisor write access in kernel mode
Jun 30 20:13:21 Tower kernel: #PF: error_code(0x0002) - not-present page
Jun 30 20:13:21 Tower kernel: PGD 80000013f5291067 P4D 80000013f5291067 PUD 1426ff2067 PMD 0
Jun 30 20:13:21 Tower kernel: Oops: 0002 [#1] SMP PTI
Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P           O      5.7.2-Unraid #1
Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019
Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
Jun 30 20:13:21 Tower kernel: FS:  000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0
Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 30 20:13:21 Tower kernel: Call Trace:
Jun 30 20:13:21 Tower kernel: ? _nv025251rm+0x260/0x260 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _nv031459rm+0x7a/0xb0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _nv031799rm+0x6ec/0x2440 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _nv021294rm+0xbb/0x1a0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _nv021542rm+0x27/0x50 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Jun 30 20:13:21 Tower kernel: ? _nv000901rm+0x1200/0x1cc0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? rm_init_adapter+0xd5/0xe0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? nv_open_device+0x434/0x648 [nvidia]
Jun 30 20:13:21 Tower kernel: ? nvidia_open+0x2a1/0x41a [nvidia]
Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_open+0x62/0x8d [nvidia]
Jun 30 20:13:21 Tower kernel: ? chrdev_open+0x150/0x187
Jun 30 20:13:21 Tower kernel: ? cdev_put+0x19/0x19
Jun 30 20:13:21 Tower kernel: ? do_dentry_open+0x181/0x296
Jun 30 20:13:21 Tower kernel: ? path_openat+0x85a/0x933
Jun 30 20:13:21 Tower kernel: ? do_filp_open+0x4c/0xa9
Jun 30 20:13:21 Tower kernel: ? up_write+0x17/0x24
Jun 30 20:13:21 Tower kernel: ? chown_common.isra.0+0xec/0x14d
Jun 30 20:13:21 Tower kernel: ? _cond_resched+0x1b/0x1e
Jun 30 20:13:21 Tower kernel: ? slab_pre_alloc_hook+0x2c/0x53
Jun 30 20:13:21 Tower kernel: ? do_sys_openat2+0x6d/0xd9
Jun 30 20:13:21 Tower kernel: ? do_sys_open+0x35/0x4f
Jun 30 20:13:21 Tower kernel: ? do_syscall_64+0x7a/0x87
Jun 30 20:13:21 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio]
Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000
Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb81 ]---
Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
Jun 30 20:13:21 Tower kernel: FS:  000014bad9eb3b80(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 00000013f536c001 CR4: 00000000003606e0
Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 30 20:13:21 Tower kernel: general protection fault, probably for non-canonical address 0x61ad8614c5ac1b00: 0000 [#2] SMP PTI
Jun 30 20:13:21 Tower kernel: CPU: 20 PID: 5398 Comm: nvidia-smi Tainted: P      D    O      5.7.2-Unraid #1
Jun 30 20:13:21 Tower kernel: Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4g.0.1113190807 11/13/2019
Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv007414rm+0x2c/0x330 [nvidia]
Jun 30 20:13:21 Tower kernel: Code: 48 85 d2 74 07 48 63 47 08 48 01 d0 48 8b 17 48 85 d2 75 16 e9 9d 02 00 00 0f 1f 44 00 00 48 8b 4a 10 48 85 c9 74 17 48 89 ca <48> 39 32 77 ef 0f 83 29 02 00 00 48 8b 4a 18 48 85 c9 75 e9 48 89
Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97d40 EFLAGS: 00010006
Jun 30 20:13:21 Tower kernel: RAX: ffffc90020e97dc8 RBX: ffffc90020e97d70 RCX: 61ad8614c5ac1b00
Jun 30 20:13:21 Tower kernel: RDX: 61ad8614c5ac1b00 RSI: 0000000000001516 RDI: ffffffffa177a3d8
Jun 30 20:13:21 Tower kernel: RBP: ffff8893f2c22ff0 R08: 0000000000000001 R09: ffffffffa0588903
Jun 30 20:13:21 Tower kernel: R10: ffff889428430a00 R11: ffff889428430a00 R12: 675f65736e6f7073
Jun 30 20:13:21 Tower kernel: R13: ffff889428433000 R14: ffffffffa1778c20 R15: ffff889428433000
Jun 30 20:13:21 Tower kernel: FS:  0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0
Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 30 20:13:21 Tower kernel: Call Trace:
Jun 30 20:13:21 Tower kernel: ? _nv036791rm+0xf1/0x1d0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? rm_free_unused_clients+0x41/0xe0 [nvidia]
Jun 30 20:13:21 Tower kernel: ? _raw_spin_lock_irqsave+0x3a/0x66
Jun 30 20:13:21 Tower kernel: ? nvidia_close+0xf3/0x25b [nvidia]
Jun 30 20:13:21 Tower kernel: ? nvidia_frontend_close+0x2c/0x3e [nvidia]
Jun 30 20:13:21 Tower kernel: ? __fput+0x107/0x1d0
Jun 30 20:13:21 Tower kernel: ? task_work_run+0x70/0x81
Jun 30 20:13:21 Tower kernel: ? do_exit+0x3f8/0x8f3
Jun 30 20:13:21 Tower kernel: ? rewind_stack_do_exit+0x17/0x20
Jun 30 20:13:21 Tower kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat ip_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic bonding ixgbe mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore aesni_intel glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel kvm drm_kms_helper intel_cstate coretemp mxm_wmi drm crct10dif_pclmul intel_powerclamp crc32c_intel sb_edac backlight syscopyarea sysfillrect sysimgblt fb_sys_fops x86_pkg_temp_thermal agpgart ipmi_si ahci input_leds megaraid_sas libahci ipmi_ssif i2c_core led_class wmi acpi_power_meter button acpi_pad [last unloaded: mdio]
Jun 30 20:13:21 Tower kernel: ---[ end trace 2ead729f5369cb82 ]---
Jun 30 20:13:21 Tower kernel: RIP: 0010:_nv025250rm+0x8/0x40 [nvidia]
Jun 30 20:13:21 Tower kernel: Code: 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 57
Jun 30 20:13:21 Tower kernel: RSP: 0018:ffffc90020e97a28 EFLAGS: 00010296
Jun 30 20:13:21 Tower kernel: RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff8893eed22e48
Jun 30 20:13:21 Tower kernel: RDX: ffff8893f6963408 RSI: ffff8893e809e008 RDI: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: RBP: ffff8893eed22e40 R08: ffffffffa0a60930 R09: ffff8893eed229ec
Jun 30 20:13:21 Tower kernel: R10: 0000000000001516 R11: 0000000000000000 R12: ffff8893eed64008
Jun 30 20:13:21 Tower kernel: R13: ffff8893f68d4008 R14: ffff8893eed64008 R15: ffff8893f33e0008
Jun 30 20:13:21 Tower kernel: FS:  0000000000000000(0000) GS:ffff88942fd00000(0000) knlGS:0000000000000000
Jun 30 20:13:21 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 20:13:21 Tower kernel: CR2: 0000000000000000 CR3: 000000000200a002 CR4: 00000000003606e0
Jun 30 20:13:21 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 30 20:13:21 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 30 20:13:21 Tower kernel: Fixing recursive fault but reboot is needed!
Jun 30 20:13:42 Tower rsyslogd: action 'action-3-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2002.0 try https://www.rsyslog.com/e/2359 ]

root@Tower:~# lspci -v | grep VGA
07:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1) (prog-if 00 [VGA controller])

 

Wondering if anyone has any suggestions on where to start looking for an answer? I have tried my best google-fu and failed before posting here.

Edited by jwiener3
Link to comment
11 minutes ago, jwiener3 said:

unable to log in with ssh, but not through the GUI.

Unable for one way BUT not for the other way. I suspect you meant one of these was working since you said BUT. 

 

Did you mean to say you are able to use ssh?

Link to comment

Question: I have  an HP Gen8 DL380e i need the Kernel patch arround here compiled witouth RMMR check to be able to passtrough devices.

Have this nvidia Unraid this RRMR checks disabled? or is this incompatible with the RMMR checks solution?

Thanks!

Link to comment
38 minutes ago, ChoKoBo said:

Question: I have  an HP Gen8 DL380e i need the Kernel patch arround here compiled witouth RMMR check to be able to passtrough devices.

Have this nvidia Unraid this RRMR checks disabled? or is this incompatible with the RMMR checks solution?

Thanks!

There isn't any extra patches in this build, only the Nvidia drivers.

Link to comment
12 minutes ago, cjjermont said:


How would I do that?


Sent from my iPhone using Tapatalk

The same way you added it.

If you have chosen the GPU in a VM you have to choose another GPU to remove the stubbing. Then reboot.

Link to comment
The same way you added it.
If you have chosen the GPU in a VM you have to choose another GPU to remove the stubbing. Then reboot.

I have no vms, I’ve never used it for anything other than only gpu, It’s a used card, do I need to boot a windows system and remove all drivers from it? I only want it for plex transcode.


Sent from my iPhone using Tapatalk
Link to comment
13 hours ago, jwiener3 said:

Yes sorry, typo there. I AM able to login with ssh, not the gui.

Then there might be something wrong with the bzroot...

Try to go back to the stock one and try it again.

 

EDIT: sorry should be a bit more specific, put your usb thumb drive into your computer download the Unraid version 6.8.3 from the downloadpage and replace the bzroot/bzmodules/bzfirmware/bzimages after that put it back on your server and reboot, another method will be to log via SFTP into your server (should be still possible if you can connect through ssd) a tool for that would be WinSCP if you are on windows, then go to your boot directory and replace the above mentioned files.

Link to comment
7 hours ago, cjjermont said:


I have no vms, I’ve never used it for anything other than only gpu, It’s a used card, do I need to boot a windows system and remove all drivers from it? I only want it for plex transcode.


Sent from my iPhone using Tapatalk

You have done something to stub the card. Either by editing the syslinux.cfg or using the vfio plugin. You said in an earlier post that you put it in vfio, so reverse what you did.

Link to comment
9 hours ago, ich777 said:

Then there might be something wrong with the bzroot...

Try to go back to the stock one and try it again.

 

EDIT: sorry should be a bit more specific, put your usb thumb drive into your computer download the Unraid version 6.8.3 from the downloadpage and replace the bzroot/bzmodules/bzfirmware/bzimages after that put it back on your server and reboot, another method will be to log via SFTP into your server (should be still possible if you can connect through ssd) a tool for that would be WinSCP if you are on windows, then go to your boot directory and replace the above mentioned files.

Thank you, I did recover by going back to the standard files. I tried again with the NVIDIA drivers and had the same results. Put it back to the standard Unraid version again and then tried a 3rd time with the latest NVIDIA 6.9.0(22) with the same results.  I have reverted back to 6.8.3 standard, but does anyone have any suggestions on getting this to work so I can use my NVIDIA card? I am hoping to use it for my plex docker for transcoding.

Link to comment
33 minutes ago, jwiener3 said:

Thank you, I did recover by going back to the standard files. I tried again with the NVIDIA drivers and had the same results. Put it back to the standard Unraid version again and then tried a 3rd time with the latest NVIDIA 6.9.0(22) with the same results.  I have reverted back to 6.8.3 standard, but does anyone have any suggestions on getting this to work so I can use my NVIDIA card? I am hoping to use it for my plex docker for transcoding.

Just out of curiosity, try and do a fresh install of unraid on a spare USB, unplug your original unraid usb and put in the spare one.

(Don't start the array or anything or assign any drives as if you assign them incorrectly you could lose data)

Then install unraid nvidia.

 

Now see if that boots okay and if it does you definitely know that there's something on your original unraid install that is causing an issue.

Just to rule out anything hardware related possibly?

Edited by Solverz
Link to comment
6 hours ago, saarg said:

You have done something to stub the card. Either by editing the syslinux.cfg or using the vfio plugin. You said in an earlier post that you put it in vfio, so reverse what you did.

image.thumb.png.c23d2f3e0ac9ab2048ecaa6f89f38447.pngnothing is set, i only downloaded it to try it. but still didnt work.. the 660 shows up just fine.. and i can set it up for plex.. but the m2000 is what i want to use

Link to comment
1 hour ago, cjjermont said:

image.thumb.png.c23d2f3e0ac9ab2048ecaa6f89f38447.pngnothing is set, i only downloaded it to try it. but still didnt work.. the 660 shows up just fine.. and i can set it up for plex.. but the m2000 is what i want to use

What did you download?

 

Kernel driver in use: vfio

The above means that the device have loaded the dummy driver. This prevents the nvidia driver to be loaded. If you don't use any VM's, the vfio module is not loaded. So there is something you have done.

Post the syslinux.cfg file (You can find it when you click the Flash on the Main page), vfio-pci.cfg (In /boot/config/).

Link to comment
What did you download?
 
Kernel driver in use: vfio
The above means that the device have loaded the dummy driver. This prevents the nvidia driver to be loaded. If you don't use any VM's, the vfio module is not loaded. So there is something you have done.
Post the syslinux.cfg file (You can find it when you click the Flash on the Main page), vfio-pci.cfg (In /boot/config/).

f72c8e8fb19298bd5df68ea93e8aa9c8.jpg


Sent from my iPhone using Tapatalk
Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.