[Plugin] Nvidia-Driver


ich777

Recommended Posts





assuming that the thermals arent an issue, should this be able to run multiple docker containers pointing to the same GPU?


No problem at all if you want to use the card accross multiple containers at the same time, I also use one card in Emby, Jellyfin & Plex because I have to test the drivers if a new one is released.

I only can think of a hardware issue or something similar, have you tried to assign another card to the Plex container and see if it is working?

Sent from my C64



Link to comment
3 minutes ago, ich777 said:



 

 


No problem at all if you want to use the card accross multiple containers at the same time, I also use one card in Emby, Jellyfin & Plex because I have to test the drivers if a new one is released.

I only can think of a hardware issue or something similar, have you tried to assogn another card to the Plex container and see if it is working?

Sent from my C64

 

 

I just assigned plex to the 3rd GPU (Titan) and ill give that a shot right now.

  • Like 1
Link to comment
13 hours ago, VladoPortos said:

So I did not get crash as before, maybe because I caught it first but seems like it network related at least this was in log:

 

Oct  9 17:25:13 PlexServer kernel: ------------[ cut here ]------------
Oct  9 17:25:13 PlexServer kernel: WARNING: CPU: 4 PID: 78054 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: Modules linked in: xt_mark nvidia_uvm(PO) xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter xfs nfsd lockd grace sunrpc md_mod i915 video iosf_mbi i2c_algo_bit intel_gtt nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart it87 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables wmi_bmof mxm_wmi edac_mce_amd amd_energy kvm_amd btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul crc32_pclmul crc32c_intel mpt3sas ghash_clmulni_intel aesni_intel crypto_simd
Oct  9 17:25:13 PlexServer kernel: ecdh_generic ecc cryptd nvme i2c_piix4 glue_helper atlantic ahci nvme_core libahci raid_class scsi_transport_sas i2c_core rapl ccp wmi k10temp thermal button acpi_cpufreq
Oct  9 17:25:13 PlexServer kernel: CPU: 4 PID: 78054 Comm: kworker/4:0 Tainted: P           O      5.10.28-Unraid #1
Oct  9 17:25:13 PlexServer kernel: Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40 AORUS MASTER, BIOS F5q 04/12/2021
Oct  9 17:25:13 PlexServer kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Oct  9 17:25:13 PlexServer kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01
Oct  9 17:25:13 PlexServer kernel: RSP: 0018:ffffc9000037cdd8 EFLAGS: 00010202
Oct  9 17:25:13 PlexServer kernel: RAX: 0000000000000188 RBX: 000000000000d3ca RCX: 00000000fe806454
Oct  9 17:25:13 PlexServer kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa00dc128
Oct  9 17:25:13 PlexServer kernel: RBP: ffff888de1169900 R08: 00000000c01ab324 R09: ffff8883ef80aba0
Oct  9 17:25:13 PlexServer kernel: R10: 0000000000000098 R11: ffff8882e0cea100 R12: 0000000000001cca
Oct  9 17:25:13 PlexServer kernel: R13: ffffffff8210b440 R14: 000000000000d3ca R15: 0000000000000000
Oct  9 17:25:13 PlexServer kernel: FS:  0000000000000000(0000) GS:ffff88903d100000(0000) knlGS:0000000000000000
Oct  9 17:25:13 PlexServer kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  9 17:25:13 PlexServer kernel: CR2: 0000000000527fc8 CR3: 00000001645d2000 CR4: 0000000000350ee0
Oct  9 17:25:13 PlexServer kernel: Call Trace:
Oct  9 17:25:13 PlexServer kernel: <IRQ>
Oct  9 17:25:13 PlexServer kernel: nf_conntrack_confirm+0x2f/0x36 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: nf_hook_slow+0x39/0x8e
Oct  9 17:25:13 PlexServer kernel: nf_hook.constprop.0+0xb1/0xd8
Oct  9 17:25:13 PlexServer kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
Oct  9 17:25:13 PlexServer kernel: ip_local_deliver+0x49/0x75
Oct  9 17:25:13 PlexServer kernel: __netif_receive_skb_one_core+0x74/0x95
Oct  9 17:25:13 PlexServer kernel: process_backlog+0xa3/0x13b
Oct  9 17:25:13 PlexServer kernel: net_rx_action+0xf4/0x29d
Oct  9 17:25:13 PlexServer kernel: __do_softirq+0xc4/0x1c2
Oct  9 17:25:13 PlexServer kernel: asm_call_irq_on_stack+0x12/0x20
Oct  9 17:25:13 PlexServer kernel: </IRQ>
Oct  9 17:25:13 PlexServer kernel: do_softirq_own_stack+0x2c/0x39
Oct  9 17:25:13 PlexServer kernel: do_softirq+0x3a/0x44
Oct  9 17:25:13 PlexServer kernel: netif_rx_ni+0x1c/0x22
Oct  9 17:25:13 PlexServer kernel: macvlan_broadcast+0x10e/0x13c [macvlan]
Oct  9 17:25:13 PlexServer kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan]
Oct  9 17:25:13 PlexServer kernel: process_one_work+0x13c/0x1d5
Oct  9 17:25:13 PlexServer kernel: worker_thread+0x18b/0x22f
Oct  9 17:25:13 PlexServer kernel: ? process_scheduled_works+0x27/0x27
Oct  9 17:25:13 PlexServer kernel: kthread+0xe5/0xea
Oct  9 17:25:13 PlexServer kernel: ? __kthread_bind_mask+0x57/0x57
Oct  9 17:25:13 PlexServer kernel: ret_from_fork+0x22/0x30
Oct  9 17:25:13 PlexServer kernel: ---[ end trace 33448d9f3f916301 ]---

 

So I disabled both NW interfaces in BIOS, added Intel Pro/1000 dual port PCI card that was working for me in another server (not unraid) for 2 years without hiccup... and another round of testing is a go... 

 

 

 

Probably my final update, seems like it was the macvlan driver issue for me and not Nvidia. I'm 14 hours in, two cards encoding videos and except these messages I haven't had crash or anything else strange in log. 

 

Oct 10 08:54:32 PlexServer kernel: caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 09:01:02 PlexServer kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Oct 10 09:01:02 PlexServer kernel: caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 09:04:52 PlexServer kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Oct 10 09:04:52 PlexServer kernel: caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 09:14:10 PlexServer kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Oct 10 09:14:10 PlexServer kernel: caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs

 

And as far as I know, there is nothing to be done about these... as it seems to be related to Bios...(?)

Link to comment
On 10/9/2021 at 1:13 PM, ich777 said:




 

 


No problem at all if you want to use the card accross multiple containers at the same time, I also use one card in Emby, Jellyfin & Plex because I have to test the drivers if a new one is released.

I only can think of a hardware issue or something similar, have you tried to assign another card to the Plex container and see if it is working?

Sent from my C64


 

 

 

after some testing, I think its the Phoenix Miner docker. no matter what config I use, whether its 1, 2, or 3 GPUs they will eventually crash. If I dont run the docker zfs crashes the system, but will not do it near as often.

Link to comment
1 hour ago, jonmerc said:

I have the driver installed and settings in plex set up but it looks like it's still using cpu to transcode. I even tried tdarr and it's also still using CPU. what could be my issue?

Have you followed the instructions on the first page for Plex (second post) and the instructions from the first post in this thread?

 

You've don't provided any further information: what card are you using, do you rebooted unRAID after the installation?

Please also post your Diagnostics.

Link to comment
7 minutes ago, TheSkaz said:

after some testing, I think its the Phoenix Miner docker. no matter what config I use, whether its 1, 2, or 3 GPUs they will eventually crash. If I dont run the docker zfs crashes the system, but will not do it near as often.

Is Plex is working just fine with the other card?

 

Do the server only crash if you have PhoenixMiner running, I would recommend that you create a post in the PhoenixMiner support thread, maybe @lnxd is able to help.

Link to comment

Hello! 


My nvidia driver doesn't seem to stick around after a reboot anymore.  I go to the community app store, install the plugin, confirm it shows up in Settings > Nvidia Driver then reboot to complete install and then its gone from the settings>Nvidia Driver and rinse and repeat.  Any suggestions?

 

System Devices:

IOMMU group 28:

[10de:1c03] 42:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)

[10de:10f1] 42:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)

 

Syslinux Config:

unRAID OS:

kernel /bzimage
append initrd=/bzroot

 

Settings.cfg

first_installation=true
driver_version=latest
local_version=465.31
disable_xconfig=false
update_check=false
 

Install Plugin results:

plugin: installing: https://github.com/ich777/unraid-nvidia-driver/raw/master/nvidia-driver.plg
plugin: downloading https://github.com/ich777/unraid-nvidia-driver/raw/master/nvidia-driver.plg
plugin: downloading: https://github.com/ich777/unraid-nvidia-driver/raw/master/nvidia-driver.plg ... done

+==============================================================================
| Installing new package /boot/config/plugins/nvidia-driver/nvidia-driver-2021.09.17.txz
+==============================================================================

Verifying package nvidia-driver-2021.09.17.txz.
Installing package nvidia-driver-2021.09.17.txz:
PACKAGE DESCRIPTION:
Package nvidia-driver-2021.09.17.txz installed.

 

Nvidia Driver:

nVidia Info:

Nvidia Driver Version:

Installed GPU(s):

 

System Info

Unraid Version:6.9.2

Kernel:5.10.28-Unraid

Compile Date:Wed Apr 7 08:23:18 PDT 2021

 

root@Tower:~# nvidia-smi
-bash: nvidia-smi: command not found
 

Link to comment
On 11/15/2020 at 9:22 PM, ich777 said:

If you only want to use your Nvidia graphics card for a VM then don't install this Plugin

Thanks for the great plugin and support thread.

 

So, a gpu that is bind to vfio and also used in a vm (shut down), won't appear in the plugin?

 

My idea here is to try check the gpu power status, usage, fan speeds etc. while idle and before turning on the windows vm.

Edited by Kydonakis
Link to comment
9 hours ago, Kydonakis said:

So, a gpu that is bind to vfio and also used in a vm (shut down), won't appear in the plugin?

 

yes, this plugin is meant to use the nvidia gpu for unraid (dockers, ...), vfio bind = out of sight for unraid ...

 

9 hours ago, Kydonakis said:

My idea here is to try check the gpu power status, usage, fan speeds etc. while idle and before turning on the windows vm.

i have no idea why you would do so ... but actually rather leave a VM running instead keeping it off (using less power then turning vm off, even when some small cpu usage is needed), when you want to have some infos while the VM is in idle, take a look at the MSI Afterburner prometheus plugin here, takes some nice Infos out for grafana.

 

sample from here (my idle gaming VM)

image.thumb.png.20cf53a864034cf783afdf0e4975c8e8.png

Link to comment
On 10/13/2021 at 12:02 AM, cyberd said:

My nvidia driver doesn't seem to stick around after a reboot anymore.

Please go to the Plugins tab and see if you got a tab that says "Plugin Errors".

Can you try to delete the plugin and reinstall it from the CA App and see if it is working again, keep in mind that you have to restart the Docker sevice after doing that.

Link to comment
1 minute ago, Kydonakis said:

I see that the gpu fan are spinning so there should be some load on the gpu. This is what I am investigating here!

 

there is no load on the gpu, there is just no control so thats why the fans are running (wrong mode while off)

 

keep an vm running and you good to go

Link to comment
3 minutes ago, Kydonakis said:

While the vm is shut down, I see that the gpu fan are spinning so there should be some load on the gpu.

As @alturismo said there is no load on the GPU but since if you stop a VM the card is in technical terms reset (like if you reboot a computer or better speaking started) and dependent on how the manufacturer from the card has implemented how the card behaves when no driver is loaded in the BIOS of the card the fans spin at 0%, 30%, 50%,... on some cards they spin even at 100% like you can read a few posts above.

Link to comment

ok, I see. It is only the "reset status" of the gpu bios that causes the fans to rotate at some speed and this comes with no extra load (power consumption). Thanks for the information.

 

It is then, in my case, either choosing between a 24/7 half-speed fan spin (vm shut down) and the zero fan speed that comes with an extra 50-60W on the idle vm (cpu consumption and random windows disk readings/writings).

 

 

Link to comment
17 minutes ago, Kydonakis said:

this comes with no extra load (power consumption).

This is also a thing that is dependent on the implementation from the manufacturer, some cards are in a deep sleep P8 and higher and some cards in P0 (highest power state but that has nothing to say about how much power the card consumes).

You have to test this with a power meter connected to your server, completely dependent on the card.

Link to comment
2 hours ago, Kydonakis said:

and the zero fan speed that comes with an extra 50-60W on the idle vm (cpu consumption and random windows disk readings/writings).

if you need a extra 50-60W for an idle VM there would be something wrong i assume

 

in my case, my whole system need ~ 10W less when the VM is in idle instead turning it off and the GPU is not in low mode ...

 

and im ~ 60 - 65W in total with 3 running VM's, 1VM x GT1030, 1VM x RTX3070, 1VM gvt-g igpu and some docker(s) etc ...

 

image.thumb.png.d957d21bc95b113145801c69a0ee0552.png

 

when i now turn my Media VM off (RTX 3070) my power consumption increases to 70 - 75W in total, so senseless to me to keep it off, i just have some small idle CPU consumption ... so for cooling it would be a little better, in sum i loose to turn off.

  • Like 1
Link to comment
18 hours ago, ich777 said:

Please go to the Plugins tab and see if you got a tab that says "Plugin Errors".

Can you try to delete the plugin and reinstall it from the CA App and see if it is working again, keep in mind that you have to restart the Docker sevice after doing that.

No errors showing in the plugin tab. The Nvidia plugin also doesn't show up on that tab just under settings.  

 

There was no option to uninstall so I deleted the folder from boot and tried that way. Same results.. any other thoughts?

 

Link to comment
No errors showing in the plugin tab. The Nvidia plugin also doesn't show up on that tab just under settings.  
 
There was no option to uninstall so I deleted the folder from boot and tried that way. Same results.. any other thoughts?
 
The plugin only showed up under settings and not on the plugin page?
That shouldn't be possible.
What version from unRAID are you running?
Please post your Diagnostics.

Sent from my C64

Link to comment
15 minutes ago, cyberd said:

Yeah. It doesn't display in the plugins tab at all.

Please issue:

rm -rf /boot/config/plugins/nvidia-driver
rm -rf /boot/config/plugins/nvidia-driver.plg

 

After you did that please restart the server and try to pull the plugin again from the CA App.

 

Are you sure that you don't closed the plugin download window with the red 'X' in the upper corner and waited for the "Done" button to appear?

 

One thing that I've noticed is that on your USB Flash device seems something wrong from what I see here...

In the /boot/config folder are files and folders that are usually located in the root of your USB Flash device, what I've also noticed is that you got database files in /boot/config, I don't think it's a good idea to put a database file on the USB Flash device since this can/will cause extensive writes and also maybe a failure of the device.

Link to comment
9 hours ago, ich777 said:

Please issue:

rm -rf /boot/config/plugins/nvidia-driver
rm -rf /boot/config/plugins/nvidia-driver.plg

 

After you did that please restart the server and try to pull the plugin again from the CA App.

 

Are you sure that you don't closed the plugin download window with the red 'X' in the upper corner and waited for the "Done" button to appear?

 

One thing that I've noticed is that on your USB Flash device seems something wrong from what I see here...

In the /boot/config folder are files and folders that are usually located in the root of your USB Flash device, what I've also noticed is that you got database files in /boot/config, I don't think it's a good idea to put a database file on the USB Flash device since this can/will cause extensive writes and also maybe a failure of the device.

Good catch on the USB Flash. I cleaned it up and rebooted and re-installed the plugin and all is well.   Seems there was a duplicate of everything in the config from March.. which was the last time I made big changes so a bad copy command.

 

Thanks for the assistance!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.