[Plugin] Nvidia-Driver


ich777

Recommended Posts

6 hours ago, ConnerVT said:

Though not your fault, I do get spammed by the nVidia driver bug that seems to have been around for quite some time:

Yes I know but there is a workaround for this but I have to search it, this is because the watchdog in the Kernel is now enabled with 6.9.1 and it now displays this "Warning" nothing to worry about, you can try to uninstall the GPU Statistics Plugin then the "Warnings" should be minimized to a few...

 

EDIT: Here is the post from @b3rs3rk how you can suppress this message (never tried this but it should work I think):

 

6 hours ago, ConnerVT said:

My biggest complaint is that I needed to buy a Quadro P400 to put in the server, to get the fullest out of hw transcoding.  ich777 - you owe me $120.  *grin*

Where can I send you the money... :D

 

6 hours ago, Mr_Jay84 said:

So from what I can gather the Nvidia plugin captures the GPU's and wont allow them to be used by VMs.

Not sure which post to answer first... :D

Have you turned persistence mode on?

That should be no problem because the Nvidia Plugin don't capture the cards only if you turn on persistence mode.

 

You can theoretically use one single card for Docker Containers and for a VM but not at the same time and if you are not carefully then this could lead to hard system lockups like you described in your post, I never recommend using one single card for both Docker and VM's.

 

6 hours ago, Mr_Jay84 said:

Or is there a way to keep the spare cards on VFIO and run a script to put them in P8 mode?

I think you are asking because your cards don't go to idle if you are not using it for something or am I wrong?

Some cards can go to idle even if they are not used for anything but that depends on the model of the card an how the manufacturer implemented it in the cards BIOS.

 

5 hours ago, Mr_Jay84 said:

Will that work if the cards are using VFIO though?

If i haven't got them enabled with VFIO the drivers will set the cards to the correct power state however this means I can't use them with the VMs as i get this error...

I think what you can try to do is that you bind the card for the VM to VFIO, assign it to the VM, enable autostart of the VM, write a script for the User Script's Plugin that runs on every reboot that wait's for about 2 or 3 minutes (so that the VM is fully started) and then turns it off again after the wait time with the command: 'virsh shutdown YOURVMNAME' (something like this should work just fine) if that is what you want to achieve, is the card your in P8 (this is also dependent on the card how the manufacturer implemented this in the BIOS of the card).

 

Hope this helps and is what you wanted to know...

 

  • Thanks 1
Link to comment

Ah right I see, so if persistence mode is on the card will be unavailable to use with a VM?

 

If persistence mode is off will the cards throttle back to P8 power state once the docker/VM is finished with it?

 

 

When the cards are in VFIO mode I'm not sure which power state their in, I don't think there's anyway to check via the CLI. The only way I can think of is to measure it is the power draw from the PSU. I have a Corsair PSU that I can read the power draw. They don't appear to be using anymore power even if the VMs are turned off. The system itself is in the attic so I'd need to go up and listen to the fan states to double check.

Link to comment
28 minutes ago, Mr_Jay84 said:

Ah right I see, so if persistence mode is on the card will be unavailable to use with a VM?

I think so because the driver "captures" the card from what I know and isn't available for the VM, I also think it is eventually possible to set persistence mode for only one or two cards but I'm maybe wrong about that (I personally use persistence mode, I will look this up).

 

EDIT: @Mr_Jay84 I think it should be able to set it only for one or more cards: Click

nvidia-smi -i <target gpu> -pm ENABLED
    Enabled persistence mode for GPU <target gpu>.
    All done.

The <target gpu> should be the HW ID I think (something like 0000:01:00.0).

 

28 minutes ago, Mr_Jay84 said:

If persistence mode is off will the cards throttle back to P8 power state once the docker/VM is finished with it?

As said above this is dependent on the card itself and how the manufacturer implemented this in the BIOS of the card.

 

28 minutes ago, Mr_Jay84 said:

When the cards are in VFIO mode I'm not sure which power state their in, I don't think there's anyway to check via the CLI. The only way I can think of is to measure it is the power draw from the PSU. I have a Corsair PSU that I can read the power draw. They don't appear to be using anymore power even if the VMs are turned off. The system itself is in the attic so I'd need to go up and listen to the fan states to double check.

Exactly that's the only way to tell if they are in P8 since you can't easily get the power state if they are bound to VFIO.

This is also dependent on the card in which power state they are if you shutdown the VM.

Link to comment

Hi Guys,

 

If i have vfio-pci.ids=10de:1c03,10de:10f1 in my syslinux config will this plugin operate? Am i able to have dockers using the gfx as well as VM utilising it i presume no but wanted to check if this is why nvidia-smi wasn't loading?

Link to comment
6 minutes ago, Draco said:

vfio-pci.ids=10de:1c03,10de:10f1

What are these device ID's?

EDIT: If this is the Card and the Audio Controller then no, see the answer below why, also there is a new way of adding devices to VFIO.

 

6 minutes ago, Draco said:

Am i able to have dockers using the gfx as well as VM utilising it i presume no but wanted to check if this is why nvidia-smi wasn't loading?

Yes this is possible but not recommended (and I never recommend doing it like this) since you can hard lock up the server.

If you want to use it for Containers and VM's don't bind it to VFIO since Unraid can't see the GPU if it's bound to VFIO (also there is a new way how to bind devices to VFIO - Tools -> System Devices).

Link to comment
36 minutes ago, ich777 said:

What are these device ID's?

EDIT: If this is the Card and the Audio Controller then no, see the answer below why, also there is a new way of adding devices to VFIO.

 

Yes this is possible but not recommended (and I never recommend doing it like this) since you can hard lock up the server.

If you want to use it for Containers and VM's don't bind it to VFIO since Unraid can't see the GPU if it's bound to VFIO (also there is a new way how to bind devices to VFIO - Tools -> System Devices).

Thanks for the speedy reply i thought this is why it wasn't showing up will give it a whirl (remove this config) and report back

  • Like 1
Link to comment

Is Quadro 600 okay?

VGA compatible controller: NVIDIA Corporation GF108GL [Quadro 600] (rev a1)
i only be doing 1 or 2 streaming the most.
if so anyways to load it driver. it required NVIDIA 390.xx Legacy drivers

Edited by winterkid310
Link to comment
15 hours ago, ich777 said:
21 hours ago, ConnerVT said:

Though not your fault, I do get spammed by the nVidia driver bug that seems to have been around for quite some time:

Yes I know but there is a workaround for this but I have to search it, this is because the watchdog in the Kernel is now enabled with 6.9.1 and it now displays this "Warning" nothing to worry about, you can try to uninstall the GPU Statistics Plugin then the "Warnings" should be minimized to a few...

 

EDIT: Here is the post from @b3rs3rk how you can suppress this message (never tried this but it should work I think):

 

Thanks for the reply, and for directing me to the GPU Statistics plug-in post.  I had seen that (with dozens of others) while researching this.  I know it's harmless, but just annoying to see hundreds of lines of spam when looking through the logs.

 

I did add nvidia-smi --persistence-mode=1 to my array start up script.  So far it has suppressed any more of those log entries.  Plex has been working fine, ramps up to P0 on transcode and back to P8 when ended (on Quadro P400).  The very first launch the hw transcode didn't start, and Plex client (my phone) just sat and waited for it to start.  After that, has worked 100%.

 

If things start acting up, I'll get more aggressive with my changes, and try some things from the post you quoted.  Baby steps has always been my server strategy.

 

15 hours ago, ich777 said:
21 hours ago, ConnerVT said:

My biggest complaint is that I needed to buy a Quadro P400 to put in the server, to get the fullest out of hw transcoding.  ich777 - you owe me $120.  *grin*

Where can I send you the money...

 

Don't worry.  I'd probably just put it towards another drive.  :)

Link to comment
4 hours ago, winterkid310 said:

Is Quadro 600 okay?

VGA compatible controller: NVIDIA Corporation GF108GL [Quadro 600] (rev a1)
i only be doing 1 or 2 streaming the most.
if so anyways to load it driver. it required NVIDIA 390.xx Legacy drivers

No, that's a too old card and won't work within container anyways.

Link to comment

 

10 hours ago, ich777 said:

I think so because the driver "captures" the card from what I know and isn't available for the VM, I also think it is eventually possible to set persistence mode for only one or two cards but I'm maybe wrong about that (I personally use persistence mode, I will look this up).

 

EDIT: @Mr_Jay84 I think it should be able to set it only for one or more cards: Click


nvidia-smi -i <target gpu> -pm ENABLED
    Enabled persistence mode for GPU <target gpu>.
    All done.

The <target gpu> should be the HW ID I think (something like 0000:01:00.0).

 Ich777 right I think the best route for me is to leave the solitary transcoding card with the drivers in persistence mode that way it will throttle back to P8 when not in use by Emby. Then leave the spare cards in VFIO with the VMs, at least this way I know they power throttle properly.

  • Like 1
Link to comment

Do you guys happen to know if I can have two Nvidia Dockers, one running Shinobi Face detection and another NVENC transcoding on the same P2000 GPU at the same time?

 

Edit:

I found this online:

Separate from the CUDA cores, NVENC/NVDEC run encoding or decoding workloads without slowing the execution of graphics or CUDA workloads running at the same time.

 

But when I start one process(docker) it kicks the other out

Edited by Jagadguru
Link to comment
4 hours ago, Jagadguru said:

Do you guys happen to know if I can have two Nvidia Dockers, one running Shinobi Face detection and another NVENC transcoding on the same P2000 GPU at the same time?

Theoretically it should work just fine since I do the same in my Nvidia-Debian-Buster container but that's a little different since I'm doing 3D rendering and HW encoding in one container with Steam Link.

 

I tested this also with Plex and Jellyfin where I started one transcode in the Plex and one transcode in the Jellyfin container at the same time and they both work just flawlessly.

 

4 hours ago, Jagadguru said:

I found this online:

Separate from the CUDA cores, NVENC/NVDEC run encoding or decoding workloads without slowing the execution of graphics or CUDA workloads running at the same time.

Good information can you give me the source to this?

Link to comment
12 hours ago, ich777 said:

Theoretically it should work just fine since I do the same in my Nvidia-Debian-Buster container but that's a little different since I'm doing 3D rendering and HW encoding in one container with Steam Link.

 

I tested this also with Plex and Jellyfin where I started one transcode in the Plex and one transcode in the Jellyfin container at the same time and they both work just flawlessly.

 

Good information can you give me the source to this?

I am using your Debian-Nvidia-buster to run OBS, stream and encode 24/7. It works great. Much lighter than a virtual machine. I have tried both. The quote comes from https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/

  • Like 1
Link to comment

New Nvidia Drivers v460.67 where already automatically built!

 

To install the newer drivers reboot your server.

 

Make sure that you have a active Internet connection at boot and the version on the Plugin page is set to 'latest' so the Plugin can detect that there is a newer version available and grab the drivers on boot (keep in mind the boot process will eventually take a little longer depending on your internet connection because the Plugin has to download the new driver).

 

 

If you've got no active Internet connection on boot (because you run PiHole or your Firewall on unRAID) go to the Plugin page and make sure that the driver is set to 'latest', click the 'Download' button (this will open up a window and download the driver instantly - DON'T CLOSE THIS WINDOW UNTIL THE 'DONE' BUTTON IS DISPLAYED), after the download finished reboot your server to install the new driver.

 

grafik.thumb.png.9bf0c289348986c961839855ee753bc9.png

Link to comment

Morning @ich777, happy friday :-). Another minor ui change i would like you to consider for this plugin:-

 

Issue:- 

Red notification shown even though no reboot is required.

 

Steps to replicate issue:-

1. switch the 'Select Preferred Driver' radio button from 'latest' to the 'Nvidia Driver Version' you are running (as shown on the left), click apply, note that you get the red notification popup suggesting a reboot is required, even though there are no changes to be made.

 

Expected behaviour:-

no red notification message is shown, as 'nvidia driver version' matches 'Selected preferred driver'.

 

Suggested fix:-

Compare loaded version with selected version, if they are equal then do not show the notification on apply.

  • Like 1
Link to comment
2 minutes ago, binhex said:

Steps to replicate issue:-

1. switch the 'Select Preferred Driver' radio button from 'latest' to the 'Nvidia Driver Version' you are running (as shown on the left), click apply, note that you get the red notification popup suggesting a reboot is required, even though there are no changes to be made.

I will look into this next week. :)

  • Thanks 1
Link to comment
2 hours ago, binhex said:

no red notification message is shown, as 'nvidia driver version' matches 'Selected preferred driver'.

Please update the Plugin to version 2021.03.19 and this should be fixed (found some spare time laying around... :D ).

 

No extra message was added because it should be obvious that the selected driver matches the current installed (and also not to upset users with messages), the restart message is only displayed if the driver doesn't match the current installed driver, hope that's also fine for you.

  • Thanks 1
Link to comment
4 minutes ago, ich777 said:

Please update the Plugin to version 2021.03.19 and this should be fixed (found some spare time laying around... :D ).

 

No extra message was added because it should be obvious that the selected driver matches the current installed (and also not to upset users with messages), the restart message is only displayed if the driver doesn't match the current installed driver, hope that's also fine for you.

nice!, works for me.

  • Like 1
Link to comment

I been using Unraid 6.8.3 with the old Unraid-Nvidia plugin until a few hours ago and I upgraded this morning to Unraid 6.9.1 and this plugin (thanks for doing it!).

Im having a weird problem, hopefully you can see what Im doing wrong. Plex is detecting my GPU and seems to do some transcoding with it but the CPU is impacted heavily while the GPU is under a low work load. This wasn't the case before, the CPU usage rarely went up of 10% while transcoding. This are a few screenshots that I took:

1348729418_ScreenShot2021-03-19at1_54_07PM.png.4754f4d609853298cc28123c9af8c89d.png

2125001149_ScreenShot2021-03-19at1_53_32PM.png.0b49fec7bc35100fad9930547c698e5e.png

 

2099028729_ScreenShot2021-03-19at1_53_41PM.png.2aa7f61ddde07b7630e69edafad61261.png

 

1038319801_ScreenShot2021-03-19at2_07_34PM.png.561330ab19a2466da9fceebc73aecf53.png

 

1826653709_ScreenShot2021-03-19at1_54_18PM.png.8033beac91d03f948fe8a7a103ebd27a.png


I have tried using the official and LinuxServer.io Plex docker images and both shows the same problem so Im coming here just in case you get some ideas about what could be happening

Link to comment
8 minutes ago, s0b said:

I have tried using the official and LinuxServer.io Plex docker images and both shows the same problem so Im coming here just in case you get some ideas about what could be happening

Can you try a file without subtitles or even turn them off? From what I know subtitles and Plex can be a little bit weird...

 

Also from what I see the transcoding is working perfectly fine with your card, try a few other files please and report back.

  • Like 1
Link to comment
18 minutes ago, ich777 said:

Also from what I see the transcoding is working perfectly fine with your card, try a few other files please and report back.


I been playing a bit with files and configs and seems like disabling the option "Enable HDR tone mapping" in the Plex transcode options reduces a lot the CPU load. Will keep chasing this issue in the Docker side as seems like the drivers are working as expected.

Thanks again for this plugin and your support!

  • Like 1
Link to comment

Hi all and @ich777,

I upgraded a few days ago to 6.9.1 and also to the latest version of this great plugin without any issue. On first reboot it upgraded the nvidia drivers to 460.56, and the download at boot was reasonably fast afair.

I'm currently rebooting the NAS for another reason (rsyslogd fix with this version), and as I've set the plugin to "latest" driver, it's currently downloading the 460.67 drivers (118 MB), which is the expected behavior. My issue is the download on boot is done at a ridiculously slow speed, around 100KB/s average, that is around 17 minutes for the full download ! As a consequence, the boot and outage time is significantly longer than expected.

Just to understand, where is the package downloaded from ? I think @limetech, who are now supporting this plugin as part of the official Unraid setup, should provide a decent download repo with download speeds similar to what we get when updating the OS (AWS iirc).

Thanks in advance for your thoughts and support.

Link to comment
1 hour ago, Gnomuz said:

As a consequence, the boot and outage time is significantly longer than expected.

Then your connection to Github is really slow at the time of writing (I also had problems with Deutsche Telekom sometimes with downloads from Github and everything hosted on AWS).

 

EDIT: Btw Github is also hosted on AWS.

 

You can always check the plugin page and download it from there with the Download button.

 

1 hour ago, Gnomuz said:

Thanks in advance for your thoughts and support.

You always have the option to choose a driver and set it not to latest to prevent updates on boot.

 

EDIT2: I get around 9MB/s downloading the file.

Link to comment
3 hours ago, ich777 said:

Then your connection to Github is really slow at the time of writing (I also had problems with Deutsche Telekom sometimes with downloads from Github and everything hosted on AWS).

 

EDIT: Btw Github is also hosted on AWS.

 

You can always check the plugin page and download it from there with the Download button.

 

You always have the option to choose a driver and set it not to latest to prevent updates on boot.

 

EDIT2: I get around 9MB/s downloading the file.

Thanks for the reply, as usual 😉

Strange that I got such a slow DL speed from France (ISP Free). As I've said, I didn't have this issue when downloading v460.56 a few days ago.

I understand your advice to avoid this possible problem during the reboot would be, when a new Nvidia linux driver is published, to :

download the latest driver package from the plugin page

- set the driver version to this downloaded version rather than "latest" in the plugin page

- reboot (without downloading, as the latest driver is already there)

If that's correct, I'm fine with such a method, it gives more control on the driver version. Anyway Nvidia doesn't publish drivers every other day, and I'm not always willing to install them on day one !

Thanks again.

  • Like 1
Link to comment
9 hours ago, Gnomuz said:

when a new Nvidia linux driver is published

Exactly set the driver to your current version instead to latest and if a new driver is released go to the plugin page and set the driver to the newer driver version and click the download button, this will open up nearly the same window as on the plugin installation amd download the driver (you can also click the Download button to verify the MD5 sum but that's also done after the download finished) after that you can reboot and the newer driver will be installed.

 

Have some times problems with a few ISP's here in Europe with downloads from AWS and Github, since my Docker containers also download the files from Github that has something to do with the traffic volume that the ISP bought from oversea I think (at least I read about that somewhere, I'm not too familar how this works exactly).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.