[Plugin] Nvidia-Driver


ich777

Recommended Posts

19 minutes ago, raf802 said:

When I was getting the issue, their power state was P8.

You don‘t use the GPU in VMs or am I wrong?

 

If you are only using the GPU for Docker containers simply put this line in your go file and everything should work as it would with the script:

nvidia-persistenced

and of course reboot after that.

Link to comment
19 minutes ago, ich777 said:

You don‘t use the GPU in VMs or am I wrong?

I have a second GPU set to be used in the VM. Nothing is bound to vfio on boot and these VMs are off. 

In short, no it is not used by VMs.

 

22 minutes ago, ich777 said:

If you are only using the GPU for Docker containers simply put this line in your go file and everything should work as it would with the script:

nvidia-persistenced

and of course reboot after that.

Ok, I'll do that, thank you. 

  • Like 1
Link to comment
2 minutes ago, raf802 said:

I have a second GPU set to be used in the VM. Nothing is bound to vfio on boot and these VMs are off. 

Keep in mind that if you have nvidia-persistenced on when you try to start the VM it is possible that the VM won't start and/or even crash the your entire server.

I would recommend that you bind the one card to VFIO that you maybe plan to use in a VM because then nvidia-persistenced will only work for the card that is not bound to VFIO.

 

This is then basically the same as what the script from spaceinvader one does since it will also pull the cards into P8 and they can of course ramp up to whatever power mode they need, but keep that with the VM in mind.

 

As a workaround if you don't want to bind the GPU to VFIO you can also do something like this in your go file:

nvidia-persistenced
sleep 5
kill $(pidof nvidia-persistenced)

 

This will basically start nvidia-persistenced, wait 5 seconds and kill it after the 5 seconds so that the cards go to P8 when you boot the server.

Of course there are more advanced ways to also pull the card into P8 again after you've ended a VM.

Link to comment
47 minutes ago, ich777 said:

Keep in mind that if you have nvidia-persistenced on when you try to start the VM it is possible that the VM won't start and/or even crash the your entire server.

I would recommend that you bind the one card to VFIO that you maybe plan to use in a VM because then nvidia-persistenced will only work for the card that is not bound to VFIO.

 

This is then basically the same as what the script from spaceinvader one does since it will also pull the cards into P8 and they can of course ramp up to whatever power mode they need, but keep that with the VM in mind.

 

As a workaround if you don't want to bind the GPU to VFIO you can also do something like this in your go file:

nvidia-persistenced
sleep 5
kill $(pidof nvidia-persistenced)

 

This will basically start nvidia-persistenced, wait 5 seconds and kill it after the 5 seconds so that the cards go to P8 when you boot the server.

Of course there are more advanced ways to also pull the card into P8 again after you've ended a VM.

Thanks. 

I have bound the second card to vfio. Can't remember why it wasn't to begin with tbh.

 

Is there a way of putting the VM GPU in a low power P8 state when not in use (VM on) and when the VM is off? 

 

1 hour ago, raf802 said:

I will upgrade the OS and drivers to see if the GPUs keep working.

I have also updated the OS to 6.10.3, which upgraded the GPU drivers to v515 automatically. They are working fine still and my issue is resolved. 

 

Thank you ich777!

  • Like 1
Link to comment
10 minutes ago, raf802 said:

I have bound the second card to vfio. Can't remember why it wasn't to begin with tbh.

I think because you are tried to save power when the VM wasn't running maybe?

 

11 minutes ago, raf802 said:

I have also updated the OS to 6.10.3, which upgraded the GPU drivers to v515 automatically. They are working fine still and my issue is resolved. 

Exactly, because there is a newer driver for the legacy cards available and it can't find your specified driver version, it will fall back to the latest available.

 

Glad to hear that everything is up and running again.

Link to comment
23 hours ago, ich777 said:

I think because you are tried to save power when the VM wasn't running maybe?

 

Exactly, because there is a newer driver for the legacy cards available and it can't find your specified driver version, it will fall back to the latest available.

 

Glad to hear that everything is up and running again.

 

I think I spoke too soon....

I noticed today that the issue has come back. Again, plex was working fine and used the GPU for transcoding. I only noticed the issue because another docker container that was to use the GPU failed to compile / run.

 

In the attached setup, I have "nvidia-persistenced" in the go file and no custom scripts running. 

 

Here is the diagnostics file. 

 

dailynas-diagnostics-20220616-1206.zip

Edited by raf802
extra info.
Link to comment
1 hour ago, raf802 said:

 

I think I spoke too soon....

I noticed today that the issue has come back. Again, plex was working fine and used the GPU for transcoding. I only noticed the issue because another docker container that was to use the GPU failed to compile / run.

 

In the attached setup, I have "nvidia-persistenced" in the go file and no custom scripts running. 

 

As a "start from scratch" approach. I removed the

nvidia-persistenced

command from the go file.

The docker containers all start up correctly with the GPU. Of course, they are in powestate P0, and I am trying to save power. 

 

Do you mind elaborating on the following code?

nvidia-persistenced
sleep 5
kill $(pidof nvidia-persistenced)

Would this also put the container GPU into the P8 powerstate? Can it also go into P0 when needed and back to P8?

 

Some observations I have noticed with power save off (no commands run):

  • GPU is in P0 state. Both nvidia-smi & GPU statistics plugin state the same.
  • nvidia-smi states power usage of 1W, GPU statistics reports 17W.

gpu.thumb.JPG.9573df82208de21127e385d8279bf336.JPG

 

Is this discrepancy normal? Could nvidia-smi be reporting the incorrect power usage for P0, and hence P8 is <1W and causes an error?

For reference, when in P8, the GPU uses 7W according to GPU statistics.

 

When I run the nvidia-persistenced command from the cli, the nvidia-smi power usage reports correctly for P0 and then reports the correct P8 state too.

gpu2.thumb.JPG.b3c6c16099fcbc318b9f18868f65cd9c.JPG

 

dailynas-diagnostics-20220616-1206.zip

Edited by raf802
Link to comment

I see now the available legacy driver version was updated from v470.94 to v470.129.06... is there a way we can lock in that legacy driver branch?

The double update reboot is a pain, although I'm still thankful the functionality is there at all! :)

 

note: now up and running again with driver support after a reboot

Edited by tjb_altf4
Link to comment
17 hours ago, raf802 said:

Do you mind elaborating on the following code?

This will ultimately start nvidia-persistenced, wait and then end nvidia-persistenced

 

17 hours ago, raf802 said:

Would this also put the container GPU into the P8 powerstate? Can it also go into P0 when needed and back to P8?

You have to try this, the card should be go back to P8 again after a transcode is finished.

 

17 hours ago, raf802 said:

Some observations I have noticed with power save off (no commands run):

I would first of all recommend that you use the command:

watch nvidia-smi

this will keep nvidia-smi on top and update it every 2 seconds, you can stop it by pressing CTRL + C

 

17 hours ago, raf802 said:

Is this discrepancy normal? Could nvidia-smi be reporting the incorrect power usage for P0, and hence P8 is <1W and causes an error?

For reference, when in P8, the GPU uses 7W according to GPU statistics.

I really can't tell what's going on on your system but this is completely out of my control since this is caused by nvidia-smi and how it outputs everything.

Link to comment
7 hours ago, saz369 said:

after upgrade to 6.10.3 from 6.10.2, I lost NVidia P400 video card. please advise. I already try restarted and reinstalled NVidia drive plug in.

Does it work now? Since you've posted in another thread that it is working again.

 

Have you waited until the Everything done message appeared after you've upgraded?

Link to comment
4 hours ago, tjb_altf4 said:

I see now the available legacy driver version was updated from v470.94 to v470.129.06...

Well, had to bump the version to the latest available driver because for the SOON™ Unraid release v470.94 won't compile because it only supports up to Kernel version 5.17

 

4 hours ago, tjb_altf4 said:

The double update reboot is a pain, although I'm still thankful the functionality is there at all! :)

I know that this causes a lot of pain sometimes and I've already thought about creating a option for the now "Legacy" driver version but I really don't want to because if Nvidia decides to drop support now for the Legacy driver for whatever reason this option is deprecated and the plugin needs to be changed again...

  • Thanks 2
Link to comment
13 hours ago, ich777 said:

but I really don't want to....

 

Aw, c'mon.  It isn't like you have anything else to do.  /end sarcasm

 

Between how Nvidia changes their releases and the rapid releases Unraid 6.10.x has been through, I wish to thank you for staying on top of all of this as well as you have!

  • Like 1
Link to comment

Hey hey! Been trying for hours (lol) to get my PLEX HW transcoding to run, can't find my error. 

 

UnRaid 6.10.3

3600X
1070 Ti
Nvidia driver + GPU statistics installed
IOMMU group VGA hook is unchecked
Got Plex Pass
Tried different movies/codecs
--runtime=nvidia is in
GPUID has no spaces > i set it to _all_ now
Used HW encoding in Plex
linuxserver/plex:latest

no VMs installed

 

Please help and thanks to this awesome community!  :)

drivers.png

gpu.png

nohw.png

nvidia-smi.png

vars.png

Link to comment
10 hours ago, sk3tn said:

Used HW encoding in Plex

On the first two pages some users reported that deleting and reinstalling the container solved the issue.

Have you also added your Plex token to the container?

 

From my perspective it can only be a Plex issue and this question would be better suited in the Support Thread from the Docker container itself.

 

You can also try Jellyfin if you want to since it's free and you can try if transcoding is working there...

  • Like 1
Link to comment
6 hours ago, ich777 said:

On the first two pages some users reported that deleting and reinstalling the container solved the issue.

Have you also added your Plex token to the container?

 

I reinstalled it yesterday many times before, but today is another day and it works now fine! unfortunately I do not know now what it failed, too bad! thank you!

  • Like 1
Link to comment

Sorry if this has already been addresses somewhere in here, if so I couldnt find it.

 

posted in general support, but it was suggested to post here as the issue likely comes from the nvidia plugin best I can tell.

 

I installed a P600 and the nvidia plugin. changed my BIOS to use the igpu instead of pci gpu. Everything is fine through the whole boot process, except at the very end when you would normally get a login screen, its blank. Server is up and running fine and can be accessed via the webgui. but the console monitor shows no login page. The BIOS settings are correct, other wise I would get NO video at all from the onboard graphics, but like I said it's fine til the login page. I booted in GUI safemode, and the login screen reappeared, which makes me more confident the issue stems from the nvidia plugin. I have no idea what to try next.

Link to comment

To confirm the plugin was the problem, I removed it and rebooted, and the login screen was back. I then reisntalled the plugin, turned docker off/on, rebooted, and no more login page :(

 

So it's definitely the plugin causing something. I also found this thread where someone else appears to have the exact same problem, but it looks like he didnt find a solution

Link to comment
On 6/18/2022 at 3:58 PM, sk3tn said:

Hey hey! Been trying for hours (lol) to get my PLEX HW transcoding to run, can't find my error. 

 

UnRaid 6.10.3

3600X
1070 Ti
Nvidia driver + GPU statistics installed
IOMMU group VGA hook is unchecked
Got Plex Pass
Tried different movies/codecs
--runtime=nvidia is in
GPUID has no spaces > i set it to _all_ now
Used HW encoding in Plex
linuxserver/plex:latest

no VMs installed

 

Please help and thanks to this awesome community!  :)

drivers.png

gpu.png

nohw.png

nvidia-smi.png

vars.png

I am having the same issue as you, its driving me crazy!! 

Link to comment

Hello all once again... I have just built a new system to try hardware transcoding once again... before I was not able to because of IOMMU groupings... 

 

So this build is a GeForce GT 710 2gb running on a HP ProLiant DL360 G7 server... I am able to get video out via the graphics card and everything works well... Just not nvidia graphics...

 

When running nvidia-smi I am getting "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." even when trying different drivers... also have rebooted, downloaded drivers now, and restarted docker... 

 

I have good IOMMU groupings to include "

IOMMU group 34:[10de:128b] 09:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)

[10de:0e0f] 09:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)"

 

Suggestions?

 

Also ich777 I want to tip my hat to you for your continued support for everyone here... I would be happy to buy you a beer should we ever cross paths :)

 

Link to comment
2 hours ago, ZigbeeZombie said:

I am having the same issue as you, its driving me crazy!! 

This is resolved now:

 

If the card shows up fine on the plugin page the it is more likely that it is a Docker container issue and I would recommedn that you post on the support thread from the container that you are using.

 

You can also drop your Diagnostics here and I will go through it if I see anything suspicious.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.