Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

things have become very odd since installing that driver. Now my AMD GPU cant be seen by pheonix miner and t-rex is saying it cant find CUDA but the nvidia plugin is seeing the nvidia GPU without issues. I wonder if that specific driver version is bad? I wish there was a way for me to test this on my own

 

edit - I also want to mention that when I turned off docker, as instructed by the plugin, and turned it back on, docker refused to start and I had to restart the entire rig to get it back

 

edit - Also nvidia-smi works

 

edit - managed to fix the AMD gpu issue but im still not sure whats going on with the nvidia driver

Edited by ButterMeWaffle
Link to comment
1 hour ago, ButterMeWaffle said:

things have become very odd since installing that driver.

From what I read so far, everything is uncommon and not related to this.

 

1 hour ago, ButterMeWaffle said:

Now my AMD GPU cant be seen by pheonix miner

Diagnostics? Very uncommon that this driver caused this.

 

1 hour ago, ButterMeWaffle said:

t-rex is saying it cant find CUDA

Seems this is an issue with the container and has nothing to do with the driver on the host (this plugin).

 

1 hour ago, ButterMeWaffle said:

I wonder if that specific driver version is bad?

Sure thing, install Jellyfin add some files to it and let it transcode, you will see that this driver is working, I have tested it, because I test all custom drivers and this one is working just fine... ;)

 

1 hour ago, ButterMeWaffle said:

I also want to mention that when I turned off docker, as instructed by the plugin, and turned it back on, docker refused to start and I had to restart the entire rig to get it back

You have to reboot not just simply turn on and off Docker, turning on and off Docker is only for the first installation.

Link to comment
8 minutes ago, ich777 said:

From what I read so far, everything is uncommon and not related to this.

 

Diagnostics? Very uncommon that this driver caused this.

 

Seems this is an issue with the container and has nothing to do with the driver on the host (this plugin).

 

Sure thing, install Jellyfin add some files to it and let it transcode, you will see that this driver is working, I have tested it, because I test all custom drivers and this one is working just fine... ;)

 

You have to reboot not just simply turn on and off Docker, turning on and off Docker is only for the first installation.

gotcha, I will keep looking into the docker container and see if I can figure it out

Link to comment
5 minutes ago, ButterMeWaffle said:

gotcha, I will keep looking into the docker container and see if I can figure it out

solved it! in the github it mentions that is you have cuda 10 you need to specify it, but it only mentions it for to. Figured I would try using ptrfrll/nv-docker-trex:cuda11 and it worked! 

  • Like 1
Link to comment

I just upgrade to 6.10.3 and I lost my GPU plugin support in process. I have tried to reinstall/reboot multiple times without any luck. GPU won't show up in plugin. GPU is available option to VMs if I try to make one. GPU is not passed trought to any VM.

Here is message when tried to install Nvidia-Plugin:

,

Quote

+==============================================================================
| Installing new package /boot/config/plugins/nvidia-driver/nvidia-driver-2022.05.06.txz
+==============================================================================

Verifying package nvidia-driver-2022.05.06.txz.
Installing package nvidia-driver-2022.05.06.txz:
PACKAGE DESCRIPTION:
Package nvidia-driver-2022.05.06.txz installed.

-------Main download URL not reachable, using Fallback URL-------

-----ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR------
---Can't get latest Nvidia driver version and found no installed local driver---
plugin: run failed: /bin/bash retval: 1

 

Plugin will also disappear from settings during reboot.

 

Any help?

Edited by VoNpo
Link to comment
52 minutes ago, VoNpo said:

I just upgrade to 6.10.3 and I lost my GPU plugin support in process. I have tried to reinstall/reboot multiple times without any luck. GPU won't show up in plugin. GPU is available option to VMs if I try to make one. GPU is not passed trought to any VM.

Please post your Diagnostics.

 

Have you any AdBlocking software on your network running?

Can you reach GitHub from your local computer?

 

Seems like that your Server can't communicate with GitHub (where the driver package is located).

 

54 minutes ago, VoNpo said:

Plugin will also disappear from settings during reboot.

Please remove the plugin entirely and run the command from an Unraid Terminal:

rm -rf /boot/config/plugins/nvidia-driver

and reboot afterwards.

 

After that try to pull the Nvidia-Driver plugin again from the CA App and see if it is the same (don't close the window with the red 'X' and wait for the Done button to appear).

If it's the same it looks like your server can't communicate with GitHub.

Link to comment
46 minutes ago, elcapitano said:

Was working, but from one day to the other it just disappeared from unraid.

How should the Quadro be recognized by the plugin if it‘s bount to VFIO?

BIND=0000:0f:00.0|10de:1c31 0000:0f:00.1|10de:10f1

This is something that can't work and you can also see from your screenshot that the driver in use is "vfio-pci" and not "nvidia".

 

0f:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106GL [Quadro P2200] [10de:1c31] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:131b]
	Kernel driver in use: vfio-pci
0f:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
	Subsystem: NVIDIA Corporation GP106 High Definition Audio Controller [10de:131b]
	Kernel driver in use: vfio-pci

 

 

Also why are you on 6.9.2? Unraid 6.10.3 is the most recent version.

 

What also is really suspicious to me is that it seems that you've tried to install the plugin after it's already installed:

Jul  8 14:14:33 MASTER root: plugin: skipping: /boot/config/plugins/nvidia-driver/nvidia-driver-2022.05.06.txz already exists
Jul  8 14:14:33 MASTER root: plugin: running: /boot/config/plugins/nvidia-driver/nvidia-driver-2022.05.06.txz

 

 

Please also remove this line from your go file, this should be fixed:

# Fix Docker - Case Insensitive
sed -i 's#@Docker-Content-Digest:\\s*\(.*\)@#\@Docker-Content-Digest:\\s*\(.*\)@i#g' /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php 

 

 

Please try also the same, run this command from a Unraid Terminal:

rm -rf /boot/config/plugins/nvidia-driver

and reboot your server afterwards.

After the reboot pull the Nvidia-Driver plugin again from the CA App.

Link to comment

@VoNpo & @elcapitano just to let you know and because I was also curious, I've now tried it on my test server and it works without a hitch, I've uninstalled the Nvidia-Driver plugin, rebooted my server and pulled a fresh copy from the CA App:

grafik.png.8df48607ae5a0a67dc45d5f6782f8858.png 

(the screenshot shows the installation stage because this takes a little while on my test server, it's not really the fastest machine)

 

This is after it was finished installing:

grafik.png.3ecd6eb700c7e3f6c8e0534c52c86943.png

 

Here you can see my system devices:

grafik.thumb.png.7fb8c4a6c08e40e881b73de52656961f.png

 

And here is also a screenshot from the plugin itself:

grafik.thumb.png.2a68d4edeb51a5af9d3824fccf053d89.png

Link to comment
2 hours ago, ich777 said:

@VoNpo & @elcapitano just to let you know and because I was also curious, I've now tried it on my test server and it works without a hitch, I've uninstalled the Nvidia-Driver plugin, rebooted my server and pulled a fresh copy from the CA App:

grafik.png.8df48607ae5a0a67dc45d5f6782f8858.png 

(the screenshot shows the installation stage because this takes a little while on my test server, it's not really the fastest machine)

 

This is after it was finished installing:

grafik.png.3ecd6eb700c7e3f6c8e0534c52c86943.png

 

Here you can see my system devices:

grafik.thumb.png.7fb8c4a6c08e40e881b73de52656961f.png

 

And here is also a screenshot from the plugin itself:

grafik.thumb.png.2a68d4edeb51a5af9d3824fccf053d89.png

 

Thanks for this - big help.

I un-bound the device - reboot - removed the driver - reboot - installed new driver - reboot -> All good 🙂 

  • Like 1
Link to comment

Hello,

 

I am having an issue with this plugin. I rebooted my server and it was no longer showing up. I tried to install it but am receiving this:

 

-----ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR------
----Can't get Production Branch version and found no installed local driver-----
-----Please wait for an hour and try it again, if it then also fails please-----
------go to the Support Thread on the unRAID forums and make a post there!------
plugin: run failed: /bin/bash retval: 1

 

I have attached my logs/diagnostics. Hopefully someone can help :)

server-unraid-diagnostics-20220709-0837.zip

Link to comment
29 minutes ago, ultimz said:

Hello,

 

I am having an issue with this plugin. I rebooted my server and it was no longer showing up. I tried to install it but am receiving this:

 

-----ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR------
----Can't get Production Branch version and found no installed local driver-----
-----Please wait for an hour and try it again, if it then also fails please-----
------go to the Support Thread on the unRAID forums and make a post there!------
plugin: run failed: /bin/bash retval: 1

 

I have attached my logs/diagnostics. Hopefully someone can help :)

server-unraid-diagnostics-20220709-0837.zip 141.99 kB · 0 downloads

Update: I followed a previous posts advice and deleted the plugin using the terminal. Reinstalled, rebooted and running fine now.

Link to comment
2 hours ago, ultimz said:

Update: I followed a previous posts advice and deleted the plugin using the terminal. Reinstalled, rebooted and running fine now.

I think in some parts of the world GitHub has some issues or something isn't working quite right since it can't pull the latest driver version...

 

Really not sure what to do about that, sorry for the inconvenience...

Link to comment
2 hours ago, ich777 said:

I think in some parts of the world GitHub has some issues or something isn't working quite right since it can't pull the latest driver version...

 

Really not sure what to do about that, sorry for the inconvenience...

 

No worries at all - thanks for all the work you put into this plugin... really appreciate it along with the support you provide!

  • Like 2
Link to comment

Man, I really struggled getting the nvidia driver plugin reinstalled after upgrading to 6.10.3. The problem was that I'm an idiot. Also, maybe a slight problem with the plugin installer user interface?

 

I followed all the instructions here -- removed the plugin using 'rm' at the command line, rebooted about 40 times, and kept trying to reinstall the plugin from community apps. The problem arose because the plugin installer status window would get to the line that says

 

"Package nvidia-driver-2022.05.06.txz installed."

 

And then the window/text would pause and do nothing for 30-60 seconds. So at that point I thought the driver was installed and rebooted. I never waited for the plugin to actually finish installing. I did this about 10 times.

 

The problem was I never saw the *next* part that says:

 

"WARNING - WARNING - WARNING... Don't close this window ... until the Done button is displayed!"

 

But it took soooo loooong for that text to be displayed, I never saw it. I didn't figure out that the process wasn't finished until I saw other people's screenshots here on the forum.

 

I wonder what's happening between the "Package... installed" and "WARNING - WARNING" that caused such a long delay? If there's some call there that can be moved/removed so the WARNING WARNING comes up immediately, it sure would have saved me a couple hours of going around in circles and rebooting too soon.

 

Of course, once I actually waited until the whole thing was *actually* finished and the Done button finally appeared, it all worked great.

 

Edited by grigsby
Link to comment
3 hours ago, grigsby said:

I wonder what's happening between the "Package... installed" and "WARNING - WARNING" that caused such a long delay? If there's some call there that can be moved/removed so the WARNING WARNING comes up immediately

This text should be shown imediately.

 

The first steps are basically that the plugin itslef is downloaded and installed (the GUI part), wich is actually not even 1MB and then the message WARNING is displayed.

 

I really can‘t imagine what‘s going on there, maybe the pull from the version numbers are takig that long but I really can‘t imagine that that‘s the case since this file is only a file with a few KB.

Also everything is hosted on GitHub.

 

May I ask where you are located in the world?

I have most of the times issues with people living in China that the downloads are really slow and everything takes ages to complete.

Link to comment

Hello,

I have a 1060 GTX connected to unraid and a Windows 10 VM that I use from time to time to play some games.

 

The thing is in an unraid fresh boot, GPU is correctly detected and I can run a script to force P0 State (power consumption 6W and fan off) but after I have run a start/stop with the VM, the graphic card is no longer available so it keep with the fan ON and, I guess, consumption is way more elevated.

 

Is there any way to fix this in 6.10.3? Do I have to restart Unraid everytime I turnoff the VM to recover the low consumption or it is possible to somehow restart the nvidia driver or similar through a script? I only turn on that VM once in a month or less.. so it is important to me to save some electricity.

 

Thanks in advance!

Link to comment
3 hours ago, Yeyo53 said:

the graphic card is no longer available

Where?

 

3 hours ago, Yeyo53 said:

and I can run a script to force P0 State

I thin you mean P8, P0 is the highest power state.

Also, where is the script?

 

3 hours ago, Yeyo53 said:

Is there any way to fix this in 6.10.3?

I can‘t think of a reason why it shouldn‘t be the same on any version prior to 6.10.3

 

3 hours ago, Yeyo53 said:

Do I have to restart Unraid everytime I turnoff the VM to recover the low consumption or it is possible to somehow restart the nvidia driver or similar through a script?

As said above without anything I can‘t help, the Diagnostics would be also helpful.

 

BTW this is not an issue with this plugin and I think this would be better suited in the VM subforums since this is a VM related question.

Link to comment
3 hours ago, Yeyo53 said:

Is there any way to fix this in 6.10.3?

working fine here, with a GTX1060 and a RTX3080ti, both starting in P8 mode (nvidia persistent), VM Start i exit the persist mode (before), after the VM stop i enable the persist mode again ...

 

may start ahead, what script are you using ? in terms you only have 1 GPU, a simple go file entry would be enough

 

nvidia-persistenced &

 

image.png.3c437c856f40756eaa34219c154ff447.png

 

now you GPU would be sleeping when the system is booted

 

to start, stop this mode i use qemu hooks, these are scripts which are executed depending on VM state, starting, started, stopping, stopped, ... etc etc ... 

 

as idea here, i stripped it now to a basic start stop scenario with only nvidia persistent mode on a vm named AlsPC_Media, 3 scripts, the main hook script which is always executed while any VM activity is done, then the executed ones depending on state(s), may that helps ...

 

nano /etc/libvirt/hooks/qemu.d/hook_scripts.sh
...

#!/bin/bash

if [ $1 = "AlsPC_Media" -a $2 = "prepare" -a $3 = "begin" ]; then
        /mnt/cache/system/hook_scripts/alspc_media_start.sh &
        /usr/local/emhttp/webGui/scripts/notify -e "Unraid Server Notice" -s "VM Info, "$1", "$2", "$3"" -i "normal" &
elif [ $1 = "AlsPC_Media" -a $2 = "release" -a $3 = "end" ]; then
        /mnt/cache/system/hook_scripts/alspc_media_stop.sh &
        /usr/local/emhttp/webGui/scripts/notify -e "Unraid Server Notice" -s "VM Info, "$1", "$2", "$3"" -i "normal" &
fi
exit 0;
---

nano /mnt/cache/system/hook_scripts/alspc_media_start.sh
...

#!/bin/bash

kill $(pidof nvidia-persistenced) &
sleep 1

exit 0;
---

nano /mnt/cache/system/hook_scripts/alspc_media_stop.sh
...

#!/bin/bash

nvidia-persistenced &
sleep 1

exit 0;
---

 

  • Like 1
Link to comment

I'm setting up a new unraid instance with this plugin and can't seem to fix this issue: on server reboot, the GPU doesn't show up in the plugin (but it does show up in system devices).

Screen Shot 2022-07-15 at 6.01.31 PM.png

 

Screen Shot 2022-07-15 at 6.01.56 PM.png

 

Attached diagnostics.

 

On boot, I see the following in the logs on screen (but not in syslog -- so this may be a red-herring):

 

```

modprobe: FATAL: Module nvidia not found in directory /lib/modules/5.15.46-Unraid

```

 

I have uninstalled and re-installed the plugin three times now because it keeps breaking on reboot.

 

While I do use pihole for DNS, it's not blocking github:

 

1107052083_ScreenShot2022-07-15at6_13_56PM.thumb.png.c1441ab6a47d87c7032fdfec8af268cd.png

 

In fact, no DNS queries from `testtower` are blocked.

 

For my BIOS, I have CSM disabled, Secure Boot disabled, fTPM disabled, TPM 2.0 disabled. I've tried with all combinations of enabling these (except secure boot of course) with no luck.

 

Any help would be appreciated.

 

UPDATE:

 

I tried once more after deleting the plugin, restarting and ensuring the following: CSM disabled, Secure Boot disabled, fTPM disabled, TPM 2.0 disabled. It seems to have worked now. Do I really need to keep TPM and fTPM disabled for this plugin?

 

testtower-diagnostics-20220715-1756.zip

Edited by Howboys
Link to comment
5 hours ago, Howboys said:

For my BIOS, CSM disabled, Secure Boot disabled, fTPM disabled, TPM 2.0 disabled

5 hours ago, Howboys said:

I tried once more after deleting the plugin, CSM disabled, Secure Boot disabled, fTPM disabled, TPM 2.0 disabled

These are two times the same things that you‘ve wrote…

 

This is the first time that I hear from such an issue that is related to TPM in general.

I know that CSM disabled and booting with UEFI can cause issues with the Nvidia driver but never heard of TPM causing such issues.

 

EDIT: From what I saw from your syslog it seems like it never did load the module, but I think the Diagnostics from a boot with UEFI instead of a Legacy boot (CSM)?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...