[Plugin] Nvidia-Driver


ich777

Recommended Posts

6 hours ago, Eddie Seelke said:

I am missing the dashboard widget. How do I get it back?

Please check your Plugins page if you have any plugins in the Plugins Error tab

 

If so remove them, reboot, install them again from the CA App and reboot again if required.

Link to comment
6 hours ago, colev14 said:

It seems like the problem is with the plex web client.

TBH I never heard from someone that the web app crashes the whole server or makes it unusable… only that it stays black and the transcode never starts.

  • Like 1
Link to comment
4 hours ago, ich777 said:

TBH I never heard from someone that the web app crashes the whole server or makes it unusable… only that it stays black and the transcode never starts.

 

This is generally the complaints I've heard as well, though I have never had it happen to me.  But I have seen other issues with Plex on the web client, such as not able to restart a video after pausing for several minutes.  Things I've never seen with a "real" client.

 

It could also be an issue with the release of Plex he's running.  I had the GUI from my server crash the other day while trying to transcode an OTA TV stream.  Needed to SSH in and couldn't clean shut down (wouldn't unmount my cache drive).  Maybe unrelated to anything, but who knows?  First time in several years of running this server and configuration, always has been rock solid.

 

Anyway, the Plex web client is the least reliable test bed for making any assumptions when troubleshooting an issue.

Link to comment
41 minutes ago, fespinoza831 said:

Hello I keep getting this error every few days.

Sorry I don‘t see a Nvidia GPU in your system.

Are you sure that the card is plugged all the way in and/or not defective?

Did you change anything in terms of hardware?

Link to comment
1 hour ago, fespinoza831 said:

It will be fine then i get notified that there's a new driver update and the gpu will disappear. but no hardware changes or anything

then may take another look at it, its really not there for the host os ...

 

check if its correctly settled, also PSU is fine ... may try another pcie slot if possible ...

Link to comment
On 3/6/2024 at 1:38 AM, ich777 said:

Please post your Diagnostics before doing all of that and also the file: /boot/config/plugins/nvidia-driver/settings.cfg

 

I assume you are referring to Docker containers?

Yes, the containers won't be able to start when the driver is removed after the first reboot, after the second restart you should be up and running again just fine.

tower-diagnostics-20240311-0258.zip

 

May I ask why did you click the Download button in the first place? Was something not working or did you simply just try to change the driver?

Hi there

settings.cfg

Thanks for this, here are the config files and the nividia driving config also

 

Link to comment
34 minutes ago, alitech said:

Thanks for this, here are the config files and the nividia driving config also

Did you yet try what I recommended earlier?

  1. Uninstall the plugin
  2. Reboot
  3. Install it again from the CA App
  4. Reboot again

 

After that everything should work fine.

 

Thank you for the config, I now maybe know what the cause of the issue is but it will take me a few days to fix that, however if you do what I recommended above it will work again.

Link to comment

I want to use CUDA Toolkit and cuDNN in a Docker container but it appears that has to be installed on the host. Is there a way to add these as part of the driver installation?

 

Classifier process output: 2024-03-11 22:45:37.214569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Classifier process output: 2024-03-11 22:45:37.219343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

Classifier process output: 2024-03-11 22:45:37.219716: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.219987: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.220253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.220501: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.220731: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.220954: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.221190: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.221443: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory

Classifier process output: 2024-03-11 22:45:37.221448: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

 

Link to comment

I had the same issue with libraries missing trying to use CUDA for Speech To Text with Whisper.

The solution I used was to collect the missing library files from a machine with them installed (at least some of them come from the `nvidia-cudnn` package) and them map them into the container (section out of my docker-compose file);

      - /mnt/docker/appdata/CUDA/libcudnn_ops_infer.so.8.5.0:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /mnt/docker/appdata/CUDA/libcudnn_cnn_infer.so.8.5.0:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /mnt/docker/appdata/CUDA/libcublas.so.11.11.3.6:/usr/lib/x86_64-linux-gnu/libcublas.so.11:ro
      - /mnt/docker/appdata/CUDA/libcublasLt.so.11.11.3.6:/usr/lib/x86_64-linux-gnu/libcublasLt.so.11:ro


Cheers

Link to comment
1 hour ago, realies said:

I want to use CUDA Toolkit and cuDNN in a Docker container but it appears that has to be installed on the host. Is there a way to add these as part of the driver installation?

Sorry but cuDNN is not part of the default Nvidia driver and is basically not supported by the plugin since it only supports CUDA and cuDNN is, at least to my knowledge a Deep Neural Network library.

 

However you could install it yourself if I'm not mistaken but I'm not sure if it doesn't need to be installed in the container itself since as @Fraddles pointed out he had to mount it into the container.

 

27 minutes ago, Fraddles said:

The solution I used was

I would rather recommend to download the libraries from here as a tarball and install it either on the host or directly in the Docker container (as far as I can tell you don't have to compile anything).

Please be aware that this is a pretty big download (about 900MB).

 

However I think that cuDNN should be part of the container and not be installed on the host but I could be wrong about that, have both of you checked if there is maybe a version from the container available with the cuDNN library?

 

Please always include which containers are you using and if there is a support thread here on the forums for the container ask first there.

Link to comment

Sorry, some additional detail...
I am using CUDA with the 'rhasspy/wyoming-whisper' container to do STT, part of HA's voice assistant setup.

 

I did not install any thing on the Unraid host, or in the container itself... I copied the required files from a Debian install I had been playing with before trying out Unraid.  Details on that install can be found here; https://github.com/Fraddles/Home-Automation/tree/main/Voice-Assistant

Most of the setup on that page is not required with Unraid and this driver, but as previously noted the cuDNN libraries are still needed.  Not all of them I just copied the four listed files into a subfolder in my docker appdata and mapped them into the container.

 

There does seem to be a few options for 3rd party containers with GPU support baked in, or I could build my own, but I prefer to use the 'official' containers where possible.  Mapping a handful of static files in is not a difficult task :)

 

Cheers.

 

 

Link to comment
8 hours ago, Fraddles said:

rhasspy/wyoming-whisper

4 hours ago, Fraddles said:

the newly released version of the wyoming-whisper container broke my config...

Why not create an issue in his GitHub over here and request that he bundles the files with the container? There is even an issue open which requests that from what I can tell.

 

@realies & @Fraddles TBH I haven't looked into self hosted AI stuff much (since this can get expensive quick over here in Europe) but I assume that since AI is currently hyped as hell that some Docker images might miss some dependencies to fully support all hardware configurations <- but I could be wrong about that.

 

EDIT: It seems like CUDA isn't fully supported by the image you are using click.

Link to comment

Oh, I am well aware that I am playing with edge cases... :)

 

Your driver works like a dream, far less hassle to setup than a plain Debian install... Kudos to you!
 

Thanks!

 

EDIT:  Have resolved my issues with the new container... all working nicely. :)

 

Edited by Fraddles
  • Like 1
Link to comment

I am running unraid 6.12.4 and trying to utilize Nvidia Driver plugin to use a Nvidia Quadro P400 for transcoding in Jellyfin. I have successfully configured this in the past, however, recently the driver seems to be missing. I tried to reinstall the Nvidia Driver plugin but now it stops with the following error:

--------------Can't download Nvidia Driver Package v550.40.07----------------- plugin: run failed: '/bin/bash' returned 1

 

The quadro p400 shows up in the supported driver list, so I'm not sure why it can't download or install the driver anymore. I am able to download and install other plugins successfully, and my GPU is correctly recognized by the OS.

 

root@Tower:~# lspci | grep -i nvidia
06:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

 

Logs only show the following:

 

Mar 12 08:29:06 Tower root: plugin: creating: /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz - downloading from URL https://github.com/ich777/unraid-nvidia-driver/raw/master/packages/nvidia-driver-2024.01.19.txz
Mar 12 08:29:08 Tower root: plugin: checking: /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz - MD5
Mar 12 08:29:08 Tower root: plugin: running: upgradepkg --install-new /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz
Mar 12 08:29:08 Tower root: plugin: creating: /usr/local/emhttp/plugins/nvidia-driver/README.md - from INLINE content
Mar 12 08:29:08 Tower root: plugin: running: anonymous

 

and that's where they end. What do I do next? Is the driver v550.40.07 not supported anymore? Is nvidia driver pulling the wrong driver version? 

 

One thing I noticed is that unraid doesn't seem to be creating the nvidia-driver directory under /boot/config/plugins, or under plugins-removed or plugins-error. It did initially during my first time troubleshooting, but i deleted all of those files and now nothing seems to be created. I've checked the perms on all of these directories and root has rwx on all folders/files.

 

I'm not sure what to try next. Any help would be appreciated

tower-diagnostics-20240312-0947.zip

Edited by captainkegs
Link to comment
1 hour ago, captainkegs said:

I am running unraid 6.12.4

Please upgrade to 6.12.8 and then post new Diagnostics, before upgrading make sure to go to the Plugins in Unraid, look if you have a Plugins Error tab, if yes delete the plugin and upgrade (please also check your notifications if you got a message before rebooting to upgrade to the new version that the new version is downloaded.

After the upgrade see if the Nvidia driver got installed, if not Reinstall it from the CA App.

 

1 hour ago, captainkegs said:

It did initially during my first time troubleshooting, but i deleted all of those files and now nothing seems to be created. I've checked the perms on all of these directories and root has rwx on all folders/files.

Please don't do anything manual, this will make troubleshooting much more complicated.

 

Do the steps from above and see if it installs correctly after the upgrade.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.