[Plugin] Nvidia-Driver


ich777

Recommended Posts

hi,

just upgraded Unraid to 6.12 and nvidia driver to 535.54.03 however it seems to have stopped using the gpu.

the nvidia app still shows my graphics card and so does nvidi-smi command.

The card is used for plex transcoding only. Tested nvidia-smi inside and outside the docker container terminal and it shows the card, however trying to play media it needs to transcode it's showing 0% GPU usage and 0 processes.

Any pointers would be great, many thanks.

 

 

 

Mon Jun 19 15:14:51 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P400                    Off | 00000000:07:00.0 Off |                  N/A |
| 31%   44C    P0              N/A /  N/A |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

 

Nvidia Info:

Nvidia Driver Version:535.54.03

Open Source Kernel Module:No

Installed GPU(s):0:
Quadro P400
07:00.0
GPU-1eead256-a395-17c4-331a-109bdf1313bf
 

 

System Info

Unraid Version:6.12.0

Kernel:6.1.33-Unraid

Edited by thelostnorthman
Link to comment
59 minutes ago, Omen said:

Found at: ./var/lib/pkgtools/packages/nvidia-driver-2023.05.16

Deleted it, the error is almost the same:

Because that is not the actual issue that you are facing here…

 

What‘s most likely the case here is that the plugin couldn‘t communicate with GitHub for some reason (PiHole, AdGuard, Unifi switches, to many GitHub API requests,…)

 

1 hour ago, Omen said:

Seems like deletion on the last step helped, but after reboot. Solved.

After the second reboot it seems that the driver was able to communicate with GitHub and the package was downloaded successful.

 

Another reason could be that you didn‘t wait until the Plugin Update Helper wasn‘t finished and told you that it is safe now to reboot.

Link to comment
Just now, thelostnorthman said:

What I'm finding is that the gpu isn't being used by the plex container anymore. 

But this then seems more like a Plex issue than a driver issue.

 

If the card is recognized on the plugin page it should be working.

 

What input material are you trying to transcode? Do you use the Plex webclient to force a transcode, it is know that there is an issue and transcodig isn‘t working there.

Also see this post and a few above:

 

Link to comment
7 minutes ago, ich777 said:

But this then seems more like a Plex issue than a driver issue.

 

If the card is recognized on the plugin page it should be working.

 

What input material are you trying to transcode? Do you use the Plex webclient to force a transcode, it is know that there is an issue and transcodig isn‘t working there.

Also see this post and a few above:

 

I've been trying to transcode 4k to 1080 on the android app, the samsung tv app and web client. Even straight 4k original on my home network it would pass that to the GPU to deal with but all bottlenecked at CPU now, no GPU usage.

Link to comment
26 minutes ago, thelostnorthman said:

no GPU usage.

What does the Plex transcoding log say, what variables habe you passed through in the Plex template? Did you do it like mentioned in the second post or did you use another variable, Extra Parameter?

Link to comment

I have the Nvidia Quadro K4000 graphics card.

When I try to install the latest from the plugin, I get this error:

 

image.thumb.png.db4f2e5d9bf3045a48e9de115d595368.png

 

The Nvidia drivers page points me to this driver for Linux 64bit which is fairly recent (March 2023)

image.thumb.png.af8bc736616f665a9d6e8970af56b978.png

image.thumb.png.dca179b6ff22121f6929a3b392fbfb01.png

 

However, when I use the plugin, the only 470.xxx one that it provides is the one at the bottom: v470.141.03

image.thumb.png.4d52eae46136f6bcefad70adcf4e3755.png

I want to be able to use this card with Docker for apps that use the "--runtime=nvidia" parameter like stable-diffusion etc. that detect the NVIDIA GPU and ask for its GPUID e.g. GPU-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxx.

 

Is this something that can be supported?

 

 

 

 

 

Edited by frakman1
Link to comment
1 hour ago, frakman1 said:

When I try to install the latest from the plugin, I get this error:

Without the Diagnostics I can't say anything. It is most likely the case that there is something else preventing the card from showing up.

 

1 hour ago, frakman1 said:

However, when I use the plugin, the only 470.xxx one that it provides is the one at the bottom: v470.141.03

This is still the same driver...

Link to comment

I just posted here: 

 

 

At bootup it's telling me that the nvidia package seems to be corrupt. How can I redownload the driver?

 

Ah, I think I have to wait until the redownload is finished 🙂 I didn't read the whole message after Checksum error 🙂

 

Should there be a "porgress bar" while downloading? --> No, it's finished now...

 

Rebooted and it's working again.

 

Thank you 😄

Edited by enJOyIT
Link to comment
55 minutes ago, enJOyIT said:

I just posted here: 

This is a plugin and you should always use the support thread because strictly speaking Limetech has nothing to do with it.

 

55 minutes ago, enJOyIT said:

At bootup it's telling me that the nvidia package seems to be corrupt. How can I redownload the driver?

Did you waited with the reboot until the Plugin-Update-Helper told you it is safe to reboot?

 

55 minutes ago, enJOyIT said:

Ah, I think I have to wait until the redownload is finished 🙂 I didn't read the whole message after Checksum error 🙂

After the download finished it should have told you just to reboot.

 

55 minutes ago, enJOyIT said:

Should there be a "porgress bar" while downloading? --> No, it's finished now...

No, it tells you to wait until the Done button appears and that you should not close this window.

 

 

Glad that it is now working again, please always read the dialogues. :)

  • Like 1
Link to comment

Hi there!

 

After the latest update of the plugin including an update of Unraid to 6.12.1 and the latest nvidia driver, it seems my graphics card no longer shows up. 

 

Keeps displaying the driver as being corrupt on the boot screen, even after multiple re-installs, re-downloads of various driver versions and reboots - without any success.

 

I don't have any Adguard or the like on my home network.

 

Worth noting I have also reinstalled Unraid on my USB flash drive, only keeping the config folder.

 

Hope someone can help me narrow down my problem :)

tower-diagnostics-20230621-1834.zip

Link to comment
5 minutes ago, Olitrolli said:

Keeps displaying the driver as being corrupt

Where does it say this, do you have a screenshot please?

The driver size seems correct to me and I don't think the checksum is wrong.

 

Have you yet tried just to reboot after verifying the checksums? From what I can tell from your syslog it wasn't downloaded and installed later on in the boot process.

Link to comment
9 minutes ago, ich777 said:

Where does it say this, do you have a screenshot please?

The driver size seems correct to me and I don't think the checksum is wrong.

 

Have you yet tried just to reboot after verifying the checksums? From what I can tell from your syslog it wasn't downloaded and installed later on in the boot process.

This is the response I get:

FATAL: Module nvidia not found in directory /lib/modules.....

 

I deleted the plugin. Went to download it again and now it won't let me download it.

 

Just rebooted the system as well and same response :-((

 

1389737302_Screenshot2023-06-21at19_14_48.thumb.png.40b34fc6fe02560367ffc7cccc71256c.png

Link to comment

Okay. So through the power of rebooting again again I managed to download the plugin and get confirmation that the driver had been installed. I pressed the "done" button and went for another reboot to finalize the install.

 

This is what the plugin displays once booted into Unraid.

 

 

Screenshot 2023-06-21 at 19.32.55.png

Link to comment
40 minutes ago, Olitrolli said:

I deleted the plugin. Went to download it again and now it won't let me download it.

 

Just rebooted the system as well and same response :-((

The usual way is that you remove the plugin, then reboot and then reinstall the plugin again and reboot again.

 

30 minutes ago, Olitrolli said:

This is what the plugin displays once booted into Unraid.

Please post your Diagnostics, this is completely different from before.

Link to comment
1 hour ago, Olitrolli said:

Again, really appreciate the help here:)

These Diagnostics are the same as before, the driver did not install for whatever reason on your system even if it tells it in the syslog.

The message on boot just tells you that the driver package did not install

 

I need the Diagnostics when the plugin reports that "nvidia-smi is not able..."

 

Do the following:

  1. Uninstall the plugin
  2. Reboot
  3. Pull a fresh copy from the plugin from the CA App
  4. Reboot
  5. The plugin should report that nvidia-smi failed -> pull the Diagnostics imitatively and post them here

 

Did you change anything in the BIOS, did you maybe do a BIOS update or anything similar, was it always the case that you've booted into UEFI instead of Legacy mode?

Please check also if you have C-States in the BIOS disabled -> this causes mostly issues with Nvidia cards.

Link to comment
36 minutes ago, ich777 said:

These Diagnostics are the same as before, the driver did not install for whatever reason on your system even if it tells it in the syslog.

The message on boot just tells you that the driver package did not install

 

I need the Diagnostics when the plugin reports that "nvidia-smi is not able..."

 

Do the following:

  1. Uninstall the plugin
  2. Reboot
  3. Pull a fresh copy from the plugin from the CA App
  4. Reboot
  5. The plugin should report that nvidia-smi failed -> pull the Diagnostics imitatively and post them here

 

Did you change anything in the BIOS, did you maybe do a BIOS update or anything similar, was it always the case that you've booted into UEFI instead of Legacy mode?

Please check also if you have C-States in the BIOS disabled -> this causes mostly issues with Nvidia cards.

 

Okay! So I replicated the nvidia-smi fail scenario and attached the diagnostics below.

 

No I haven't changed any BIOS settings recently and have always booted into UEFI. Would you recommend booting in Legacy mode instead?

 

Will check C-states in BIOS when I get home but haven't touched or had any problems with it before :-/

Screenshot 2023-06-21 at 22.58.50.png

tower-diagnostics-20230621-2259.zip

Link to comment
8 hours ago, Olitrolli said:

Okay! So I replicated the nvidia-smi fail scenario and attached the diagnostics below.

I have to look into that issue when I got home from work.

 

It is really strange to me since another user on 6.12.1 is utilizing the driver just fine:

 

8 hours ago, Olitrolli said:

No I haven't changed any BIOS settings recently and have always booted into UEFI. Would you recommend booting in Legacy mode instead?

With the Nvidia Driver yes, however if you haven't experienced any issues so far then it should work on 6.12.1 also just fine.

 

8 hours ago, Olitrolli said:

Will check C-states in BIOS when I get home but haven't touched or had any problems with it before 😕

Most AMD users have to disable C-States so that the card can work properly.

 

I think something else is going on on your system but I can't tell for sure what.

 

BTW just because I noticed it, don't switch between various driver version since this won't magically solve your issue, stick to the latest one by switching it makes diagnosing the issue much harder for me.

 

What is the device 0000:07:00.0 in your system and why did you bind it to VFIO? If all of that doesn't help can you try to downgrade to 6.12.0 and see if that fixes the issue?

 

I've also now took a quick look how often driver version 535.54.03 was downloaded and as of time of writing it was downloaded about 2000 times, so I'm assuming it is working fine since otherwise I would have many reports that the driver isn't working.

Link to comment
13 hours ago, ich777 said:

I have to look into that issue when I got home from work.

 

It is really strange to me since another user on 6.12.1 is utilizing the driver just fine:

 

With the Nvidia Driver yes, however if you haven't experienced any issues so far then it should work on 6.12.1 also just fine.

 

Most AMD users have to disable C-States so that the card can work properly.

 

I think something else is going on on your system but I can't tell for sure what.

 

BTW just because I noticed it, don't switch between various driver version since this won't magically solve your issue, stick to the latest one by switching it makes diagnosing the issue much harder for me.

 

What is the device 0000:07:00.0 in your system and why did you bind it to VFIO? If all of that doesn't help can you try to downgrade to 6.12.0 and see if that fixes the issue?

 

I've also now took a quick look how often driver version 535.54.03 was downloaded and as of time of writing it was downloaded about 2000 times, so I'm assuming it is working fine since otherwise I would have many reports that the driver isn't working.

 

So I re-seated the graphics card and disabled C-states in the BIOS and boom! No more "corrupted" driver.

 

Before that I had even tried to reinstall Unraid on an entirely new and formatted USB flash drive with same results.

 

I have never touched the C-state BIOS option prior - weird if that had turned into a problem all of a sudden.

 

Thank you so much @ich777!:D

 

I really want to support you for the help you do for this community. Is sponsoring you on Github the right place to go? 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.