[Plugin] Nvidia-Driver


ich777

Recommended Posts

Hi, 

 

I just installed an nvidia P4000 and the unmanic docker. Everything seems to be working however when I look at my stats of my GPU & CPU my GPU seems to be throttling and my CPU is doing the hard work.

 

For example, I look at my Load - Memory which drops back to 0 every time, Encoder - Decoder and PCI Bus Rx/Tx (MB/s).

When I have 4 workers running, shouldn't my GPU work harder?

Animatie.thumb.gif.0bc9a7b826feb64b02964b591756d494.gif

_
unmanic.thumb.PNG.163643f650b37dfda5f0e39fe14fdb75.PNGtower-diagnostics-20230406-1900.zip

Thanks, 

Link to comment

I just noticed my plugin hasn't notified me of an update in a while ... and a manual update fails.

 

Card: Gigabyte GTX 1080

Installed driver:  520.56.06

Unraid version: 6.11.5

 

Trying to manually update to the production branch (v525.105.17) fails with this:

 

----------------Downloading Nvidia Driver Package v470.141.03-----------------
---------This could take some time, please don't close this window!------------

---------------Can't download Nvidia Driver Package v470.141.03----------------

 

It looks like it keeps trying to download that ancient version no matter what I do, including if I even try to manually choose that old version to download and install. Any ideas on how to overcome this?

Edited by CaptainShipoopi
clarifying "no matter what I do"
Link to comment
2 hours ago, joykingdom said:

Except for the persistent error log, the nvidia T400 driver installed with this plugin works fine, in Emby or plex.

As said above, this is most likely a bug in your BIOS.

I also really can't tell if this is caused by the mobile T400 chip that your card is using, which is as said not officially supported by the driver from Nvidia.

Link to comment
25 minutes ago, ich777 said:

Remove the plugin, reboot, pull a fresh copy from the CA App, reboot again and see if it is working after you did that.

Thank you my friend -- this seems to have done the trick. It just installed the latest version post-reboot; now I'm downgrading to the production branch and looks successful so far. No earthly idea what caused this to jam up ... I didn't see anything related whatsoever in the logs.

  • Like 1
Link to comment
1 hour ago, forsaken1 said:

Nvidia GeForce GTS450

i would say "not supported" anymore, may consider that this card has no video encoder capabilities, so pretty useless to use with the drivers on the host ... and for usage in a VM this plugin (and drivers) are not needed, see page 1.

 

so in terms your usecase was for dockers, wouldnt work anyway as there is no nvenc encoder ...

 

supported cards (and features) https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new

  • Like 2
Link to comment

What GPU would work the best with NVIDIA-DRIVER and Plex? That is affordable? I'm currently using a Nvida GTX 970 on a Asus Maximus V Formula Motherboard. The GPU was working okay with Plex about a month and half ago. But all it sudden Plex stopped using Hardware Coding with my GPU. I'm currently running on Production Branch:  v525.105.17.

Link to comment

Folks don't seem to realize that Plex only uses the video encoder/decoder circuitry of the GPU's processor.  All of those thousands of CUDA cores on a high end card sit idle when transcoding.  The newer the card, the more (and more recent) formats it can handle.

 

The Quadro T400 (and its predecessor P400) both can transcode h.265 which 4K is typically coded.  Low cost, and low power (30W, all straight from the PCIe slot - no add'l cable needed) makes it perfect for home media servers.

 

There are times where a more powerful card is needed.  If you are running your own private Netflix and have a half dozen (or more) folks remotely transcoding at the same time, you probably need to move up to one of the Quadro's bigger brothers.  Someone running facial recognition or some other AI application could possibly put the CUDA cores to use.  Users who wish to do this would likely mention it in their posts, and already have an idea what they need for hardware.  So it makes the Quadro T400 an easy recommendation.

Edited by ConnerVT
speeling
  • Thanks 1
Link to comment

Hello!

 

I have RTX 3080 and passthrough it directly in the VM, everything works well. But I came up with the idea to buy a RTX A2000 and throw it into Unraid dockers. And, in a classic way, I have an issue "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

Tell me which way to dig? VFIO is unbinded. Diagnostics attached.

 

Thank you in advance!

diagnostics-20230411-2153.zip

Link to comment
10 hours ago, brainreplaced said:

RTX A2000

From what I see this is a HP branded card correct:

02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [RTX A2000] [10de:2531] (rev a1)
    Subsystem: Hewlett-Packard Company Device [103c:151d]
    Kernel modules: nvidia_drm, nvidia
02:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1)
    Subsystem: Hewlett-Packard Company Device [103c:151d]

 

From what I see in your syslog your card doesn't answer and falls from the bus afterwards:

Apr 11 18:30:15 brainserver kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Apr 11 18:30:15 brainserver kernel: nvidia 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 11 18:30:15 brainserver kernel: nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Apr 11 18:30:15 brainserver kernel: NVRM: The NVIDIA GPU 0000:02:00.0
Apr 11 18:30:15 brainserver kernel: NVRM: (PCI ID: 10de:2531) installed in this system has
Apr 11 18:30:15 brainserver kernel: NVRM: fallen off the bus and is not responding to commands.
Apr 11 18:30:15 brainserver kernel: nvidia: probe of 0000:02:00.0 failed with error -1
Apr 11 18:30:15 brainserver kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Apr 11 18:30:15 brainserver kernel: NVRM: This can occur when a driver such as:
Apr 11 18:30:15 brainserver kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv
Apr 11 18:30:15 brainserver kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s).
Apr 11 18:30:15 brainserver kernel: NVRM: Try unloading the conflicting kernel module (and/or
Apr 11 18:30:15 brainserver kernel: NVRM: reconfigure your kernel without the conflicting
Apr 11 18:30:15 brainserver kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module
Apr 11 18:30:15 brainserver kernel: NVRM: again.
Apr 11 18:30:15 brainserver kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Apr 11 18:30:15 brainserver kernel: NVRM: None of the NVIDIA devices were initialized.
Apr 11 18:30:15 brainserver kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 243

 

It could be the case because it's a HP branded card and they use proprietary calls to initialize the card but I don't think that this is the case here.

 

Please try/check these things:

  • Enable Above 4G Decoding in your BIOS
  • Enable Resizable BAR Support in your BIOS
  • Re-seat the card in the slot
  • Check if the power is connected properly to the card <- if there is a external power
  • Make sure that your Motherboard supports delivering 70W of power through the PCIe slot if you have no external power connector
  • Update your BIOS
Link to comment
18 minutes ago, ich777 said:

Please try/check these things:

  • Enable Above 4G Decoding in your BIOS
  • Enable Resizable BAR Support in your BIOS
  • Re-seat the card in the slot
  • Check if the power is connected properly to the card <- if there is a external power
  • Make sure that your Motherboard supports delivering 70W of power through the PCIe slot if you have no external power connector
  • Update your BIOS

Options like "Above 4GB MMIO BIOS Assignment" and "Re-Size BAR Support" already enabled. Bios has the latest version. I've tried swapping cards, but no change. And I'm sure there's about 70W on that PCIe.

 

And one more thing, I cannot change EFI to Legacy, there are only two options: "EFI" and "Disabled"

image.thumb.png.1703b95c34061e60e18ceffbe9db2161.png

Link to comment
18 minutes ago, brainreplaced said:

I've tried swapping cards, but no change.

Do you have another PC where you can try the card, I'm assuming the card is working properly, correct?

 

I only can think of a hardware compatibility issue.

 

If you are able to try the card in another PC please try to also create a dedicated Unraid USB Boot device, boot from this USB device on the PC where you are testing the card (DON'T START THE ARRAY OR ASSIGN DISKS), register Trail, install the CA App, install the Nvdia Driver plugin and see if the card is recognized <- but as said, don't start the Array or assign any disks.

If the above is working it is a hardware incompatibility issue, maybe also try to contact the Supermicro support about that if this is a known issue.

  • Like 1
Link to comment
39 minutes ago, Draco1544 said:

Any update on this ?

As said, it's not possible.

 

What do you want to do? You can't set the fan speed on Unraid. When the driver is loaded the default fan curve that the manufacturer defined is applied which is stored in the vBIOS.

  • Like 1
  • Thanks 1
Link to comment

Hi

 

Been trying to get my Quadro p600 to be recognised by the driver plugin without any success. Just get the message "No devices were found". The card appears in the hardware profile. Ive used the latest and the production drivers and neither recognise the card. Thought it might be my ancient BIOS. So I updated that to latest, no dice.

Ive attached some screenshots and a diagnostics file. Any assistance is gratefully received.

 

Screenshot 2023-04-21 at 17.50.49.png

Screenshot 2023-04-21 at 17.49.30.png

Screenshot 2023-04-21 at 17.47.27.png

thompson-diagnostics-20230421-1802.zip

Link to comment
28 minutes ago, gergtreble said:

Any assistance is gratefully received.

Please check if a BIOS update is available for your Motherboard. I know that Supermicro boards are a little bit picky when it comes to GPUs.

Are you sure that the PCIe slot is able to deliver 30W of power?

Please try to Enable Above 4G Support or Large Address Space and Resizable BAR support if you have that options in your BIOS.

 

Are you also sure if the card is working properly?

 

The card is failing to initialize:

Apr 21 17:44:52 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2461)
Apr 21 17:44:52 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

This is most of the times caused by a hardware incompatibility or a faulty card.

Link to comment
10 minutes ago, ich777 said:

Please check if a BIOS update is available for your Motherboard. I know that Supermicro boards are a little bit picky when it comes to GPUs.

 

I'm now running the most up to date BIOS. Its not helped.

 

10 minutes ago, ich777 said:

Are you sure that the PCIe slot is able to deliver 30W of power?

 

I really don't know. But alt least one other person has got it working (I'm gonna reach out to him). The fan is spinning on it, if thats any indication?

 

10 minutes ago, ich777 said:

Please try to Enable Above 4G Support or Large Address Space and Resizable BAR support if you have that options in your BIOS.

 

Enabled the Above 4G support. Didn't change the outcome. Could not find any way to enable Resizable BAR in the BIOS.

 

10 minutes ago, ich777 said:

Are you also sure if the card is working properly?

 

The card is failing to initialize:

Apr 21 17:44:52 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2461)
Apr 21 17:44:52 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

This is most of the times caused by a hardware incompatibility or a faulty card.

 

Oh dear, I don't know or have anyway to tell! I picked this card up cheap on Ebay. Maybe it's time to return it. :(

 

Any other ideas? Thanks for your help. 

Link to comment
47 minutes ago, gergtreble said:

Maybe it's time to return it. :(

Do you maybe have another computer where you can test this card?

If so, please try to install the drivers and put some 3D load on the card like Furmark.

 

49 minutes ago, gergtreble said:

But alt least one other person has got it working

Okay, then maybe this is a faulty card but I really can‘t tell for sure.. Have you trued the card yet in another PCIe slot if you have one?

Link to comment
2 minutes ago, ich777 said:

Do you maybe have another computer where you can test this card?

If so, please try to install the drivers and put some 3D load on the card like Furmark.

 

I don't have any other way to test it TBH. 

 

2 minutes ago, ich777 said:

 

Okay, then maybe this is a faulty card but I really can‘t tell for sure.. Have you trued the card yet in another PCIe slot if you have one?

 

I only have the one full sized PCI slot sadly. 

One thing I have noticed is that I get the following in my syslog:
 


Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2465)
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2465)
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2465)
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2465)
Apr 21 19:50:19 THOMPSON kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

Whenever I reload the plugin page or run nvidia-smi. 

So yeah, I think I have enough evidence here that I have a dud card. Bummer! Thanks for the help, I'll return it and try and find another. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.