[Plugin] Nvidia-Driver


ich777

Recommended Posts

4 hours ago, Jacon said:
4 hours ago, Jacon said:

I would recommend that you request that on the Docker Engine sub forums since this is not really related to my plugin but I have to say that GPU-BURN ist just one single line that a user needs to execute and afterwards he has to delete the container from the Docker page with Advanced View turned on.

 

BTW if you want some more metrics from your card install nvtop from the CA App, this is command line only but a very useful tool if you want to see detailed utilization metrics.

 

Anyways, glad that everything is now working for you!

  • Thanks 1
Link to comment
On 1/5/2023 at 12:43 AM, ich777 said:

Tesla and Quadro cards are already supported by the plugin.

 hi im kind of confused.

 

I read multiple times that Tesla cards are supported and also multiple times that they are note Supported.

 

I own a Tesla M40 24GB card, and that some seems to be not working with your plugin.  Am i doing something wrong or is this one not supported. As far as i could see, this card is not in the list of Supported Graphics, but none Tesla seems to be in that one.

 

is there a way i could manually get the drivers running ?

 

 

Thanks alot for your efforts

Link to comment
1 hour ago, LittelD said:

 hi im kind of confused.

Me too about the Tesla cards because they are Datacenter cards in general but the K series is confirmed working fine.

 

1 hour ago, LittelD said:

is there a way i could manually get the drivers running ?

From what I see the Tesla M40 24GB card uses the same driver as the consumer cards when you select the Cuda 12.0 Toolkit so in theory it should work just fine.

 

Can you please post your Diagnostics with the driver installed so that I can see how your System is set up and what is maybe preventing it from being recognized?

Link to comment
2 hours ago, ich777 said:

Me too about the Tesla cards because they are Datacenter cards in general but the K series is confirmed working fine.

 

From what I see the Tesla M40 24GB card uses the same driver as the consumer cards when you select the Cuda 12.0 Toolkit so in theory it should work just fine.

 

Can you please post your Diagnostics with the driver installed so that I can see how your System is set up and what is maybe preventing it from being recognized?

Thank God!!!! and i thought im fking stupid to read and understand basic stuff :D

 

here are my diag files

unraidtower-diagnostics-20230110-1211.zip

Link to comment
8 minutes ago, LittelD said:

im really sorry i forgot that... now after install

Here is the error:

Jan 10 12:28:22 UnraidTower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jan 10 12:28:22 UnraidTower kernel: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)

 

This seems pretty much BIOS related, please double check that you've enabled Above 4G Decoding (this is surely called very differently on your Dell server in the BIOS) and check for a option about resizable BAR support and enable it.

 

The driver should work fine, the only thing what is preventing it from working is your BIOS currently, hope that helps. :)

 

Please let me know how that option is called on the Dell servers if you find it.

Link to comment
33 minutes ago, ich777 said:

Here is the error:

Jan 10 12:28:22 UnraidTower kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jan 10 12:28:22 UnraidTower kernel: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)

 

This seems pretty much BIOS related, please double check that you've enabled Above 4G Decoding (this is surely called very differently on your Dell server in the BIOS) and check for a option about resizable BAR support and enable it.

 

The driver should work fine, the only thing what is preventing it from working is your BIOS currently, hope that helps. :)

 

Please let me know how that option is called on the Dell servers if you find it.

well sadly there is no option in the bios for this.... 

 

guess journey ends here for now

 

Thanks alot for your support

Edited by LittelD
Link to comment
2 minutes ago, ich777 said:

There is an option for that for sure because another user here enabled it successfully on his Dell server.

 

Something with large PCI address space or something like that.

well this is not a Dell Server , this just a Optiplex 7020 and nothing in the bios that sounded even far away like that, or anyhting i couldnt understand/knew what it did exactly. 

Link to comment
9 minutes ago, LittelD said:

well this is not a Dell Server

Oh, I thought this is some kind of Dell Server because I saw lots of Dell devices in your devices overview.

 

Anyways, maybe check if there is a newer BIOS available which enables this function, technically speaking this should be possible on nearly every motherboard which is capable of x86_64.

Link to comment
On 1/10/2023 at 6:31 AM, LittelD said:

well this is not a Dell Server , this just a Optiplex 7020 and nothing in the bios that sounded even far away like that, or anyhting i couldnt understand/knew what it did exactly. 

 

Any chance a BIOS setting was hardcoded that prohibits the BIOS from reading the card?  It could be worth a default reset.  I'd also check to see if there are any PCI settings that appear odd.

Link to comment

Hi I'm having trouble with this plugin. When I click on the nvidia-driver page in the settings I just get a blank screen (that seems to be trying to load then times out). I've disabled and enabled docker. I've rebooted. I've uninstalled the plugin. reinstalled the plugin, then disabled and renabled docker. no joy. 


Also when I first tried to get it working, I try to apply a GPU to a frigate docker container only to have the entire docker image corrupt itself. 

Logs show: 276 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.0.121, server: , request: "GET /Settings/nvidia-driver HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "LOCAL-IP", referrer: "http://LOCAL-IP/Plugins"

 

 

Link to comment
45 minutes ago, CallMeBeachy said:

When I click on the nvidia-driver page in the settings I just get a blank screen (that seems to be trying to load then times out).

With that little information I really can't do much, please post your Diagnostics.

 

45 minutes ago, CallMeBeachy said:

Also when I first tried to get it working, I try to apply a GPU to a frigate docker container only to have the entire docker image corrupt itself. 

This is most likely not caused by the plugin, maybe this was some kind of weird coincidence and I've never heard of something like that because the plugin doesn't touch the Docker image or anything else on the system, except for the directory on the boot drive where the plugin is stored.

 

How much RAM do you have installed in your system? Are you sure that you are not running out of memory?

What card are you using? Is this a new card or did you by it used? Have you yet tried to re-seat the card in the slot? Did you check if external power (if required) is connected properly and sufficient?

Link to comment
2 hours ago, ich777 said:

With that little information I really can't do much, please post your Diagnostics.

 

This is most likely not caused by the plugin, maybe this was some kind of weird coincidence and I've never heard of something like that because the plugin doesn't touch the Docker image or anything else on the system, except for the directory on the boot drive where the plugin is stored.

 

How much RAM do you have installed in your system? Are you sure that you are not running out of memory?

What card are you using? Is this a new card or did you by it used? Have you yet tried to re-seat the card in the slot? Did you check if external power (if required) is connected properly and sufficient?

Well... it turns out I needed to reseat the GPU... I can't believe that. I nearly didn't do it thinking.. the bloody thing can't move and I can see it in the systems interfaces..

Nvidia panel is opening now. Will attempt to attach it to the docker containers now. Appreciate your help and your patience.

Link to comment
1 hour ago, dopeytree said:

Hows the situation with using this to run a 2nd nvidia card? Would one just install a 2nd instance?

I don't fully understand...

Just install a second card in your system and it should show up in the Plugin, no need to install anything for a second card.

 

I know a few users who are running 3+ cards in their system with the plugin.

  • Like 1
Link to comment

Is there a way to control idle usage fan speed? my rtx a2000 seems to default to 30% which is whinny. 

I'm sure if I could tweak the idle fan to be either 20% or 40% it would change the whine.

It's a blower so no simple way to swap it to a noctua silent fan.

 

The card is idling in p8 power idle mode.

Edited by dopeytree
Link to comment
26 minutes ago, dopeytree said:

Is there a way to control idle usage fan speed? my rtx a2000 seems to default to 30% which is whinny. 

Sadly enough no, at least not without a Desktop environment and some libraries which are actually not available on Unraid.

The fan curve is set in the BIOS from the card.

 

28 minutes ago, dopeytree said:

The card is idling in p7 power idle mode.

I assume with nvidia-persistenced enabled correct?

Link to comment

yeah nvidia-persistenced is enabled on boot in go file. Sorry it's p8 not p7.

Its only 30° at idle so shouldn't need 30% fan usage.

It rises to 50% fan speed when in full use playing gaming like cyberpunk2077 with raytracing on at around 65°-75°.

 

I found this so will have a tinker https://www.techticity.com/howto/how-to-control-nvidia-graphics-card-fan-speed-in-linux/

 

Edited by dopeytree
Link to comment

Hello all.  I have two NVidia cards installed and followed the directions in the plugin with the Opensource driver.  I had tried the others but I still get the "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." I rebooted as well.  I am using a Tesla P4 and a an NVS 310.  The 310 is the display card in the primary slot and the P4 is secondary.  Both show up in devices.  Thanks so much in advance!

 

 

diags.zip

Link to comment
4 hours ago, Trackpads said:

I have two NVidia cards installed and followed the directions in the plugin with the Opensource driver.

Are you sure that you've followed the directions from the Nvidia Driver plugin page for the Open Source Kernel Module?

I don't see the folder /boot/config/modprobe.d and the nvidia.config file is also missing.

 

If you really want to use the open source driver you have to follow like mentioned on the Nvidia Driver plugin page for the Open Source Kernel Module.

 

The next thing is that you've bound your P4 to VFIO and that wont work either, you have to unbind it from VFIO so that it can show up on the host and therefore in the plugin:

0b:00.0 3D controller [0302]: NVIDIA Corporation GP104GL [Tesla P4] [10de:1bb3] (rev a1)
    Subsystem: NVIDIA Corporation GP104GL [Tesla P4] [10de:11d8]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia_drm, nvidia

 

And last the NVS 310 isn't supported by the Open Source Kernel Module, or any other driver package because it is simply to old and needs the older legacy driver series 390 which isn't available through the plugin because you won't have much benefit because you simply can't use it in Docker containers.

As you can even see here the driver did also tell you that you card is not supported by the Open Source Kernel Module:

Jan 18 12:40:06 Executor kernel: NVRM: The NVIDIA GPU 0000:0a:00.0 (PCI ID: 10de:107d)
Jan 18 12:40:06 Executor kernel: NVRM: installed in this system is not supported by open
Jan 18 12:40:06 Executor kernel: NVRM: nvidia.ko because it does not include the required GPU
Jan 18 12:40:06 Executor kernel: NVRM: System Processor (GSP).

This is the card the driver is referring to:

0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)
	Subsystem: Hewlett-Packard Company GF119 [NVS 310] [103c:1154]
	Kernel modules: nvidia_drm, nvidia

 

 

Please don't forget that if you want to use the Open Source Kernel Module it will only work if you create the file with the content described on the Nvidia Driver plugin page and reboot afterwards, but I would recommend that you stick to the latest available closed source driver.

Link to comment

Hi Ich777,

After my Plex docker failed to stop, and an unclean shutdown of my array, Nvidia-smi and the plugin cannot find my GPU (gtx1050ti). It appears under system hardware. Is there anything I can do to troubleshoot/reset? I have tried uninstalling/reinstalling different driver versions. Thanks!

 

Edited by wkipling
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.