[Plugin] Nvidia-Driver


ich777

Recommended Posts

I have a GTX 645, that I want to use for jellyfin transcoding.

I am able to set it up using driver version v470.129.06, but when trying to monitor GPU usage, there always appears to be no processes running. `nvidia-smi` does recognize the GPU, and I can see it's temp, but no usage.

On Nvidias website, it appears that the latest supported driver for the GTX 645 is v470.141.03, so I was thinking that maybe updating to that driver version might fix monitoring issue.

Is there any way to update to v470.141.03? It's not an option from the plugins settings page

Link to comment
1 hour ago, bmyonatan said:

I have a GTX 645, that I want to use for jellyfin transcoding.

Sorry but this isn't the best choice for transcoding...

This card is only capable of transcoding h264: Click (your card is 1st gen and code name: GK106).

 

Maybe look for a NVIDIA T400, this card is Turing based, is low power (can draw only up to 35W), doesn't need external power and you can get it brand new for about $130,-

Link to comment

Hello.

 

I have a GTX 1660 Super in my server and last night it disappeared from your plugin and my unraid system. I did take the GPU outside to a test rig and the GPU is working fine, well atleast I got a video output from it so im assuming that it's fine.

 

When reinstalling the GPU in my server again and Unraid still doesn't recognize or finding it.

 

 

 

tower-diagnostics-20220814-2012.zip

Link to comment
29 minutes ago, gevsan said:

When reinstalling the GPU in my server again and Unraid still doesn't recognize or finding it.

No support as long as this is on your server:

Aug 14 19:59:50 Tower emhttpd: /usr/local/emhttp/plugins/user.scripts/backgroundScript.sh "/tmp/user.scripts/tmpScripts/unlock nvidia/script" >/dev/null 2>&1

 

First post of this thread, red text on top.

  • Like 1
Link to comment
On 8/3/2022 at 8:23 PM, ich777 said:

 

On 8/3/2022 at 7:43 PM, Scootter said:

First, thanks for all the work you do to provide this plugin! I suddenly have started having issues with any docker that uses --runtime=nvidia. I first noticed after a system reboot and saw that Plex had not started up. When I tried to start it I immediately got "Execution Error, Bad parameter".

This usually indicates that the runtime is not working properly and also is logged in your syslog.

 

What packages have you installed from the Nerd Pack? I can only imagine that you have something installed that is interfering with the Nvidia Driver.

Have you changed anything recently in your system, may it be the hardware or software (Docker, Plugins,...)

So I appear to be having this issue.

Fresh install though so never worked before.

Just uninstalled Nerd Pack and rebooted

 

Getting

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

I'm Sure I'm missing something.

 

Done the usual reboot/reinstall etc etc. (initially I was having VIFO problems, system used to pass it through now it doesn't)

Driver gets the below and appears to be A ok

image.thumb.png.26e4116b76ca596944457a410c84feae.png

 

GPU stats is getting pulled correctly too 

image.png.f39df14983789acdc55bb481ede2fa1e.png

 

image.png.b3cd958fbe556a4bf88800a53a70c309.png

 

I'm like 99% sure I'm missing something dumb. what logs would you need?

 

Edit: Also confirmed these are ok too.

image.png.ccfe3c9aa604d2504f38281bd51b8cde.png

Edited by BomB191
more info
Link to comment
6 hours ago, BomB191 said:

Edit: Also confirmed these are ok too.

A user already had the same issue and I was able to reproduce this on my test server while up- and downgrading Unraid version.

 

I was able to solve it here:

 

Please also look at the following post since the user had many packages installed from the NerdPack and in his case it seems to be that this caused the issue.

 

...please report back if you got it working again.

Link to comment
3 hours ago, ich777 said:

A user already had the same issue and I was able to reproduce this on my test server while up- and downgrading Unraid version.

 

I was able to solve it here:

 

Please also look at the following post since the user had many packages installed from the NerdPack and in his case it seems to be that this caused the issue.

 

...please report back if you got it working again.

Unfortunately I attempted those fixes before posting.

 

The only nerd pack item I had installed was perl (cant even remember what I installed it for to be fair)

But all has been removed completely and rebooted, I also tried reinstalling the driver after this aslo - same result

 

nvidia-persistenced in the cmd line is accepted but no change.

 

NVIDIA_VISIBLE_DEVICES I think is where my issue might be

 

I'm copying the information from

 image.png.fefcdef682ae8b96325a4f3800f739de.png

Confirmed no spaces, Tried re copy pasting 

 

Correct "GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a"

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

 

incorrect "asfa" I triel 'all' also as i saw that somewhere when I was searching.

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

 

Item is as per instructions on first post

image.png.d6070716099e05279b1ac29f62973501.png

 

NVIDIA_DRIVER_CAPABILITIES however spits a different error when I set it to 'some'

 

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
unsupported capabilities found in 'some' (allowed ''): unknown.

 

with the correct 'all' I get

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

 

 

My final attempt was

put ' --runtime=nvidia' in extra pram

fail the "save/compile'

Go back in and edit the template and repasted the 'GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a'

Failed with the same error as NVIDIA_VISIBLE_DEVICES as above.

 

Edited by BomB191
more info
Link to comment
35 minutes ago, BomB191 said:

nvidia-persistenced in the cmd line is accepted but no change.

Do you have nvidia-persistenced enabled? If yes, please disable it with:

kill $(pidof nvidia-persistenced)

 

and try it again after you've disabled it.

 

Do you have all the variables in the Docker template as described in the second post of this thread?

Link to comment
48 minutes ago, ich777 said:

Do you have nvidia-persistenced enabled? If yes, please disable it with:

kill $(pidof nvidia-persistenced)

 

and try it again after you've disabled it.

 

Do you have all the variables in the Docker template as described in the second post of this thread?

After running 'kill $(pidof nvidia-persistenced)'

I get the same error

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

 

Can also confirm both required variables are in the docker 

Key : NVIDIA_VISIBLE_DEVICES

Value : GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a

 

Key NVIDIA_DRIVER_CAPABILITIES

Value : all

 

This is in the unmanic container. I assume I'm not at the point of the container itself having issues with it yet.

 

I am on Version: 6.10.3 should I hop onto the 6.11.0-rc3?

Link to comment
5 minutes ago, ich777 said:

No, have you yet tried to change a value in the container and to basically recreate it on your server and see if this fixes your issue?

 

Can you maybe post your Diagnostics?

I require a dunce hat for tonight.

 

Went to make a new container and notice these 2 prams hiding under more settings.

image.thumb.png.38e5539ea5c4590483f8addff0f06f6f.png

 

Figures it would be something extremally stupid. I didn't even contemplate checking in there.

 

The disappointment in myself is unmeasurable.

 

TIL check 'Show more settings ...' :( Sorry for wasting your time. and thank you immensely for the assistance 

Link to comment
10 minutes ago, ich777 said:

What did you do exactly?

Did you remove the entry with ‘all‘ or did you remove your UUID?

Yes when I went to create a fresh container i noticed it under 'show more settings'

 

so on my container i had 2x NVIDIA_VISIBLE_DEVICES one with my GPU and one with 'all' in the field.

 

I deleted the variable I created and used the one in the container already and renamed it to my gpu 

So the container now has the below in the settings regarding the GPU

image.thumb.png.2c41a531502bb40ca3dc5c38e6a4d926.png

image.thumb.png.cc3ffee0ea998b306493d24f1237097c.png

  • Like 1
Link to comment
2 hours ago, HopkinsAG said:

I'm not sure what I am doing wrong.

Why are you not using the latest available driver?

 

What happens when you issue:

nvidia-smi

from a Unraid terminal (screenshot)?

 

The only thing that I can see is that you have enabled this in the VM settings:

pcie_acs_override=downstream,multifunction

Do you need those settings? If not try to Disable them and reboot.

 

I can see nothing in your Diagnostics that would prevent the card from working, the only thing that I can imagine that your PCIe slot delivers too less power but then the card would certainly fall from the bus what I don't see here.

 

Please make sure that you've enable Above 4G Decoding in the BIOS.

I had another user here with a Dell server where a HP Nvidia card wouldn't work, maybe it's the same in your system too since HP is notorious for locking down their Servers so that you can only use their Add-On cards and such.

Link to comment

I have tried almost all of the drivers and the oldest driver was my last test. 
 

I think you’re on to something about the 5gig card. I am unable to do anything in the bios for allowing greater than 4gig cards. I have ordered a Quadro M2000 with 4gig to test out that theory. I should just upgrade the server to hardware that isn’t 8+ years old…

Link to comment
1 hour ago, ich777 said:

This (Above 4G decoding) has nothing to do with the memory size from add on cards, this is only the address space that a add on card can use in RAM.

Sorry, I miss understood what you said. I have looked in the bios and searched online. I don’t see to have the option in the Gen7 server. It is in the Gen8 though. I might just be hosed. 

Link to comment

At boot, it takes nearly a minute for the driver to install:

 

Aug 18 16:08:45 Tower root: --------------------Nvidia driver v515.57 found locally---------------------
Aug 18 16:08:45 Tower root: 
Aug 18 16:08:45 Tower root: -----------------Installing Nvidia Driver Package v515.57-------------------
Aug 18 16:09:36 Tower kernel: nvidia: loading out-of-tree module taints kernel.
Aug 18 16:09:36 Tower kernel: nvidia: module license 'NVIDIA' taints kernel.
Aug 18 16:09:36 Tower kernel: Disabling lock debugging due to kernel taint
Aug 18 16:09:36 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Aug 18 16:09:36 Tower kernel: 
Aug 18 16:09:36 Tower kernel: nvidia 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Aug 18 16:09:36 Tower kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.57  Wed Jun 22 22:44:07 UTC 2022
Aug 18 16:09:36 Tower root: 

 

Is there any way to speed this up? 

Link to comment
3 hours ago, Howboys said:

At boot, it takes nearly a minute for the driver to install

The driver is about 300MB in terms of size…

I think one minute for a driver installation is not bad especially on Linux where you have usually have to reboot twice to install it if use the proprietary driver.

 

3 hours ago, Howboys said:

Is there any way to speed this up? 

Not really, the only way is to not install it. 😅

 

May I ask why?

Link to comment
10 minutes ago, ich777 said:

The driver is about 300MB in terms of size…

I think one minute for a driver installation is not bad especially on Linux where you have usually have to reboot twice to install it if use the proprietary driver.

 

Not really, the only way is to not install it. 😅

 

May I ask why?

 

It's not a huge deal but without the plugin, my boot is less than a minute. So 2-3x high boot up seems like.. something not ideal.

 

I know with normal Linux, I could create a preloaded image but idk if unRAID has that option.

Link to comment
36 minutes ago, Howboys said:

It's not a huge deal but without the plugin, my boot is less than a minute.

Good boot time…

 

36 minutes ago, Howboys said:

So 2-3x high boot up seems like.. something not ideal.

May I ask how often do you reboot your server? I reboot my server very little and even if I do so it is not of a huge deal because I know exactly how long it would take.

 

38 minutes ago, Howboys said:

I know with normal Linux, I could create a preloaded image but idk if unRAID has that option.

No, not by default.

 

Also please keep in mind that if you are creating such an image you with the drivers preloaded you should be on the version that you want to create a custom image, then you have to pull down the drivers, maybe some build dependencies, compile everything, pack the image again move it to your USB boot device (, hope everything went well so that your system is able to boot) and revoot afterwards.

 

This process would take about 15 to 20 minutes (<- with recent hardware) depending on the speed of your server.

 

Also keep in mind that the boot will also take longer, not that much longer as if you create a custom image but I think it will be around 20 to 30 seconds more, because the boot image is bogger and takes longer to load.

 

Hope that makes all sense to you.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.