[Plugin] Nvidia-Driver

ich777 · August 12, 2022

38 minutes ago, thanhkhoa84 said:

I had fixed typo in the command 'disable_xconfig' instead of 'disble_xconfig' and it works like a charm.

Doh! Sorry for that, glad that it is fixed now and also that you've noticed my typo.

bmyonatan · August 13, 2022

I have a GTX 645, that I want to use for jellyfin transcoding.

I am able to set it up using driver version v470.129.06, but when trying to monitor GPU usage, there always appears to be no processes running. `nvidia-smi` does recognize the GPU, and I can see it's temp, but no usage.

On Nvidias website, it appears that the latest supported driver for the GTX 645 is v470.141.03, so I was thinking that maybe updating to that driver version might fix monitoring issue.

Is there any way to update to v470.141.03? It's not an option from the plugins settings page

ich777 · August 13, 2022

1 hour ago, bmyonatan said:

I have a GTX 645, that I want to use for jellyfin transcoding.

Sorry but this isn't the best choice for transcoding...

This card is only capable of transcoding h264: Click (your card is 1st gen and code name: GK106).

Maybe look for a NVIDIA T400, this card is Turing based, is low power (can draw only up to 35W), doesn't need external power and you can get it brand new for about $130,-

gevsan · August 14, 2022

Hello.

I have a GTX 1660 Super in my server and last night it disappeared from your plugin and my unraid system. I did take the GPU outside to a test rig and the GPU is working fine, well atleast I got a video output from it so im assuming that it's fine.

When reinstalling the GPU in my server again and Unraid still doesn't recognize or finding it.

tower-diagnostics-20220814-2012.zip

ich777 · August 14, 2022

29 minutes ago, gevsan said:

When reinstalling the GPU in my server again and Unraid still doesn't recognize or finding it.

No support as long as this is on your server:

Aug 14 19:59:50 Tower emhttpd: /usr/local/emhttp/plugins/user.scripts/backgroundScript.sh "/tmp/user.scripts/tmpScripts/unlock nvidia/script" >/dev/null 2>&1

First post of this thread, red text on top.

BomB191 · August 15, 2022

On 8/3/2022 at 8:23 PM, ich777 said:

On 8/3/2022 at 7:43 PM, Scootter said:

First, thanks for all the work you do to provide this plugin! I suddenly have started having issues with any docker that uses --runtime=nvidia. I first noticed after a system reboot and saw that Plex had not started up. When I tried to start it I immediately got "Execution Error, Bad parameter".

This usually indicates that the runtime is not working properly and also is logged in your syslog.

What packages have you installed from the Nerd Pack? I can only imagine that you have something installed that is interfering with the Nvidia Driver.

Have you changed anything recently in your system, may it be the hardware or software (Docker, Plugins,...)

So I appear to be having this issue.

Fresh install though so never worked before.

Just uninstalled Nerd Pack and rebooted

Getting

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

I'm Sure I'm missing something.

Done the usual reboot/reinstall etc etc. (initially I was having VIFO problems, system used to pass it through now it doesn't)

Driver gets the below and appears to be A ok

GPU stats is getting pulled correctly too

image.png.f39df14983789acdc55bb481ede2fa1e.png

image.png.b3cd958fbe556a4bf88800a53a70c309.png

I'm like 99% sure I'm missing something dumb. what logs would you need?

Edit: Also confirmed these are ok too.

image.png.ccfe3c9aa604d2504f38281bd51b8cde.png

Edited August 15, 2022 by BomB191
more info

ich777 · August 16, 2022

6 hours ago, BomB191 said:

Edit: Also confirmed these are ok too.

A user already had the same issue and I was able to reproduce this on my test server while up- and downgrading Unraid version.

I was able to solve it here:

Please also look at the following post since the user had many packages installed from the NerdPack and in his case it seems to be that this caused the issue.

...please report back if you got it working again.

BomB191 · August 16, 2022

3 hours ago, ich777 said:

A user already had the same issue and I was able to reproduce this on my test server while up- and downgrading Unraid version.

I was able to solve it here:

Please also look at the following post since the user had many packages installed from the NerdPack and in his case it seems to be that this caused the issue.

...please report back if you got it working again.

Unfortunately I attempted those fixes before posting.

The only nerd pack item I had installed was perl (cant even remember what I installed it for to be fair)

But all has been removed completely and rebooted, I also tried reinstalling the driver after this aslo - same result

nvidia-persistenced in the cmd line is accepted but no change.

NVIDIA_VISIBLE_DEVICES I think is where my issue might be

I'm copying the information from

image.png.fefcdef682ae8b96325a4f3800f739de.png

Confirmed no spaces, Tried re copy pasting

Correct "GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a"

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

incorrect "asfa" I triel 'all' also as i saw that somewhere when I was searching.

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

Item is as per instructions on first post

image.png.d6070716099e05279b1ac29f62973501.png

NVIDIA_DRIVER_CAPABILITIES however spits a different error when I set it to 'some'

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
unsupported capabilities found in 'some' (allowed ''): unknown.

with the correct 'all' I get

Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

My final attempt was

put ' --runtime=nvidia' in extra pram

fail the "save/compile'

Go back in and edit the template and repasted the 'GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a'

Failed with the same error as NVIDIA_VISIBLE_DEVICES as above.

Edited August 16, 2022 by BomB191
more info

ich777 · August 16, 2022

35 minutes ago, BomB191 said:

nvidia-persistenced in the cmd line is accepted but no change.

Do you have nvidia-persistenced enabled? If yes, please disable it with:

kill $(pidof nvidia-persistenced)

and try it again after you've disabled it.

Do you have all the variables in the Docker template as described in the second post of this thread?

BomB191 · August 16, 2022

48 minutes ago, ich777 said:
Do you have nvidia-persistenced enabled? If yes, please disable it with:
kill $(pidof nvidia-persistenced)
and try it again after you've disabled it.

Do you have all the variables in the Docker template as described in the second post of this thread?

After running 'kill $(pidof nvidia-persistenced)'

I get the same error

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: false: unknown device: unknown.

Can also confirm both required variables are in the docker

Key : NVIDIA_VISIBLE_DEVICES

Value : GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a

Key : NVIDIA_DRIVER_CAPABILITIES

Value : all

This is in the unmanic container. I assume I'm not at the point of the container itself having issues with it yet.

I am on Version: 6.10.3 should I hop onto the 6.11.0-rc3?

ich777 · August 16, 2022

12 minutes ago, BomB191 said:

I am on Version: 6.10.3 should I hop onto the 6.11.0-rc3?

No, have you yet tried to change a value in the container and to basically recreate it on your server and see if this fixes your issue?

Can you maybe post your Diagnostics?

BomB191 · August 16, 2022

5 minutes ago, ich777 said:

No, have you yet tried to change a value in the container and to basically recreate it on your server and see if this fixes your issue?

Can you maybe post your Diagnostics?

I require a dunce hat for tonight.

Went to make a new container and notice these 2 prams hiding under more settings.

Figures it would be something extremally stupid. I didn't even contemplate checking in there.

The disappointment in myself is unmeasurable.

TIL check 'Show more settings ...' Sorry for wasting your time. and thank you immensely for the assistance

ich777 · August 16, 2022

2 minutes ago, BomB191 said:

Figures it would be something extremally stupid. I didn't even contemplate checking in there.

No worries, so it is now working or am I wrong?

BomB191 · August 16, 2022

9 minutes ago, ich777 said:

No worries, so it is now working or am I wrong?

yes confirmed now working!

Thank you very much 💖

ich777 · August 16, 2022

11 minutes ago, BomB191 said:

yes confirmed now working!

What did you do exactly?

Did you remove the entry with ‘all‘ or did you remove your UUID?

BomB191 · August 16, 2022

10 minutes ago, ich777 said:

What did you do exactly?

Did you remove the entry with ‘all‘ or did you remove your UUID?

Yes when I went to create a fresh container i noticed it under 'show more settings'

so on my container i had 2x NVIDIA_VISIBLE_DEVICES one with my GPU and one with 'all' in the field.

I deleted the variable I created and used the one in the container already and renamed it to my gpu

So the container now has the below in the settings regarding the GPU

HopkinsAG · August 18, 2022

I have tried a lot of different things that others suggested in this thread, but I can't see to get it working. I have an HP DL360 G7 and an Nvidia P2000. I can see the video card in system devices and it is using the nvidia driver, but when I go into the plugin settings it says No Device. I'm not sure what I am doing wrong.

tower-diagnostics-20220817-2221.zip

ich777 · August 18, 2022

2 hours ago, HopkinsAG said:

I'm not sure what I am doing wrong.

Why are you not using the latest available driver?

What happens when you issue:

nvidia-smi

from a Unraid terminal (screenshot)?

The only thing that I can see is that you have enabled this in the VM settings:

pcie_acs_override=downstream,multifunction

Do you need those settings? If not try to Disable them and reboot.

I can see nothing in your Diagnostics that would prevent the card from working, the only thing that I can imagine that your PCIe slot delivers too less power but then the card would certainly fall from the bus what I don't see here.

Please make sure that you've enable Above 4G Decoding in the BIOS.

I had another user here with a Dell server where a HP Nvidia card wouldn't work, maybe it's the same in your system too since HP is notorious for locking down their Servers so that you can only use their Add-On cards and such.

HopkinsAG · August 18, 2022

I have tried almost all of the drivers and the oldest driver was my last test.

I think you’re on to something about the 5gig card. I am unable to do anything in the bios for allowing greater than 4gig cards. I have ordered a Quadro M2000 with 4gig to test out that theory. I should just upgrade the server to hardware that isn’t 8+ years old…

ich777 · August 18, 2022

1 hour ago, HopkinsAG said:

4gig cards

This (Above 4G decoding) has nothing to do with the memory size from add on cards, this is only the address space that a add on card can use in RAM.

HopkinsAG · August 18, 2022

1 hour ago, ich777 said:

This (Above 4G decoding) has nothing to do with the memory size from add on cards, this is only the address space that a add on card can use in RAM.

Sorry, I miss understood what you said. I have looked in the bios and searched online. I don’t see to have the option in the Gen7 server. It is in the Gen8 though. I might just be hosed.

Howboys · August 18, 2022

At boot, it takes nearly a minute for the driver to install:

Aug 18 16:08:45 Tower root: --------------------Nvidia driver v515.57 found locally---------------------
Aug 18 16:08:45 Tower root: 
Aug 18 16:08:45 Tower root: -----------------Installing Nvidia Driver Package v515.57-------------------
Aug 18 16:09:36 Tower kernel: nvidia: loading out-of-tree module taints kernel.
Aug 18 16:09:36 Tower kernel: nvidia: module license 'NVIDIA' taints kernel.
Aug 18 16:09:36 Tower kernel: Disabling lock debugging due to kernel taint
Aug 18 16:09:36 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Aug 18 16:09:36 Tower kernel: 
Aug 18 16:09:36 Tower kernel: nvidia 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Aug 18 16:09:36 Tower kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.57  Wed Jun 22 22:44:07 UTC 2022
Aug 18 16:09:36 Tower root:

Is there any way to speed this up?

ich777 · August 19, 2022

3 hours ago, Howboys said:

At boot, it takes nearly a minute for the driver to install

The driver is about 300MB in terms of size…

I think one minute for a driver installation is not bad especially on Linux where you have usually have to reboot twice to install it if use the proprietary driver.

3 hours ago, Howboys said:

Is there any way to speed this up?

Not really, the only way is to not install it. 😅

May I ask why?

Howboys · August 19, 2022

10 minutes ago, ich777 said:

The driver is about 300MB in terms of size…

I think one minute for a driver installation is not bad especially on Linux where you have usually have to reboot twice to install it if use the proprietary driver.

Not really, the only way is to not install it. 😅

May I ask why?

It's not a huge deal but without the plugin, my boot is less than a minute. So 2-3x high boot up seems like.. something not ideal.

I know with normal Linux, I could create a preloaded image but idk if unRAID has that option.

ich777 · August 19, 2022

36 minutes ago, Howboys said:

It's not a huge deal but without the plugin, my boot is less than a minute.

Good boot time…

36 minutes ago, Howboys said:

So 2-3x high boot up seems like.. something not ideal.

May I ask how often do you reboot your server? I reboot my server very little and even if I do so it is not of a huge deal because I know exactly how long it would take.

38 minutes ago, Howboys said:

I know with normal Linux, I could create a preloaded image but idk if unRAID has that option.

No, not by default.

Also please keep in mind that if you are creating such an image you with the drivers preloaded you should be on the version that you want to create a custom image, then you have to pull down the drivers, maybe some build dependencies, compile everything, pack the image again move it to your USB boot device (, hope everything went well so that your system is able to boot) and revoot afterwards.

This process would take about 15 to 20 minutes (<- with recent hardware) depending on the speed of your server.

Also keep in mind that the boot will also take longer, not that much longer as if you create a custom image but I think it will be around 20 to 30 seconds more, because the boot image is bogger and takes longer to load.

Hope that makes all sense to you.

[Plugin] Nvidia-Driver

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

ich777

ich777

ich777

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation