[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

4 hours ago, Solverz said:

Sorry for late response. I have just ran the command "nvidia-smi" and the results is "No devices were found".

 

If the below command is what you mean by chbmb, then it gave the same result "No devices were found".

 

nvidia-smi --query-gpu=gpu_name,gpu_bus_id,gpu_uuid --format=csv,noheader | sed -e s/00000000://g | sed 's/\,\ /\n/g'

Then I don't know what is happening. I have a p400 also, but no issues at all.

Do you have anything in /dev/dri?

Have you tried using an older version of unraid o see if it works there?

Link to comment
4 hours ago, saarg said:

Then I don't know what is happening. I have a p400 also, but no issues at all.

Do you have anything in /dev/dri?

Have you tried using an older version of unraid o see if it works there?

Yes I tried the previous 2 versions of unraid nvidia and same result.

 

The below are in the /dev/dri/ directory.

 

by-path/  card0  renderD128

Link to comment
6 hours ago, david279 said:

Try plugging in the HDMI on the GPU to a monitor or something. It may need to have some type of output for it to work. Try a Dummy HDMI adapter.

I don't have a dummy or monitor plugged in, so not shure if that is the case, but it's worth a try.

Link to comment
10 hours ago, Solverz said:

I am not booting to gui mode no, it is running headless so I just let it boot to unraid without the gui :)

 

 

Can you post your diagnostics and I'll see if I find anything there.

Link to comment

I have a weird behaviour , I have a quadro P400 and a 1050ti in the system

 

Nvidia Driver Version:	440.59
GPU 0 Model & Bus:	Quadro P400      23:00.0
GPU 0 UUID:	GPU-de8ab77e-8fff-db12-f93d-ebe991944a85
GPU 1 Model & Bus:	GeForce GTX 1050 Ti      2D:00.0
GPU 1 UUID:	GPU-e60379bf-191f-14ec-3841-d4dfd8e82ab8

All is working fine when passing through the 1050TI to the plex container, but when trying to pass the quadro, I get this

 

/usr/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: GPU-de8ab77e-8fff-db12-f93d-ebe991944a85\\n\""": unknown.

The command failed.

 

Link to comment
11 minutes ago, chocorem said:

I have a weird behaviour , I have a quadro P400 and a 1050ti in the system

 


Nvidia Driver Version:	440.59
GPU 0 Model & Bus:	Quadro P400      23:00.0
GPU 0 UUID:	GPU-de8ab77e-8fff-db12-f93d-ebe991944a85
GPU 1 Model & Bus:	GeForce GTX 1050 Ti      2D:00.0
GPU 1 UUID:	GPU-e60379bf-191f-14ec-3841-d4dfd8e82ab8

All is working fine when passing through the 1050TI to the plex container, but when trying to pass the quadro, I get this

 


/usr/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: GPU-de8ab77e-8fff-db12-f93d-ebe991944a85\\n\""": unknown.

The command failed.

 

It looks like you have some empty spaces at the end of the UUID.

Link to comment
1 hour ago, scottc said:

if you look at the end of the docker error line

 


device error: unknown device id: GPU-de8ab77e-8fff-db12-f93d-ebe991944a85\\n\

\\n\ is what is causing your issue

 

 

I using the edit to put the data in .... and when putting the other UUID from the 1050, I get no errors ? how to remove this \\n\ ?

Link to comment
6 hours ago, chocorem said:

I double checked in the config field, no there are no space, neither at the start nor at the end

I'd you try to delete it and toe it in manually? You might not see that there is a space. There is a bug that makes the same issue when copy pasting from the forum and typing it manually works.

Link to comment
15 hours ago, Solverz said:

 

1. Your bios is very old. You have 1.8.2 (From 2011) and the newest is 1.13.0 (2018).

2. You have this error in the syslog about the P400.

pci 0000:04:00.0: can't claim BAR 3 [mem 0xbe000000-0xbfffffff 64bit pref]: no compatible bridge window

kernel: pci 0000:04:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
kernel: pci 0000:04:00.0: BAR 3: trying firmware assignment [mem 0xbe000000-0xbfffffff 64bit pref]
kernel: pci 0000:04:00.0: BAR 3: [mem 0xbe000000-0xbfffffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbf698fff]
kernel: pci 0000:04:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]

3. You got the below lines in your syslog, which I don't have. Might be different revisions of the card or that your's a Dell card?

 

kernel: nvidia: loading out-of-tree module taints kernel.
kernel: nvidia: module license 'NVIDIA' taints kernel.

4. It seems it's a different GPU ID also. Yours is

[drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver

And mine is

[drm] [nvidia-drm] [GPU ID 0x00008200] Loading driver

 

5. You have the below when the plugin is installed, so there is something going on. Hardware/bios issue is my guess.

 

kernel: resource sanity check: requesting [mem 0xdd700000-0xde6fffff], which spans more than PCI Bus 0000:04 [mem 0xdc000000-0xddffffff]
kernel: caller _nv030928rm+0x5d/0xd0 [nvidia] mapping multiple BARs
kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0xffff:1227)
kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0

 

 

So try to update your bios and see if that works.

Link to comment
24 minutes ago, saarg said:

 

1. Your bios is very old. You have 1.8.2 (From 2011) and the newest is 1.13.0 (2018).

2. You have this error in the syslog about the P400.


pci 0000:04:00.0: can't claim BAR 3 [mem 0xbe000000-0xbfffffff 64bit pref]: no compatible bridge window

kernel: pci 0000:04:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
kernel: pci 0000:04:00.0: BAR 3: trying firmware assignment [mem 0xbe000000-0xbfffffff 64bit pref]
kernel: pci 0000:04:00.0: BAR 3: [mem 0xbe000000-0xbfffffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbf698fff]
kernel: pci 0000:04:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]

3. You got the below lines in your syslog, which I don't have. Might be different revisions of the card or that your's a Dell card?

 


kernel: nvidia: loading out-of-tree module taints kernel.
kernel: nvidia: module license 'NVIDIA' taints kernel.

4. It seems it's a different GPU ID also. Yours is


[drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver

And mine is


[drm] [nvidia-drm] [GPU ID 0x00008200] Loading driver

 

5. You have the below when the plugin is installed, so there is something going on. Hardware/bios issue is my guess.

 


kernel: resource sanity check: requesting [mem 0xdd700000-0xde6fffff], which spans more than PCI Bus 0000:04 [mem 0xdc000000-0xddffffff]
kernel: caller _nv030928rm+0x5d/0xd0 [nvidia] mapping multiple BARs
kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x26:0xffff:1227)
kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0

 

 

So try to update your bios and see if that works.

Thank you so much for finding these issues.

 

I am unsure if it is a Dell P400 or not as I just got it off of ebay. Would this cause any issues that you know of if it is a Dell p400?

 

I also just found out that the Pci-e x16 slot on the motherboard is PCIE Gen2 and performs in x8 mode not x16.... Would this relate to any of the above issues from what you understand with your experience?

 

I will also get the BIOs updated and report back :)

 

Thanks again!!!

Link to comment
57 minutes ago, Solverz said:

Thank you so much for finding these issues.

 

I am unsure if it is a Dell P400 or not as I just got it off of ebay. Would this cause any issues that you know of if it is a Dell p400?

 

I also just found out that the Pci-e x16 slot on the motherboard is PCIE Gen2 and performs in x8 mode not x16.... Would this relate to any of the above issues from what you understand with your experience?

 

I will also get the BIOs updated and report back :)

 

Thanks again!!!

 

I don't think there should be any issues if it's a dell p400.

 

Running the card in 8x gen2 shoudln't be an issue as far as I know.

Link to comment

Just checking after reading the last 10 pages to verify.

 

Is a dummy plug required for an Nvidia card being used exclusively for transcoding. Right now I have it set to work on the Plex docker from ls.io and I may also look at passing it over to a handbrake docker for retranscoding some media that quality isn't a huge concern (cartoons, etc.).

 

Right now I don't have it installed, but I've only just switched over to the nvidia transcode yesterday. Don't want to hit any snares as we go and I can just order the dummy plug ASAP if I should use it.

 

Thanks!

 

Love the work on the nvidia support LS.io, greatly appreciated.

Link to comment
Just now, david279 said:

I don't think the card will initialize if it doesn't have some type of output detected. 

I have it installed and transcoding without any output currently. I just wasn't sure if I'd run into some sort of low power state issues or anything. I never even removed the plastic covers that come installed in the HDMI/Displayport connectors on the GPU when I installed it into my server.

Link to comment
5 minutes ago, DaClownie said:

I have it installed and transcoding without any output currently. I just wasn't sure if I'd run into some sort of low power state issues or anything. I never even removed the plastic covers that come installed in the HDMI/Displayport connectors on the GPU when I installed it into my server.

Ok that's good to know. I know VMs with a passed through GPU that don't have anything connected will freak out sometimes.

Link to comment

Just a quick update on my 1660 super getting 'lost'.. 

 

I've now ruled out the HW (I think)

- Booting directly to a windows 10 native install (I just removed the HBA controller and unRAID USB Stick and used a spare 120GB SSD with a fresh WIN 10 install) I have run GPU tools for 3 days solid with no issues seen

- I've then booted to unRAID (non-nvidia) and passed it through as the primary GPU to a Win10 VM and that's run diagnostics for 2.5 days with no issues)

 

All I can think is the 440.59 linux drivers don't sit nicely with my Asus 1660 Super OC Phoenix GPU / Ryzen 3600 / Asus B450F Motherboard.

 

I guess in the spirit of this plugin all I can do now is not use the GPU until the next unraid build is released and hopefully the plugin can be updated to the latest drivers. I appreciate LinuxServer are busy so I can't expect anything more than they've stated. I have popped a +1 post in the feature request for native unRAID support for NVidia Drivers.

 

I'll focus on fixing the one or two niggles I've not yet resolved and sit patiently with fingers crossed!

Link to comment
On 5/2/2020 at 1:08 PM, saarg said:

 

I don't think there should be any issues if it's a dell p400.

 

Running the card in 8x gen2 shoudln't be an issue as far as I know.

I put the p400 in another system and it is now working fine, so I think I am going to use this new system instead.

 

However I noticed when rebooting the gpu UUID is not shown in the NVIDIA UNRAID settings and to get it to show I have to uninstall the NVIDIA UNRAID plug in and reinstall it.

 

Any ideas?

Link to comment
1 hour ago, Solverz said:

I put the p400 in another system and it is now working fine, so I think I am going to use this new system instead.

 

However I noticed when rebooting the gpu UUID is not shown in the NVIDIA UNRAID settings and to get it to show I have to uninstall the NVIDIA UNRAID plug in and reinstall it.

 

Any ideas?

Probably a race condition. You could use the command chbmb posted. I don't think the UUID will change so ou only need to get it once.

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.