[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

2 minutes ago, saarg said:

Probably a race condition. You could use the command chbmb posted. I don't think the UUID will change so ou only need to get it once.

Yes that command works also thank you.

Do you mind me asking what you mean by a race condition?

Link to comment

I hope you don't mind me asking...

 

Is it possible to get the two kernel modules: 'joydev' and 'uinput' with this plugin/images?

I recently released a container to play Steam games with In Home Streaming (even over the internet) inside a container but the only problem is that these two kernel modules are needed to enable 'real' controller support, now i have to map the controller buttons to keyboard or even mouse inputs and that's really frustrating for some games.

 

EDIT: Not needed anymore, got it solved. ;)

Edited by ich777
  • Like 1
Link to comment
21 hours ago, Solverz said:

Yes that command works also thank you.

Do you mind me asking what you mean by a race condition?

@saarg

 

I looked into race condition and found out what it means.

 

I also found out why the uuid was not showing. It was because jellyfin was set to autostart, when I disabled this it showed the gpu uuid again after and reboot :)

Thanks for all your help!!!

Link to comment
2 hours ago, Solverz said:

@saarg

 

I looked into race condition and found out what it means.

 

I also found out why the uuid was not showing. It was because jellyfin was set to autostart, when I disabled this it showed the gpu uuid again after and reboot :)

Thanks for all your help!!!

I didn't see your post until now.

 

It's weird jellyfin would make the UUID disappear.

Link to comment

Hi All, Just recently installed this (the 6.8.3 nvidia build) for the first time and I'm getting the "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. " error. I have two cards, the first is a GT 240 for console access and wouldn't expect to show up, but the second card is a 1660 GTX. I previously used the card for gpu passthrough but its used far and few so I'd rather repurpose it. I removed the stub on the card and have disabled the VM manager. After rebooting I'm still not seeing the card show up in the Nvida Build settings. Is there anything else I need to disable besides the stub and the vm manager? I believe its still in the XML for the VM however I would assume with the VM manager disabled it shouldn't be holding on to the card still no?

 

Any help is appreciated!

 

Thanks,

~chiefo

Edited by chiefo
Link to comment
32 minutes ago, chiefo said:

Hi All, Just recently installed this (the 6.8.3 nvidia build) for the first time and I'm getting the "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. " error. I have two cards, the first is a GT 240 for console access and wouldn't expect to show up, but the second card is a 1660 GTX. I previously used the card for gpu passthrough but its used far and few so I'd rather repurpose it. I removed the stub on the card and have disabled the VM manager. After rebooting I'm still not seeing the card show up in the Nvida Build settings. Is there anything else I need to disable besides the stub and the vm manager? I believe its still in the XML for the VM however I would assume with the VM manager disabled it shouldn't be holding on to the card still no?

 

Any help is appreciated!

 

Thanks,

~chiefo

You can check which module is loaded for the 1660 with lspci -k.

There have been others with a 1660 that have had issues getting the card recognised. Might be a driver version issue.

Link to comment
5 minutes ago, saarg said:

You can check which module is loaded for the 1660 with lspci -k.

There have been others with a 1660 that have had issues getting the card recognised. Might be a driver version issue.

I'll check that out in a minute. Ran lspci -v  and saw it was still being stubbed. Turns out while most of the components ended in 10d<letter> there was another one still stubbed that I thought was from the wireless card I had stubbed. Servers rebooting now.

 

 

 

*EDIT* Yeah jumped the gun. Its showing up now that i removed the 4th stub item that was hanging behind

Edited by chiefo
More info
Link to comment
18 hours ago, saarg said:

I didn't see your post until now.

 

It's weird jellyfin would make the UUID disappear.

I know, jellyfin only makes it disappear when jellyfin is set to autostart and the server reboot with this setting on. The gpu is assigned to the jellyfin container so not sure if anything is going on when the server boots with jellyfin autostarting? Maybe? 😅

Edited by Solverz
Link to comment
3 hours ago, Solverz said:

I know, jellyfin only makes it disappear when jellyfin is set to autostart and the server reboot with this setting on. The gpu is assigned to the jellyfin container so not sure if anything is going on when the server boots with jellyfin autostarting? Maybe? 😅

Hmm.... You could try to set a startup delay for jellyfin and see if that helps. Not sure if you need a plugin for that or it's a default thing.

  • Thanks 1
Link to comment

Just to update my own 'issue' and say I think I've tentatively found a solution after a bit more problem solving

 

Issues

1. Multiple Log entries stating "kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]" and "kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs" This correlated with nvidia-smi being called (either manually or by GPU Stats plugin).

Solution - It seems moving this to a different PCIE slot (I have two x16 slots, although not x16 if you use both) has stopped this error from appearing

 

2. The GPU dropping off the bus randomly and then being held in reset (fans at full speed) with nvidia-smi reporting a Lost GPU.

Solution - After trying many things:

- Moving Slots (this fixed issue 1, but had no effect on this isue)

- Passing through to a Win 10 VM - could not get it to work (error code 43) despite the correct VBios and stubbing the additional 2 devices (The 1660 super appears as 4 devices), however I believe this also had issues with the GPU getting 'lost' as the VM could not start randomly and I'd need to reboot.

- Native booting to Win 10 worked fine, no issues (I had to remove the HBA Card to ensure no changes to my array could occur)

- New BIOS revision - Made no difference

- Changing power supplies - Made no difference

- Finally (and why I didn't try this earlier) I tried to run memtest from the unRAID boot menu which would just reset the PC and it would never load memtest. I found out that you can't run memtest if booting using UEFI, so disabled that in the BIOS. Memory testing passed 24 hours of testing, however I then remembered reports that VM Pass through could be problematic with UEFI enabled, so kept it disable and booted to unRAID in legacy mode, and it's been 3+ days now without the GPU falling over.

 

I am a bit tentative, however, the UEFI being flaky with GPU's for VMs and it affecting the Linux Drivers is plausible so I'll update again if it makes it to a week. I'm also running GPU Stats again (no multiple BAR reports still).

 

TLDR: - I think/hope I've fixed my problem and just want to share incase it helps anyone else, disabling UEFI seems to have got me a nice stable system.

 

Just the plex issue of not managing the power modes correctly which is definitely not an issue with this plugin!

 

 

 

Link to comment
On 12/31/2019 at 6:38 PM, Timmy said:

Hey Guys. im trying to get my P2000 working for plex transcoding. i just installed the Unraid Nvidia plugin and it gets stuck on the Updating available builds screen.  any ideas?

 

im running 6.8.0 

 

I'm quoting this old comment because that's how I found this thread from a Google search. I'm just posting to help searchers in the future.

 

After installing the unraid-nvidia plug-in, visiting the settings page, and then clicking the Unraid Nividia link, I would get stuck at "Updating available builds". To resolve this, I did nothing else but changed my MTU to 1500. I had previously changed it to 9214 to accomplish something different. Changing it to 1500 let the process to update the page complete after about a minute.

 

Hope that helps someone!

Link to comment

Hello!

 

I'm facing an issue with nvidia-unraid.

With standard unraid, I was passing throught my 2080ti to a Win10 Gaming VM.

To do this, I had to add this "vfio-pci.ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7" to syslinux.

 

I switched to nvidia-unraid, for a plex container, and smi failed, so I had to remove "vfio...." from syslinux.

After a reboot, smi work, but my WM can't boot because it can't initialize the GPU.

 

Is there a special way to pasthrought GPU with nvidia-unraid? Or is it 2 use case not compatible?

 

Thanks !

Link to comment

@Valiran

 

as described in 1st page and i guess several times here, u cant use the nvidia passthrough simultan in docker (while in use) and vm passthrough,

so either its in use by a docker (then the VM wont boot) or its in use by the VM (dockers cant use it).

 

may read 1st page complete, its explained there, i dont use it as i use my intel igpu for dockers and nvidia for vm only (no unraid build needed therefore)

Link to comment

So to do what I want I need 2 sperate GPU.

One dedicated for docker and the other for the VM ans in this configuration it will work with nvidia-unraid right?

 

I will have smi errors only for the passed throught gpu, if I understand.

Link to comment
1 hour ago, Valiran said:

So to do what I want I need 2 sperate GPU.

One dedicated for docker and the other for the VM ans in this configuration it will work with nvidia-unraid right?

 

I will have smi errors only for the passed throught gpu, if I understand.

if u want to run simultan, yes, dedicated for unraid (unraid build needed) for dockers ...

and dedicated for each running VM simultan

 

sample

Intel igpu, GT1030, GTX1070

 

intel is for unraid (no special build needed, just modprobe i915) and dockers

GT1030 VM (AlsPC OR Ubuntu, not simultan ...) see pic, i couldnt start the ubuntu vm as the GT1030 is in use by VM AlsPC

GTX1070 VM (AlsMedia)

image.thumb.png.5ad3808dd2605b8f1110e21a5e5e0200.png

  • Like 1
Link to comment

I just installed a NVidia Quadro 4000 in my UnRAID server. When I try to configure it in the setting, I get:

"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

Nvidia Driver Version:  440.59

UnRAID 6.8.3

 

But the card is detected by the OS.

lspci -v

01:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 4000] (rev a3) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company GF100GL [Quadro 4000]
        Flags: bus master, fast devsel, latency 0, IRQ 10, NUMA node 0
        Memory at f8000000 (32-bit, non-prefetchable)
        Memory at d0000000 (64-bit, prefetchable)
        Memory at dc000000 (64-bit, prefetchable)
        I/O ports at cf00
        [virtual] Expansion ROM at 000c0000 [disabled]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel modules: nvidia_drm, nvidia
 

Any Idea ? Thanks

Link to comment
4 hours ago, Janus said:

I just installed a NVidia Quadro 4000 in my UnRAID server. When I try to configure it in the setting, I get:

"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

Nvidia Driver Version:  440.59

UnRAID 6.8.3

 

But the card is detected by the OS.

lspci -v

01:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 4000] (rev a3) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company GF100GL [Quadro 4000]
        Flags: bus master, fast devsel, latency 0, IRQ 10, NUMA node 0
        Memory at f8000000 (32-bit, non-prefetchable)
        Memory at d0000000 (64-bit, prefetchable)
        Memory at dc000000 (64-bit, prefetchable)
        I/O ports at cf00
        [virtual] Expansion ROM at 000c0000 [disabled]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel modules: nvidia_drm, nvidia
 

Any Idea ? Thanks

Have you tried running the command posted on the previous page to get the UUID?

Link to comment
3 hours ago, saarg said:

Have you tried running the command posted on the previous page to get the UUID?

You mean this command ?

nvidia-smi --query-gpu=gpu_name,gpu_bus_id,gpu_uuid --format=csv,noheader | sed -e s/00000000://g | sed 's/\,\ /\n/g'

 

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Link to comment
56 minutes ago, Janus said:

You mean this command ?

nvidia-smi --query-gpu=gpu_name,gpu_bus_id,gpu_uuid --format=csv,noheader | sed -e s/00000000://g | sed 's/\,\ /\n/g'

 

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Are you booting in uefi mode? Might try legacy boot and see if it works then.

It looks like the modules are loaded.

Have you ever stubbed the card before and maybe forgot to remove the audio part from stubbing?

Link to comment

Hey all! I am hoping that someone may have an explanation for the issue I am facing.

 

I recently purchased a Lenovo Nvidia P400 and was attempting to install it, but I have run into a couple of issues. Here is what I did:

 

1. Upon boot, it appeared as though the system was defaulting to the PEG, which made it not possible to view the Motherboard startup sequence, or Unraid startup sequence where I log in to the system. I just waited a certain amount of time so that I knew I was at the password portion, typed my password blind, and success! Booted successfully into Unraid.

 

2. I went to install the Nvidia plugin, but when I went to install the Unraid version, the Plugin interface just sat on "Updating Available Builds". It never loaded past that part, no matter what settings or how ever many restarts I conducted.

 

3. I checked to see if there were any additional settings associated with my motherboard that is required for this card, but I couldn't seem to find anything. The card is seated correctly, and the PCI-E power from the PSU is connected to the motherboard (card doesn't have a direct connection).

 

4. Without the card installed, the system boots appropriately into Unraid including the appearance of the Motherboard and Unraid startup sequences

 

Here is some info about my set-up for ease of diagnosing:

 

Motherboard: MSI B360 Gaming Plus

CPU: Intel Celeron G4900

GPU: Lenovo Nvidia P400***

 

-PCI Express Graphics set as primary boot device

-Integrated graphics multi-monitor is enabled

 

The only thing that I can think of as being an issue with this is that it is a Lenovo card instead of a PNY. Not sure how different the cards are in terms of compatibility, but I might also be missing something else.

 

Any ideas?

 

Thanks!

Link to comment
2 hours ago, saarg said:

Are you booting in uefi mode? Might try legacy boot and see if it works then.

It looks like the modules are loaded.

Have you ever stubbed the card before and maybe forgot to remove the audio part from stubbing?

Hi

I'm booting in legacy, and the card is newly inserted.

Thanks

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.