[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

Would it be possible to get a version of this for the Tesla cards, I have a K20 I'd love to use for transcoding on my Plex server for my remote users (I don't use H265, so the Tesla card would work great for my needs), but it won't work with the stock Nvidia drivers.  Nvidia has a seperate driver series for the Tesla cards apparently, the latest of which is available here:  https://www.nvidia.com/Download/driverResults.aspx/158193/en-us

 

Link to comment
On 3/22/2020 at 2:51 PM, JesterEE said:

No problem.

 

I just want to be clear, this seems to be an underlying problem in the build ... the a GPU Stats plug-in just makes it more apparent.

 

-JesterEE

Same here even when the GPU Stats plugin removed logs slammed with 

CMD: > /var/log/syslog to truncate it for now, since the log was 422,000 Bytes

Quote

Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

There's something going on here, on a kernel level I think

nvidia-container-runtime-hook.log

Edited by Fiservedpi
Nvidia container log
  • Like 1
Link to comment
On 4/2/2020 at 11:37 PM, soloandy said:

I'm trying to install nVidia Unraid Builds 6.8.3.

But I got this message after the installation: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.   

1649292598_2020-04-0221_30_41-MediaServer_Unraid-Nvidia.png.f034b49066a4422113a43c0f3d20fb2c.png

 

I have a nVidia GT710 installed (should be supported), and it has been recognized by the system.

1674624195_2020-04-0221_34_21-login-froot(MediaServer).png.3ba32c47ba6c12412a2aec26a00e9932.png 

Does anyone have the same issue, how to solve this problem? Thanks!

I'm having the same issue with nVidia Unraid 6.8.3, driver version 440.59. I'm using a gtx 1050 super, however.

 

It worked ok immediately after installing the build, however I have since passed it through to a VM. Upon stopping the VM, nvidia-smi failed. Rebooting Unraid still sees nvidia-smi fail, even before passing through to VM.

Link to comment
2 hours ago, CaptainSandwich said:

I'm having the same issue with nVidia Unraid 6.8.3, driver version 440.59. I'm using a gtx 1050 super, however.

 

It worked ok immediately after installing the build, however I have since passed it through to a VM. Upon stopping the VM, nvidia-smi failed. Rebooting Unraid still sees nvidia-smi fail, even before passing through to VM.

If you are passing the card through to a VM why do you need the Nvidia build in the first place as it is the drivers in the VM that control the card.   I thought the main use case for the Nvidia build was to be able to use hardware transcoding in docker containers?   Your description of your symptoms suggest that the card is not properly resetting when closing down the VM and it takes a power cycle to achieve that.

  • Thanks 1
Link to comment
44 minutes ago, itimpi said:

If you are passing the card through to a VM why do you need the Nvidia build in the first place as it is the drivers in the VM that control the card.   I thought the main use case for the Nvidia build was to be able to use hardware transcoding in docker containers?   Your description of your symptoms suggest that the card is not properly resetting when closing down the VM and it takes a power cycle to achieve that.

Hi itimpi,

 

The intent is to use a secondary gpu for transcoding in emby, however the 1050 super is just for game streaming. Whilst not running a VM, the nvidia drivers allow for the card to enter a low power state. Is this not correct?

 

I have power cycled the machine and still received the nvidia-smi failed message prior to starting VM. 

Link to comment
2 hours ago, CaptainSandwich said:

Hi itimpi,

 

The intent is to use a secondary gpu for transcoding in emby, however the 1050 super is just for game streaming. Whilst not running a VM, the nvidia drivers allow for the card to enter a low power state. Is this not correct?

 

I have power cycled the machine and still received the nvidia-smi failed message prior to starting VM. 

Just had a look at VFIO-PCI Config, looked like the GPU was being bound to the vfio-pci driver. Unbound, restarted, and everything seems to be working fine again. 

Link to comment
1 hour ago, CaptainSandwich said:

Just had a look at VFIO-PCI Config, looked like the GPU was being bound to the vfio-pci driver. Unbound, restarted, and everything seems to be working fine again. 

That makes sense as the binding would hide the card from the Unraid Linux level  It does sound as if might affect your ability to use the card in a VM :(

 

Link to comment

Good day,

 

I recently setup my Unraid server without any issues and have been trying to get the Unraid Nvidia plugin to work. I've installed it twice from the community applications plugin with my Unraid being at 6.8.3 and choosing the Unraid Nvidia 6.8.3. Everything runs fine until the install finishes and it wants me to reboot, when I reboot my Unraid just gets to a certain point and hangs there. I've attached 3 videos of what happens.

 

I'm going to try again but not use community plugin this time, just sucks because I have to set my entire Unraid server back up each time. If anyone is able to assist me that would be great.

 

Here are my specs:

 

Supermicro SuperServer 6047R-E1R24N 24-Bay LFF 4U Rackmount Server
Processor 1: Intel Xeon E5-2630L v2 2.4GHz 6 Core 15MB Cache Processor
Processor 2: Intel Xeon E5-2630L v2 2.4GHz 6 Core 15MB Cache Processor

LSI SAS 9211-8i

Nvidia Quadro p400

Samsung 16GB 2X 8GB PC3-10600R DDR3 1333MHz 240Pin ECC Reg RDIMM RAM Memory  x4

Unraid Pro v6.8.3

 

 

Link to comment
On 4/1/2020 at 12:08 AM, elcapitano said:

I have multiple entries like this before the system becomes unresponsive:

 

Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

 

Removed the GPU Statistics Plugin, and the log entries reduced.

Will see if it crashes again.

 

Did anyone figure out if there is a resolution to this? 

Link to comment
18 hours ago, itimpi said:

That makes sense as the binding would hide the card from the Unraid Linux level  It does sound as if might affect your ability to use the card in a VM :(

 

Seems to be working ok for pass through, not sure if that's because the gpu bios has been passed through as well?

 

Additionally, I've added the below to /etc/libvirt/hooks/qemu to re-enable persistence mode when the VM releases resources (see https://libvirt.org/hooks.html). This allows the card to enter P8 state where it draws ~4W whilst idle in Unraid, compared to the ~9W in the default P0 state. I'm not sure how robust this is, or how it'll work once the second card is added - may need to specify address of the target gpu also - but it appears to work for now. 

 

Added snippet:

if ($argv[2] == 'release' && $argv[3] == 'end'){
        shell_exec('date +"%b %d %H:%M:%S libvirt hook: Setting nVidia Promiscuous mode to 1" >> /var/log/syslog');
        shell_exec('nvidia-smi --persistence-mode=1');
}

 

Full script at /etc/libvirt/hooks/qemu:

#!/usr/bin/env php

<?php
if ($argv[2] == 'release' && $argv[3] == 'end'){
        shell_exec('date +"%b %d %H:%M:%S libvirt hook: Setting nVidia Promiscuous mode to 1" >> /var/log/syslog');
        shell_exec('nvidia-smi --persistence-mode=1');
}

if (!isset($argv[2]) || $argv[2] != 'start') {
        exit(0);
}

$strXML = file_get_contents('php://stdin');

$doc = new DOMDocument();
$doc->loadXML($strXML);

$xpath = new DOMXpath($doc);

$args = $xpath->evaluate("//domain/*[name()='qemu:commandline']/*[name()='qemu:arg']/@value");

for ($i = 0; $i < $args->length; $i++){
        $arg_list = explode(',', $args->item($i)->nodeValue);

        if ($arg_list[0] !== 'vfio-pci') {
                continue;
        }

        foreach ($arg_list as $arg) {
                $keypair = explode('=', $arg);

                if ($keypair[0] == 'host' && !empty($keypair[1])) {
                        vfio_bind($keypair[1]);
                        break;
                }

 

  • Like 3
Link to comment
1 hour ago, cbc02009 said:

Sorry if this has already been asked, but I can't seem to find an answer. Is there a version of this plugin for the 6.9 beta? I need the 5.4 kernel for temp monitoring for my 3700x, but I'd also like to use my gtx 970 for folding@home.

This have been answered already not so long ago in this thread.

The only available builds are the ones you see in the plugin. So no beta.

  • Thanks 1
Link to comment
On 4/19/2020 at 4:12 AM, cbc02009 said:

Sorry if this has already been asked, but I can't seem to find an answer. Is there a version of this plugin for the 6.9 beta? I need the 5.4 kernel for temp monitoring for my 3700x, but I'd also like to use my gtx 970 for folding@home.

The new 5.6 kernel nvidia drivers JUST got posted on slackbuilds.

 

So it wasn't even possible before :)

  • Thanks 1
Link to comment

My mobo is :

ASUS P8Z68-V PRO, Socket-1155

ATX, Z68, DDR3, 3xPCIe(2.0)x16, CFX&SLI, SATA 6Gb/s,USB3.0,FW, VGA,DVI,HDMI, EFI

 

Will PCIe 2.0 slow down a 1660 card? Should I get the Ti for transcoding Plex 4K streams? Or will it be even more slowed down by my Motherboard?

Link to comment
8 minutes ago, SkyHead said:

My mobo is :

ASUS P8Z68-V PRO, Socket-1155

ATX, Z68, DDR3, 3xPCIe(2.0)x16, CFX&SLI, SATA 6Gb/s,USB3.0,FW, VGA,DVI,HDMI, EFI

 

Will PCIe 2.0 slow down a 1660 card? Should I get the Ti for transcoding Plex 4K streams? Or will it be even more slowed down by my Motherboard?

I doubt you will notice any difference not having PCIe 3.0. You will not notice any difference between Ti or not as it's the same amount of nvenc/nvdec chips on both cards.

  • Thanks 1
Link to comment

I am rather new to unraid still testing the waters with everything.

I have a 1070 G1 Gaming, I was able to activate it one time with this plugin.

I install a windows clean install (To get my clean ROM) on one of the cache drives yesterday and since then I cannot boot into unraid with the "unraid nvidia" plugin os...

I get the "Failed to Allocate memory for Kernel command line, bailing out booting kernel failed: bad file number"

Even making a new USB, formating all the drives clean doesn't help.

I literally create the USB, setup the registration key, drives, install "community applications" plugin, install "nvidia unraid" and choose the 6.8.3 version.

It copies to my drive and then when I boot I get that message...

I even did new config, as I said, format all the drives (including parity and cache)

 

This is the second time it happens to me, first time I was able to run everything again after creating a new configuration, this time it won't work.

 

Any idea what is going on? what am I doing wrong?

Edited by sand372
Found a Solution, seems it was the UEFI, in legacy mode it boots ok
Link to comment

I had no issues with my server until today I installed this Unraid Nvidia image and used it for Plex.  Since then my log is filled with this error:

 

kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs\

kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]  

 

To the point my dockers no longer work correctly and I have to restart my server.  Is there going to be a fix?  Going through this thread I believe this has been an ongoing issue...

Link to comment
2 hours ago, PickleRick said:

I had no issues with my server until today I installed this Unraid Nvidia image and used it for Plex.  Since then my log is filled with this error:

 

kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs\

kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]  

 

To the point my dockers no longer work correctly and I have to restart my server.  Is there going to be a fix?  Going through this thread I believe this has been an ongoing issue...

Not unless we can reproduce it

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.