[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

Hi all,

 

Pretty new to Unraid and I'm still figuring out a lot of things.

So I had a second Dell XPS 9560 lying around with a broken LCD screen. As my old Synology was getting slow, I installed Unraid on my XPS.

I got most of it to work, but now I can't get the Nvidia GTX 1050 to show up in the Nvidia Unraid Build v6.8.2.

I got this error: "440.44 No devices were found"

2118589340_2020-02-0914_35_03-UnraidXPSServer_Unraid-Nvidia.png.21abe04ff65fbe964d58e2635aca24a8.png

 

It does show up when trying to create a VM:

1419246690_2020-02-0914_42_41-UnraidXPSServer_AddVM_GraphicsCard.png.0f948ad602aef9d89ee13224a719a001.png

 

But for example I can't use it to run Plex with HW transcoding. (only HW transcoding with the Intel 630 works)

 

Does anyone know how to solve this issue?

 

>> Diagnostics file: unraidxpsserver-diagnostics-20200209-1436.zip

 

 

Thanks a lot!

 

J-J

Edited by J-J
Link to comment
1 hour ago, J-J said:

Hi all,

 

Pretty new to Unraid and I'm still figuring out a lot of things.

So I had a second Dell XPS 9560 lying around with a broken LCD screen. As my old Synology was getting slow, I installed Unraid on my XPS.

I got most of it to work, but now I can't get the Nvidia GTX 1050 to show up in the Nvidia Unraid Build v6.8.2.

I got this error: "440.44 No devices were found"

2118589340_2020-02-0914_35_03-UnraidXPSServer_Unraid-Nvidia.png.21abe04ff65fbe964d58e2635aca24a8.png

 

It does show up when trying to create a VM:

1419246690_2020-02-0914_42_41-UnraidXPSServer_AddVM_GraphicsCard.png.0f948ad602aef9d89ee13224a719a001.png

 

But for example I can't use it to run Plex with HW transcoding. (only HW transcoding with the Intel 630 works)

 

Does anyone know how to solve this issue?

 

>> Diagnostics file: unraidxpsserver-diagnostics-20200209-1436.zip

 

 

Thanks a lot!

 

J-J

Try running this command and post the output.

 

nvidia-smi --query-gpu=gpu_name,gpu_bus_id,gpu_uuid --format=csv,noheader | sed -e s/00000000://g | sed 's/\,\ /\n/g'

 

Link to comment

  

Just now, CHBMB said:
Quote

root@UnraidXPSServer:~# nvidia-smi --query-gpu=gpu_name,gpu_bus_id,gpu_uuid --format=csv,noheader | sed -e s/00000000://g | sed 's/\,\ /\n/g'
No devices were found
root@UnraidXPSServer:~#

 

 

Edited by J-J
Link to comment
On 2/5/2020 at 12:32 AM, 08deanr said:

Here is the diagnostics from the running system. I would rather not mess with it now but if needed i can later i just alot going on now

media-diagnostics-20200204-1929.zip 213.45 kB · 1 download

Just took a look at this and I'm confused, on v6.8.2 those modules are present as far as I can tell.

 

On 2/5/2020 at 2:13 AM, 08deanr said:

Correct i have the on board Realtek and the PCI Broadcom and NEITHER were working.  Both are now working with the Stock Unraid

 

 

 

I'm a bit confused as both of these are present in the v6.8.2 Nvidia build....  they may have been missing on v6.8.1 (I can't remember)

root@server:/# ls -la /lib/modules/4.19.98-Unraid/kernel/drivers/net/ethernet/realtek | grep r8169
-rw-r--r--  1 root root 33728 Feb  2 20:52 r8169.ko.xz
root@server:/# ls -la /lib/modules/4.19.98-Unraid/kernel/drivers/net/ethernet/broadcom | grep tg3
-rw-rw-rw-  1 root root 66572 Feb  2 20:51 tg3.ko.xz

 

Link to comment
Just now, CHBMB said:

Just realised you're using a mobile GPU on a laptop.  Probably going to be the issue

Stange... When I tried it out for the first time 2 weeks ago, it used to pop-up at the plugin screen, but then it disappeared after a couple of minutes. (most of the times after Plex crashed)

A hard reboot of the system (sometimes twice) made it pop-up again, but disappear afterwards. 

So if it's a mobile GPU issue, I guess there's not a lot to do about it.

Link to comment

Did a search and couldn't find this exact issue for weeks on this topic, but if it's already been mentioned and I missed it, I apologize.

 

When trying to load the latest available builds, I get this in the logs (and the builds never load):

 

nginx: 2020/02/09 11:10:13 [error] 7421#7421: *52 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.201, server: , request: "POST /plugins/Unraid-Nvidia/include/exec.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.1.102:81", referrer: "http://192.168.1.102:81/Settings/Unraid-Nvidia"

 

Any ideas?

 

Edited by Schwiing
Link to comment
3 hours ago, CHBMB said:

Just took a look at this and I'm confused, on v6.8.2 those modules are present as far as I can tell.

 

 

 

 

I'm a bit confused as both of these are present in the v6.8.2 Nvidia build....  they may have been missing on v6.8.1 (I can't remember)


root@server:/# ls -la /lib/modules/4.19.98-Unraid/kernel/drivers/net/ethernet/realtek | grep r8169
-rw-r--r--  1 root root 33728 Feb  2 20:52 r8169.ko.xz
root@server:/# ls -la /lib/modules/4.19.98-Unraid/kernel/drivers/net/ethernet/broadcom | grep tg3
-rw-rw-rw-  1 root root 66572 Feb  2 20:51 tg3.ko.xz

 

thanks for confirming they are in there. Maybe there was something corrupted in the download the first time. I will try again here soon and let you know what happens. Hopefully it was nothing. 

Link to comment
8 hours ago, J-J said:

Stange... When I tried it out for the first time 2 weeks ago, it used to pop-up at the plugin screen, but then it disappeared after a couple of minutes. (most of the times after Plex crashed)

A hard reboot of the system (sometimes twice) made it pop-up again, but disappear afterwards. 

So if it's a mobile GPU issue, I guess there's not a lot to do about it.

 

it is called NVIDIA Optimus Technology and @CHBMB is right, it is NOT the same as a standard desktop/workstation card. This comes with it's own baggage which is NOT supported by standard nvidia driver.

 

https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/#nvidia-optimus

 

This is a massive overhead on its own to be managed and maintained and I hardly doubt you will be able to leverage this custom nvidia build as is.

Edited by ezhik
Link to comment
On 2/7/2020 at 7:29 PM, aptalca said:

Unraid nvidia has nothing to do with letsencrypt. You probably broke it when you changed the networks around.

 

To update, you just need to install the custom version from within the nvidia plugin

I didnt change anything, just installed the unraid nvidia. It got fixed when I recreated the docker network. So I really didnt do anything.

Link to comment
8 hours ago, karlpox said:

I didnt change anything, just installed the unraid nvidia. It got fixed when I recreated the docker network. So I really didnt do anything.

Educated guess. You didn't set the option in docker settings for "not deleting custom networks" so on server reboot, unraid is deleting your custom network.

Link to comment
On 7/4/2019 at 9:23 PM, Xaero said:

Edit:
This is not a problem with an easy solution at all. 

I can monitor the transcode processes and make sure that everything is killed - but the only solution is to kill Plex:
https://forums.plex.tv/t/stuck-in-p-state-p0-after-transcode-finished-on-nvidia/387685/24
I can user fuser -vk /dev/nvidia* and it will immediately switch to a P8 state. The only process using the card when this is run is "Plex Media Server" 

It's not hard to write a script that will only do this if:
There are no processes using the card and the card is in a P0 state. I just don't know if there are any undesirable side-effects of doing it this way.

Here is such a script:


#!/bin/bash

while true; do
cur_pstate=$(nvidia-smi --query-gpu=pstate --format=csv,noheader)
running_processes=$(ps --no-headers "$(nvidia-smi |tail -n +16 | head -n -1 | sed 's/\s\s*/ /g' | cut -d' ' -f3)" | wc -l) 2>/dev/null

if [[ $cur_pstate = "P0" && $running_processes -eq 0 ]]; then
# if we got here, the card is only running the Xorg process and is in the P0 state, let's fix that.
    fuser -kv /dev/nvidia*
    echo "Reset Power State"
fi

#sleep so we aren't blocking a thread constantly.
sleep 1

done

Starting the X server on Unraid does allow one to open nvidia settings; to do this you can use a script like this to start the X server (note, that since chvt and fgconsole aren't available, you will have to switch back to VT7 by pressing Ctrl+Alt+F7):


#!/bin/bash

##This will only work on single GPU systems:
GPUID=$(nvidia-xconfig --query-gpu-info | grep BusID | sed 's/^[^:]*: //')

#Now that we know the PCI BusID of the card we can create the X server with a fake display:
nvidia-xconfig -s -a --allow-empty-initial-configuration --use-display-device=None --virtual=640x480 --busid "$GPUID" -o /dev/stdout | X :99 -config /dev/stdin&

Once you have that server running, you can return to the default unraid GUI and run:
nvidia-settings -c :99
To open nvidia-settings on the card. You could also store an xorg configuration file and use that for the virtual X display, and to set persistent nvidia settings.

 

The only way I can think of to fix this properly is to figure out why the Plex process is claiming the card and prevent that from happening. I'll look into it some more, but this needs to be fixed properly by Plex/nVidia. The linked thread at the Plex forums has more information.

I may be able to detach the Plex Transcoder process with the wrapper script, making it it's own entity, and then trapping the SIGINT/SIGKILL in the wrapper and using it to kill the transcoder, effectively using the wrapper script to separate the Plex Media Server process from the Plex Transcoder process. It's pretty kludgy, but might work.



Oh Boy:
image.png.ab3233c6afeac47311c043bd84d1a226.png

 

We're in idle P-State while transcoding territory!

Hi, thanks for your instructions!

If possible can you please provide more help on the scripts running. I have RTX 2070 and have this problem with p0 state stuck. I use both latest versions for Unraid Nvidia and Plex docker. Transcoding works. fuser -kv /dev/nvidia* helps to change to p8 state but only temporary. I tried to run the scrips using the CA User scripos plugin but get errors. Here are the screens. What do I do wrongly?

Capture1.PNG

Capture2.PNG

Capture3.PNG

Link to comment
7 hours ago, Schwiing said:

Is there a way to download the latest build manually? I keep getting upstream timeout errors.

#!/bin/bash

#Set your Unraid version here in the form 6-7-3
UNRAID_VERSION="6-8-2"

# Set the type of build you want here - nvidia or stock
BUILD_TYPE="nvidia"

#Set the download location here
DOWNLOAD_LOCATION="/mnt/cache/downloads/nvidia"

echo Downloading v$UNRAID_VERSION of the $BUILD_TYPE build to the $DOWNLOAD_LOCATION folder

#Make target directory
[[ ! -d ${DOWNLOAD_LOCATION} ]] && \
mkdir -p ${DOWNLOAD_LOCATION}

#download files
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzimage -O ${DOWNLOAD_LOCATION}/bzimage
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzroot -O ${DOWNLOAD_LOCATION}/bzroot
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzroot-gui -O ${DOWNLOAD_LOCATION}/bzroot-gui
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzfirmware -O ${DOWNLOAD_LOCATION}/bzfirmware
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzmodules -O ${DOWNLOAD_LOCATION}/bzmodules

#download sha356 files
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzimage.sha256 -O ${DOWNLOAD_LOCATION}/bzimage.sha256
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzroot.sha256 -O ${DOWNLOAD_LOCATION}/bzroot.sha256
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzroot-gui.sha256 -O ${DOWNLOAD_LOCATION}/bzroot-gui.sha256
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzfirmware.sha256 -O ${DOWNLOAD_LOCATION}/bzfirmware.sha256
wget https://lsio.ams3.digitaloceanspaces.com/unraid-nvidia/${UNRAID_VERSION}/${BUILD_TYPE}/bzmodules.sha256 -O ${DOWNLOAD_LOCATION}/bzmodules.sha256

#check sha256 files
BZIMAGESHA256=$(cat ${DOWNLOAD_LOCATION}/bzimage.sha256 | cut -c1-64)
BZROOTSHA256=$(cat ${DOWNLOAD_LOCATION}/bzroot.sha256 | cut -c1-64)
BZROOTGUISHA256=$(cat ${DOWNLOAD_LOCATION}/bzroot-gui.sha256 | cut -c1-64)
BZFIRMWARESHA256=$(cat ${DOWNLOAD_LOCATION}/bzfirmware.sha256 | cut -c1-64)
BZMODULESSHA256=$(cat ${DOWNLOAD_LOCATION}/bzmodules.sha256 | cut -c1-64)

#calculate sha256 on downloaded files
BZIMAGE=$(sha256sum $DOWNLOAD_LOCATION/bzimage | cut -c1-64)
BZROOT=$(sha256sum $DOWNLOAD_LOCATION/bzroot | cut -c1-64)
BZROOTGUI=$(sha256sum $DOWNLOAD_LOCATION/bzroot-gui | cut -c1-64)
BZFIRMWARE=$(sha256sum $DOWNLOAD_LOCATION/bzfirmware | cut -c1-64)
BZMODULES=$(sha256sum $DOWNLOAD_LOCATION/bzmodules | cut -c1-64)

#Compare expected with actual downloaded files
[[ $BZIMAGESHA256 == $BZIMAGE ]]; echo "bzimage passed sha256 verification"
[[ $BZROOTSHA256 == $BZROOT ]]; echo "bzroot passed sha256 verification"
[[ $BZROOTGUISHA256 == $BZROOTGUI ]]; echo "bzroot-gui passed sha256 verification"
[[ $BZFIRMWARESHA256 == $BZFIRMWARE ]]; echo "bzfirmware passed sha256 verification"
[[ $BZMODULESSHA256 == $BZMODULES ]]; echo "bzmodules passed sha256 verification"

That script will do it.  Need to change the 3 parameters to suit.

 

chmod +x it to make it executable, if all the SHA256 sums match copy it across to your flash disk.

  • Like 3
  • Thanks 2
Link to comment

Welp, I figured it out (likely no one else with this issue but just in case)....

 

So, my unraid box is connected via a 10G NIC. Way back when, I was told to set my NIC's MTU to 9000 with jumbo frames on in my switch upstream to utilize full speed. Turns out, this is what messed up my (well my instance of) unraid-nvidia plugin. With MTU set back to 1500 on the NIC, it popped up the builds within seconds and now I'm slowly downloading it as CHBMB intended it to be :)

 

What a headache...all self inflicted.

Link to comment

Yeah, there's quite some tech behind what mtu to set, I believe the best performance comes at lower than 1500 - depending on which type of internet connection you have - 1492 rings a bell.  I posted something about it over at the ipfire forums years ago after doing some research.  Quite complex and quite common that 9000 messes things up.

Link to comment

Found the post here: https://forum.ipfire.org/viewtopic.php?t=20924

 

The relevant parts pasted below for convenience.  This was taken from a guy that has deep knowledge of networking (referenced above) and installs the kit for a living.

 

Point to Point Protocol (PPP) is, as its name suggests, a protocol for establishing a link between two points. In the case of ADSL it's between your modem or PC to our BRAS (Broadband Remote Access Server – the PPP server). PPP isn't technically required for an ADSL connection, the internet could run straight over the ADSL's ATM network. PPP is used on most of ADSL connection because of the legacy of dialup; an authorisation/accounting system is needed and PPP is the defacto standard.

PPP can be run in two different ways and the preferred style is dependant on your ISPs set up. PPPoE is best if they run an ethernet link between the DSLAM and BRAS, PPPoA is better if they are running an ATM network between the DSLAM and BRAS.

Why the difference? Maximum Transmissable Unit (MTU) is the largest individual data packet that can be sent over a network, in the case of Ethernet it's 1500, in the case of ATM it's not limited. What this means is that as the packet passes from ADSL to Ethernet is can exceed Ethernet's MTU and be dropped. By forcing PPPoE your modem forces the packets to stay within Ethernet's 1500 MTU limit. That said PPPoA is marginally more efficient with overheads and processing by both ends.

Many people are asking:
– should I use PPPoE or PPPoA
– what MTU is better

Some people are reporting troubles with PPPoA and some with PPPoE. So use whatever works for you, however if you are really in a position to choose, then you will be slightly better off by using PPPoA. As for MTU, set it to 1462 in case you have settled on PPPoA.

So the short answer is: use PPPoA with MTU 1462 bytes.

The long (and much more involved) answer. Part one: PPPoE vs PPPoA.
PPPoE uses one extra eight bytes long header which eats into the payload. PPPoA does not have this header so it has less overhead and each packet can carry more useful data (8 bytes more) which results in slight (around one percent) speed improvement.

The long (and much more involved) answer. Part two: MTU issues.
The default MTU for PPPoA is 1500 bytes. The same default for PPPoE is 1492 bytes (8 bytes less due to increased overhead because of one extra 8 bytes header mentioned above).

Your ADSL modem always talks to DSLAM using ATM with either PPPoE or PPPoA (whatever you have chosen) on top of ATM. DSLAM is in turn connected to a server called BRAS/LNS using either ATM (in case of Telstra Wholesale DSLAMs) of Gigabit Ethernet – GE (in case of iiNET DSLAMs).

If backhaul is ATM based, then DSLAM can process both PPPoE and PPPoA and it can digest both MTU of 1500 bytes for PPPoA and MTU of 1492 bytes for PPPoE. These are the default MTU values so no probs here.

If backhaul is GE based, then DSLAM can still process both PPPoE and PPPoA, however it can digest only MTU of no more than 1492 bytes. If you have chosen PPPoE then 1492 is the PPPoE's default MTU and everything is fine. If you have chosen PPPoA and kept its default MTU equal to 1500 bytes, then you are in trouble unless you lowered MTU to at least 1492 bytes. This is the reason why iiNet recommends PPPoE – less potential issues with MTU exceeding 1492 bytes because iiNet DSLAMs are set to max MTU equal to 1492 bytes for both PPPoE and PPPoA.

The long (and much more involved) answer. Part three: What MTU to choose.

When choosing MTU you should aim at increased speed. So don't think in terms of choosing between 1500 and 1492 bytes MTUs as these values are only relevant when you consider how to avoid the potential trouble when a DSLAM (with GE backhaul) drops your packets with 1500 bytes MTU.

But we are after the increased speed aren’t we? It will be achieved by getting 53 bytes long ATM cells filled better. Which happens when the packet has 1454 bytes MTU for PPPoE and 1462 bytes MTU for PPPoA. PPPoA is itself more efficient for the reason described above. Hence the answer: PPPoA with 1462 bytes MTU.

Link to comment
On 2/11/2020 at 10:58 AM, andrey_kk said:

Hi, thanks for your instructions!

If possible can you please provide more help on the scripts running. I have RTX 2070 and have this problem with p0 state stuck. I use both latest versions for Unraid Nvidia and Plex docker. Transcoding works. fuser -kv /dev/nvidia* helps to change to p8 state but only temporary. I tried to run the scrips using the CA User scripos plugin but get errors. Here are the screens. What do I do wrongly?

 

Nothing - I never got this fully working.
I haven't had much time to look at it lately and I was kind of stuck sicne I couldn't justify using fuser -kv as a solution since it would kill other processes besides just Plex. Ultimately, the permanent solution needs to come from Plex.

EDIT:

This script, should work to kill the processes that need to be killed. You can probably add a call to it in a wrapper for the Plex Transcoder, and it would kill any offending processes as long as no transcodes were already keeping the priority locked.
You could also just run it on a schedule and it would probably curb the issue a little bit:
 

https://forums.plex.tv/t/stuck-in-p-state-p0-after-transcode-finished-on-nvidia/387685/43

 

Edited by Xaero
Link to comment
12 hours ago, Xaero said:

Nothing - I never got this fully working.
I haven't had much time to look at it lately and I was kind of stuck sicne I couldn't justify using fuser -kv as a solution since it would kill other processes besides just Plex. Ultimately, the permanent solution needs to come from Plex.

EDIT:

This script, should work to kill the processes that need to be killed. You can probably add a call to it in a wrapper for the Plex Transcoder, and it would kill any offending processes as long as no transcodes were already keeping the priority locked.
You could also just run it on a schedule and it would probably curb the issue a little bit:
 

https://forums.plex.tv/t/stuck-in-p-state-p0-after-transcode-finished-on-nvidia/387685/43

 

Thanks for explanations! Will look at the Plex forum then. As for the script- I get some BSD syntax error. Cannot find any solution in Google for it. Maybe I lack some module or library that does not let the scrip work?

Link to comment

Hello,

I've noticed since adding a P2000 GPU to my UnRaid server (using it for Plex and handbrake) that I have a near 50/50 chance of my server booting if it's headless. I can plug in a monitor to the server and see where it stuck at if the WebGUI hasn't responded in some time after boot and it looks to always be stuck here. Not sure if there is a log or something else I can produce that gives better insight. I didn't have any issue until enabling this NVidia plugin and plex use of it. would anyone have any thoughts? im using unraid 6.8.2

 

20200215_154731.jpg

Edited by Clayton
Link to comment
14 minutes ago, Clayton said:

Hello,

I've noticed since adding a P2000 GPU to my UnRaid server (using it for Plex and handbrake) that I have a near 50/50 chance of my server booting if it's headless. I can plug in a monitor to the server and see where it stuck at if the WebGUI hasn't responded in some time after boot and it looks to always be stuck here. Not sure if there is a log or something else I can produce that gives better insight. I didn't have any issue until enabling this NVidia plugin and plex use of it. would anyone have any thoughts? im using unraid 6.8.2

 

 

Have you got any VM's using the GPU?

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.