Jump to content
linuxserver.io

[Plugin] Linuxserver.io - Unraid Nvidia

1184 posts in this topic Last Reply

Recommended Posts

8 minutes ago, hawihoney said:

* Unraid NVIDIA 6.7.2 release is delayed.

To be fair, I don't think it's delayed.  We all run it on the knowledge that updates will be ready when they're ready - they're not easy to do, may encounter problems that need ironing and real life will get in the way. Some have taken much longer.

 

If anything, the release of 6.7.1 seemed quick to me.

 

Could you revert to stock 6.7.2 for now and tweak the systems that use Nvidia to revert back to how they were 4 months ago?  Then revert back to the Nvidia version when it's out?

Share this post


Link to post
34 minutes ago, Pducharme said:

 

Sorry missed that part.  Do you run the MariaDB docker from a NVMe cache disk?  it might help for the slowness (?).

Yes, two NVMe building a cache. Each on its own PCIe x4 card.

 

Share this post


Link to post
25 minutes ago, Cessquill said:

 

Could you revert to stock 6.7.2 for now and tweak the systems that use Nvidia to revert back to how they were 4 months ago?  Then revert back to the Nvidia version when it's out?

All three machines back to 6.7.0 now. Everythings good.

 

Sorry for my frustrated Post.

 

 

Share this post


Link to post
As a user of Unraid NVIDIA _and_ tools using SQLite the delayed 6.7.2 Unraid NVIDIA release is a real problem here. We can't change back to stock Unraid. On the other side some important SQLite based tools don't work any longer. In fact it has bitten us because after applying 6.7.1 SQLite tools and SQLite dumps did overwrite backups with empty files. We simply did not expect that somebody would remove a tool like SQLite from Unraid.
 
Now Unraid 6.7.2 is out and SQLite is back - but not for us. We have to wait for the Unraid NVIDIA 6.7.2 release. Going back to 6.7.0 without these additional security patches is no option either.
 
So now we have lot of time to change our own SQLite tools to check for SQLite in Unraid before dumping data or whatever. New data is not coming into the house - so everything cool, no?
 
Just some other 0.02 USD.
 
It might be a real problem for you, it isn't for me, I've just got back from the hospital after seeing a very close relative for what may be the last time.

That's a problem, with no solution.

You could however downgrade to v6.6.7

Sent from my Mi A1 using Tapatalk

Share this post


Link to post
1 hour ago, CHBMB said:

It might be a real problem for you, it isn't for me, I've just got back from the hospital after seeing a very close relative for what may be the last time.

Oh dear. My prayers for you and your family, mate.

Share this post


Link to post
1 hour ago, CHBMB said:

It might be a real problem for you, it isn't for me, I've just got back from the hospital after seeing a very close relative for what may be the last time.

 

Family first my friend! Prayers with you and your family! 

Share this post


Link to post

Please forgive me for this somewhat off topic response.

3 hours ago, hawihoney said:

We have tons of self written scripts that create, manipulate and extract databases (MariaDB and SQLite). Many of them running automatically from within Unraid User Scripts. Some PHP, some Perl, some bash, ...

 

There's for example a 30GB SQLite database that simply holds personal names and their relations.

My suggestion would be to get away from using scripts like this on bare metal unRAID as fast as possible. Since the advent of unRAID v6 the suggestion has been that plugins and programs running directly on the unRAID OS should be limited to those that directly modify/extend the core system (and web UI). Everything else should be Dockerized or in a VM. 

 

Without knowing much about what scripts you are running, I would assume that most if not all could be easily dockerized. Find a base image with some or all of your dependencies pre-installed. Add a couple of RUN directive to the docker file to install anything else you need. A COPY directive to pull the script into the root directory of the image, a couple of VOLUME directives to allow you to bind mount the required input and output directories, and call the script in the ENTRYPOINT/CMD. Build locally and then all of your User Script Plugin scripts can change to a docker run command that launches an ephemeral container. 

 

 

Share this post


Link to post
Posted (edited)

It's sad that people have now made this awesome hard work become a post worthy of:

 

https://old.reddit.com/r/ChoosingBeggars/

 

Please remember before this came out, we had to buy expensive large core processors AND use more power to do what we are doing today.

 

Thank you again for all your hard work. Without it we are back to what we had before which is nothing.

 

*I hope your family member is doing well. No amount of software/Computer work is worth not having those memories of said family members.*

Edited by Dazog

Share this post


Link to post
13 hours ago, IamSpartacus said:

 

If I'm reading this post correctly, you are running Unraid in some type of production environment?  If that's the case, why in the world would you be running a 3rd party non-officially supported OS version?  That just can't happen in a production environment IMO.

Just a suggestion - but a couple of conditionals and a mail daemon could automatically detect, and notify of backup failures. 

Share this post


Link to post

I did look for this but didn't find it.  If you have a single GPU can you still use it for transcoding or while that cause an issue in unRAID?  I don't use the GUI mode at all and my unRAID boxes onboard video died.  I only have room for one video card unless I pick up a PCI card but I'd rather not do that if I can avoid it.

 

Share this post


Link to post

v6.7.2 uploaded

Share this post


Link to post
1 hour ago, CHBMB said:

v6.7.2 uploaded

You are a gentlemen and a scholar. 

Share this post


Link to post

Hello,

is there a possibility to downgrade drivers? Im running an older card, which is not supported by the current driver (418), so I need to "downgrade" drivers. I know which driver it is and it does have a Linux 64-bit version if I say so...

Share this post


Link to post
2 hours ago, cztrollolcz said:

Hello,

is there a possibility to downgrade drivers? Im running an older card, which is not supported by the current driver (418), so I need to "downgrade" drivers. I know which driver it is and it does have a Linux 64-bit version if I say so...

No, we only use the current version at the time of building.

Share this post


Link to post
4 hours ago, cztrollolcz said:

Hello,

is there a possibility to downgrade drivers? Im running an older card, which is not supported by the current driver (418), so I need to "downgrade" drivers. I know which driver it is and it does have a Linux 64-bit version if I say so...

Most likely your card doesn't have nvenc support if it's not supported anymore.

Share this post


Link to post
On 6/29/2019 at 7:37 AM, CHBMB said:

v6.7.2 uploaded

 

Thank you, upgraded with no issues. Love this plugin and appreciate all the hard work put into it.

Share this post


Link to post
On 5/27/2019 at 6:59 PM, Xaero said:

@CHBMB I too see this high power consumption. I know why it's happening, too. 

Basically, the nvidia driver doesn't initialize power management until an Xorg server is running. The only way to force a power profile on Linux currently is to use nvidia-smi like so:
nvidia-settings --ctrl-display :0 -a "[gpu:0]/GPUPowerMizerMode=2"

Which requires a running Xorg display. I've been trying to dig around in sysfs to see if there is another place that this value is stored, but there doesn't seem to be. It looks like the cards are locked into performance mode... Perhaps this is worth bringing up to nvidia?

In the meantime, I'm going to continue digging to see if I can find a way (perhaps an nvidia-settings docker?) to force the power state.

Did you get any further with that? I would love to see the card going back to P8 state after transcoding with plex. I don't really understand what you mean with the Xorg server. I know it's a display server but don't get how this two things are connected to each other. Could you explain that? Thanks.

Share this post


Link to post
Posted (edited)
1 hour ago, pappaq said:

Did you get any further with that? I would love to see the card going back to P8 state after transcoding with plex. I don't really understand what you mean with the Xorg server. I know it's a display server but don't get how this two things are connected to each other. Could you explain that? Thanks.

Basically, the Nvidia Linux drivers are designed such that they aren't fully independent of the display server. Parts of the driver aren't active until a display server like X11 or Wayland hooks the driver resources.

 

As a result of there not being an active display in the unraid environment, nvidia-settings can't be called to change driver settings. In the example above ":0" is shorthand for: "localhost:0.0" which is the first screen, on the first display server, on the local machine. That display doesn't exist, so the n idia settings application just tells you it can't do what you asked.

 

Normally, in Linux land, we have "sysfs" nodes for driver and hardware settings. "Sysfs" is the /sys/ folder on Linux systems. All of the driver flags, power states, temperature sensors, etc live in this folder as files. Nvidia, for whatever reason, has avoided embracing both KMS (kernel mode setting, which let's the kernel make decisions about what the display should be doing during boot) and sysfs nodes.

 

This honestly is something Nvidia should be fixing. We might be able to band-aid it by running a display server in a docker with the Nvidia settings application and using it to manage the power state. It also might fail horribly. 

 

It might be better to create a fake second X server on unraid itself using a userscript as the temporary solution.

 

 

 

 

EDIT2:

Just tested my above theory with a fake X server on unraid and it works perfectly. Let me write a userscript to create one of these fake environments on the fly.

image.thumb.png.bd5a85218591077dda895d182cba8042.png

Edited by Xaero

Share this post


Link to post
27 minutes ago, Xaero said:

Basically, the Nvidia Linux drivers are designed such that they aren't fully independent of the display server. Parts of the driver aren't active until a display server like X11 or Wayland hooks the driver resources.

 

As a result of there not being an active display in the unraid environment, nvidia-settings can't be called to change driver settings. In the example above ":0" is shorthand for: "localhost:0.0" which is the first screen, on the first display server, on the local machine. That display doesn't exist, so the n idia settings application just tells you it can't do what you asked.

 

Normally, in Linux land, we have "sysfs" nodes for driver and hardware settings. "Sysfs" is the /sys/ folder on Linux systems. All of the driver flags, power states, temperature sensors, etc live in this folder as files. Nvidia, for whatever reason, has avoided embracing both KMS (kernel mode setting, which let's the kernel make decisions about what the display should be doing during boot) and sysfs nodes.

 

This honestly is something Nvidia should be fixing. We might be able to band-aid it by running a display server in a docker with the Nvidia settings application and using it to manage the power state. It also might fail horribly. 

 

It might be better to create a fake second X server on unraid itself using a userscript as the temporary solution.

 

 

 

 

EDIT2:

Just tested my above theory with a fake X server on unraid and it works perfectly. Let me write a userscript to create one of these fake environments on the fly.

image.thumb.png.bd5a85218591077dda895d182cba8042.png

That sound promising! I look forward to that! Would it be too much to ask if you could explain how to implement it and run it when you are done writing the script? Thank you so much in advance!

Share this post


Link to post

Give me a bit here, there are some problems with this - for one, I think persistencemode may need to be forced on, it seems like the driver is releasing the card when the rendering finishes, which sets it back to a P0 state, restarting the X server isn't enough to get the p-state back. I'll work on it a bit, and make sure it's ready for prime time before I push anything out. It is the first time my card has dropped back from the P0 state since it was installed in the server though, so it's definitely progress in the right direction

Share this post


Link to post
Posted (edited)
10 minutes ago, Xaero said:

Give me a bit here, there are some problems with this - for one, I think persistencemode may need to be forced on, it seems like the driver is releasing the card when the rendering finishes, which sets it back to a P0 state, restarting the X server isn't enough to get the p-state back. I'll work on it a bit, and make sure it's ready for prime time before I push anything out. It is the first time my card has dropped back from the P0 state since it was installed in the server though, so it's definitely progress in the right direction

Hey, no worries! I discovered the thing with the persistencemode too. It's the only way to get my two cards into P8 state. I'm too happy that someone does have the same problem as me and is willing to do the work I can't for now! Thanks!

Edited by pappaq

Share this post


Link to post
Posted (edited)

Edit:
This is not a problem with an easy solution at all. 

I can monitor the transcode processes and make sure that everything is killed - but the only solution is to kill Plex:
https://forums.plex.tv/t/stuck-in-p-state-p0-after-transcode-finished-on-nvidia/387685/24
I can user fuser -vk /dev/nvidia* and it will immediately switch to a P8 state. The only process using the card when this is run is "Plex Media Server" 

It's not hard to write a script that will only do this if:
There are no processes using the card and the card is in a P0 state. I just don't know if there are any undesirable side-effects of doing it this way.

Here is such a script:


#!/bin/bash

while true; do
cur_pstate=$(nvidia-smi --query-gpu=pstate --format=csv,noheader)
running_processes=$(ps --no-headers "$(nvidia-smi |tail -n +16 | head -n -1 | sed 's/\s\s*/ /g' | cut -d' ' -f3)" | wc -l) 2>/dev/null

if [[ $cur_pstate = "P0" && $running_processes -eq 0 ]]; then
# if we got here, the card is only running the Xorg process and is in the P0 state, let's fix that.
    fuser -kv /dev/nvidia*
    echo "Reset Power State"
fi

#sleep so we aren't blocking a thread constantly.
sleep 1

done

Starting the X server on Unraid does allow one to open nvidia settings; to do this you can use a script like this to start the X server (note, that since chvt and fgconsole aren't available, you will have to switch back to VT7 by pressing Ctrl+Alt+F7):

#!/bin/bash

##This will only work on single GPU systems:
GPUID=$(nvidia-xconfig --query-gpu-info | grep BusID | sed 's/^[^:]*: //')

#Now that we know the PCI BusID of the card we can create the X server with a fake display:
nvidia-xconfig -s -a --allow-empty-initial-configuration --use-display-device=None --virtual=640x480 --busid "$GPUID" -o /dev/stdout | X :99 -config /dev/stdin&

Once you have that server running, you can return to the default unraid GUI and run:
nvidia-settings -c :99
To open nvidia-settings on the card. You could also store an xorg configuration file and use that for the virtual X display, and to set persistent nvidia settings.

 

The only way I can think of to fix this properly is to figure out why the Plex process is claiming the card and prevent that from happening. I'll look into it some more, but this needs to be fixed properly by Plex/nVidia. The linked thread at the Plex forums has more information.

I may be able to detach the Plex Transcoder process with the wrapper script, making it it's own entity, and then trapping the SIGINT/SIGKILL in the wrapper and using it to kill the transcoder, effectively using the wrapper script to separate the Plex Media Server process from the Plex Transcoder process. It's pretty kludgy, but might work.



Oh Boy:
image.png.ab3233c6afeac47311c043bd84d1a226.png

 

We're in idle P-State while transcoding territory!

Edited by Xaero

Share this post


Link to post

Could anyone guide what's the difference between stock unraid builds and nvidia unraid builds.

Sorry for asking stupid questions😅

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.