Jump to content
linuxserver.io

[Plugin] Linuxserver.io - Unraid Nvidia

1359 posts in this topic Last Reply

Recommended Posts

2 hours ago, Xaero said:

That is something that would need support added for the netdata docker. There's a plugin here:
https://github.com/coraxx/netdata_nv_plugin

That would allow you to monitor various GPU statistics via netdata. Note that it doesn't monitor the nvdec and nvenc pipelines, so you wouldn't be able to see the transcoding usage - just the memory usage.

Edit:
Updated my script on gist;
https://git.io/fhhe3

I found someone who complies netdata with nvidia-smi enabled and it doesn't work.

 

I presume because the nvidia-smi isn't installed in the default location.

 

I will have to look to where it's installed.

 

My telegraf says the same thing:

2019-03-10T04:12:10Z E! [inputs.nvidia_smi]: Error in plugin: nvidia-smi binary not at path /usr/bin/nvidia-smi, cannot gather GPU data

Share this post


Link to post
2 minutes ago, Dazog said:

I found someone who complies netdata with nvidia-smi enabled and it doesn't work.

 

I presume because the nvidia-smi isn't installed in the default location.

 

I will have to look to where it's installed.

 

My telegraf says the same thing:

2019-03-10T04:12:10Z E! [inputs.nvidia_smi]: Error in plugin: nvidia-smi binary not at path /usr/bin/nvidia-smi, cannot gather GPU data

You'll need to change the docker to use:

--runtime=nvidia


For each application that needs access to the card. The docker runtime is what has nvidia-smi in the case of your containers. In the case of unraid-nvidia it is also located in /usr/bin/nvidia-smi - but the dockers don't run on the base unraid system, they run inside the docker runtime. The stock docker runtime doesn't have nvidia-smi, the nvidia runtime does have nvidia-smi. 

Edit your docker like you did Plex and change it to use --runtime=nvidia and it should work.

Share this post


Link to post
3 minutes ago, Xaero said:

You'll need to change the docker to use:


--runtime=nvidia

 


For each application that needs access to the card. The docker runtime is what has nvidia-smi in the case of your containers. In the case of unraid-nvidia it is also located in /usr/bin/nvidia-smi - but the dockers don't run on the base unraid system, they run inside the docker runtime. The stock docker runtime doesn't have nvidia-smi, the nvidia runtime does have nvidia-smi. 

Edit your docker like you did Plex and change it to use --runtime=nvidia and it should work.

I did and it doesn't.

 

I will manually change the config files for both.

 

I can see the SMI working in the netdata docker, so it is being passed through properly. It just isn't providing stats in netdata itself.

Share this post


Link to post
2 minutes ago, Dazog said:

I did and it doesn't.

 

I will manually change the config files for both.

 

I can see the SMI working in the netdata docker, so it is being passed through properly. It just isn't providing stats in netdata itself.

Open a console for the docker and see what the output of

which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

Share this post


Link to post
6 minutes ago, Xaero said:

Open a console for the docker and see what the output of


which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

Telegraf doesn't find the dir.

 

Netdata's matches in docker what unraid reports.

 

Argh. frustrating.

Share this post


Link to post
23 minutes ago, Xaero said:

Open a console for the docker and see what the output of


which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

ok i edited the Telegraf.conf and no more errors. Now time to code a dashboard and see if it works...

Share this post


Link to post
47 minutes ago, Dazog said:

ok i edited the Telegraf.conf and no more errors. Now time to code a dashboard and see if it works...

So I just tried using the netdata docker, pointing it to this one for the repo:
d34dc3n73r/netdata-glibc

And while the docker does start, netdata works, and I can use nvidia-smi there doesn't seem to be any included netdata-nv plugin. I tried manually installing it, but haven't had any luck. I'll keep you posted on whether or not I figure it out.

Share this post


Link to post
Posted (edited)
11 minutes ago, Xaero said:

So I just tried using the netdata docker, pointing it to this one for the repo:
d34dc3n73r/netdata-glibc

And while the docker does start, netdata works, and I can use nvidia-smi there doesn't seem to be any included netdata-nv plugin. I tried manually installing it, but haven't had any luck. I'll keep you posted on whether or not I figure it out.

Yea, now i cannot get nvidia-smi to work in telegraf :(

 

I do have everything set properly.

 

Ignore me. can't use alpine linux version :0

Edited by Dazog

Share this post


Link to post
18 minutes ago, Dazog said:

Yea, now i cannot get nvidia-smi to work in telegraf :(

 

I do have everything set properly.

 

Ignore me. can't use alpine linux version :0

Got this working in netdata!
image.thumb.png.2bf79e1e66e96f36ca185a8aacda6f7c.png
So, don't worry about grabbing a special version, the version from Community Apps is fine.


Steps to reproduce:

Grab the docker from Community Apps.
During the initial container install switch to advanced view, and add --runtime=nvidia to the end of the list.
Add a new variable "NVIDIA_VISIBLE_DEVICES" with the value set to "all"
Click done, and let the docker install.
Open a console for the docker.
echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf
Restart the docker.
Enjoy.

 

Share this post


Link to post
Posted (edited)
9 minutes ago, Xaero said:

Got this working in netdata!
image.thumb.png.2bf79e1e66e96f36ca185a8aacda6f7c.png
So, don't worry about grabbing a special version, the version from Community Apps is fine.


Steps to reproduce:

Grab the docker from Community Apps.
During the initial container install switch to advanced view, and add --runtime=nvidia to the end of the list.
Add a new variable "NVIDIA_VISIBLE_DEVICES" with the value set to "all"
Click done, and let the docker install.
Open a console for the docker.
echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf
Restart the docker.
Enjoy.

 

Works :P. I had to specify my GPU.

I wonder if i can ask the person making this to add the nvidia_smi to his docker, so we don't have to do this every update :)

 

Still working on Telegraf then I am gonna post the dashboard for people.

Edited by Dazog

Share this post


Link to post
13 minutes ago, Dazog said:

Works :P. I had to specify my GPU.

I wonder if i can ask the person making this to add the nvidia_smi to his docker, so we don't have to do this every update :)

 

Still working on Telegraf then I am gonna post the dashboard for people.

Probably still best to ask if they can add support to their docker, since it should just "not work" if nvidia-smi isn't available. But, if they are opposed, have a user script you can run on a schedule:


#!/bin/bash

con="$(docker ps --format "{{.Names}}" | grep -i netdata)"

exists=$(docker exec -i "$con" grep -iqe "nvidia_smi: yes" /etc/netdata/python.d.conf >/dev/null 2>&1; echo $?)

if [ "$exists" -eq 1 ]; then
    docker exec -i "$con" /bin/sh -c 'echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf'
    docker restart "$con" >/dev/null 2>&1
    echo '<font color="green"><b>Done.</b></font>'
else
    echo '<font color="red"><b>Already Applied!</b></font>'
fi

Share this post


Link to post
Posted (edited)

Installed this today.  Worked first time no problems.  I did notice the drives won't spin down again (which is a problem of RC4 that is fixed in RC5.  Hopefully An update to something else comes soon.

 

Edited by Marshalleq
Quote box erroneously added

Share this post


Link to post

We're still working on this and the Unraid updates, all is compiling fine, with drivers and kernel modules on host, but when starting containers stuff just isn't being carried through.

On the plus side, when we do figure it out, we'll have a much better understanding of how everything works under the hood I guess......



Sent from my Mi A1 using Tapatalk

Share this post


Link to post
1 hour ago, CHBMB said:

We're still working on this and the Unraid updates, all is compiling fine, with drivers and kernel modules on host, but when starting containers stuff just isn't being carried through.

On the plus side, when we do figure it out, we'll have a much better understanding of how everything works under the hood I guess......



Sent from my Mi A1 using Tapatalk
 

You guys rock. This has increased the value of my system substantially already - I can now pass through 12 of the 16 threads in my cpu without bottlenecking Plex. This has improved VM performance significantly!

 

Can't wait for your updates - but I gladly will.

 

Let me know if you need testers at any point.

Share this post


Link to post
1 minute ago, rix said:

Have you not read the big bold red letters telling you not post about this on unraid.net? 😱

i did right after i posted that, hence has been removed

 

Share this post


Link to post

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

Share this post


Link to post
1 hour ago, CHBMB said:

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

Perhaps look into the DKMS version of the nvidia driver. It's been pretty rock solid through kernel versions for me.
Thanks for the update!

Share this post


Link to post
2 hours ago, CHBMB said:

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

How do I donate to the beer fund.

 

Seriously. let me know.

Share this post


Link to post
13 minutes ago, huntastikus said:

https://www.linuxserver.io/donate/   

 

doing the same right now! thank you for your time and efforts!

Done. Sent money for beers.

 

Too bad my CDN dollar is so bad that it's half with exchange ugh.

Share this post


Link to post
Perhaps look into the DKMS version of the nvidia driver. It's been pretty rock solid through kernel versions for me.
Thanks for the update!
It's never really been an issue with getting the drivers installed rather an issue with runc and the modifications required to get the Nvidia runtime working.

The drivers have always been relatively straightforward.

Sent from my Mi A1 using Tapatalk

Share this post


Link to post

Do you guys know if this works with the PNY Quadro 4000 (VCQ4000-PB) 2GB?

It wasn't recognizing the card on rc-4 so i figured i would wait for rc-5.   I installed rc-5 last night but still doesn't work.

Before i go crazy troubleshooting i just want to see if it even should be working with this card?

Share this post


Link to post
10 minutes ago, suprjet44 said:

Do you guys know if this works with the PNY Quadro 4000 (VCQ4000-PB) 2GB?

It wasn't recognizing the card on rc-4 so i figured i would wait for rc-5.   I installed rc-5 last night but still doesn't work.

Before i go crazy troubleshooting i just want to see if it even should be working with this card?

You'll have to check the nvidia support list for the driver installed. We only have a handful of cards we can test in the group.

Share this post


Link to post

Thanks so much for all that the LinuxServer.Io team does... just got this working on 6.7.0-rc5 with my new 1660 ti and also tested with the Decode patch/script... all appear to be working as expected!

 

Sent a couple chits for donation...will add some more when I get more in my PayPal account!

 

-Sw2

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.