[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

2 hours ago, Xaero said:

That is something that would need support added for the netdata docker. There's a plugin here:
https://github.com/coraxx/netdata_nv_plugin

That would allow you to monitor various GPU statistics via netdata. Note that it doesn't monitor the nvdec and nvenc pipelines, so you wouldn't be able to see the transcoding usage - just the memory usage.

Edit:
Updated my script on gist;
https://git.io/fhhe3

I found someone who complies netdata with nvidia-smi enabled and it doesn't work.

 

I presume because the nvidia-smi isn't installed in the default location.

 

I will have to look to where it's installed.

 

My telegraf says the same thing:

2019-03-10T04:12:10Z E! [inputs.nvidia_smi]: Error in plugin: nvidia-smi binary not at path /usr/bin/nvidia-smi, cannot gather GPU data

Link to comment
2 minutes ago, Dazog said:

I found someone who complies netdata with nvidia-smi enabled and it doesn't work.

 

I presume because the nvidia-smi isn't installed in the default location.

 

I will have to look to where it's installed.

 

My telegraf says the same thing:

2019-03-10T04:12:10Z E! [inputs.nvidia_smi]: Error in plugin: nvidia-smi binary not at path /usr/bin/nvidia-smi, cannot gather GPU data

You'll need to change the docker to use:

--runtime=nvidia


For each application that needs access to the card. The docker runtime is what has nvidia-smi in the case of your containers. In the case of unraid-nvidia it is also located in /usr/bin/nvidia-smi - but the dockers don't run on the base unraid system, they run inside the docker runtime. The stock docker runtime doesn't have nvidia-smi, the nvidia runtime does have nvidia-smi. 

Edit your docker like you did Plex and change it to use --runtime=nvidia and it should work.
  • Upvote 1
Link to comment
3 minutes ago, Xaero said:

You'll need to change the docker to use:


--runtime=nvidia

 


For each application that needs access to the card. The docker runtime is what has nvidia-smi in the case of your containers. In the case of unraid-nvidia it is also located in /usr/bin/nvidia-smi - but the dockers don't run on the base unraid system, they run inside the docker runtime. The stock docker runtime doesn't have nvidia-smi, the nvidia runtime does have nvidia-smi. 

Edit your docker like you did Plex and change it to use --runtime=nvidia and it should work.

I did and it doesn't.

 

I will manually change the config files for both.

 

I can see the SMI working in the netdata docker, so it is being passed through properly. It just isn't providing stats in netdata itself.

Link to comment
2 minutes ago, Dazog said:

I did and it doesn't.

 

I will manually change the config files for both.

 

I can see the SMI working in the netdata docker, so it is being passed through properly. It just isn't providing stats in netdata itself.

Open a console for the docker and see what the output of

which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

  • Like 1
Link to comment
6 minutes ago, Xaero said:

Open a console for the docker and see what the output of


which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

Telegraf doesn't find the dir.

 

Netdata's matches in docker what unraid reports.

 

Argh. frustrating.

Link to comment
23 minutes ago, Xaero said:

Open a console for the docker and see what the output of


which nvidia-smi


is. That will tell you where the nvidia-smi binary is located.

ok i edited the Telegraf.conf and no more errors. Now time to code a dashboard and see if it works...

Link to comment
47 minutes ago, Dazog said:

ok i edited the Telegraf.conf and no more errors. Now time to code a dashboard and see if it works...

So I just tried using the netdata docker, pointing it to this one for the repo:
d34dc3n73r/netdata-glibc

And while the docker does start, netdata works, and I can use nvidia-smi there doesn't seem to be any included netdata-nv plugin. I tried manually installing it, but haven't had any luck. I'll keep you posted on whether or not I figure it out.

Link to comment
11 minutes ago, Xaero said:

So I just tried using the netdata docker, pointing it to this one for the repo:
d34dc3n73r/netdata-glibc

And while the docker does start, netdata works, and I can use nvidia-smi there doesn't seem to be any included netdata-nv plugin. I tried manually installing it, but haven't had any luck. I'll keep you posted on whether or not I figure it out.

Yea, now i cannot get nvidia-smi to work in telegraf :(

 

I do have everything set properly.

 

Ignore me. can't use alpine linux version :0

Edited by Dazog
Link to comment
18 minutes ago, Dazog said:

Yea, now i cannot get nvidia-smi to work in telegraf :(

 

I do have everything set properly.

 

Ignore me. can't use alpine linux version :0

Got this working in netdata!
image.thumb.png.2bf79e1e66e96f36ca185a8aacda6f7c.png
So, don't worry about grabbing a special version, the version from Community Apps is fine.


Steps to reproduce:

Grab the docker from Community Apps.
During the initial container install switch to advanced view, and add --runtime=nvidia to the end of the list.
Add a new variable "NVIDIA_VISIBLE_DEVICES" with the value set to "all"
Click done, and let the docker install.
Open a console for the docker.
echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf
Restart the docker.
Enjoy.

 

  • Like 1
  • Upvote 2
Link to comment
9 minutes ago, Xaero said:

Got this working in netdata!
image.thumb.png.2bf79e1e66e96f36ca185a8aacda6f7c.png
So, don't worry about grabbing a special version, the version from Community Apps is fine.


Steps to reproduce:

Grab the docker from Community Apps.
During the initial container install switch to advanced view, and add --runtime=nvidia to the end of the list.
Add a new variable "NVIDIA_VISIBLE_DEVICES" with the value set to "all"
Click done, and let the docker install.
Open a console for the docker.
echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf
Restart the docker.
Enjoy.

 

Works :P. I had to specify my GPU.

I wonder if i can ask the person making this to add the nvidia_smi to his docker, so we don't have to do this every update :)

 

Still working on Telegraf then I am gonna post the dashboard for people.

Edited by Dazog
  • Upvote 1
Link to comment
13 minutes ago, Dazog said:

Works :P. I had to specify my GPU.

I wonder if i can ask the person making this to add the nvidia_smi to his docker, so we don't have to do this every update :)

 

Still working on Telegraf then I am gonna post the dashboard for people.

Probably still best to ask if they can add support to their docker, since it should just "not work" if nvidia-smi isn't available. But, if they are opposed, have a user script you can run on a schedule:


#!/bin/bash

con="$(docker ps --format "{{.Names}}" | grep -i netdata)"

exists=$(docker exec -i "$con" grep -iqe "nvidia_smi: yes" /etc/netdata/python.d.conf >/dev/null 2>&1; echo $?)

if [ "$exists" -eq 1 ]; then
    docker exec -i "$con" /bin/sh -c 'echo "nvidia_smi: yes" >> /etc/netdata/python.d.conf'
    docker restart "$con" >/dev/null 2>&1
    echo '<font color="green"><b>Done.</b></font>'
else
    echo '<font color="red"><b>Already Applied!</b></font>'
fi

  • Like 1
  • Upvote 1
Link to comment

Installed this today.  Worked first time no problems.  I did notice the drives won't spin down again (which is a problem of RC4 that is fixed in RC5.  Hopefully An update to something else comes soon.

 

Edited by Marshalleq
Quote box erroneously added
Link to comment

We're still working on this and the Unraid updates, all is compiling fine, with drivers and kernel modules on host, but when starting containers stuff just isn't being carried through.

On the plus side, when we do figure it out, we'll have a much better understanding of how everything works under the hood I guess......



Sent from my Mi A1 using Tapatalk

  • Like 1
  • Upvote 3
Link to comment
1 hour ago, CHBMB said:

We're still working on this and the Unraid updates, all is compiling fine, with drivers and kernel modules on host, but when starting containers stuff just isn't being carried through.

On the plus side, when we do figure it out, we'll have a much better understanding of how everything works under the hood I guess......



Sent from my Mi A1 using Tapatalk
 

You guys rock. This has increased the value of my system substantially already - I can now pass through 12 of the 16 threads in my cpu without bottlenecking Plex. This has improved VM performance significantly!

 

Can't wait for your updates - but I gladly will.

 

Let me know if you need testers at any point.

Link to comment

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

  • Like 3
Link to comment
1 hour ago, CHBMB said:

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

Perhaps look into the DKMS version of the nvidia driver. It's been pretty rock solid through kernel versions for me.
Thanks for the update!

Link to comment
2 hours ago, CHBMB said:

v6.6.7 and v6.7.0rc5 uploaded.  If anyone pings me or @bass_rock and mentions the word Nvidia in the next week, we'll probably murder you and dispose of your body so well you'll never be discovered.

 

It's been a slog for both of us.  To say between us we've compiled this at least 50 times would be a conservative estimate, and the theories and conversations we've had have been numerous.

 

Bottom line, we're not really sure how we got it to work for so many successive versions before hitting this wall.

How do I donate to the beer fund.

 

Seriously. let me know.

  • Upvote 1
Link to comment
Perhaps look into the DKMS version of the nvidia driver. It's been pretty rock solid through kernel versions for me.
Thanks for the update!
It's never really been an issue with getting the drivers installed rather an issue with runc and the modifications required to get the Nvidia runtime working.

The drivers have always been relatively straightforward.

Sent from my Mi A1 using Tapatalk

  • Upvote 1
Link to comment

Do you guys know if this works with the PNY Quadro 4000 (VCQ4000-PB) 2GB?

It wasn't recognizing the card on rc-4 so i figured i would wait for rc-5.   I installed rc-5 last night but still doesn't work.

Before i go crazy troubleshooting i just want to see if it even should be working with this card?

Link to comment
10 minutes ago, suprjet44 said:

Do you guys know if this works with the PNY Quadro 4000 (VCQ4000-PB) 2GB?

It wasn't recognizing the card on rc-4 so i figured i would wait for rc-5.   I installed rc-5 last night but still doesn't work.

Before i go crazy troubleshooting i just want to see if it even should be working with this card?

You'll have to check the nvidia support list for the driver installed. We only have a handful of cards we can test in the group.

Link to comment

Thanks so much for all that the LinuxServer.Io team does... just got this working on 6.7.0-rc5 with my new 1660 ti and also tested with the Decode patch/script... all appear to be working as expected!

 

Sent a couple chits for donation...will add some more when I get more in my PayPal account!

 

-Sw2

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.