[PLUGIN] GPU Statistics


Recommended Posts

6 minutes ago, b3rs3rk said:


Just to check, the parent PID wasn't the same as the PID in nvidia-smi right?  I may have something wrong in my thinking here.  Ugh, makes it so much harder when I can't test this stuff on my own.  I'll see if I can run a deepstack instance and get it to spawn these processes.  I'm guessing you need a camera to do it?

I'm fairly certain, yeah. And you can test deepstack by just installing its docker (and following the gpu instructions, just 3 steps iirc) and "deepstack ui". Both in the unraid app store. The deepstack UI allows you to easily spit your own image to the AI and test it.

 

Thanks for all your help

Edited by gowg
Link to comment
6 hours ago, b3rs3rk said:


I updated to 6.9.2 today and updated to the latest NVIDIA driver (465.19.01) and I'm still not having any issues with a Quadro P4000.  I've already pushed a fix for the double NVIDIA in the product name which will go in the next release.  Other than that, the plugin is displaying data just as I expected.

image.png.05da8436e6940779ed9b5a1aa2e50759.png

I upgraded to 6.9.2 and it's working now.  Yeah.

 

There are differences in drivers.  Not sure it warrants quadro, geforce and rtx version of the plugin or maybe a card selection area that downloads the correct linux driver.   Just a thought

Edited by 5STAR
Link to comment
4 minutes ago, 5STAR said:

I upgraded to 6.9.2 and it's working now.  Yeah.

 

There are differences in drivers.  Not sure it warrants quadro, geforce and rtx version of the plugin or maybe a card selection area that downloads the correct linux driver.   Just a thought

 

Maybe in some cases on Windows there are separate driver packages, but for Linux they are not separate.

https://www.nvidia.com/Download/driverResults.aspx/171392/en-us

 

If you click Supported Products you'll see pretty much everything is supported by the one Linux driver package.

Link to comment
1 hour ago, b3rs3rk said:

 

Maybe in some cases on Windows there are separate driver packages, but for Linux they are not separate.

https://www.nvidia.com/Download/driverResults.aspx/171392/en-us

 

If you click Supported Products you'll see pretty much everything is supported by the one Linux driver package.

Ahh ok great.  if I go to the nvidia driver section and select linux there are 3 separate sections or maybe they are just filters of supported drivers.    This would make it easier for sure :)

Edited by 5STAR
Link to comment
9 hours ago, 5STAR said:

Anyway its working again :)  Thanks for all the help and consideration of the issue.

 

Mike

I saw that you run Unraid through ESXi amd that can also cause problems, please keep that in mind...

 

Go to the Nvidia thread and search for ESXi, I rember one user that run it also through ESXi and he saw the card but card but can't use it.

Link to comment
13 hours ago, ich777 said:

I saw that you run Unraid through ESXi amd that can also cause problems, please keep that in mind...

 

Go to the Nvidia thread and search for ESXi, I rember one user that run it also through ESXi and he saw the card but card but can't use it.

it's all working as it was since day one now.  There has never been any updates for the esxi platform.  Only thing that changed were items with unraid and the last change fixed it again.   I am happy it's back :) Unraid runs perfectly in the esxi platform.  Only thing that doesn't is running VM's in Unraid.  For that I use esxi.   All the dockers i have ever loaded work perfectly however I only use the plex docker now.  Running Unraid in esxi I do loose the ability to see low level sensors such as motherboard stuff because esxi has control of them.  But I don't need that stuff anyway as it's covered in other areas of esxi.   Items like temp sensors on motherboard, chipset identification etc are all controlled by esxi.  

 

Just for fun I also did boot Unraid natively and bypassed the esxi booting process.  Unraid worked perfectly and setup the unraid server for my hardware.   I was able to see all the low level stuff including chipset etc..  unraid is just great stuff. 

 

Thanks again guys for working through this one. 

Edited by 5STAR
Link to comment
On 5/13/2020 at 8:47 AM, Unixsystem said:

I too am also getting spammed with:


May 13 09:44:28 Tank kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
May 13 09:44:31 Tank kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

but, I tried to uncheck some of the polling options to see if a particular one maybe causing it, but no matter how many I uncheck or which ones, it automatically goes back to all of them being checked.

 

UPDATE: Ok I figured it out, it has to do with power states and this query waking up the display.

Setting nvidia-smi --persistence-mode=1 will fix the issue, to keep the gpu initialized.

I noticed when looking at the state, it was always in a P0, state with persistent mode on, it allowed it to drop to P8.

(This is for a quadro p2000, btw)

Thanks for this command suggestion!  Worked like a charm for me.  My question, however, is how do I get this to run on every boot?

Link to comment
3 hours ago, Johnny Utah said:

Thanks for this command suggestion!  Worked like a charm for me.  My question, however, is how do I get this to run on every boot?

Please look in the Nvidia-Driver thread, there is a description in this post:

 

It should be also possible to put the two commands (without the first line) in the 'go' file.

Link to comment
18 hours ago, Johnny Utah said:

Thanks for this command suggestion!  Worked like a charm for me.  My question, however, is how do I get this to run on every boot?

 

it does indeed seems to stop the logging, I also get this as a result in the log: 

 

Apr 15 12:18:12 Tower kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

 

 

 

Link to comment
7 minutes ago, daan_SVK said:

 

it does indeed seems to stop the logging, I also get this as a result in the log: 

 


Apr 15 12:18:12 Tower kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

 

 

 

Please read the comment from above your post!

Link to comment
23 hours ago, Johnny Utah said:

Thanks for this command suggestion!  Worked like a charm for me.  My question, however, is how do I get this to run on every boot?

 

If you have "User Scipts" installed, create a new script with the following and set it to run on first start of the array.

 

#!/bin/bash
#set persistence mode
nvidia-smi -pm 1

 

If you were running any other patches on your card, you could insert them in the same script. :)

Link to comment
On 4/15/2021 at 7:10 PM, mrMTB said:

 

If you have "User Scipts" installed, create a new script with the following and set it to run on first start of the array.

 


#!/bin/bash
#set persistence mode
nvidia-smi -pm 1

 

If you were running any other patches on your card, you could insert them in the same script. :)

That did it!  Thank you very much for the assistance.

Link to comment

I just wanted to stop in and say thanks for making this plugin. My transcoding GPU has been acting up for while, but I wasn't sure what the problem was. This plugin helped me see that the temp was high and the fan wasn't working. Turns out I had a stray wire blocking the fan's operation. So thanks again, I appreciate it. 

  • Like 2
Link to comment
3 hours ago, Zotarios said:

Is this info accurate?

image.png.6cbb3e7c98384a28e082248e68562023.png

How can I be sure if it's using version 3 or 1?

 

The first figure is usually dynamic based on the power state.  To reduce power consumption, the card uses Gen 1 when idle and ramps up to the maximum as determined by the power state.  The value in the parentheses is the maximum of your card or your PCI express bus capabilities and is static.  My maximum, for example, displays as Generation 2 because I'm using an older motherboard/chipset even though the Quadro P4000 I'm using can use 3.0.

image.png.3a866d527107b2807d17af4f95c31bb5.png

  • Thanks 1
Link to comment
  • 2 weeks later...

can someone help me out? i have a vega 64 frontier edition that im trying to setup on my server for mining till plex encoding starts to work for AMD but when i try to install GPU statistics from the CA it throws the following error and I am not sure how to proceed

image.png.21641f6773559eccc6c4f624a6cac3e8.png

Link to comment
3 hours ago, Kvo1087 said:

I have the CA plugin and updated but it does not return any results when searching for radeon top. sorry for all the screenshots

grafik.png.890032f6b9f54afcd3f7fd7bf7c03d79.png

 

You also can search for parts of the name like in this case 'Radeon' and you will find it too. :)

Please also keep in mind the description of the GPU Statistics says that you actually need a vendor utility Plugin like Nvidia-Driver, Radeon-TOP or Intel-GPU-TOP.

grafik.png.450e91dbc805afbadb58b18e8b4f236f.png

Link to comment

Hello, logged in to my unraid box today and noticed a lot of log spam
 

May 12 18:15:38 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May 12 18:15:38 ***** kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May 12 18:15:39 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May 12 18:15:39 ***** kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May 12 18:15:40 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May 12 18:15:40 ***** kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May 12 18:15:40 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May 12 18:15:40 ***** kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May 12 18:15:41 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
May 12 18:15:41 ***** kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
May 12 18:15:42 ***** kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

 

Removing GPU Stats Plugin stopped it. Is there a fix for this? Thanks :)

Link to comment
13 minutes ago, RedSpider said:

Removing GPU Stats Plugin stopped it. Is there a fix for this? Thanks :)

Just ignore the message this is just a information from the driver itself when nvidia-smi is called and should only appear if you are on the Dashboard page (you also got at least one such message in your syslog when you open up the Nvidia-Driver plugin page because I also have to call nvidia-smi to get the driver details and also the UUID of the card/s).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.