[PLUGIN] GPU Statistics


Recommended Posts

1 minute ago, jungle said:

I’m here to help - tell me what you need me to do. 

 

Thanks. At your convenience, please open a terminal window or ssh session into your server and type

 

sensors

 

and in the output look for the lines that look like mine:

 

amdgpu-pci-0900
Adapter: PCI adapter
vddgfx:           N/A  
vddnb:            N/A  
edge:         +37.0°C  

 

and post here. I suspect they will look very similar (different PCI address though - 0a00, if I remember your earlier posts correctly).

 

Link to comment
29 minutes ago, John_M said:

 

Thanks. At your convenience, please open a terminal window or ssh session into your server and type

 


sensors

 

and in the output look for the lines that look like mine:

 


amdgpu-pci-0900
Adapter: PCI adapter
vddgfx:           N/A  
vddnb:            N/A  
edge:         +37.0°C  

 

and post here. I suspect they will look very similar (different PCI address though - 0a00, if I remember your earlier posts correctly).

 

 

image.png.d988137229e07dbd2165936019fef780.png

  • Thanks 2
Link to comment

Changing the 0a00 to match my 0900, I get

 

root@Pusok:~# sensors amdgpu-pci-0900 -j
{
   "amdgpu-pci-0900":{
      "Adapter": "PCI adapter",
      "vddgfx":{
ERROR: Can't get value of subfeature in0_input: Can't read

      },
      "vddnb":{
ERROR: Can't get value of subfeature in1_input: Can't read

      },
      "edge":{
         "temp1_input": 38.000
      }
   }
}

 

Link to comment
8 hours ago, John_M said:

Changing the 0a00 to match my 0900, I get

 


root@Pusok:~# sensors amdgpu-pci-0900 -j
{
   "amdgpu-pci-0900":{
      "Adapter": "PCI adapter",
      "vddgfx":{
ERROR: Can't get value of subfeature in0_input: Can't read

      },
      "vddnb":{
ERROR: Can't get value of subfeature in1_input: Can't read

      },
      "edge":{
         "temp1_input": 38.000
      }
   }
}

 

 

Those ERROR messages break the format so it is no longer valid JSON.  Makes it difficult to parse on my end without doing some dumb string manipulation.  Try this:

 

sensors amdgpu-pci-0900 -j 2>errors

 

Link to comment
10 hours ago, b3rs3rk said:

 

Can you confirm the output of:

 


sensors amdgpu-pci-0a00 -j

 

 

{
   "amdgpu-pci-0a00":{
      "Adapter": "PCI adapter",
      "vddgfx":{
ERROR: Can't get value of subfeature in0_input: Can't read

      },
      "vddnb":{
ERROR: Can't get value of subfeature in1_input: Can't read

      },
      "edge":{
         "temp1_input": 72.000
      }
   }
}

Link to comment
2 minutes ago, jungle said:

 

{
   "amdgpu-pci-0a00":{
      "Adapter": "PCI adapter",
      "vddgfx":{
ERROR: Can't get value of subfeature in0_input: Can't read

      },
      "vddnb":{
ERROR: Can't get value of subfeature in1_input: Can't read

      },
      "edge":{
         "temp1_input": 72.000
      }
   }
}

 

sensors amdgpu-pci-0a00 -j 2>errors

 

Link to comment
22 minutes ago, jungle said:

{
   "amdgpu-pci-0a00":{
      "Adapter": "PCI adapter",
      "vddgfx":{

      },
      "vddnb":{

      },
      "edge":{
         "temp1_input": 72.000
      }
   }
}

 

Much better.  I think I have everything worked out.

 

@John_M if possible I'd like to test the code against your machine again.  Probably @ich777's loaned RX480 too.

  • Like 1
Link to comment
4 hours ago, b3rs3rk said:

Probably @ich777's loaned RX480 too.

There you go:

 

root@Development:~# sensors amdgpu-pci-0400 -j 2>errors
{
   "amdgpu-pci-0400":{
      "Adapter": "PCI adapter",
      "vddgfx":{
         "in0_input": 0.750
      },
      "fan1":{
         "fan1_input": 676.000,
         "fan1_min": 0.000,
         "fan1_max": 3700.000
      },
      "edge":{
         "temp1_input": 27.000,
         "temp1_crit": 94.000,
         "temp1_crit_hyst": -273.150
      },
      "power1":{
         "power1_average": 8.206,
         "power1_cap": 127.000
      }
   }
}

 

  • Thanks 1
Link to comment
1 minute ago, ich777 said:

There you go:

 


root@Development:~# sensors amdgpu-pci-0400 -j 2>errors
{
   "amdgpu-pci-0400":{
      "Adapter": "PCI adapter",
      "vddgfx":{
         "in0_input": 0.750
      },
      "fan1":{
         "fan1_input": 676.000,
         "fan1_min": 0.000,
         "fan1_max": 3700.000
      },
      "edge":{
         "temp1_input": 27.000,
         "temp1_crit": 94.000,
         "temp1_crit_hyst": -273.150
      },
      "power1":{
         "power1_average": 8.206,
         "power1_cap": 127.000
      }
   }
}

 

 

Interesting.  Looks like (at least some of) the dGPUs have fan and power metering as well that can be added.

Link to comment

Hey Berserk,

 

I have a interesting error coming up since I updated to 6.9

 

First I was getting spammed with this

Mar 31 20:40:34 unRAID kernel: NVRM: GPU 0000:01:00.0: Failed to enable MSI; falling back to PCIe virtual-wire interrupts.
Mar 31 20:40:34 unRAID kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]

Once I figured out it was the plugin, I removed it but now I'm getting 

Mar 31 20:41:23 unRAID nginx: 2021/03/31 20:41:23 [error] 9939#9939: *10099 FastCGI sent in stderr: "Unable to open primary script: /usr/local/emhttp/plugins/gpustat/gpustatus.php (No such file or directory)" while reading response header from upstream, client: 192.168.2.110, server: , request: "GET /plugins/gpustat/gpustatus.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.2.155", referrer: "http://192.168.2.155/Dashboard"
Mar 31 20:42:23 unRAID nginx: 2021/03/31 20:42:23 [error] 9939#9939: *10605 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 192.168.2.110, server: , request: "GET /plugins/gpustat/gpustatus.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.2.155", referrer: "http://192.168.2.155/Dashboard"

 

Any idea what's going on?

Link to comment
50 minutes ago, Addy said:

First I was getting spammed with this

This is the Nvidia driver and is pretty normal if you are on the Unraid Dashboard since the GPU Statistics plugin is calling 'nvidia-smi' every second I think and also is because the Kernel config 'CONFIG_WATCHDOG' is enabled in releases 6.9.0+ to support more hardware monitor chips (NCT).

That's why you see this warnings but that is nothing to worry about and they only appear if you are on the Unraid Dashboard page.

 

Actually the Nvidia-Driver plugin itself produces the same error if you go in the configuration page because I also have to pull some things from 'nvidia-smi'.

 

1 hour ago, Addy said:

Once I figured out it was the plugin, I removed it but now I'm getting 

What plugin have you removed? The GPU Statistics plugin?

A reboot should solve this message.

Link to comment

Just wanted to say I am experiencing no data on my gpu stats plugin.  Last time it did this after one of the updates you sent out I just uninstalled the gpustats and reinstalled it.. it started working after I selected nvidia.  I run an nvidia quadro p2200 and it worked perfect before the flurry up daily updates.  

 

I rebooted the unraid server .  I know the nvidia gpu is working because plex is smoking right along with a large buffering of video.  the gpustats isn't updating most of the stats.. sometimes I see the load wiggle 1 % or so and or pci gen max go from 1 to 3 but that is the only thing that seems to update during use. 

 

Thanks for the great work.  Makes owning unraid enjoyable :)

Edited by 5STAR
Link to comment

@b3rs3rk Sorry was occupied like anything; Just came by to say a huge thank you! My Nano shows the temp (gpu die temp?) as well.

 

On a side note I noticed this in unraid logs being logged around every 10 minutes (since 1st release for amd gpu support):

 

Quote

 

kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).

 

kernel: [drm] UVD initialized successfully.

 

kernel: [drm] VCE initialized successfully.

 

kernel: amdgpu 0000:01:00.0: [drm] Cannot find any crtc or sizes

 

 

Let me know if you need further help.

Edited by soulskill
Link to comment
On 3/22/2021 at 12:00 AM, John_M said:

 

Thanks. At your convenience, please open a terminal window or ssh session into your server and type

 



sensors

 

and in the output look for the lines that look like mine:

 



amdgpu-pci-0900
Adapter: PCI adapter
vddgfx:           N/A  
vddnb:            N/A  
edge:         +37.0°C  

 

and post here. I suspect they will look very similar (different PCI address though - 0a00, if I remember your earlier posts correctly).

 

 


Here’s mine from an Sapphire R9 Nano:

 

Quote

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:        1.24 V  
fan1:             N/A  (min = 1000 RPM, max = 4200 RPM)
edge:         +30.0°C  (crit = +89.0°C, hyst = -273.1°C)
power1:       11.18 W  (cap = 150.00 W)


Custom loop (liquid cooled) hence the ‘fan1: N/A’.

 

Wondering if we can also fetch the VMem. Temps.

Edited by soulskill
Link to comment
On 4/1/2021 at 12:56 AM, 5STAR said:

Just wanted to say I am experiencing no data on my gpu stats plugin.  Last time it did this after one of the updates you sent out I just uninstalled the gpustats and reinstalled it.. it started working after I selected nvidia.  I run an nvidia quadro p2200 and it worked perfect before the flurry up daily updates.  

 

I rebooted the unraid server .  I know the nvidia gpu is working because plex is smoking right along with a large buffering of video.  the gpustats isn't updating most of the stats.. sometimes I see the load wiggle 1 % or so and or pci gen max go from 1 to 3 but that is the only thing that seems to update during use. 

 

Thanks for the great work.  Makes owning unraid enjoyable :)


Look in the original post of this thread and provide the troubleshooting info at the bottom of it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.