[Support]: Intel iGPU Utilization Stats into InfluxDB for use with Grafana - intel-gpu-telegfraf


14 posts in this topic Last Reply

Recommended Posts

Hi all,

 

I figured I would share the container I put together, maybe someone else will find it useful.

 

The goal: See the utilization of the Intel iGPU in my Grafana dashboard.

The how: Create a container running a tiny script to manipulate the output of intel_gpu_top and have Telegraf send it to a InfluxDB instance.

The result:

grafana_screenshot.png

Docker template repo to add to your Unraid repo list: https://github.com/brianmiller/docker-templates (the container is called 'intel-gpu-telegraf')

Docker Hub repo: https://hub.docker.com/r/theoriginalbrian/intel-gpu-telegraf

Docker GitHub: https://github.com/brianmiller/docker-intel-gpu-telegraf

 

Currently, the container looks for the "Video/0" engine within the iGPU.  If there's a desire to pull this into the Unraid template UI and make it editable, let me know.  It shouldn't be difficult too do.

 

-Brian

Edited by TheBrian
add screenshot
Link to post
  • 3 weeks later...
  • 2 weeks later...

Very cool! It's great seeing someone else can use it. :)

 

Thanks for the Grafana adds.  If you don't mind, I may add some of these details to the main post and Docker page.  

 

You discovered I'm using the "exec" function of Telegraf.

 

-Brian

Link to post
  • 4 months later...

I may be an edge case but in beta35 this (very handy) docker fills up my syslog with the following error until the system's overloaded.

Nov 23 10:00:10 NAS kernel: bad: scheduling from the idle thread!
Nov 23 10:00:10 NAS kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.18-Unraid #1
Nov 23 10:00:10 NAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5005-ITX, BIOS P1.40 08/06/2018
Nov 23 10:00:10 NAS kernel: Call Trace:
Nov 23 10:00:10 NAS kernel: dump_stack+0x6b/0x83
Nov 23 10:00:10 NAS kernel: dequeue_task_idle+0x21/0x2a
Nov 23 10:00:10 NAS kernel: __schedule+0x135/0x49e
Nov 23 10:00:10 NAS kernel: ? __mod_timer+0x215/0x23c
Nov 23 10:00:10 NAS kernel: schedule+0x77/0xa0
Nov 23 10:00:10 NAS kernel: schedule_timeout+0xa7/0xe0
Nov 23 10:00:10 NAS kernel: ? __next_timer_interrupt+0xaf/0xaf
Nov 23 10:00:10 NAS kernel: msleep+0x13/0x19
Nov 23 10:00:10 NAS kernel: pci_raw_set_power_state+0x185/0x257
Nov 23 10:00:10 NAS kernel: pci_restore_standard_config+0x35/0x3b
Nov 23 10:00:10 NAS kernel: pci_pm_runtime_resume+0x29/0x7b
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: __rpm_callback+0x6b/0xcf
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_callback+0x50/0x66
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_resume+0x2e2/0x3d6
Nov 23 10:00:10 NAS kernel: ? __schedule+0x47d/0x49e
Nov 23 10:00:10 NAS kernel: __pm_runtime_resume+0x55/0x71
Nov 23 10:00:10 NAS kernel: __intel_runtime_pm_get+0x15/0x4a [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_enable+0x53/0x147 [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_event_add+0xf/0x20 [i915]
Nov 23 10:00:10 NAS kernel: event_sched_in+0xd3/0x18f
Nov 23 10:00:10 NAS kernel: merge_sched_in+0xb4/0x1de
Nov 23 10:00:10 NAS kernel: visit_groups_merge.constprop.0+0x174/0x3ad
Nov 23 10:00:10 NAS kernel: ctx_sched_in+0x11e/0x13e
Nov 23 10:00:10 NAS kernel: perf_event_sched_in+0x49/0x6c
Nov 23 10:00:10 NAS kernel: ctx_resched+0x6d/0x7c
Nov 23 10:00:10 NAS kernel: __perf_install_in_context+0x117/0x14b
Nov 23 10:00:10 NAS kernel: remote_function+0x19/0x43
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_queue+0x103/0x1a4
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_from_idle+0x2f/0x3a
Nov 23 10:00:10 NAS kernel: do_idle+0x20f/0x236
Nov 23 10:00:10 NAS kernel: cpu_startup_entry+0x18/0x1a
Nov 23 10:00:10 NAS kernel: start_kernel+0x4af/0x4d1
Nov 23 10:00:10 NAS kernel: secondary_startup_64+0xa4/0xb0

 

Link to post
  • 2 weeks later...

That was exactly the plugin I was looking for many thanks !

 

Just a note to maybe add an env for influx_username (for Auth) as my existing DB required influx_password AND username.

I modified directly the telegraf.conf but would be a great addition for other users.

 

Thanks Again !

 

Link to post
On 11/17/2020 at 4:06 AM, tronyx said:

This is awesome! Thanks so much for putting it together. I don't suppose you'd be willing to share your dashboard for this for easy replication of the above panels?

Certainly!  I'll see if I can update the GitHub with the dashboards I use. I'll link them here when I get them uploaded.

Link to post
On 12/1/2020 at 12:46 PM, Lunz_ said:

That was exactly the plugin I was looking for many thanks !

 

Just a note to maybe add an env for influx_username (for Auth) as my existing DB required influx_password AND username.

I modified directly the telegraf.conf but would be a great addition for other users.

 

Thanks Again !

 

This should be easy enough.  I'll take a look.

Link to post
On 11/23/2020 at 8:38 AM, CS01-HS said:

I may be an edge case but in beta35 this (very handy) docker fills up my syslog with the following error until the system's overloaded.


Nov 23 10:00:10 NAS kernel: bad: scheduling from the idle thread!
Nov 23 10:00:10 NAS kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.18-Unraid #1
Nov 23 10:00:10 NAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5005-ITX, BIOS P1.40 08/06/2018
Nov 23 10:00:10 NAS kernel: Call Trace:
Nov 23 10:00:10 NAS kernel: dump_stack+0x6b/0x83
Nov 23 10:00:10 NAS kernel: dequeue_task_idle+0x21/0x2a
Nov 23 10:00:10 NAS kernel: __schedule+0x135/0x49e
Nov 23 10:00:10 NAS kernel: ? __mod_timer+0x215/0x23c
Nov 23 10:00:10 NAS kernel: schedule+0x77/0xa0
Nov 23 10:00:10 NAS kernel: schedule_timeout+0xa7/0xe0
Nov 23 10:00:10 NAS kernel: ? __next_timer_interrupt+0xaf/0xaf
Nov 23 10:00:10 NAS kernel: msleep+0x13/0x19
Nov 23 10:00:10 NAS kernel: pci_raw_set_power_state+0x185/0x257
Nov 23 10:00:10 NAS kernel: pci_restore_standard_config+0x35/0x3b
Nov 23 10:00:10 NAS kernel: pci_pm_runtime_resume+0x29/0x7b
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: __rpm_callback+0x6b/0xcf
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_callback+0x50/0x66
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_resume+0x2e2/0x3d6
Nov 23 10:00:10 NAS kernel: ? __schedule+0x47d/0x49e
Nov 23 10:00:10 NAS kernel: __pm_runtime_resume+0x55/0x71
Nov 23 10:00:10 NAS kernel: __intel_runtime_pm_get+0x15/0x4a [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_enable+0x53/0x147 [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_event_add+0xf/0x20 [i915]
Nov 23 10:00:10 NAS kernel: event_sched_in+0xd3/0x18f
Nov 23 10:00:10 NAS kernel: merge_sched_in+0xb4/0x1de
Nov 23 10:00:10 NAS kernel: visit_groups_merge.constprop.0+0x174/0x3ad
Nov 23 10:00:10 NAS kernel: ctx_sched_in+0x11e/0x13e
Nov 23 10:00:10 NAS kernel: perf_event_sched_in+0x49/0x6c
Nov 23 10:00:10 NAS kernel: ctx_resched+0x6d/0x7c
Nov 23 10:00:10 NAS kernel: __perf_install_in_context+0x117/0x14b
Nov 23 10:00:10 NAS kernel: remote_function+0x19/0x43
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_queue+0x103/0x1a4
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_from_idle+0x2f/0x3a
Nov 23 10:00:10 NAS kernel: do_idle+0x20f/0x236
Nov 23 10:00:10 NAS kernel: cpu_startup_entry+0x18/0x1a
Nov 23 10:00:10 NAS kernel: start_kernel+0x4af/0x4d1
Nov 23 10:00:10 NAS kernel: secondary_startup_64+0xa4/0xb0

 

I'm glad it's useful.  I haven't seen these errors before.  Did they start after you installed the intel-gpu-telegraf container or after the upgrade of unraid?

Link to post
12 minutes ago, TheBrian said:

I'm glad it's useful.  I haven't seen these errors before.  Did they start after you installed the intel-gpu-telegraf container or after the upgrade of unraid?

I upgraded to beta35 then installed intel-gpu-telegraf for the first time. I'll try again and report back

Link to post
  • 1 month later...
On 12/6/2020 at 8:47 AM, CS01-HS said:

I managed to cause the same error (and freeze my server) playing around with intel_gpu_top in the intel-gpu-tools container so my problem's at a lower level, your container's fine.

After several freezes (which caused unclean shutdowns) in Handbrake using the hardware encoder and also with monitoring, and random corrupted encodes, I got my machine stable with the following changes:

  1. Added intel_iommu=on,igfx_off to syslinux config (this may be optional)
  2. Added a dummy HDMI plug to my headless server (j5005)

It's been stable now for several days despite continuous hardware encoding in Handbrake with no corruption.

Link to post
  • 5 weeks later...

Hey

 

This is exactly what I've been looking for but for some reason I'm unable to get it working I'm clearly being thick (been a long few weeks) I've installed the docker but for some reason im getting  "[agent] Error writing to outputs.influxdb: could not write any address" any help would be much appreciated

 

Cheers Tinni  

 

Link to post
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.