[Support]: Intel iGPU Utilization Stats into InfluxDB for use with Grafana - intel-gpu-telegfraf


12 posts in this topic Last Reply

Recommended Posts

Hi all,

 

I figured I would share the container I put together, maybe someone else will find it useful.

 

The goal: See the utilization of the Intel iGPU in my Grafana dashboard.

The how: Create a container running a tiny script to manipulate the output of intel_gpu_top and have Telegraf send it to a InfluxDB instance.

The result:

grafana_screenshot.png

Docker template repo to add to your Unraid repo list: https://github.com/brianmiller/docker-templates (the container is called 'intel-gpu-telegraf')

Docker Hub repo: https://hub.docker.com/r/theoriginalbrian/intel-gpu-telegraf

Docker GitHub: https://github.com/brianmiller/docker-intel-gpu-telegraf

 

Currently, the container looks for the "Video/0" engine within the iGPU.  If there's a desire to pull this into the Unraid template UI and make it editable, let me know.  It shouldn't be difficult too do.

 

-Brian

Edited by TheBrian
add screenshot
Link to post
  • 3 weeks later...
  • 2 weeks later...

Very cool! It's great seeing someone else can use it. :)

 

Thanks for the Grafana adds.  If you don't mind, I may add some of these details to the main post and Docker page.  

 

You discovered I'm using the "exec" function of Telegraf.

 

-Brian

Link to post
  • 4 months later...

I may be an edge case but in beta35 this (very handy) docker fills up my syslog with the following error until the system's overloaded.

Nov 23 10:00:10 NAS kernel: bad: scheduling from the idle thread!
Nov 23 10:00:10 NAS kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.18-Unraid #1
Nov 23 10:00:10 NAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5005-ITX, BIOS P1.40 08/06/2018
Nov 23 10:00:10 NAS kernel: Call Trace:
Nov 23 10:00:10 NAS kernel: dump_stack+0x6b/0x83
Nov 23 10:00:10 NAS kernel: dequeue_task_idle+0x21/0x2a
Nov 23 10:00:10 NAS kernel: __schedule+0x135/0x49e
Nov 23 10:00:10 NAS kernel: ? __mod_timer+0x215/0x23c
Nov 23 10:00:10 NAS kernel: schedule+0x77/0xa0
Nov 23 10:00:10 NAS kernel: schedule_timeout+0xa7/0xe0
Nov 23 10:00:10 NAS kernel: ? __next_timer_interrupt+0xaf/0xaf
Nov 23 10:00:10 NAS kernel: msleep+0x13/0x19
Nov 23 10:00:10 NAS kernel: pci_raw_set_power_state+0x185/0x257
Nov 23 10:00:10 NAS kernel: pci_restore_standard_config+0x35/0x3b
Nov 23 10:00:10 NAS kernel: pci_pm_runtime_resume+0x29/0x7b
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: __rpm_callback+0x6b/0xcf
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_callback+0x50/0x66
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_resume+0x2e2/0x3d6
Nov 23 10:00:10 NAS kernel: ? __schedule+0x47d/0x49e
Nov 23 10:00:10 NAS kernel: __pm_runtime_resume+0x55/0x71
Nov 23 10:00:10 NAS kernel: __intel_runtime_pm_get+0x15/0x4a [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_enable+0x53/0x147 [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_event_add+0xf/0x20 [i915]
Nov 23 10:00:10 NAS kernel: event_sched_in+0xd3/0x18f
Nov 23 10:00:10 NAS kernel: merge_sched_in+0xb4/0x1de
Nov 23 10:00:10 NAS kernel: visit_groups_merge.constprop.0+0x174/0x3ad
Nov 23 10:00:10 NAS kernel: ctx_sched_in+0x11e/0x13e
Nov 23 10:00:10 NAS kernel: perf_event_sched_in+0x49/0x6c
Nov 23 10:00:10 NAS kernel: ctx_resched+0x6d/0x7c
Nov 23 10:00:10 NAS kernel: __perf_install_in_context+0x117/0x14b
Nov 23 10:00:10 NAS kernel: remote_function+0x19/0x43
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_queue+0x103/0x1a4
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_from_idle+0x2f/0x3a
Nov 23 10:00:10 NAS kernel: do_idle+0x20f/0x236
Nov 23 10:00:10 NAS kernel: cpu_startup_entry+0x18/0x1a
Nov 23 10:00:10 NAS kernel: start_kernel+0x4af/0x4d1
Nov 23 10:00:10 NAS kernel: secondary_startup_64+0xa4/0xb0

 

Link to post
  • 2 weeks later...

That was exactly the plugin I was looking for many thanks !

 

Just a note to maybe add an env for influx_username (for Auth) as my existing DB required influx_password AND username.

I modified directly the telegraf.conf but would be a great addition for other users.

 

Thanks Again !

 

Link to post
On 11/17/2020 at 4:06 AM, tronyx said:

This is awesome! Thanks so much for putting it together. I don't suppose you'd be willing to share your dashboard for this for easy replication of the above panels?

Certainly!  I'll see if I can update the GitHub with the dashboards I use. I'll link them here when I get them uploaded.

Link to post
On 12/1/2020 at 12:46 PM, Lunz_ said:

That was exactly the plugin I was looking for many thanks !

 

Just a note to maybe add an env for influx_username (for Auth) as my existing DB required influx_password AND username.

I modified directly the telegraf.conf but would be a great addition for other users.

 

Thanks Again !

 

This should be easy enough.  I'll take a look.

Link to post
On 11/23/2020 at 8:38 AM, CS01-HS said:

I may be an edge case but in beta35 this (very handy) docker fills up my syslog with the following error until the system's overloaded.


Nov 23 10:00:10 NAS kernel: bad: scheduling from the idle thread!
Nov 23 10:00:10 NAS kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.18-Unraid #1
Nov 23 10:00:10 NAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5005-ITX, BIOS P1.40 08/06/2018
Nov 23 10:00:10 NAS kernel: Call Trace:
Nov 23 10:00:10 NAS kernel: dump_stack+0x6b/0x83
Nov 23 10:00:10 NAS kernel: dequeue_task_idle+0x21/0x2a
Nov 23 10:00:10 NAS kernel: __schedule+0x135/0x49e
Nov 23 10:00:10 NAS kernel: ? __mod_timer+0x215/0x23c
Nov 23 10:00:10 NAS kernel: schedule+0x77/0xa0
Nov 23 10:00:10 NAS kernel: schedule_timeout+0xa7/0xe0
Nov 23 10:00:10 NAS kernel: ? __next_timer_interrupt+0xaf/0xaf
Nov 23 10:00:10 NAS kernel: msleep+0x13/0x19
Nov 23 10:00:10 NAS kernel: pci_raw_set_power_state+0x185/0x257
Nov 23 10:00:10 NAS kernel: pci_restore_standard_config+0x35/0x3b
Nov 23 10:00:10 NAS kernel: pci_pm_runtime_resume+0x29/0x7b
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: __rpm_callback+0x6b/0xcf
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_callback+0x50/0x66
Nov 23 10:00:10 NAS kernel: ? pci_pm_default_resume+0x1e/0x1e
Nov 23 10:00:10 NAS kernel: rpm_resume+0x2e2/0x3d6
Nov 23 10:00:10 NAS kernel: ? __schedule+0x47d/0x49e
Nov 23 10:00:10 NAS kernel: __pm_runtime_resume+0x55/0x71
Nov 23 10:00:10 NAS kernel: __intel_runtime_pm_get+0x15/0x4a [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_enable+0x53/0x147 [i915]
Nov 23 10:00:10 NAS kernel: i915_pmu_event_add+0xf/0x20 [i915]
Nov 23 10:00:10 NAS kernel: event_sched_in+0xd3/0x18f
Nov 23 10:00:10 NAS kernel: merge_sched_in+0xb4/0x1de
Nov 23 10:00:10 NAS kernel: visit_groups_merge.constprop.0+0x174/0x3ad
Nov 23 10:00:10 NAS kernel: ctx_sched_in+0x11e/0x13e
Nov 23 10:00:10 NAS kernel: perf_event_sched_in+0x49/0x6c
Nov 23 10:00:10 NAS kernel: ctx_resched+0x6d/0x7c
Nov 23 10:00:10 NAS kernel: __perf_install_in_context+0x117/0x14b
Nov 23 10:00:10 NAS kernel: remote_function+0x19/0x43
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_queue+0x103/0x1a4
Nov 23 10:00:10 NAS kernel: flush_smp_call_function_from_idle+0x2f/0x3a
Nov 23 10:00:10 NAS kernel: do_idle+0x20f/0x236
Nov 23 10:00:10 NAS kernel: cpu_startup_entry+0x18/0x1a
Nov 23 10:00:10 NAS kernel: start_kernel+0x4af/0x4d1
Nov 23 10:00:10 NAS kernel: secondary_startup_64+0xa4/0xb0

 

I'm glad it's useful.  I haven't seen these errors before.  Did they start after you installed the intel-gpu-telegraf container or after the upgrade of unraid?

Link to post
12 minutes ago, TheBrian said:

I'm glad it's useful.  I haven't seen these errors before.  Did they start after you installed the intel-gpu-telegraf container or after the upgrade of unraid?

I upgraded to beta35 then installed intel-gpu-telegraf for the first time. I'll try again and report back

Link to post
  • 1 month later...
On 12/6/2020 at 8:47 AM, CS01-HS said:

I managed to cause the same error (and freeze my server) playing around with intel_gpu_top in the intel-gpu-tools container so my problem's at a lower level, your container's fine.

After several freezes (which caused unclean shutdowns) in Handbrake using the hardware encoder and also with monitoring, and random corrupted encodes, I got my machine stable with the following changes:

  1. Added intel_iommu=on,igfx_off to syslinux config (this may be optional)
  2. Added a dummy HDMI plug to my headless server (j5005)

It's been stable now for several days despite continuous hardware encoding in Handbrake with no corruption.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.