Jump to content
jedimstr

Telegraf Agent for InfluxDB/Grafana Dashboard?

30 posts in this topic Last Reply

Recommended Posts

Has anyone either setup a Telegraf agent directly or made a plugin/docker container for collecting their unRAID server stats for a separate InfluxDB/Grafana server?

I'm most interested in monitoring the network access of unRAID and cpu/memory status as I run various containers and/or VMs.

Share this post


Link to post

Nevermind.  Found the right Docker image: https://hub.docker.com/r/jjungnickel/telegraf/

 

hey jedimstr -- did you get telegraf up and running on your box?

 

I got mine all setup and working. CPU/Memory/uptime are working in grafana. I'm now to the point where I need to get network and disk stats. Got any tips for that?

 

Primarily I need to know how to add the network metrics, and somehow pass my array disks to the docker container so I can grab stats from them. Not sure how to proceed. Any info would be great!

 

Thanks!

Share this post


Link to post

Yup, working great for me.

For the drives, just make sure /mnt/ is assigned on Host Path 4 and for Network set Network Type to Host.

 

For additional SNMP network stats from my network switches and to set the InfluxDB Host I modified the telegraf.conf in appdata/telegraf (derived from telegraf.conf.tpl), but by default all drives and networks are monitored as long as you have the assignments correct in the Container config.

 

Once you have Telegraf feeding InfluxDB correctly, you can experiment with the data you get.

 

Here's an example for the network stats:

 

IHV2U6A.jpg

 

And here's my dashboard for my unRAID server:

0HZAxqe.jpg

Share this post


Link to post

Can you share a screenshot of your volume mappings and everything else? Or maybe the xml template file you used? I've tried with the official telegraf docker image and I'm not getting any data into influxdb. I now have the host 'unraid' but no data associated with it.

 

Share this post


Link to post

For the drives, just make sure /mnt/ is assigned on Host Path 4

 

Slightly confused by this statement, I understand how to assign, but what does "Host Path 4" mean? Here's the docker command I'm preparing:

 

docker run -t -v /var/run/docker.sock:/var/run/docker.sock -v "/mnt":"??" --name="telegraf" --net="host" -e INFLUXDB_URL=http://192.168.1.127:8086 -e HOSTNAME=tower jjungnickel/telegraf

 

What would I need to map /mnt to?

 

Thanks jedimstr!

Share this post


Link to post

This is my docker run command:

docker run -d --name="telegraf" --net="bridge" --privileged="true" -e HOST_PROC="/rootfs/proc" -e HOST_SYS="/rootfs/sys" -e HOST_MOUNT_PREFIX="/rootfs" -e HOST_ETC="/rootfs/etc" -e TZ="America/Denver" -v "/mnt/user/appdata/telegraf/telegraf.conf":"/etc/telegraf/telegraf.conf":ro -v "/proc":"/rootfs/proc":ro -v "/":"/rootfs":ro -v "/var/run/docker.sock":"/var/run/docker.sock":ro -v "/sys":"/rootfs/sys":ro -v "/etc":"/rootfs/etc":ro telegraf

 

This requires you to put in your own telegrafl.conf file at /mnt/user/appdata/telegraf/telegraf.conf

 

This has been working for me. The only thing that doesn't look right is my network sent and received. Haven't figured that out yet. But everything else is accurate.

unraid-dashboard.png.87c4a2dcab7a6d0b9807312810de1e94.png

Share this post


Link to post

This is my docker run command:

docker run -d --name="telegraf" --net="bridge" --privileged="true" -e HOST_PROC="/rootfs/proc" -e HOST_SYS="/rootfs/sys" -e HOST_MOUNT_PREFIX="/rootfs" -e HOST_ETC="/rootfs/etc" -e TZ="America/Denver" -v "/mnt/user/appdata/telegraf/telegraf.conf":"/etc/telegraf/telegraf.conf":ro -v "/proc":"/rootfs/proc":ro -v "/":"/rootfs":ro -v "/var/run/docker.sock":"/var/run/docker.sock":ro -v "/sys":"/rootfs/sys":ro -v "/etc":"/rootfs/etc":ro telegraf

 

This requires you to put in your own telegrafl.conf file at /mnt/user/appdata/telegraf/telegraf.conf

 

This has been working for me. The only thing that doesn't look right is my network sent and received. Haven't figured that out yet. But everything else is accurate.

 

Two things about the way Network sent and received are set in Telegraf.

[*]Data is stored as BYTES

[*]Data is progressive total

 

To get your sent and received you need to *8 the values to get bits per second and then to translate the values into recognizable graphs you have to use non_negative_derivative which will track the changes in value rather than just spouting out the growing total.

 

Example:

IHV2U6A.jpg

Share this post


Link to post

For the drives, just make sure /mnt/ is assigned on Host Path 4

 

Slightly confused by this statement, I understand how to assign, but what does "Host Path 4" mean? Here's the docker command I'm preparing:

 

docker run -t -v /var/run/docker.sock:/var/run/docker.sock -v "/mnt":"??" --name="telegraf" --net="host" -e INFLUXDB_URL=http://192.168.1.127:8086 -e HOSTNAME=tower jjungnickel/telegraf

 

What would I need to map /mnt to?

 

Thanks jedimstr!

 

I meant in the unRAID interface for the Docker Container for "Host Path 4".  The reason you want to map /mnt in the container is to have recognizable /disk1 - /diskn and /user available for any storage based stats in InfluxDB.    I also mapped the customized config file as well as "Host Path 2" /mnt/cache/appdata/telegraf/telegraf.conf.tpl to Container Path: /etc/telegraf/telegraf.conf.tpl .

 

Here are my mappings in my Telegraf Container settings:

 

width=900http://i.imgur.com/kI9SeTu.png[/img]

Share this post


Link to post

Thanks to all the information in this thread, I was finally able to get my grafana setup and working. However, I am running into some difficulty getting my graphs to display correctly and I was hoping that you guys wouldn't mind sharing some of your queries/config for your graphs. Specifically, I am looking for some help getting a singlestat to display my entire array and cache pool. I can see that @jedimstr and @atribe both have this on their dashboards and it looks great. Here's what I have now, one singlestat for each disk but I can't seem to find the right query to combine all the disks into one singlestat.

 

Also, I get a strange error when I view my dashboard on the first load (Invalid dimensions for plot, width = 600, height = 0). If I hit the refresh button on the top-right it will go away and display correctly, but that is kinda annoying. Any ideas you all have for a fix for this issue would also be appreciated. I'm just getting started with grafana and just trying to gain some knowledge.

 

Thanks.

-majestic

 

image001.png.01ae82dd874d002fa48fae3a6f601bbb.png

Share this post


Link to post

@majestic I've attached a screenshot of my grafana query. In short it is simply plotting the usage of /mnt/user.

 

I also get the error you've seen, but its only there until the page finishes loading all of the metrics. I've seen some of the gauge panels not work after I've edited them until I save, go to a different dashboard and then go back.

 

Also, I fixed my network output by changing this docker container from --net="bridge" to --net="host".

 

unRAID-Array-Usage.PNG.b1cdfd9d06495bc90791bf3e344a4a4e.PNG

Share this post


Link to post

The single stat error that goes away on refresh or duration change is fixed in the latest Grafana update.

Update your Container and it should fix this issue.

Share this post


Link to post

This is my docker run command:

docker run -d --name="telegraf" --net="bridge" --privileged="true" -e HOST_PROC="/rootfs/proc" -e HOST_SYS="/rootfs/sys" -e HOST_MOUNT_PREFIX="/rootfs" -e HOST_ETC="/rootfs/etc" -e TZ="America/Denver" -v "/mnt/user/appdata/telegraf/telegraf.conf":"/etc/telegraf/telegraf.conf":ro -v "/proc":"/rootfs/proc":ro -v "/":"/rootfs":ro -v "/var/run/docker.sock":"/var/run/docker.sock":ro -v "/sys":"/rootfs/sys":ro -v "/etc":"/rootfs/etc":ro telegraf

 

This requires you to put in your own telegrafl.conf file at /mnt/user/appdata/telegraf/telegraf.conf

 

This has been working for me. The only thing that doesn't look right is my network sent and received. Haven't figured that out yet. But everything else is accurate.

 

I'm struggling to get telegraf working. Here is my docker run command:

docker run -d --name="telegraf" --net="host" --privileged="true" -e INFLUXDB_URL="http://10.0.1.66:8086" -e HOSTNAME="tower" -e HOST_PROC="/rootfs/proc" -e HOST_SYS="/rootfs/sys" -e HOST_MOUNT_PREFIX="/rootfs" -e HOST_ETC="/rootfs/etc" -e TZ="Europe/London" -v "/mnt/cache/appdata/telegraf":"/etc/telegraf":ro -v "/mnt":"/mnt":ro -v "/var/run/docker.sock":"/var/run/docker.sock":ro -v "/mnt/cache/appdata/telegraf/telegraf.conf.tpl":"/etc/telegraf/telegraf.conf.tpl":ro -v "/proc":"/rootfs/proc":ro -v "/":"/rootfs":ro -v "/sys":"/rootfs/sys":ro -v "/etc":"/rootfs/etc":ro jjungnickel/telegraf

 

Keeps exiting without doing anything.

I have the standard telegraph.conf file in as telegraph.conf.tpl

 

Is there anything special I need to add into the telegraph.conf?

 

Any idea what's wrong? I can get the admin screen of influxdb OK, but nil gets into it.

Share this post


Link to post

Viaduct, I just added telegraf to the community apps. It uses environment variables so that you don't need a custom telegraf.conf file. Its working great for me, if you are still having problems with your setup you should give it a try.

Share this post


Link to post

Great. Thank you. I'll delete my instance and config and start afresh with the CA one later.

 

 

Sent from my iPhone using Tapatalk

Share this post


Link to post

Can we start a dashboard sharing repository also? :) I would like to produce this for myself also...

Share this post


Link to post

Can we start a dashboard sharing repository also? :) I would like to produce this for myself also...

 

Do you mean a repo for the json grafana uses? I'm not sure what that would look like at the moment. But I did add influxdb and grafana to community apps. So its pretty easy to set this all up now.

Share this post


Link to post

Is there any chance we can get untelegraf updated to the latest version? I'm running into an issue with it outputting metric to influxDB. I found the following post which pretty much describes the change that was made....

 

https://github.com/influxdata/influxdb/issues/7242

 

Thanks for the heads up, the latest tag for that repo was still on 0.13.2. I switched it to use tag telegraf-1.0. You can refresh community apps or you can manually change the repo (use the advanced view) to appcelerator/telegraf:telegraf-1.0.

Share this post


Link to post

Thanks a bunch! Unfortunately it doesn't look like this fixed the issue. I'm still getting the "retention policy not found" error in telegraf log. Does influxDB have to be updated as well? I'm not sure what version the influxDB docker is on compared to what is out now. Thanks for looking into this!

Share this post


Link to post

Thanks a bunch! Unfortunately it doesn't look like this fixed the issue. I'm still getting the "retention policy not found" error in telegraf log. Does influxDB have to be updated as well? I'm not sure what version the influxDB docker is on compared to what is out now. Thanks for looking into this!

 

Did you check to make sure you updated it properly? From the command line you can type "docker exec -it untelegraf bash" and then "telegraf --version". It should say "Telegraf - version 1.0.0".

You could try nuking influxdb and recreating it. Maybe the create statement from telegraf was botched and needs to be redone (I guess you could just delete the telegraf database in influxdb and then recreate your telegraf container.)

 

As far as the influxdb I have in the community apps it uses the alpine tag, which is at version 1.0.0.

 

Share this post


Link to post

I've confirmed that I did pull the latest update. Telegraf says it's version 1.0. Good call on the telegraf database though. I'll drop it and see what it does when it gets recreated.

Share this post


Link to post

Just an update on things....

 

I nuked influxDB and unTelegraf, and the issue still persists. Made sure appdata was deleted for influx as well. I'm not really sure what is going on here now. I thought it would of been as simple as updating telegraf.

 

I'd be interested to know if anyone else can replicate this from a clean install of Influx and Telegraf.

Share this post


Link to post

Just an update on things....

 

I nuked influxDB and unTelegraf, and the issue still persists. Made sure appdata was deleted for influx as well. I'm not really sure what is going on here now. I thought it would of been as simple as updating telegraf.

 

I'd be interested to know if anyone else can replicate this from a clean install of Influx and Telegraf.

 

I've tracked down the problem, and there is an easy work around.

The work around: in the influxdb admin interface run the following command: CREATE RETENTION POLICY "default" ON "telegraf" DURATION 30d REPLICATION 1 DEFAULT

 

The issue: Influxdb changed the name of the default retention policy to autogen. The image I'm using hasn't updated the telegraf.conf file it uses to reflect that. I've submitted a pull request, so they could fix it soon. Until then you can just create a retention policy named default.

I'm also planning on adding vanilla telegraf to community apps, however, this will require you to create your own telegraf.conf file. But this does let you be much more flexible, compared to using environment variables.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.