Jump to content
Viaduct

Scripts for Server Monitoring using Influx DB and Grafana without Telegraf agent

63 posts in this topic Last Reply

Recommended Posts

6 hours ago, zandrsn said:

Strangely enough all of a sudden your script seems to be working for me... And the examples from the manual work as they did for you. I didn't change anything else in the script, so I'm not sure what was going on, but I appreciate the help!

 

 

My pleasure :D

Share this post


Link to post
On 9/24/2016 at 6:39 AM, Viaduct said:

ipmi.sh

 

Script for monitoring IPMI parameters.

 

** A working influxdb setup is required - see opening post **

 

**Disclaimer: I am not a programmer and thus please use the scripts at your own risk. Please read the script to understand what it is trying to achieve. There is no error trapping in it and no check to see if it is already running, so in theory if the script running time is longer than the crontab interval you could end up in a ever increasing system load and potentially crash the server.**

 

This requires ipmi-sensors to be installed. I use the excellent dmacias IPMI plugin which installs this.

 

You need to edit the script to add the devices you want measuring in the $tagsArray statement. Different motherboards will have different values. Run ipmi-sensors from the command line to see what yours are. I'm not clever enough to write a script that does it automatically. And you will need to change the parameters mentioned in the first post in the curl statement. You will need to use a unix type editor to do it e.g. Notepad++ such that the correct line endings are in the file. Save them somewhere that crontab can access them (mine are in /mnt/cache/appdata/myscripts/).

 

Note that the tagName cannot have spaces in, so I have had to add an extra line to remove the spaces from the string.

 

 


#!/usr/bin/php
<?php

$tagsArray = array(
"CPU_FAN1", 
"REAR_FAN1", 
"REAR_FAN2",
"MB Temp",
"CPU Temp"
);



//do system call

$call = "ipmi-sensors";
$output = shell_exec($call);

//parse output for tag and value

foreach ($tagsArray as $tag) {

preg_match("/".$tag.".*(\b\d+\b)\..*$/mi", $output, $match);

//send measurement, tag and value to influx

sendDB($match[1], $tag);

}
//end system call


//send to influxdb

function sendDB($val, $tagname) {
$tagname2 = str_replace(' ', '', $tagname);
$curl = "curl -i -XPOST 'http://influxDBIP:8086/write?db=telegraf' --data-binary 'IPMI,host=Tower,region=us-west "
.$tagname2."=".$val."'";
$execsr = exec($curl);
}

?>
 

 

 

I run this script every 5 min in crontab:

 

 


*/5 * * * * /mnt/cache/appdata/myscripts/ipmi.sh &>/dev/null 2>&1
 

 

 

Grafana does not need any specific setup to show this metric since it is a simple value. The measurement IPMI will appear in the list of measurements to select.

To show fan speed and temp on the same graph, move the temps to the right axis and the fans to the left axis.

@Viaduct - Thanks for the script! I tweaked the. regex a bit to include the decimal places as my current draw was below 1.  Seems to be working well.

 

Will the crontab settings save reboots or does it need to be added to the GO script?

 

preg_match("/".$tag.".*(\b\d+\b\.\d\d).*$/mi", $output, $match);

 

 

g8Bfq1v.png

 

 

Share this post


Link to post

Need to add to go script ... something like

 

 


(crontab -l; echo "*/15 * * * * /appdata/scripts/drivespace.sh &>/dev/null 2>&1") | crontab -
(crontab -l; echo "*/5 * * * * /appdata/scripts/hdtemp.sh &>/dev/null 2>&1") | crontab -
(crontab -l; echo "* * * * * /appdata/scripts/iostats.sh &>/dev/null 2>&1") | crontab -
-----*snip*-----

 

Share this post


Link to post

Better way is to create a file on your flash drive under /config/plugins with the extension .cron.  This will be loaded into crontab when unraid boots.  You can test it and import the cron files by typing update_cron

 

 

Share this post


Link to post
13 minutes ago, RAINMAN said:

/config/plugins

should be /config/plugins/dynamix/

Share this post


Link to post
On 6/2/2017 at 8:02 AM, Lynxphp said:

Hi!

 

First of all, a huge thanks Viaduct for your scripts! It motivated to start scripting again in php (i made a script to control my case fans depending on the drive, mobo and CPU temps, with various thresholds) and the result in grafana looks awesome.

 

I recently bought a WD Red 8tb drive. Unfortunately it always reports "active/idle" (even though the unraid GUI indicates the drive is spun down) with "hdparm -C", which renders the spinup script useless to identify whether the drive is spun up or down:

 

 

I've been looking for other means to get the spin status, but didn't find anything useful.

 

Any ideas?

 

In attachement, a preview of a part of my grafana dashboard. It doesn't illustrate my problem, i just post it as eye candy.

2017-06-01.png

 

 

10

2017-06-01.png

 

I have been successfully collecting data of my entire ESXi host, all the various vm's and the hardware components and have been able to portray it into my grafana dashboard. However I have not been able to find a sweet dashboard for unraid, your dashboard for unraid is truly one of the best. I love the way you have used tables show show the disk temps for each of HDD's.

Do you or anyone else here mind sharing the json file of this dashboard?

 

Also, what is the ideal script i can use to collect the data for unraid/plex installed on it into influxdb? I have seen multiple scripts online. But whats the most comprehensive and recommended out here.

Share this post


Link to post

Thought ya'll might be interested in a new docker app I added to community apps under the name apcupsd-influxdb-exporter.

https://hub.docker.com/r/atribe/apcupsd-influxdb-exporter 

This is a docker wrapper around a python script that gets metrics out of apcupsd and puts them in influxdb.

I just barely uploaded the xml to github, so I'm not sure how long that process takes for it to show up. but I've been using it for a few days and it works great.

Share this post


Link to post
36 minutes ago, atribe said:

Thought ya'll might be interested in a new docker app I added to community apps under the name apcupsd-influxdb-exporter.

https://hub.docker.com/r/atribe/apcupsd-influxdb-exporter 

This is a docker wrapper around a python script that gets metrics out of apcupsd and puts them in influxdb.

I just barely uploaded the xml to github, so I'm not sure how long that process takes for it to show up. but I've been using it for a few days and it works great.

    "atribe's Repository": {
        "atribe/apcupsd-influxdb-exporter": [
            "Fatal: No valid Overview Or Description present - Application dropped from CA automatically - Possibly far too many formatting tags present",
            "No Icon specified within the application template"
        ],

 

Share this post


Link to post
14 hours ago, dockerPolice said:

    "atribe's Repository": {
        "atribe/apcupsd-influxdb-exporter": [
            "Fatal: No valid Overview Or Description present - Application dropped from CA automatically - Possibly far too many formatting tags present",
            "No Icon specified within the application template"
        ],

 

Aw shucks. Ok added a description and it is now appearing in community apps.

Share this post


Link to post
On 11/2/2016 at 3:29 PM, RAINMAN said:

I got some motivation from the scripts posted here to add some monitoring to my UNRAID installation as well.  I figured it was a bit less resource intensive to do it directly from bash but I'm totally guessing on that. 

 

I also wrote scripts for my DD-WRT router and windows PCs (powershell) but I figured for now I'd share the unraid scripts I wrote in case they are useful to anyone.  I'm not that experienced with bash scripting so if there is anything I could do better I'd appreciate the corrections.  All I ask is if you make improvements please share it back to me and the community.

 

I actually created 3 scripts for different intervals.  1, 5 and 30 mins.

 

Cron Jobs

 


#
# InfluxDB Stats 1 Minute (Delay from reading CPU when all the other PCs in my network report in)
# * * * * * sleep 10; /boot/custom/influxdb/influxStats_1m.sh > /dev/null 2>&1
#
# InfluxDB Stats 5 Minute
# 0,10 * * * * /boot/custom/influxdb/influxStats_5m.sh > /dev/null 2>&1
#
# InfluxDB Stats 30 Minute
# 0,30 * * * * /boot/custom/influxdb/influxStats_30m.sh > /dev/null 2>&1
 

 

 

 

Basic variables I use in all 3 scripts.

 


#
# Set Vars
#
DBURL=http://192.168.254.3:8086
DBNAME=statistics
DEVICE="UNRAID"
CURDATE=`date +%s`
 

 

 

 

CPU

Records CPU metrics - Load averages and CPU time

 


# Had to increase to 10 samples because I was getting a spike each time I read it.  This seems to smooth it out more
top -b -n 10 -d.2 | grep "Cpu" |  tail -n 1 | awk '{print $2,$4,$6,$8,$10,$12,$14,$16}' | while read CPUusr CPUsys CPUnic CPUidle CPUio CPUirq CPUsirq CPUst
do
top -bn1 | head -3 | awk '/load average/ {print $12,$13,$14}' | sed 's/,//g' | while read LAVG1 LAVG5 LAVG15
do
	curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "cpuStats,Device=${DEVICE} CPUusr=${CPUusr},CPUsys=${CPUsys},CPUnic=${CPUnic},CPUidle=${CPUidle},CPUio=${CPUio},CPUirq=${CPUirq},CPUsirq=${CPUsirq},CPUst=${CPUst},CPULoadAvg1m=${LAVG1},CPULoadAvg5m=${LAVG5},CPULoadAvg15m=${LAVG15} ${CURDATE}000000000" >/dev/null 2>&1
done
done
 

 

 

Memory Usage

 


top -bn1 | head -4 | awk '/Mem/ {print $6,$8,$10}' | while read USED FREE CACHE
do
curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "memoryStats,Device=${DEVICE} memUsed=${USED},memFree=${FREE},memCache=${CACHE} ${CURDATE}000000000" >/dev/null 2>&1	
done
 

 

 

Network

 


if [[ -f byteCount.tmp ]] ; then

# Read the last values from the tmpfile - Line "eth0"
grep "eth0" byteCount.tmp | while read dev lastBytesIn lastBytesOut
do
	cat /proc/net/dev | grep "eth0" | grep -v "veth" | awk '{print $2, $10}' | while read currentBytesIn currentBytesOut 
	do			
		# Write out the current stats to the temp file for the next read
		echo "eth0" ${currentBytesIn} ${currentBytesOut} > byteCount.tmp

		totalBytesIn=`expr ${currentBytesIn} - ${lastBytesIn}`
		totalBytesOut=`expr ${currentBytesOut} - ${lastBytesOut}`

		# Prevent negative numbers when the counters reset.  Could miss data but it should be a marginal amount.
		if [ ${totalBytesIn} -le 0 ] ; then
			totalBytesIn=0
		fi

		if [ ${totalBytesOut} -le 0 ] ; then
			totalBytesOut=0
		fi
  				
		curl -is -XPOST "$DBURL/write?db=$DBNAME&u=$USER&p=$PASSWORD" --data-binary "interfaceStats,Interface=eth0,Device=${DEVICE} bytesIn=${totalBytesIn},bytesOut=${totalBytesOut} ${CURDATE}000000000" >/dev/null 2>&1

	done
done 

else
    # Write out blank file
echo "eth0 0 0" > byteCount.tmp
fi
 

 

 

Hard Disk IO

 


# Gets the stats for disk#
#
# The /proc/diskstats file displays the I/O statistics
# of block devices. Each line contains the following 14
# fields:
#  1 - major number
#  2 - minor mumber
#  3 - device name
#  4 - reads completed successfully
#  5 - reads merged
#  6 - sectors read <---
#  7 - time spent reading (ms)
#  8 - writes completed
#  9 - writes merged
# 10 - sectors written <---
# 11 - time spent writing (ms)
# 12 - I/Os currently in progress
# 13 - time spent doing I/Os (ms)
# 14 - weighted time spent doing I/Os (ms)
#

# Special Cases
# sda = Flash/boot
# sdf = Cache
# sdd = Parity

if [[ -f diskByteCountTest.tmp ]] ; then
cat /proc/diskstats | grep -E 'md|sdd|sda|sdf|loop0' | grep -E -v 'sd[a-z]1' |sed 's/md//g' | awk '{print "disk" $3, $6, $10}' | while read DISK currentSectorsRead currentSectorsWrite
do
	# Check if the disk is in the temp file.
	if grep ${DISK} diskByteCountTest.tmp 
	then
		grep ${DISK} diskByteCountTest.tmp | while read lDISK lastSectorsRead lastSectorsWrite
		do
			# Replace current disk stats with new stats for the next read
			sed -i "s/^${DISK}.*/${DISK} ${currentSectorsRead} ${currentSectorsWrite}/" diskByteCountTest.tmp
	  
			# Need to multiply by 512 to convert from sectors to bytes
			(( totalBytesRead = 512 * (${currentSectorsRead} - ${lastSectorsRead}) ))
			(( totalBytesWrite = 512 * (${currentSectorsWrite} - ${lastSectorsWrite}) ))
			(( totalBytes = totalBytesRead + totalBytesWrite))

			# Cases
			case ${DISK} in
			"disksda" )
				curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStats,Disk=boot,Device=${DEVICE} BytesPersec=${totalBytes},ReadBytesPersec=${totalBytesRead},WriteBytesPersec=${totalBytesWrite} ${CURDATE}000000000" >/dev/null 2>&1 ;;
			"disksdd" )
				curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStats,Disk=parity,Device=${DEVICE} BytesPersec=${totalBytes},ReadBytesPersec=${totalBytesRead},WriteBytesPersec=${totalBytesWrite} ${CURDATE}000000000" >/dev/null 2>&1 ;;
			"disksdf" )
				curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStats,Disk=cache,Device=${DEVICE} BytesPersec=${totalBytes},ReadBytesPersec=${totalBytesRead},WriteBytesPersec=${totalBytesWrite} ${CURDATE}000000000" >/dev/null 2>&1 ;;
			"diskloop0" )
				curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStats,Disk=docker,Device=${DEVICE} BytesPersec=${totalBytes},ReadBytesPersec=${totalBytesRead},WriteBytesPersec=${totalBytesWrite} ${CURDATE}000000000" >/dev/null 2>&1 ;;
			*)
				curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStats,Disk=${DISK},Device=${DEVICE} BytesPersec=${totalBytes},ReadBytesPersec=${totalBytesRead},WriteBytesPersec=${totalBytesWrite} ${CURDATE}000000000" >/dev/null 2>&1
				;;

		done
	else
		# If the disk wasn't in the temp file then add it to the end
		echo ${DISK} ${currentSectorsRead} ${currentSectorsWrite} >> diskByteCountTest.tmp
	fi
done
else
    # Write out a new file
cat /proc/diskstats | grep -E 'md|sdd|sda|sdf|loop0' | grep -E -v 'sd[a-z]1' |sed 's/md//g' | awk '{print "disk" $3, $6, $10}' | while read DISK currentSectorsRead currentSectorsWrite
do
	echo ${DISK} ${currentSectorsRead} ${currentSectorsWrite} >> diskByteCountTest.tmp
done
fi
 

 

 

Number of Dockers Running

 


docker info | grep "Running" | awk '{print $2}' | while read NUM
do
curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "dockersRunning,Device=${DEVICE} Dockers=${NUM} ${CURDATE}000000000" >/dev/null 2>&1
done
 

 

 

 

Hard Disk Temperatures

 


# Current array assignment.
# I could pull the automatically from /var/local/emhttp/disks.ini
# Parsing it wouldnt be that easy though.
DISK_ARRAY=( sdd sdg sde sdi sdc sdb sdh sdf )
DESCRIPTION=( parity disk1 disk2 disk3 disk4 disk5 disk6 cache )
#
# Added -n standby to the check so smartctl is not spinning up my drives
#
i=0
for DISK in "${DISK_ARRAY[@]}"
do
smartctl -n standby -A /dev/$DISK | grep "Temperature_Celsius" | awk '{print $10}' | while read TEMP 
do
	curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "DiskTempStats,DEVICE=${DEVICE},DISK=${DESCRIPTION[$i]} Temperature=${TEMP} ${CURDATE}000000000" >/dev/null 2>&1
done
((i++))
done
 

 

 

 

Hard Disk Spinup Status

 


# Current array assignment.
# I could pull the automatically from /var/local/emhttp/disks.ini
# Parsing it wouldnt be that easy though.
DISK_ARRAY=( sdd sdg sde sdi sdc sdb sdh sdf )
DESCRIPTION=( parity disk1 disk2 disk3 disk4 disk5 disk6 cache )
i=0
for DISK in "${DISK_ARRAY[@]}"
do
hdparm -C /dev/$DISK | grep 'state' | awk '{print $4}' | while read STATUS
do
	#echo ${DISK} : ${STATUS} : ${DESCRIPTION[$i]}
	if [ ${STATUS} = "standby" ]
	then
		curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStatus,DEVICE=${DEVICE},DISK=${DESCRIPTION[$i]} Active=0 ${CURDATE}000000000" >/dev/null 2>&1
	else
		curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "diskStatus,DEVICE=${DEVICE},DISK=${DESCRIPTION[$i]} Active=1 ${CURDATE}000000000" >/dev/null 2>&1
	fi
done
((i++))
done
 

 

 

Hard Disk Space

 


# Gets the stats for boot, disk#, cache, user
#
df | grep "mnt/\|/boot\|docker" | grep -v "user0\|containers" | sed 's/\/mnt\///g' | sed 's/%//g' | sed 's/\/var\/lib\///g'| sed 's/\///g' | while read MOUNT TOTAL USED FREE UTILIZATION DISK
do
if [ "${DISK}" = "user" ]; then
	DISK="array_total"
fi
curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "drive_spaceStats,Device=${DEVICE},Drive=${DISK} Free=${FREE},Used=${USED},Utilization=${UTILIZATION} ${CURDATE}000000000" >/dev/null 2>&1	
done
 

 

 

 

Uptime

 


UPTIME=`cat /proc/uptime | awk '{print $1}'`
curl -is -XPOST "$DBURL/write?db=$DBNAME" --data-binary "uptime,Device=${DEVICE} Uptime=${UPTIME} ${CURDATE}000000000" >/dev/null 2>&1
 

 

 

 

Would you mind sharing your windows script? I've been struggling to get info imported from my Windows desktop.

Share this post


Link to post
Posted (edited)
On 4/11/2019 at 10:43 AM, jerseyknoll said:

Would you mind sharing your windows script? I've been struggling to get info imported from my Windows desktop.

I posted them on github.  Browse into the Windows directory.  https://github.com/Scott-St/InfluxDB-Stats-Collection

 

I have most likely updated and changed them from when I posted that but I haven't updated Github.  If I get some time I should go through and update the scripts.

 

Edit: I'm also not a programmer so this may or may not be the best way to do any of these scripts but the hackery works for me lol

Edited by RAINMAN

Share this post


Link to post

Would someone mind sharing there Grafana panel setup for drive temps? I've managed to figure out everything but the drive temps.

Thanks

Share this post


Link to post
On 9/24/2016 at 9:39 AM, Viaduct said:

apcupsd.sh

 

Script to monitor values from the UPS.

 

** A working influxdb setup is required - see opening post **

 

**Disclaimer: I am not a programmer and thus please use the scripts at your own risk. Please read the script to understand what it is trying to achieve. There is no error trapping in it and no check to see if it is already running, so in theory if the script running time is longer than the crontab interval you could end up in a ever increasing system load and potentially crash the server.**

 

This uses apcaccess to get values from the UPS such as 'time on battery', 'battery %charge', 'load in %', 'time left on battery', 'unit temperature'.

 

 


#!/usr/bin/php
<?php

$command = "apcaccess";
$args = "status";
$tagsArray = array(
"LOADPCT", 
"ITEMP", 
"TIMELEFT", 
"TONBATT", 
"BCHARGE"
);

//do system call

$call = $command." ".$args;
$output = shell_exec($call);

//parse output for tag and value

foreach ($tagsArray as $tag) {

preg_match("/".$tag."\s*:\s([\d|\.]+)/si", $output, $match);

//send measurement, tag and value to influx

sendDB($match[1], $tag);

}
//end system call


//send to influxdb

function sendDB($val, $tagname) {

$curl = "curl -i -XPOST 'http://influxDBIP:8086/write?db=telegraf' --data-binary 'APC,host=Tower,region=us-west "
.$tagname."=".$val."'";
$execsr = exec($curl);

}

?>
 

 

 

Nothing special in grafana for this since they are simple values returned. I modified the LOADPCT by multiplying it with the rated capacity of my unit to get the power in watts.

 

I run this every minute in crontab:

 

 


* * * * * /mnt/cache/appdata/myscripts/apcupsd.sh &>/dev/null 2>&1
 

 

 

I'm by no means a super experienced programmer, I'm hoping to get some help in adapting this script to instead export the values from NUT-Settings to UnfluxDB using the same fields.  Then goal is that I want to have my Eaton UPS monitored in a Grafana Dashboard from GilbN: https://technicalramblings.com/blog/setting-grafana-influxdb-telegraf-ups-monitoring-unraid/ but the built-in UnRaid UPS Monitor only works with APC brand UPS's.

 

 

Here is the normal output from "apcaccess" as the script was designed for:
 

root@CACHE:~# apcaccess status
LOADPCT  : 3.0 Percent
BCHARGE  : 100.0 Percent
TIMELEFT : 130.6 Minutes

 

 

Here is the normal output from "upsc" from NUT-Settings of the same UPS attached:
 

root@CACHE:~# upsc eaton5px3000@127.0.0.1
ups.load: 3
battery.charge: 100
battery.runtime: 7833

 

 

The output and fields from the UPS are different along with "upsc" providing the runtime of the UPS in seconds instead of minutes.  Most of the code makes perfect sense to me and I handle most of the PHP code there, just the "preg_match" line is beyond me.

 

So ideally I would like to have for example "ups.load: 3" from "upsc" get written to the InfluxDB as "LOADPCT" the same way as the original apcaccess was; same for the other two variables.

 

 

Thanks in advance if anyone can provide and help here, even just an explanation of the "preg_match" line so that I can maybe try to figure it out myself.  I figured I would post here in case a modified UserScript for use with NUT-Settings could be archived here for others to use in the future.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.