Syslog notify - create notifications if specific words occur in the logs

mgutt · March 12, 2022

A simple script which produces Unraid notifications if specific words are found in the syslog. It can be executed for example hourly by the user scripts plugin.

Donate? 🤗

#!/bin/bash
# #####################################
# Name:        Syslog notify v1.3
# Description: Creates notification if log contains errors or its size exceeds 90% of the available space
# Author:      Marc Gutt
# #####################################

# #####################################
# Settings
# #####################################

# get most recent syslog file
syslog_file=$(ls -t /var/log/syslog{.[0-9],} 2>/dev/null | head -n 1)

# words that should cause a notification
words="corrupt|error|fail|tainted"

# store line number of last found error in this file
log_file="/tmp/syslog-notify-last-error-line-number.log"

# ignore these phrases (you can't use more than 4 wildcards per line!)
ignore_lines=(
  'kernel: CIFS: VFS: \\*\* error -9 on ioctl to get interface list' # unsolvable message from UD plugin
  'sshd[*]: Read error from remote host * port *: Connection reset by peer' # interrupted ssh connection
  'sshd[*]: Read error from remote host * port *: Connection timed out' # interrupted ssh connection
)

# #####################################
# Script
# #####################################

# make script race condition safe
if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

# obtain line number of last check
if [[ -f "$log_file" ]]; then
  line_number_start=$(cat "$log_file")
  # syslog has been truncated
  if [[ $line_number_start -gt $(grep -c ^ "$syslog_file") ]]; then
    line_number_start=0
  fi
# store last line number on first execution
else
  line_number_start=$(grep -c ^ "$syslog_file")
  line_number_start=$((line_number_start-100))
  echo "$line_number_start" > "$log_file"
fi

# parse logs
EOL=$'\n'
errors=""
while read -r line; do
  # ignore specific lines
  for ignore_line in "${ignore_lines[@]}"; do
    IFS=\* read -r one two three four <<< "$ignore_line"
    if [[ $line == *"$one"*"$two"*"$three"*"$four" ]]; then
      continue 2
    fi
  done
  # remember last line
  last_line="$line"
  # combine multiple error messages
  errors="$errors$EOL$line"
done < <(tail -n +"$((line_number_start+1))" "$syslog_file" | grep -iP "($words)")

# create notification for new errors
if [[ $errors ]]; then
  # remember line number of last error
  line_number_start=$(grep -nFx "$last_line" "$syslog_file" | cut -f 1 -d ":")
  echo "$line_number_start" > "$log_file"
  # send notification
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "syslog $(echo "$errors" | grep -ioP "($words)" | tr '[:upper:]' '[:lower:]' | sort -u | xargs)" -d "${errors:1}"
  exit
else
  # store last line number if no error has been found
  line_number_start=$(grep -c ^ "$syslog_file")
  echo "$line_number_start" > "$log_file"
fi

# create notificaton if log exceeds usage of 90%
log_size=$(df | grep -oP "[0-9]+(?=% /var/log)")
if [[ ! -f /tmp/syslog-notify.size ]] && [[ $log_size -gt 90 ]]; then
  touch /tmp/syslog-notify.size
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "log utilizes more than 90%!" -d "$(du -h /var/log/* | sort -h | tail)"
elif [[ -f /tmp/syslog-notify.size ]]; then
  rm /tmp/syslog-notify.size
fi

E-mail notification example send by Unraid:

You can test it by creating a custom error message in your syslog:

logger Errortest

boomam · May 26, 2022

Great script!

I'll be modifying it at the weekend to alert me to BTRFS scrub errors/corrupt files, and was exactly what I was looking for!

wolfNZ · May 31, 2022

Hi, thanks for creating this it's really useful in combination with a discord webhook. Currently I am only getting notifications which say "Failed sending notification". The syslog reflects this through the following entry.

Quote

May 31 19:06:07 Tower emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/startScript.sh /tmp/user.scripts/tmpScripts/SyslogNotify/script
May 31 19:06:12 Tower Discord.sh: Failed sending notification

Any idea what would be causing the script to fail? If I run a test notification from the notification settings panel it works fine.

Edited May 31, 2022 by wolfNZ

mgutt · August 7, 2022

On 5/31/2022 at 9:15 AM, wolfNZ said:

Currently I am only getting notifications which say "Failed sending notification".

I don't know what you target is, but this script finds the word "fail" in the syslog line "May 31 19:06:12 Tower Discord.sh: Failed sending notification" and sends this line as a notification. So it works as expected.

mgutt · August 7, 2022

Released v1.2

- now it is much faster as it does not parse the complete log file on every execution. Instead it uses the line number of the last found error as the starting point.

- in addition it sends a notification if the syslog file exceeds 90% of the available space

mgutt · November 19, 2022

At the moment I'm experimenting with log entries which occur extremely often and maybe should be added as a new feature to the above script:

log_last=3
log_path="/var/log/syslog"
while read -r count word; do
  echo -e "\nLast $log_last log entries which end to the word '$word' and appeared $count times:"
  grep "[ ']$word$" $(ls -tr "$log_path"*) | tail -n "$log_last"
done < <(grep -hoP "[^ ']+$" "$log_path"* | sort | uniq -c | sort -nr | grep -P '^[ ]+[0-9]{4,}')

For me it returns the following, which tells me that I should add further filtering regarding the time frame as the "worker process" do not happen anymore and maybe adding some whitelisting as "disabled state" (happens every night as I'm stopping containers to created backups) and "RAM-Disk synced" (custom entry I create on my own) log entries are expected:

Last 3 log entries which end to the word '6' and appeared 3966 times:
/var/log/syslog.1:Oct 30 22:36:31 thoth nginx: 2022/10/30 22:36:31 [alert] 11544#11544: worker process 19012 exited on signal 6
/var/log/syslog.1:Oct 30 22:36:33 thoth nginx: 2022/10/30 22:36:33 [alert] 11544#11544: worker process 19167 exited on signal 6
/var/log/syslog.1:Oct 30 22:36:35 thoth nginx: 2022/10/30 22:36:35 [alert] 11544#11544: worker process 19301 exited on signal 6

Last 3 log entries which end to the word 'state' and appeared 2538 times:
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered disabled state
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered blocking state
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered forwarding state

Last 3 log entries which end to the word 'synced' and appeared 2377 times:
/var/log/syslog:Nov 19 11:30:01 thoth docker: RAM-Disk synced
/var/log/syslog:Nov 19 12:00:01 thoth docker: RAM-Disk synced
/var/log/syslog:Nov 19 12:30:01 thoth docker: RAM-Disk synced

I executed the same code on a log of a broken server and it looks like this:

Last 3 log entries which end to the word 'write-back' and appeared 8550 times:
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdcb000-0xbcdcbfff], got write-back
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdca000-0xbcdcafff], got write-back
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdc9000-0xbcdc9fff], got write-back

Last 3 log entries which end to the word 'failed' and appeared 4227 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed

Last 3 log entries which end to the word 'nchan_max_reserved_memory.' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 3623. Increase nchan_max_reserved_memory.
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 4506. Increase nchan_max_reserved_memory.
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 233. Increase nchan_max_reserved_memory.

Last 3 log entries which end to the word 'memory' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory

Last 3 log entries which end to the word '"localhost"' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195093 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost"
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195094 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195095 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/wireguard?buffer_length=1 HTTP/1.1", host: "localhost"

Last 3 log entries which end to the word '/devices' and appeared 2013 times:
Nov 10 05:23:18 Tower nginx: 2022/11/10 05:23:18 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices
Nov 10 05:23:19 Tower nginx: 2022/11/10 05:23:19 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices
Nov 10 05:23:20 Tower nginx: 2022/11/10 05:23:20 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices

Last 3 log entries which end to the word '/disks' and appeared 2005 times:
Nov 10 05:23:19 Tower nginx: 2022/11/10 05:23:19 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks
Nov 10 05:23:20 Tower nginx: 2022/11/10 05:23:20 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks

My dev server, which runs for 6 days now, returns nothing, which probably means its healthy, but I check only log entries, which appear more than 1000 times and maybe it should instead use the total size of the log file and use a variable value. Like if the log file has 1000 lines, it should return log entries which appear more than 100 times and if it has 5000 lines, than 500 times, and so on...

Maybe some other users can test the code snippet and return some feedback? 😎

kizer · November 21, 2022

I tried to get your 3 entry part to work. Is there a specific placement for the code to work? I'm not seeing it in the output of the script nor in the log file. So I'm guessing its in the wrong place. I just placed it in the very bottom of the code. Lol

mgutt · November 21, 2022

2 hours ago, kizer said:

Is there a specific placement for the code to work?

Copy & Paste it into the Terminal. If you don't get any output, than your logs have a good condition.

You could replace {4,} against {3,}. By that it finds all lines which appear only 100 times. But those should be normal.

Revan335 · February 7, 2023

On 5/27/2022 at 12:22 AM, boomam said:

Great script!

I'll be modifying it at the weekend to alert me to BTRFS scrub errors/corrupt files, and was exactly what I was looking for!

Can you post your Modifications? @boomam

Edited February 8, 2023 by Revan335

Revan335 · February 12, 2023

My Script with 70% Log and BTRFS include.

I hope this working by BTRFS Notifications/Problems.

#!/bin/bash
# #####################################
# Name:        Syslog notify v1.2
# Description: Creates notification if log contains errors or its size exceeds 70% of the available space
# Author:      Marc Gutt
# #####################################

# #####################################
# Settings
# #####################################

# get most recent syslog file
syslog_file=$(ls -t /var/log/syslog{.[0-9],} 2>/dev/null | head -n 1)

# words that should cause a notification
words="corrupt|error|fail|tainted|BTRFS"

# store line number of last found error in this file
log_file="/tmp/syslog-notify-last-error-line-number.log"

# log what is being done
verbose=false

# #####################################
# Script
# #####################################

# make script race condition safe
if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

# check user settings
[[ $verbose == 0 ]] || [[ $verbose == false ]] && unset verbose

# obtain line number of last check
if [[ -f "$log_file" ]]; then
  line_number_start=$(cat "$log_file")
  # syslog has been truncated
  if [[ $line_number_start -gt $(grep -c ^ "$syslog_file") ]]; then
    [[ $verbose ]] && echo "Monitoring of a new syslog file begins"
    line_number_start=0
  fi
# store last line number on first execution
else
  line_number_start=$(grep -c ^ "$syslog_file")
  line_number_start=$((line_number_start-100))
  [[ $verbose ]] && echo "Monitoring syslog starts from line $line_number_start"
  echo "$line_number_start" > "$log_file"
fi

# parse logs
EOL=$'\n'
errors=""
while read -r line; do
  # remember last line
  last_line="$line"
  # combine multiple error messages
  errors="$errors$EOL$line"
done < <(tail -n +"$((line_number_start+1))" "$syslog_file" | grep -iP "($words)")

# create notification for new errors
if [[ $errors ]]; then
  # remember line number of last error
  line_number_start=$(grep -nFx "$last_line" "$syslog_file" | cut -f 1 -d ":")
  [[ $verbose ]] && echo "Monitoring syslog continues from line $line_number_start"
  echo "$line_number_start" > "$log_file"
  # send notification
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "syslog $(echo "$errors" | grep -ioP "($words)" | tr '[:upper:]' '[:lower:]' | sort -u | xargs)" -d "${errors:1}"
  exit
else
  # store last line number if no error has been found
  line_number_start=$(grep -c ^ "$syslog_file")
  [[ $verbose ]] && echo "Monitoring syslog continues from last line $line_number_start"
  echo "$line_number_start" > "$log_file"
fi

# create notificaton if log exceeds usage of 70%
log_size=$(df | grep -oP "[0-9]+(?=% /var/log)")
if [[ ! -f /tmp/syslog-notify.size ]] && [[ $log_size -gt 70 ]]; then
  touch /tmp/syslog-notify.size
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "log utilizes more than 90%!" -d "$(du -h /var/log/* | sort -h | tail)"
elif [[ -f /tmp/syslog-notify.size ]]; then
  rm /tmp/syslog-notify.size
fi

Revan335 · March 10, 2023

@mguttHave the Script a ignored Option for example this Entry's?

mgutt · March 11, 2023

12 hours ago, Revan335 said:

Have the Script a ignored Option for example this Entry's?

You can now add ignore strings. I already added your case and a different one which annoyed me.

jordanchin · July 20, 2023

Is this script still working properly in Unraid 6.12? I've noticed that the functionality that stores previous errors isnt working properly anymore. It emails me every time its executed and finds old errors. I've also noticed that the /tmp/syslog-notify-last-error-line-number.log file is always blank.

Thanks!

Syslog notify - create notifications if specific words occur in the logs

Recommended Posts

mgutt

Link to comment

boomam

Link to comment

wolfNZ

Link to comment

mgutt

Link to comment

mgutt

Link to comment

mgutt

Link to comment

kizer

Link to comment

mgutt

Link to comment

Revan335

Link to comment

Revan335

Link to comment

Revan335

Link to comment

mgutt

Link to comment

jordanchin

Link to comment

Join the conversation