Syslog notify - create notifications if specific words occur in the logs


Recommended Posts

A simple script which produces Unraid notifications if specific words are found in the syslog. It can be executed for example hourly by the user scripts plugin.

 

Donate? 🤗

 

#!/bin/bash
# #####################################
# Name:        Syslog notify v1.3
# Description: Creates notification if log contains errors or its size exceeds 90% of the available space
# Author:      Marc Gutt
# #####################################

# #####################################
# Settings
# #####################################

# get most recent syslog file
syslog_file=$(ls -t /var/log/syslog{.[0-9],} 2>/dev/null | head -n 1)

# words that should cause a notification
words="corrupt|error|fail|tainted"

# store line number of last found error in this file
log_file="/tmp/syslog-notify-last-error-line-number.log"

# ignore these phrases (you can't use more than 4 wildcards per line!)
ignore_lines=(
  'kernel: CIFS: VFS: \\*\* error -9 on ioctl to get interface list' # unsolvable message from UD plugin
  'sshd[*]: Read error from remote host * port *: Connection reset by peer' # interrupted ssh connection
  'sshd[*]: Read error from remote host * port *: Connection timed out' # interrupted ssh connection
)

# #####################################
# Script
# #####################################

# make script race condition safe
if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

# obtain line number of last check
if [[ -f "$log_file" ]]; then
  line_number_start=$(cat "$log_file")
  # syslog has been truncated
  if [[ $line_number_start -gt $(grep -c ^ "$syslog_file") ]]; then
    line_number_start=0
  fi
# store last line number on first execution
else
  line_number_start=$(grep -c ^ "$syslog_file")
  line_number_start=$((line_number_start-100))
  echo "$line_number_start" > "$log_file"
fi

# parse logs
EOL=$'\n'
errors=""
while read -r line; do
  # ignore specific lines
  for ignore_line in "${ignore_lines[@]}"; do
    IFS=\* read -r one two three four <<< "$ignore_line"
    if [[ $line == *"$one"*"$two"*"$three"*"$four" ]]; then
      continue 2
    fi
  done
  # remember last line
  last_line="$line"
  # combine multiple error messages
  errors="$errors$EOL$line"
done < <(tail -n +"$((line_number_start+1))" "$syslog_file" | grep -iP "($words)")

# create notification for new errors
if [[ $errors ]]; then
  # remember line number of last error
  line_number_start=$(grep -nFx "$last_line" "$syslog_file" | cut -f 1 -d ":")
  echo "$line_number_start" > "$log_file"
  # send notification
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "syslog $(echo "$errors" | grep -ioP "($words)" | tr '[:upper:]' '[:lower:]' | sort -u | xargs)" -d "${errors:1}"
  exit
else
  # store last line number if no error has been found
  line_number_start=$(grep -c ^ "$syslog_file")
  echo "$line_number_start" > "$log_file"
fi

# create notificaton if log exceeds usage of 90%
log_size=$(df | grep -oP "[0-9]+(?=% /var/log)")
if [[ ! -f /tmp/syslog-notify.size ]] && [[ $log_size -gt 90 ]]; then
  touch /tmp/syslog-notify.size
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "log utilizes more than 90%!" -d "$(du -h /var/log/* | sort -h | tail)"
elif [[ -f /tmp/syslog-notify.size ]]; then
  rm /tmp/syslog-notify.size
fi

 

E-mail notification example send by Unraid:

image.thumb.png.4d04af4eacef4cf266e8715a99ac3216.png

 

You can test it by creating a custom error message in your syslog:

logger Errortest

 

  • Like 4
  • Thanks 4
Link to comment
  • 2 months later...

Hi, thanks for creating this it's really useful in combination with a discord webhook. Currently I am only getting notifications which say "Failed sending notification". The syslog reflects this through the following entry.

 

Quote

May 31 19:06:07 Tower emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/startScript.sh /tmp/user.scripts/tmpScripts/SyslogNotify/script 
May 31 19:06:12 Tower Discord.sh: Failed sending notification

 

Any idea what would be causing the script to fail? If I run a test notification from the notification settings panel it works fine. 

Edited by wolfNZ
Link to comment
On 5/31/2022 at 9:15 AM, wolfNZ said:

Currently I am only getting notifications which say "Failed sending notification".

I don't know what you target is, but this script finds the word "fail" in the syslog line "May 31 19:06:12 Tower Discord.sh: Failed sending notification" and sends this line as a notification. So it works as expected.

Link to comment

Released v1.2

 

- now it is much faster as it does not parse the complete log file on every execution. Instead it uses the line number of the last found error as the starting point.

- in addition it sends a notification if the syslog file exceeds 90% of the available space

 

 

 

 

 

 

 

  • Like 2
Link to comment
  • 3 months later...

At the moment I'm experimenting with log entries which occur extremely often and maybe should be added as a new feature to the above script:

 

log_last=3
log_path="/var/log/syslog"
while read -r count word; do
  echo -e "\nLast $log_last log entries which end to the word '$word' and appeared $count times:"
  grep "[ ']$word$" $(ls -tr "$log_path"*) | tail -n "$log_last"
done < <(grep -hoP "[^ ']+$" "$log_path"* | sort | uniq -c | sort -nr | grep -P '^[ ]+[0-9]{4,}')

 

For me it returns the following, which tells me that I should add further filtering regarding the time frame as the "worker process" do not happen anymore and maybe adding some whitelisting as "disabled state" (happens every night as I'm stopping containers to created backups) and "RAM-Disk synced" (custom entry I create on my own) log entries are expected:

 

Last 3 log entries which end to the word '6' and appeared 3966 times:
/var/log/syslog.1:Oct 30 22:36:31 thoth nginx: 2022/10/30 22:36:31 [alert] 11544#11544: worker process 19012 exited on signal 6
/var/log/syslog.1:Oct 30 22:36:33 thoth nginx: 2022/10/30 22:36:33 [alert] 11544#11544: worker process 19167 exited on signal 6
/var/log/syslog.1:Oct 30 22:36:35 thoth nginx: 2022/10/30 22:36:35 [alert] 11544#11544: worker process 19301 exited on signal 6

Last 3 log entries which end to the word 'state' and appeared 2538 times:
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered disabled state
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered blocking state
/var/log/syslog:Nov 19 02:30:28 thoth kernel: docker0: port 5(veth9338c0f) entered forwarding state

Last 3 log entries which end to the word 'synced' and appeared 2377 times:
/var/log/syslog:Nov 19 11:30:01 thoth docker: RAM-Disk synced
/var/log/syslog:Nov 19 12:00:01 thoth docker: RAM-Disk synced
/var/log/syslog:Nov 19 12:30:01 thoth docker: RAM-Disk synced

 

I executed the same code on a log of a broken server and it looks like this:

 

Last 3 log entries which end to the word 'write-back' and appeared 8550 times:
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdcb000-0xbcdcbfff], got write-back
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdca000-0xbcdcafff], got write-back
Nov 11 04:35:29 Tower kernel: x86/PAT: ipmiseld:4588 map pfn expected mapping type uncached-minus for [mem 0xbcdc9000-0xbcdc9fff], got write-back

Last 3 log entries which end to the word 'failed' and appeared 4227 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: shpool alloc failed

Last 3 log entries which end to the word 'nchan_max_reserved_memory.' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 3623. Increase nchan_max_reserved_memory.
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 4506. Increase nchan_max_reserved_memory.
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: nchan: Out of shared memory while allocating message of size 233. Increase nchan_max_reserved_memory.

Last 3 log entries which end to the word 'memory' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [crit] 6364#6364: ngx_slab_alloc() failed: no memory

Last 3 log entries which end to the word '"localhost"' and appeared 4226 times:
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195093 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost"
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195094 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: *195095 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/wireguard?buffer_length=1 HTTP/1.1", host: "localhost"

Last 3 log entries which end to the word '/devices' and appeared 2013 times:
Nov 10 05:23:18 Tower nginx: 2022/11/10 05:23:18 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices
Nov 10 05:23:19 Tower nginx: 2022/11/10 05:23:19 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices
Nov 10 05:23:20 Tower nginx: 2022/11/10 05:23:20 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /devices

Last 3 log entries which end to the word '/disks' and appeared 2005 times:
Nov 10 05:23:19 Tower nginx: 2022/11/10 05:23:19 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks
Nov 10 05:23:20 Tower nginx: 2022/11/10 05:23:20 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks
Nov 10 05:23:21 Tower nginx: 2022/11/10 05:23:21 [error] 6364#6364: MEMSTORE:00: can't create shared message for channel /disks

 

My dev server, which runs for 6 days now, returns nothing, which probably means its healthy, but I check only log entries, which appear more than 1000 times and maybe it should instead use the total size of the log file and use a variable value. Like if the log file has 1000 lines, it should return log entries which appear more than 100 times and if it has 5000 lines, than 500 times, and so on...

 

Maybe some other users can test the code snippet and return some feedback? 😎

Link to comment
2 hours ago, kizer said:

Is there a specific placement for the code to work?

Copy & Paste it into the Terminal. If you don't get any output, than your logs have a good condition.

 

You could replace {4,} against {3,}. By that it finds all lines which appear only 100 times. But those should be normal.

 

 

Link to comment
  • 2 months later...

My Script with 70% Log and BTRFS include.

I hope this working by BTRFS Notifications/Problems.

 

#!/bin/bash
# #####################################
# Name:        Syslog notify v1.2
# Description: Creates notification if log contains errors or its size exceeds 70% of the available space
# Author:      Marc Gutt
# #####################################

# #####################################
# Settings
# #####################################

# get most recent syslog file
syslog_file=$(ls -t /var/log/syslog{.[0-9],} 2>/dev/null | head -n 1)

# words that should cause a notification
words="corrupt|error|fail|tainted|BTRFS"

# store line number of last found error in this file
log_file="/tmp/syslog-notify-last-error-line-number.log"

# log what is being done
verbose=false

# #####################################
# Script
# #####################################

# make script race condition safe
if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

# check user settings
[[ $verbose == 0 ]] || [[ $verbose == false ]] && unset verbose

# obtain line number of last check
if [[ -f "$log_file" ]]; then
  line_number_start=$(cat "$log_file")
  # syslog has been truncated
  if [[ $line_number_start -gt $(grep -c ^ "$syslog_file") ]]; then
    [[ $verbose ]] && echo "Monitoring of a new syslog file begins"
    line_number_start=0
  fi
# store last line number on first execution
else
  line_number_start=$(grep -c ^ "$syslog_file")
  line_number_start=$((line_number_start-100))
  [[ $verbose ]] && echo "Monitoring syslog starts from line $line_number_start"
  echo "$line_number_start" > "$log_file"
fi

# parse logs
EOL=$'\n'
errors=""
while read -r line; do
  # remember last line
  last_line="$line"
  # combine multiple error messages
  errors="$errors$EOL$line"
done < <(tail -n +"$((line_number_start+1))" "$syslog_file" | grep -iP "($words)")

# create notification for new errors
if [[ $errors ]]; then
  # remember line number of last error
  line_number_start=$(grep -nFx "$last_line" "$syslog_file" | cut -f 1 -d ":")
  [[ $verbose ]] && echo "Monitoring syslog continues from line $line_number_start"
  echo "$line_number_start" > "$log_file"
  # send notification
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "syslog $(echo "$errors" | grep -ioP "($words)" | tr '[:upper:]' '[:lower:]' | sort -u | xargs)" -d "${errors:1}"
  exit
else
  # store last line number if no error has been found
  line_number_start=$(grep -c ^ "$syslog_file")
  [[ $verbose ]] && echo "Monitoring syslog continues from last line $line_number_start"
  echo "$line_number_start" > "$log_file"
fi

# create notificaton if log exceeds usage of 70%
log_size=$(df | grep -oP "[0-9]+(?=% /var/log)")
if [[ ! -f /tmp/syslog-notify.size ]] && [[ $log_size -gt 70 ]]; then
  touch /tmp/syslog-notify.size
  /usr/local/emhttp/webGui/scripts/notify -i "alert" -s "log utilizes more than 90%!" -d "$(du -h /var/log/* | sort -h | tail)"
elif [[ -f /tmp/syslog-notify.size ]]; then
  rm /tmp/syslog-notify.size
fi

 

Link to comment
  • 4 weeks later...
  • 3 months later...

Is this script still working properly in Unraid 6.12? I've noticed that the functionality that stores previous errors isnt working properly anymore. It emails me every time its executed and finds old errors. I've also noticed that the /tmp/syslog-notify-last-error-line-number.log file is always blank.

 

Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.