script for periodic notification of unRaid status over LAN via YAC client


Recommended Posts

Here is the script I wrote to send YAC alerts to PCs on my LAN from my unRaid server. 

 

Before you do anything... you will need to edit an existing file on the unRaid server.  It is possible for you to mess up the existing contents and make the unRaid server unbootable.  If you are not comfortable editing files and adding lines, do not try this.  You have been warned... I am not responsible if you make your flash drive unusable.

 

With the warnings out of the way, this is actually not too hard and as long as you can add a line to an existing text file using wordpad without changing other lines in the same file you will be OK.

 

Before you do anything on the unRaid server, download and install a YAC listener on your PC.  You can get it from http://sunflowerhead.com/software/yac/

 

An alternative YAC listener is available as "Whoisit.zip" available here : http://forums.snapstream.com/vb/showthread.php?p=137721  It can be used if you are running WinXP or have installed the .NET runtime package as listed in the thread describing Whoisit.

 

I like the Whoisit appearance better than stock YAC, but it seems to cut the last character off the sent message (you will not see the trailing period of the error messages) Hey.. if it bothers you, use the stock "YAC listener"  Add a shortcut to whatever YAC listener you use to your "startup" folder on your PCs.  (you cannot run both on the same PC at the same time)

 

To create the unRaid alert in the case of a disk failure I had to find out the status of the unRaid disks.  I found that the /proc/mdcmd "file" contained the current status.  (The /proc folder does not contain real files, but system information in the form of files to make it easier to access)  To see all the content type:

 

cat /proc/mdcmd

 

Once I found where the status was available I had to figure out how to send a TCP/IP message from the unRaid server to the YAC Clients running on my PCs.  To send the message I use a copy of "netcat"

It is a general purpose unix utility that can send to (or listen to) any TCP or UDP port.

 

Now, netcat is not supplied by Tom, at least not yet (I've asked him to add it) but you can download a copy from  this link 

Update: Tom attached a zipped copy of netcat to a post in this thread.  Look for it there (a few posts down from this one)

 

Copy the "netcat" program to your /boot share drive on the unRaid server.

 

Next, download the check_unraid.txt script I've attached to this post. Copy it to the /boot share drive on the unRaid server.

Note: Tom's forum only allows certain file extensions on attachments, so I had to rename the file from check_unraid.sh to check_unraid.txt in order to attach it to this post.  After you download it you will need to rename it as check_unraid.sh

 

Open the check_unraid.sh script in wordpad and edit the second line to change the PC names to those on your LAN.  If you only have one PC, then it can be the only name listed.  If you have more than one machine, separate the names with spaces. Note that you do not specify IP address, but machine names.  The actual IP address will be dynamically looked up as needed.

 

now... Log onto the unRaid server using telnet and type the following unix commands:

 

cd /boot

chmod +x check_unraid.sh

chmod +x netcat

 

Once you have completed the commands above you can test your inistallation.

First, install and start one of the YAC clients on one of the PC's on your LAN, otherwise the alert will be sent, but no program will be listening.

Then, type the following command. An alert will be sent to the YAC client. (If you modified the script to NOT get hourly alerts when all is OK, you will need to stop your unRaid array and then perform your test, that way it will have something to report.)

To test type:

check_unraid.sh

 

At this point all that is left is to copy the check_unraid.sh script to a folder where it will be executed hourly by the linux cronological scheduler (cron). Also copy the netcat program to an appropriate folder.  To do this type the following commands:

 

cp /boot/check_unraid.sh /etc/cron.hourly/

cp /boot/netcat /usr/bin/netcat/

 

now, make a backup copy of the "go" script using the following command:

 

If you are using the original release of unRaid, the "go" script was in the /boot folder.

cd /boot

cp go go.original

 

If you are using a newer version of the unRaid software the "go" script is in the /boot/system/current/ folder.

cd /boot/system/current/

cp go go.original

 

One last step is to add two lines to the end of the "go" script to copy "netcat" and the "check_unraid.sh script to appropriate folders each time you restart the unRaid server.

 

You can use wordpad in windows to do this.  The lines shown below in red are the lines you will be adding to the "go" script.

 

The last few lines in the "go" script will then be:

 

# Start the management utility

emhttp &

cp /boot/check_unraid.sh /etc/cron.hourly/

cp /boot/netcat /usr/bin/netcat

 

This will automatically install (copy) the check_unraid.sh script to the /etc/cron.hourly folder every time the unRaid server is restarted. and will copy the netcat program to a folder searched for executables.

 

Do not change any of the other lines in the "go" script... you have been warned.

 

I'll attempt to describe its operation. 

Basically, the "grep" and "egrep" programs scan a file looking for text matching a given pattern.  They set a status variable ($?) to 0 if the pattern was matched and to 1 if not matched.

 

I first scan /proc/mdcmd for the mdState=STARTED line.  If it is not found I set an error message stating that the unRaid array is not Started.

 

I then scan for either =DISK_INVALID or =DISK_DSBL

 

If either is present I set an error message that the unRaid array needs attention.

 

Lastly, if I had set an error message, for each machine name in the notify list I use the net lookup command to find the IP address to use with the netcat command. It is invoked with the command arguments to exit once it has finished sending the message and to only wait 1 second to make any given connection.

 

If this is over your level of understanding get a unix/linux friend to assist you.  Do not call Tom at LimeTechnology.  If you make your server unbootable, then call him, but you better be real polite to him as you plead for assistance :D  Odds are he can always sell you another flash-drive with a new copy of unRaid :D

 

If you do NOT desire an hourly notification that the unRaid server is alive and well you can change one line near the top of the check_unraid script

from:

emsg="unRaid Array is OK."

to

emsg=""

 

 

Joe L.

 

 
# Define the machines you wish to broadcast an alert to here
# separate machine names with spaces, if there is only one machine just put one name
notify_machines="htpc htpc2 dellcpx"

yac_port=10629

#PATH=$PATH:/boot/

# initialize the error message to an empty string if no hourly OK message is desired
# like this:
# emsg=""
emsg="unRaid is OK."

# request that status be updated
echo "status" >/proc/mdcmd
# now check the status and report as needed
grep "mdState=STARTED" /proc/mdcmd >/dev/null 2>&1
if [ $? != 0 ]
then
emsg="unRaid array not started."
fi 

egrep "=DISK_INVALID|=DISK_DSBL" /proc/mdcmd >/dev/null 2>&1
if [ $? = 0 ]
then
emsg="The unRaid array needs attention. One or more disks are disabled or invalid."
fi 

# if an error message was set, broadcast it to all the machines
# in the notify list in turn
if [ "$emsg" != "" ]
then

# notify each machine on the notify list in turn
for i in $notify_machines
do
	# look up the ip address given the machine name
	ip_addr=`net lookup $i 2>/dev/null`
	if [ $? = 0 ]
	then
		echo "$HOSTNAME : $emsg" | netcat -w 1 -c $ip_addr $yac_port
	fi
done
fi

Link to comment

Is there a way to test this without waiting an hour for the cron job to fire?  I think I have everything installed correctly, but when I try to execute the script I get the following:

 

: command not foundline 3:

: command not foundline 5:

: command not foundline 7:

: command not foundline 10:

: ambiguous redirectine 11: 1

./check_unraid.sh: line 15: syntax error near unexpected token `fi'

'/check_unraid.sh: line 15: `fi

Link to comment

Is there a way to test this without waiting an hour for the cron job to fire?

You can test by typing check_unraid.sh at the command prompt after you have performed the chmod +x steps.

(I added instructions in the post above on how to test.)

 

Glad you got things figured out with carriage returns.

 

Joe L.

Link to comment

Here's a suggestion for people with multiple unraid boxes.

 

Change

 

echo "$emsg" | netcat -w 1 -c $ip_addr $yac_port

 

to

 

echo $HOSTNAME ":" "$emsg" | netcat -w 1 -c $ip_addr $yac_port

 

and you'll get the machine name as part of the message, which is quite useful.

Link to comment

Some comments:

 

1. There's no risk of making your Flash un-bootable.  If you somehow mess up the

go

script, the array might not start, but that can be fixed by restoring the

go

file.

 

2. The best way to move files over to the Flash is just to drag them to the

flash

share.  Similarly, you can navigate to the

go

script directly and edit using Wordpad.

 

3. The

check_unraid.sh

script itself needs this line just before the first

grep

command in order to get fresh status:

 

echo status >/proc/mdcmd

 

Link to comment

Thanks for the feedback Tom.

 

I did not know the status was updated only on demand.  I would have figured the array would have marked a disk as INVALID or DISABLED whenever that event occurred, but I've never had one fail, so it is good to know we will get an accurate status.

I've added the "echo" to the script listing above and will upload a updated version as an attachment.

 

Any chance of including a compiled copy of netcat in a bin directory in future releases?  It is not too big and certainly will be smaller than the statically linked version I've pointed folks to on the web.

 

Oh yes, do you have the ability to modify the "permitted suffixes" list for attachments?  It would be a lot easier if I could attach a ".zip" file instead of having to rename a shell script to a .txt file.  It might also reduce the issues with carriage returns vs CR/LF in the script.

 

Joe L.

Link to comment

Some comments:

 

1. There's no risk of making your Flash un-bootable.  If you somehow mess up the

go

script, the array might not start, but that can be fixed by restoring the

go

file.

I will agree that there is minimal risk if all that is modified is the "go" script, and you have copied it to a backup file before starting so it could be restored to its original contents. 

 

But...  if somehow a person messes with the "system" folder, or its contents, it would possibly make it difficult to boot. (You would be surprised at how creative us "users" can be  ;D) Hey, I warned them...  ;D

 

2. The best way to move files over to the Flash is just to drag them to the

flash

share.  Similarly, you can navigate to the

go

script directly and edit using Wordpad.

Good advice

 

Link to comment

Thanks for the feedback Tom.

 

I did not know the status was updated only on demand.  I would have figured the array would have marked a disk as INVALID or DISABLED whenever that event occurred, but I've never had one fail, so it is good to know we will get an accurate status.

I've added the "echo" to the script listing above and will upload a updated version as an attachment.

 

The

/proc/mdcmd

is a pseudo-file which is used to send commands to the driver and receive responses - much more flexible than ioctl.  Anyway, writing the string "status" to

/proc/mdcmd

tells the driver to dump it's current status to a buffer which can be retrieved by reading

/proc/mdcmd

.  When a disk is marked disabled because of errors, the driver of course immediately updates config data stored on the Flash, independent of the

/proc/mdcmd

mechanism.

 

Any chance of including a compiled copy of netcat in a bin directory in future releases?  It is not too big and certainly will be smaller than the statically linked version I've pointed folks to on the web.

 

Oh yes, do you have the ability to modify the "permitted suffixes" list for attachments?  It would be a lot easier if I could attach a ".zip" file instead of having to rename a shell script to a .txt file.  It might also reduce the issues with carriage returns vs CR/LF in the script.

 

Joe L.

 

I enabled .zip files to be attached.  I'll test it by attaching the netcat program compiled on a development system.

Link to comment
  • 1 year later...

Correct me if I am wrong, if this script/hack/addon is not performed on the UnRAID, there would be no way of knowing that a drive has failed/is malfunctioning? (other than logging in to the UnRAID and manually checking the status)

 

If this is how the error reports are working (or not working, since they do not exist) I have to say it is the biggest drawback on the UnRAID system. :(

I have never done any programming, I only use my computers with programs that others have written. I probably would have to try to get this hack done to my UnRAID, or try to get someone who can perform it to come here and do it to my server.

 

I cannot live with a fileserver that doesen´t report errors. Crazy! :o

Link to comment
I cannot live with a fileserver that doesen´t report errors. Crazy! :o

 

Unraid was not designed as a general purpose professional file server - it could be used that way if you are willing to deal with its weaknesses.

 

It was designed for home theater purposes where such advanced features as notifications are less (though not "un") important.

 

Of course, Tom continues to advance the feature set and I would also like to have more robust reporting.

 

 

Bill

Link to comment
  • 2 months later...

Joe L.,

 

I put this on my machine and it works great - I'm a big fan of your email program too.  I may switch over to using YAC exclusively since it suits my needs fine.   One question though, could you tell me how to add the used/free disk space info?  I assume it's similar to the email program but I don't feel comfortable enough with my programming skills to do it.

 

BTW - I don't know if the YAC window has a limit on the amount of info and this would be a lot of info (if it's like the email program).  For me it would be fine if it just said something like "unRaid is OK.  Space Used 497G Free 566G."  Or something like that (% would be cool too).  I set YAC up to write to the log file and I'm just running it once daily, so I really don't even see the actual YAC message.  So I'll just check the log every day to make sure it looks fine.

 

Thanks for another great program!

 

electr0n

Link to comment
  • 2 weeks later...

electr0n ,

 

If you are using user-shares you can get a quick space summary by replacing one line in this script.

 

Replace this line:

emsg="unRaid is OK."

 

With these two lines:

stats=`df -h | awk '/^shfs/ { printf "Total space %s, Used %s (%s), Free %s (%d%%)\n",  $2, $3, $5, $4, 100-$5 }'`

emsg="unRaid is OK.  $stats"

 

It might be best to cut and paste from the lines above, since the double quote marks, back-quotes, and single quotes might be hard to see on your browser.

 

The new output when all is OK will be:

unRaid is OK.  Total space 3.0T, Used 1.4T (49%), Free 1.6T (51%)

 

Hope this helps... "awk" is a really powerful programming language. I am using it to parse and reformat the output of the df -h command.  It does this when it parses the line printed for the shared file system "shfs"

 

Joe L.

Link to comment
  • 3 weeks later...

Joe L.

 

Thanks for the info - I'll give it a try.  It's funny but I just heard about awk recently at work so I may have a chance to check it out.  I have been playing with TCL and wrote something that can get the info off of the html //tower page to do what I want.  It's not elegant but seems to work ok for me - and it's a good learning experience for getting to know what TCL can do.  I haven't programmed in many years but it's fun to get back into it a bit.

 

Thanks again and I appreciate your advice!

 

electr0n

Link to comment
  • 8 months later...
  • 1 year later...

Nice script...

 

I am use this script and call it from s3_notHrHdTcpIp.sh after tower wakeup from S3 and hourly through cron... All is work fine but...

 

I am analyze check_unraid.sh and I am see command hdparm.. for check HDD activity.

I am test command hdparm -C /dev/sda /dev/sdb /dev/sdc ..../dev/sdf on console/telnet and command return from HDD /dev/sdf this status:

/dev/sdf:
SG_IO: bad/missing ATA_16 sense data::  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing ATA_16 sense data::  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
drive state is:  unknown

 

Like as /dev/sdf I have Samsung F1 HD103UJ (or F3 HD103SJ. Now I do not know.)

 

I have turned on AHCI in BIOS on motherboard.

 

Why all drivers except /dev/sdf return status code right and this drive not?

 

 

 

Very thanks for any answer

Link to comment

Why all drivers except /dev/sdf return status code right and this drive not?

1. different disk model

2. different disk firmware

3. status reporting not enabled

4. different disk controller

5. defective disk firmware

6. defective disk controller

7. interrupt conflict

8. defective memory

9. program bug

10. bad/missing ATA_16 sense data

11. drive state is:  unknown

12. spiral decommutator failed in "turbo encabulator"  (unlikely)

13. It is your flash drive and a command to determine if it is spinning is inappropriate (most likely)

 

 

 

Link to comment

Why all drivers except /dev/sdf return status code right and this drive not?

1. different disk model

2. different disk firmware

3. status reporting not enabled

4. different disk controller

5. defective disk firmware

6. defective disk controller

7. interrupt conflict

8. defective memory

9. program bug

10. bad/missing ATA_16 sense data

11. drive state is:  unknown

12. spiral decommutator failed in "turbo encabulator"  (unlikely)

13. It is your flash drive and a command to determine if it is spinning is inappropriate (most likely)

 

 

Joe L, thanks for answer, but I understand these a problem but this is not pleasant for me. I can try disable AHCI mode in BIOS. May disabling AHCI help for success get status information?

Link to comment

Why all drivers except /dev/sdf return status code right and this drive not?

1. different disk model

2. different disk firmware

3. status reporting not enabled

4. different disk controller

5. defective disk firmware

6. defective disk controller

7. interrupt conflict

8. defective memory

9. program bug

10. bad/missing ATA_16 sense data

11. drive state is:  unknown

12. spiral decommutator failed in "turbo encabulator"  (unlikely)

13. It is your flash drive and a command to determine if it is spinning is inappropriate (most likely)

 

 

Joe L, thanks for answer, but I understand these a problem but this is not pleasant for me. I can try disable AHCI mode in BIOS. May disabling AHCI help for success get status information?

 

I looked in an older post you made to find a syslog.

 

It appears as if /dev/sdf is a different model & firmware version than /dev/sde (a similar make/model disk on your server)

 

The odds are very high that the firmware on /dev/sdf   (1AA0) does not respond properly when queried by the hdparm command.

 

Feb 14 16:27:25 Tower dmesg[1308]: scsi 2:0:0:0: Direct-Access     ATA      SAMSUNG HD103SJ  1AJ1 PQ: 0 ANSI: 5

Feb 14 16:27:25 Tower dmesg[1308]: sd 2:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)

Feb 14 16:27:25 Tower dmesg[1308]: sd 2:0:0:0: [sde] Write Protect is off

Feb 14 16:27:25 Tower dmesg[1308]: sd 2:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 14 16:27:25 Tower dmesg[1308]: sd 2:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 14 16:27:25 Tower dmesg[1308]:  sde:

Feb 14 16:27:25 Tower dmesg[1308]: scsi 3:0:0:0: Direct-Access     ATA      SAMSUNG HD103UJ  1AA0 PQ: 0 ANSI: 5

Feb 14 16:27:25 Tower dmesg[1308]: sd 3:0:0:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)

Feb 14 16:27:25 Tower dmesg[1308]: sd 3:0:0:0: [sdf] Write Protect is off

Feb 14 16:27:25 Tower dmesg[1308]: sd 3:0:0:0: [sdf] Mode Sense: 00 3a 00 00

Feb 14 16:27:25 Tower dmesg[1308]: sd 3:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 

You might check to see if the firmware can be upgraded. (Most times it cannot, but you can ask SAMSUNG.)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.