unraid_notify 2.55 [01-01-2010]: Email notifications for unRAID status


Recommended Posts

checking temp. at this point, using:

smartctl -d ata -A /dev/x

and grabbing the 4th field from the "Temperature_Celsius" record.

 

I use the 10th field of the same line for my unmenu.awk script. You could use something like this:

smartctl -d ata -A theDisk | grep -i temperature | awk '{ print $10 }'

 

Joe L.

Link to comment
  • Replies 235
  • Created
  • Last Reply

Top Posters In This Topic

smartctl -d ata -A /dev/X  output

root@Tower:~# smartctl -d ata -A /dev/sdd
smartctl version 5.36 [i486-slackware-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   100   100   015    Pre-fail  Always       -       7232
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1332
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2543
10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x0012   253   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       721
13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       79115055
187 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       29
190 Unknown_Attribute       0x0022   071   049   000    Old_age   Always       -       29
194 Temperature_Celsius     0x0022   151   085   000    Old_age   Always       -       29
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       79115055
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0

 

The email output (without mdcmd):

his message is a status update for unRAID Tower
-----------------------------------------------------------------
Server Name: Tower
Status: Disk 2 Overheat! 151°C (DiskId: ata-SAMSUNG_HD501LJ)
Date: Sun Sep 21 14:31:23 GMT 2008

Disk Temperature Status
-----------------------------------------------------------------
Parity Disk [sdc]: Spun-Down (DiskId: ata-WDC_WD10EACS-00D6B0)
Disk 1 [sdb]: Spun-Down (DiskId: ata-WDC_WD10EACS-00D6B0)
Disk 2 [sdd]: 151°C (DiskId: ata-SAMSUNG_HD501LJ)
Disk 3 [sde]: Spun-Down (DiskId: ata-WDC_WD4000YS-01MPB1)
Disk 4 [sdf]: Spun-Down (DiskId: ata-WDC_WD4000YS-01MPB1)

Disk SMART Health Status
-----------------------------------------------------------------
Parity Disk Spun-Down (DiskId: ata-WDC_WD10EACS-00D6B0)
Disk 1 Spun-Down (DiskId: ata-WDC_WD10EACS-00D6B0)
Disk 2 PASSED (DiskId: ata-SAMSUNG_HD501LJ)
Disk 3 Spun-Down (DiskId: ata-WDC_WD4000YS-01MPB1)
Disk 4 Spun-Down (DiskId: ata-WDC_WD4000YS-01MPB1

Link to comment

 

I use the 10th field of the same line for my unmenu.awk script. You could use something like this:

smartctl -d ata -A theDisk | grep -i temperature | awk '{ print $10 }'

 

Joe L.

 

I think I am able to modify the script according to the suggested method by Joe L, but maybe this 10th field should be used in general.

Do I remember well, that wordpad can be used for editing, with proper unix ending?

Link to comment

You probably only need to change one line (line # 132 in the version I downloaded).

from:

# Disk is not spun down, try to get the disk temp

                diskTemp=`smartctl -d ata -A /dev/$1|awk '{if($2~"Temperature_Celsius"){print $4}}' 2>/dev/null`

to:

# Disk is not spun down, try to get the disk temp

                diskTemp=`smartctl -d ata -A /dev/$1|awk '{if($2~"Temperature_Celsius"){print $10}}' 2>/dev/null`

 

Link to comment

Thanks Joe, I guessed that!

 

And brainbone, thank you very much for this mod!

This is something which seems to solve the spin down issues I always had, as a result it was controlled by only the disks' firmware, and I was never able to fully debug.

Just one question: can be consider is at a safe solution? For example what happens with the data still in cache, not yet written to the disk, but there is a forced spin down. Is this data flushed somehow to the disk? Or I just misunderstand something, and this kind of situation cannot happen.

 

Link to comment

- Isn't the frequent (1 minute by default) smartctl check generate overload? Is it affect performance during heavy file transfer?

Didn't appear to effect performance on my system (Celeron 430 1.8ghz), even when I set it to check every 10 sec.  If you notice any deterioration in performance, please let me know.

 

- If I don't want to receive status messages, only alerts, than do I need to leave NotifyDelay empty?

No.

 

Leave "RcptTo" empty, or just comment it out (# RcptTo)

 

Use only the "ErrorRcptTo" line instead.

 

Link to comment
  • 2 weeks later...

I just got around to trying unraid_notify. Unfortunately I don't get emails. Here is the debug dialog.

 

root@Tower:/boot/config# unraid_notify -d

>

< 220 sccmmhc92.asp.att.net - Maillennium ESMTP/MULTIBOX sccmmhc92 #40

> HELO Tower

< 250 sccmmhc92.asp.att.net

> MAIL FROM: <[email protected]>

< 250 ok

> RCPT TO: <[email protected]>

< 250 ok; [simple] forward to <[email protected]>

> DATA

< 354 ok

> .

 

THEN AFTER SEVERAL MINUTES

 

< 221 sccmmhc91.asp.att.net

> QUIT

 

Any ideas?  ???

Link to comment

Looking again at the debug output, it looks like your domain (vincit.com) is hosted at ProHostOne (guesing you're using "mail.vincit.com" as your smtp server), and your ISP (ATT or SBC?) is getting in the way and filtering your connection to mail.vincit.com on port 25.

 

Unfortunately, it looks like ProHostOne does not provide an alternate port to 25 for SMTP connection, and I believe ATT/SBC only allow Secure (SSL) SMTP connections.

 

I would contact your host to check if they have an alternate port for (Non SSL) SMTP connections (usually port 2525 or 587).   If not, I would need to add Secure SMTP support, but this would require two additional packages: OpenSSL and Stunnel, increasing the ram and flash footprint of unraid_notify, at least for those needing SSL/Secure SMTP.

 

Link to comment

Worked like a charm - thanks again!

 

Good to hear!

 

The spin down changes are only in a beta, many users won't upgrade until it is out of beta.  Perhaps it could be optional, or conditional on a version check?

 

I'll make it optional in the next version, but I'd like to wait to get more feedback on how 2.20 is working before pushing out a new one.

 

can we remove the spindown stuff now that Tom has reverted 4.4 back to the "good" logic?  Also, we have times below 1 minute now with 4.4.  Ref the spindown time, is a blank entry a "disable" ?

 

I'll put in an option to disable unraid_notify from doing a spindown.  At the moment, a blank spindown time will default to 60min spindown.  If unRAID spins down the drives before unraid_notify has a chance to, there shouldn't be any real problem or conflicts.  unraid_notify will see that the drive is not spinning, and leave it alone.

 

Below 1 min?  I'm having trouble seeing where that is useful for anything but increased drive wear?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.