Monitoring SMART reports, the lazy way


Recommended Posts

I've always wanted to create a easy, well, lazier way to monitor SMART reports. I've constructed a script to execute every morning using at via the go file, reporting on the overall health of all my attached disks,

 

echo "Current status for /dev/sdb" > /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

smartctl -H /dev/sdb |grep "SMART overall-health self-assessment test result" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

echo "" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

echo "Current status for /dev/sdc" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

smartctl -H /dev/sdc |grep "SMART overall-health self-assessment test result" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

echo "" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

echo "Current status for /dev/sdd" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

smartctl -H /dev/sdd |grep "SMART overall-health self-assessment test result" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

echo "" >> /mnt/user/share/_Logs/_smartctl/smartctl_20`date +%y%m%d`.log

"      "      "

 

which produces the following output

 

Current status for /dev/sdb

SMART overall-health self-assessment test result: PASSED

 

Current status for /dev/sdc

SMART overall-health self-assessment test result: PASSED

 

Current status for /dev/sdd

SMART overall-health self-assessment test result: PASSED

"      "      "

 

You can write this log to the path where unRAID keeps the syslogs (/boot/logs), but I placed it to a directory on my main share for easy reading on a client machine.

 

I have no doubt their many other, albeit better ways to do this via some addon or something or other, I haven't been keeping up to date with v5.0, but this IMO is a convenient way to know if the disk(s) are on the blink without creating a terminal session to your box and examine the syslogs, using a addon if you choose not to, log onto the web interface, etc... Of course, if a disk fails, you'll probably know about it when you look for something and notice it's missing :)

 

Cheers

Link to comment

I'm not sure I'd put much faith in the "PASSED" overall status, I've had a number of drives with issues that the overall status was PASSED but were showing problems in some of the individual counters.  And once I tested them with the WD Diagnostic they were reported as failed by the diagnostic utility but the SMART report still said PASSED.  I generally watch the following counters which track the appearance of bad sectors - if I see changes in any of these I get ready to replace the drive (or at least make sure my backups are current and then test it further):

 

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0

 

and this counter that tracks SATA cable issues (so might give you a hint that a cable is working loose):

 

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

 

and this one, which seems to be associated with errors reading the surface on Western Digital drives (but this is from my own observations, I don't think there is much published about this one):

 

200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0

 

 

Plus the drive temperature as an indicator that there might be a filter that needs cleaning or a fan that has died:

 

194 Temperature_Celsius    0x0022  113  110  000    Old_age  Always      -      37

 

Regards,

 

Stephen

 

Link to comment

"at" will only execute a process once.  What are you doing so it executes again the following morning?

 

The server, in my case will power on the morning, just before 6am, on selected days to do a rsync, just after 6am, to another server, hence why I use at, you could use cron to schedule it better I guess.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.