SMART paramater tracking database


Recommended Posts

Smarthistory version 0.1.06 is now posted.

 

http://www.tcpatools.com/bubba/smarthistory.zip

 

Installation:

====================

 

- Unzip to a directory on your flash (I use /boot/smarthistory/)

- cd to the directory where you have it installed, and run it with this command line:

 

   smarthistory

 

//***********************************************************************

//* smarthistory -  a program for saving and reporting historical

//*                S.M.A.R.T. hard drive data

//*

//* Version history:

//* -------------------------------------------

//* 0.0.01  - initial release by BubbaQ

//* 0.0.02  - added datadir option and html/graph options

//* 0.0.03  - added HTML output features, and report= option

//* 0.0.04  - added wake option;  fixed bug in google chart params when y=0

//* 0.0.05  - added info about sleeping/unresponsive devices and fixed spinup bug

//*            (spinup will now read a random block so unRAID knows the drive is spun up)

//* 0.0.06  - added hdparm check and random read to spin up drives with -wake option

//* 0.0.07  - added -days option to compare current SMART values to older data

//* 0.0.08  - added -translate option to translate serial number to Arry Disk number

//* 0.1.02  - Fixed bug in saving ATA_Error_Count when it was 0

//* 0.1.03  - Added (no data) for periods when -days option pushes back before first datapoint

//*          Changed timestamp calculations custome function

//* 0.1.04  - Added -help for command-line help

//* 0.1.05  - Fixed bug in -days option calculations for which prior value to use

//* 0.1.06  - Cleaned up code to suppress warnings from PHP when PHP warnings are enabled

//*          Changed defaults size for graphs

//*          Substitute static image when graphs would show data has never changed

//*          Changed logic to give alerts on all static thresholds when report=ALL

//*            regardless of whether a delta-threshold was met

//*

//*

//* ToDo List:

//* -------------------------------------------

//* Implement more options in program config

//* User-config to exclude drives

//* Smarts to know which models of drive can give smart data w/o spinning up

//* Limits on length of time to keep data

//* Let users define colors

//* Add ANSI codes for color terminal output

//* Combine the token and program config files

//*

//***********************************************************************

 

Several configuration options are documented in the config files.

 

Description:

====================

It saves one value per day for each smart parameter, and by default, retains the FIRST values read each day.

 

Data is stored in flat-files as text, and the file name is the same as the drive serial number. 

 

It tracks all data by drive serial number, so it you move drives around, or take it out, and leave it out for months, and put it back, it will still have it's historical data.  Or if you move a drive to a new system, you can copy the smarthistory file to the new system too.

 

It will generate text output and HTML output.

 

For each SMART parameter, you have a section in the config file like this:

 

[Reallocated_Sector_Ct]
warn=10
error=30
deltawarn=2
deltaerror=5

 

In this example:

 

- if Reallocated_Sector_Ct is 10 or more, you get a warning message...

- if Reallocated_Sector_Ct is 30 or more, you get an error message...

- if Reallocated_Sector_Ct today is 2 or more than yesterday (or whatever is the most recent prior day's measurement) you get a warning message

- if Reallocated_Sector_Ct today is 5 or more than yesterday (or whatever is the most recent prior day's measurement) you get an error message

 

You can also track parameters that only some drives report.  If a drive doesn't report it, it will be ignored for that drive. 

 

I know some drives (Seagate, <spit>) report some bogus numbers for some parameters, so I'll need to add a way to ignore some models.

 

One issue is that if a drive stays spun down for several days, you won't have SMART data for several days.  There is an option to spin up drives to get the data, but then again, if the drive is spun down, you don't really need data since nothing is really going to change on a spun-down drive!

 

I don't have a large sample dataset yet, but it will give you simple graphs like this:

 

   http://tinyurl.com/993xg3

 

Any other suggestions?

 

I'm thinking about adding 30-day delta warn/error thresholds, and e-mail notification.

 

 

Link to comment
  • Replies 132
  • Created
  • Last Reply

Top Posters In This Topic

I am about 75% done with a program that saves a daily history of SMART parameters, lets you set user-defined alerts, and graphs the history via google chart.

 

To make it simple, it saves one value per day for each smart parameter.  If you run the probe more than once the same day, it will replace the new old values for today's date in the database.   It would typically be run as a cron job, once a day.

Any way to convince you to save the first on a given day. and ignore subsequent runs unless you add a special -overwrite option when invoked. , that way if you burn-in a drive on a single day you will see the true delta on that day to the next, not just the change from the last invocation on that day to the next.

Data is stored in a flat-file as text.  But it may get big if you track a lot of parameters and have a lot of drives.  I may split each drive's data into a separate file, named with the drive's serial number,.  Comments on that?

One file per drive is easier if you need to print the history, or if you remove a drive forever.

It tracks all data by drive serial number, so it you move drives around, or take it out, and leave it out for months, and put it back, it will still have it's historical data.

 

It will generate text output and HTML output.

Very Nice

For each SMART parameter, you have a section in the config file like this:

 

[Reallocated_Sector_Ct]
warn=10
error=30
deltawarn=2
deltaerror=5

 

In this example:

 

- if Reallocated_Sector_Ct is 10 or more, you get a warning message...

- if Reallocated_Sector_Ct is 30 or more, you get an error message...

- if Reallocated_Sector_Ct today is 2 or more than yesterday (or whatever is the most recent prior day's measurement) you get a warning message

- if Reallocated_Sector_Ct today is 5 or more than yesterday (or whatever is the most recent prior day's measurement) you get an error message

 

You can also track parameters that only some drives report.  If a drive doesn't report it, it will be ignored for that drive. 

 

I know some drives (Seagate, <spit>) report some bogus numbers for some parameters, so I'll need to add a way to ignore some models.

 

One issue is that if a drive stays spun down for several days, you won't have SMART data for several days.  I can add an option to spin up drives to get the data, but then again, if the drive is spun down, you don't really need data since nothing is really going to change on a spun-down drive!

 

I don't have a large sample dataset yet, but it will give you simple graphs like this:

 

   http://tinyurl.com/993xg3

 

Any other suggestions?

 

I'm thinking about adding 30-day delta warn/error thresholds, and e-mail notification.

You might need a parameter per counter, per drive, for an initial error count to ignore.  That way, after being alerted to an error you can set the ignore-count and not be alerted again unless the errordelta occurs again.

 

For example, I have an old 250 Gig drive that has 100 reallocated sectors.  It has been this way since I first ran a SMART test. I've probably run 20 or 30 preclear_disk cycles on it and it has not changed.  I would want to know if the number increased, but not to show an error every report.

 

Obviously, this ability to set the delta to ignore would need to be drive specific.

 

Joe L.

Link to comment
One file per drive is easier if you need to print the history, or if you remove a drive forever.

 

I'll probably  do it that way... I ran projections on file/record size and it was getting ugly for a 20-drive system.

 

For example, I have an old 250 Gig drive that has 100 reallocated sectors.  It has been this way since I first ran a SMART test. I've probably run 20 or 30 preclear_disk cycles on it and it has not changed.  I would want to know if the number increased, but not to show an error every report.

 

How about if the current value read from disk is unchanged from the previous value in the database, then no error is generated even if the value is above the threshold?  That would be MUCH easier.  You would get the error when any change happened that INCREASED the value from 100 to 101, but only once.

 

Any way to convince you to save the first on a given day. and ignore subsequent runs unless you add a special -overwrite option when invoked
.

 

That would be tricky.  For simplicity, the reporting is done strictly on the data in the database.... the writing of the current values is done before the reporting is called.  Let me sleep on that.

Link to comment

Looking forward to trying this out.

 

Some people may not want to spin up each drive for hourss each day in order to get the smart report.  We may be able to put some hooks into unmenu / myMain to trigger your data collection on a drive when it is spinning.  

 

Another thought is that you may want your data collection to only run smart reports when the drive is spun up (except for WD/other drives that don't spinup to take a smart report).

 

Link to comment

"Reporting" is independent from data "collection."  Reporting only uses data in the database.  This is specifically so you can get reports at any time w/o spinning up drives.

 

Currently, the flow is this:

 

- read data from all drives that are spun up or that will give SMART data when spun down (with option to force spinup).

- write data to database

- run report.

 

So you can also just do step 3, that will duplicate the prior report.

 

However, this flow means the latest (current) data has to be written to the database before reporting, which is why multiple calls the same day will update todays' data in the database.  That doesn't satisfy Joe's situation, where only the first data collection for each day should be written to the DB, and subsequent calls compare the current data (without writing it to the DB) to the historical data.

 

I never intended this app to report changes that occurred in the same day, only from day to day.  So I am cogitating on it right now.

 

I am writing it to return text that can be easily parsed, as well as html, so you can capture it, and echo it back as is to a browser.

Link to comment

I kind off like Joe's idea about it reporting changes that happen in the same day.

 

It could take a baseline read at like 4am and then take a new reading every hour after that.  If anything changes it writes it to the file (with a time stamp), reports it, if nothing changes then nothing really changes.

 

I look forward to this.  Working some of this into unraid_notify would also be very cool.

Link to comment

Here ya go:

 

   http://www.tcpatools.com/bubba/smarthistory.zip

 

It is written in php.  If you don't have php, you can install the package from here:

 

   http://mirrors.easynews.com/linux/slackware/slackware-current/slackware/n/php-5.2.8-i486-1.tgz

 

This is just an initial release to demonstrate feasability, and so people can start populating their history databases.

 

Run it with:

 

php smarthistory.php

 

It will create the data files in the directory where you are when you run smarthistory.php ... so do a cd to the data directory.  (I'll have a config file directive for that soon).

 

With the defaults, it will only record the data for the first time it is run each day.  After that, when you run it again, it compares the new data with the first data for today.

 

The data files are plain text, tab delimited.  You can edit them, to create some more sample data, or change older data so the new data will meet delta thresholds and generate output.

 

Use the command line switch  -D ON  to turn on debugging, and see (much) more output.

 

 

 

 

Link to comment

Here ya go:

 

   http://www.tcpatools.com/bubba/smarthistory.zip

 

It is written in php.  If you don't have php, you can install the package from here:

 

   http://mirrors.easynews.com/linux/slackware/slackware-current/slackware/n/php-5.2.8-i486-1.tgz

 

This is just an initial release to demonstrate feasability, and so people can start populating their history databases.

 

Run it with:

 

php smarthistory.php

 

It will create the data files in the same directory where the smarthistory.php file is.

 

With the defaults, it will only record the data for the first time it is run each day.  After that, when you run it again, it compares the new data with the first data for today.

 

The data files are plain text, tab delimited.  You can edit them, to create some more sample data, or change older data so the new data will meet delta thresholds and generate output.

 

Use the command line switch  -D ON  to turn on debugging, and see (much) more output.

 

PHP package manager .conf for your enjoyment... (tested & working)... see: http://lime-technology.com/forum/index.php?topic=3157.msg26360#msg26360

 

Cheers,

Matt

Link to comment

I think it might depend on a library(perhaps 8 libraries) most do not have installed.   

 

I get this:

root@Tower:/boot/# php smarthistory.php

php: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

 

I then downloaded the missing xml library, installed it, and tried once more...

root@Tower:/boot/packages# installpkg libxml*
Installing package libxml2-2.6.32-i486-2...
PACKAGE DESCRIPTION:
libxml2: libxml2 (XML parser library)
libxml2:
libxml2: Libxml2 is the XML C parser library and toolkit.  XML itself is a
libxml2: metalanguage to design markup languages  -- i.e. a text language where
libxml2: structures are added to the content using extra "markup" information
libxml2: enclosed between angle brackets.  HTML is the most well-known markup
libxml2: language.  Though the library is written in C, a variety of language
libxml2: bindings make it available in other environments.
libxml2:
Executing install script for libxml2-2.6.32-i486-2...

 

root@Tower:/boot/packages/smarthist# php smarthistory.php

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/curl.so' - libcurl.so.4: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/gd.so' - libt1.so.5: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/mhash.so' - libmhash.so.2: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/mysql.so' - libmysqlclient.so.15: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/mysqli.so' - libmysqlclient.so.15: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/pdo_mysql.so' - libmysqlclient.so.15: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/pspell.so' - libaspell.so.15: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/snmp.so' - libnetsnmp.so.15: cannot open shared object file: No such file or directory in Unknown on line 0

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php/extensions/xsl.so' - libexslt.so.0: cannot open shared object file: No such file or directory in Unknown on line 0

 

Now it appears as if it needs libexslt.so.0, libnetsnmp.so.15,  libaspell.so.15,libmysqlclient.so.15, libmhash.so.2, libt1.so.5, and  libcurl.so.4.

 

Somehow, I think the "Dependencies: none" line in the .conf file Matt create for BubbaQ is inaccurate.

 

Joe L.

Link to comment

I got my first "real" error (i.e one from my drive's data changing, and not from test data I created) using smarthistory today:

 

WD-WMAEH1062605: *ERROR* - Current_Pending_Sector has increased from 2 to 3 since yesterday

 

So since yesterday, a new sector has been flagged as pending relocation because of an error.

 

I also installed it on my main unRAID server, and got an error there too:

 

3NF00S42: *ERROR* - Power_On_Hours is over 44000 (it is now 54569)

 

So this drive has been running for over 6 years..... time to schedule a replacement.

Link to comment

Smarthistory version 0.0.02 is now posted.  See OP for link

 

This version adds:

 

- datadir directive for location of data files

- HTML output is now enabled

- Graphing options for HTML output are now enabled

- dailydata directive to save the FIRST measurement of the day, or LAST (i.e. most recent) measurement of the day.

 

BTW, that 6-year old Seagate is giving up the ghost.  This morning, I was greeted with this from smarthistory:

 

3NF00S42: *ERROR* - Reallocated_Sector_Ct has increased from 39 to 53 since yesterday
3NF00S42: *ERROR* - Reallocated_Sector_Ct is over 30 (it is now 53)
3NF00S42: *ERROR* - Current_Pending_Sector has increased from 2 to 38 since yesterday
3NF00S42: *ERROR* - Current_Pending_Sector is over 5 (it is now 38)
3NF00S42: *ERROR* - Power_On_Hours is over 44000 (it is now 54592)
3NF00S42: *ERROR* - Offline_Uncorrectable has increased from 2 to 38 since yesterday
3NF00S42: *ERROR* - Offline_Uncorrectable is over 5 (it is now 38)

Link to comment

I am able to run the program no problem and the data files are created. How/where do i see the html graphs. Changed the config file to html yet it still outputs the text files. Left as linked graphs. I do only have 1 set of measurements as I ran it this morning. Is it that I need at least two data points?

 

Also, just so i am clear, if I run this on day two, it creates another set of smarthistory files? I just do not want to overwrite the data...

 

Sorry, last question. DId you and or how would be a good way to set this to run automatically every morning assuming i leave my tower on 24x7.

 

Sorry for the newbie questions.

Link to comment
I am able to run the program no problem and the data files are created. How/where do i see the html graphs. Changed the config file to html yet it still outputs the text files. Left as linked graphs. I do only have 1 set of measurements as I ran it this morning. Is it that I need at least two data points?

 

Yes, you need two datapoints, "output" parameter has to be set to HTML, and "graph" parameter has to be set to either "LINK" to create links to graphs, or "IMAGE" to use image tags and embed the graphs in the HTML.  The graph size is controlled by the gs and igs parameters.

 

Also, you will have no output unless there is a warning or an error.  If you want to see all the output regardless of errors or not, use the option "-report ALL"

 

You can either put options in the config file, or use them on the command line like this:

 

   php smarthistory.php -output HTMP -graph LINK -report ALL

 

Also, just so i am clear, if I run this on day two, it creates another set of smarthistory files?

 

No.  It will add another record to the same files.  Only one data file per hard drive serial number.

 

how would be a good way to set this to run automatically every morning assuming i leave my tower on 24x7.

 

I have mine set to run twice a day via cron, as well as at shutdown and startup.

 

Link to comment

I have mine set to run twice a day via cron, as well as at shutdown and startup.

 

Parity checks are as important when using smarthistory as before.

 

Make sure you run collection BEFORE and AFTER parity checks.

 

The amount of activity during a parity check will dwarf daily activity, so the chances of SMART finding something are tremendously higher.  If it finds a problem, you will want to know immediately afterwards.

 

In fact, running the data collection a number of times DURING the parity check might be informative.  Even a few minutes of parity check activity is A LOT of activity!

 

The daily collection are of no benefit for drives that has been spun down the entire day!

 

Remember, taking SMART statistics is NOT a test of the drive.  For many drive models it does not even require a spin up.  Even if a drive were "primed to fail", SMART will NOT tell you - until the drive actually has issues reading and/or writing data.

 

Don't be lulled into a false sense of security!

 

Link to comment

Guys

 

Im embarrassed to ask this but i have spent the last 2 hours trying to figure it out so i wouldnt bother you all.

 

Given my LIMITED knowledge of linux, i do not know how to edit the crontab so i can schedule certain tasks/scripts to run. It took me an hour to realize the i could not access the /etc folder via windows explorer  ;D

 

So if I cannot directly edit all the files in that folder (like i have been with conf and other files using notepad2) I imagine i have to do it via the command line?

 

I guess the same question would apply to editing files like the host file, etc...

 

If someone wouldn't mind pointing me to some detailed instructions, i would be happy to read and figure it out on my own...

 

Thanks for all the tips.

 

 

 

 

 

Link to comment

Guys

 

Im embarrassed to ask this but i have spent the last 2 hours trying to figure it out so i wouldnt bother you all.

 

Given my LIMITED knowledge of linux, i do not know how to edit the crontab so i can schedule certain tasks/scripts to run. It took me an hour to realize the i could not access the /etc folder via windows explorer  ;D

 

So if I cannot directly edit all the files in that folder (like i have been with conf and other files using notepad2) I imagine i have to do it via the command line?

 

I guess the same question would apply to editing files like the host file, etc...

 

If someone wouldn't mind pointing me to some detailed instructions, i would be happy to read and figure it out on my own...

 

Thanks for all the tips.

 

 

In order to keep this thread on track, I would suggest you start a new thread related to your message above. (copy and pastee to a new thead, linux newbiie help)

In the mean time, I would suggest you telnet into your machine, and also begin reading some tutorials on vi.

There are many more suggestions that can be given, yet should be added to within that thread.

Here are some links to get you started.

 

http://www.tech-geeks.org/contrib/mdrone/LinuxWorkshop/newbie-linux-manual/sections/vi.html

http://www.gentoo.org/doc/en/vi-guide.xml

http://fosswire.com/2007/08/02/unixlinux-command-cheat-sheet/

http://www.unixguide.net/linux/linuxshortcuts.shtml

 

Link to comment

Thanks Rob. Those helped alot.

 

Bubbaq, so my drives seem unable to report when spun down. Is there anything in this script that will allow the drives to be spun up for reading? I tend to spin down my drives often as I am not accessing them daily.

 

Thanks for all the hard work.

Link to comment

The default is to not spin up a sleeping drive. I'll have an option to spin up the drives in future version.  I'll also ave some e-mail notification options.

 

But remember, a sleeping drive will not be experiencing errors.

 

If you set smarthistory to run via cron with a frequency less than your spindown timeout, you will get at  least one reading every time a drive is spun up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.