Hard drive possiblly dieing? Spin Recount


erikatcuse

Recommended Posts

I have a drive that I added to my array a few weeks ago.  I did a preclear on it and it seems to be working fine.  However mymain smart report gives me

0 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      2

 

On the smart page it's yellow should I be concerned. I've run the smart test on the drive and nothing has changed.

 

Erik

Link to comment

I have a drive that I added to my array a few weeks ago.  I did a preclear on it and it seems to be working fine.  However mymain smart report gives me

0 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       2

 

On the smart page it's yellow should I be concerned. I've run the smart test on the drive and nothing has changed.

 

Erik

All it indicates is that twice in the lifetime of the drive it did not spin up to speed in the time allotted by the manufacturer.  It could have been when you powered up and then immediately powered down the server, or it could have been something real affecting the friction of the bearings of the motor. (perhaps the drive was especially cold)

 

If the number stays the same, you'll probably be perfectly fine.  If you see the number increasing over time, (especially if it is increasing rapidly)  then it is an indication of a real physical problem.

 

I have 1TB Seagate with a spin_retry_coiunt=11.  It is my parity drive, but I know I had a lot of power on/off issues that might have been partly the reason.  I'll keep an eye on it.

 

the myMain plug-in makes it easy to keep an eye on the SMART attributes.  The SMART firmwware on the drive will consider the drive as FAILED once the number gets over a manufacturerTthreshold (hard to interpret their threshold numbers, but it might be 100 for your drive)

 

Just think of how many drives there are in the world with no ability to see the SMART health.  Is it no wonder there are only two kinds of hard-disks in the world...

Those hard-disks that have already failed,

and those hard-disks that have not yet failed, but eventually will... just give them some more time.

 

Joe L.

Link to comment

Seems to me, that there is a market for an application that would:

 

Daily, retrieve SMART parameters, retain history for some period of time, with variable granularity, so you can not only evaluate current SMART parameters, but also identify delta from prior values, and trends.

 

IIRC, the general "drive health" from SMART implements something like this, but I'm' not sure what parameters is uses, or what is the history time frame.

Link to comment

I agree.

 

What we can do for now, is see if Brian would be willing, in his spare time  ;D, to enhance his MyMain SMART data collection.  It just needs a scheduling process, to produce a collection of SMART reports with drive serial number and timestamp in their file names.  He could then perhaps add columns to the SMART view for Month Ago, Year Ago etc, with the numbers and/or the deltas.  Obviously it will be a little more complicated than that, but ...

Link to comment

I have 2 Seagate 1T drives where the spin_retry_count keeps marching upward.  I think they are up to 11 and 13.  No other drives in the array are having this issue.  Both Seagates were bought together so may be a firmware or production issue.

 

This is indicated as an important attribute, but (IMHO) more for a case where a drive stops spinning up reliably.  Kind of like a lawnmower that starts great and then suddenly it is very hard to start.  Definitely indicates a problem.

 

But the issue with the Seagates is that they never spin up quite quickly enough.  Like a lawn mower that's always required a couple of extra pulls.  I've seen a lot of reports of this particular drive having spin_retry_counts.  I'm just going to ignore it for now.  At some point I guess the SMART system will fail the drive for this.  At that point maybe I'll invoke the warranty and replace the drives.

 

As for a tool to allow an historic view of prior smart reports, I don't think it would be very hard.  CRON could be set up to run the reports as often as desired.  But is there a real need for this?

 

Through configuration there is an easy way to set a value on a particular SMART attribute for a drive that is reporting a problem.  For example, my spin retry attributes.  I set them and my smart view showed clean.  But when the values increased, they reappeared.  In this way I don't have to look at a sea of yellow, orange, or red attribute values every time I go to the SMART view and have to try and remember if they were the same as last time I looked.  If anything shows up I know it is something new.  The configuration value is treated as a threshold, so I could set the value to 20 on these and they won't show up again until it hits 21.

 

The ability to look in the past and see the trends on my drives attributes is not that interesting to me since I check the smart view periodically.  The only reason I like to have the old smart reports (and I do save them on each boot and on an ad hoc basis) is to be able to research when a problem started.

 

RobJ - please define "spare time".  ???  I think I once heard of it, but can't really remember what it means ... ;)

Link to comment
Through configuration there is an easy way to set a value on a particular SMART attribute for a drive that is reporting a problem.  For example, my spin retry attributes.  I set them and my smart view showed clean.  But when the values increased, they reappeared.  In this way I don't have to look at a sea of yellow, orange, or red attribute values every time I go to the SMART view and have to try and remember if they were the same as last time I looked.  If anything shows up I know it is something new.  The configuration value is treated as a threshold, so I could set the value to 20 on these and they won't show up again until it hits 21.

 

I need to ask...how do you configure the threshold value?

 

Thanks

 

Erik

Link to comment

I need to ask...how do you configure the threshold value?

 

You can't.. they are set by the manufacturer.

The disk manufacturer sets the thresholds to determine when to mark a disk as "FAILED"

However, myMain's SMART screen has per-disk thresholds for color-coding specific attributes.  That way, if your disk has 20 Spin-retries, you can set that disk's attribute threshold to 20 and not have it color-coded as yellow or red until it next changes.

 

I'll let bjp999 describe how to set the myMain "reporting" thresholds in its configuration file.... He has a lot of "spare" time... (I think he even mentioned "spare time" in a previous post.  ;))

 

Joe L.

Link to comment
As for a tool to allow an historic view of prior smart reports, I don't think it would be very hard.  CRON could be set up to run the reports as often as desired.  But is there a real need for this?

 

Through configuration there is an easy way to set a value on a particular SMART attribute for a drive that is reporting a problem.  For example, my spin retry attributes.  I set them and my smart view showed clean.  But when the values increased, they reappeared.  In this way I don't have to look at a sea of yellow, orange, or red attribute values every time I go to the SMART view and have to try and remember if they were the same as last time I looked.  If anything shows up I know it is something new.  The configuration value is treated as a threshold, so I could set the value to 20 on these and they won't show up again until it hits 21.

 

The ability to look in the past and see the trends on my drives attributes is not that interesting to me since I check the smart view periodically.  The only reason I like to have the old smart reports (and I do save them on each boot and on an ad hoc basis) is to be able to research when a problem started.

 

I'm not sure I'd call it a 'real need', but I would call it really useful.  Most users, especially the new ones, don't know much about SMART attributes, or editing config files.  And even those of us who do to some extent but are lazy, probably won't bother very often.  It seems to me that it would be far more useful to most users if there was a tool that automatically tracked and flagged attribute deltas, that automatically color coded the most important deltas.  Perhaps even better, tie this into the various user notification systems, such as emailing a report of an increase in reallocated sectors, etc.  Also very useful (in your spare free time of course!  ;D ), add a graph next to a flagged item, showing the increase over time with color coding of the rises!  I have no idea how it could be done, but wouldn't that be a great feature of unRAID!  A hard drive might start to go bad, and immediately MyMain would email a report, and red flag the changes, numerically and graphically.

Link to comment

add a graph next to a flagged item, showing the increase over time with color coding of the rises!  I have no idea how it could be done,

 

Google Chart API:

 

   http://code.google.com/apis/chart/types.html

 

I am working on CPU/I/O graphs for BubbaRaid, and I started with rrd tools, but once I found Google Chart, it simplified a lot of stuff.

Oh boy... that chart API looks like it might be very cool... (bookmarked)
Link to comment

I need to ask...how do you configure the threshold value?

 

Here is an example.  You need to put lines like this in your myMain_local.conf:

 

SetDriveValue(WD-WCAPT0066999, ata_error_count_ok, "14")

 

The first parameter is the drive serial number.  The second parameter is the name of the attributre with "_ok" appended.  The third parameter is the threshold value.

 

Here are a couple of other examples:

 

SetDriveValue(9QJ09111, reallocated_sector_ct_ok, "1")

SetDriveValue(9QJ09111, spin_retry_count_ok, "12")

 

While you're at it, there are a bunch of other attributes you can set on a per drive basis ...

 

SetDriveValue(5QD2C222, slot,          "C")              # I label drive slots starting at A at the top of my case, and going down.  You can do it anyway that makes sense for you.

SetDriveValue(5QD2C222, id,            "222")          # last three digits of serial number - much easier to remember than the long serial number.

SetDriveValue(5QD2C222, share,        "dvd1")        # samba share name, e.g., disk1.  I create custom samba shares and put the names here.

SetDriveValue(5QD2C222, interface,    "Motherboard") # name of the controller the drive is plugged into.

SetDriveValue(5QD2C222, modelname,  "Barracuda 7200.10")

SetDriveValue(5QD2C222, cache,          "16")

SetDriveValue(5QD2C222, borndate,      "10/10/2007")

SetDriveValue(5QD2C222, borndate_raw,  "2007010")

SetDriveValue(5QD2C222, purchdate,    "2/12/2008")

SetDriveValue(5QD2C222, purchdate_raw, "20080212")

SetDriveValue(5QD2C222, usage,        "Movies") # usage notes

 

If you put these in you'll see them on various myMain views.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.