SMART shows "read failure" - time to replace?


Recommended Posts

In my unRAID array I have a 1.5TB data drive that showed some read errors after, I think, a parity check. I finally got round to running a SMART self-test, but don't know how to interpret the results.

 

Near the start of the report it says:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

 

Near the end it says:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       40%      2639         2129605537

 

Should the disk be replaced? It doesn't have a "red ball"; it's still showing green.

Link to comment

I hate to ever disagree with johnnie.black, but in this case, I think he may have skimmed the post too quickly!    ;)

 

The drive did not fail the SMART test, just the read scan failed, and that should always happen if you have any "197 Current_Pending_Sector"s or "198 Offline_Uncorrectable" sectors.  The short and long SMART test do a drive system test first, then either a short targeted read scan or the long comprehensive scan.  They stop at the first sector they can't read, and report the "Completed: read failure" that you saw.

 

This is where you would generally check the SMART attributes, then prepare to Preclear the drive (perhaps twice), or run a badblocks destructive write test on it.  Now if the Preclear fails, you would then consider replacing the drive.

Link to comment

It's just my opinion, but I replace any disk that fails a SMART read test, even if the sectors can be remapped, I could maybe re-use it for a backup server, never on a main server, same for any disk with reallocated sectors, in my experience once a disk gets some bad sectors, there a very high likelihood of getting more in the near future.

 

 

Link to comment

This is where you would generally check the SMART attributes, then prepare to Preclear the drive (perhaps twice), or run a badblocks destructive write test on it.  Now if the Preclear fails, you would then consider replacing the drive.

 

Unfortunately the drive is in the array, so preclearing or anything destructive is going to erase the data on it. I think I might replace the drive anyway with a larger one that I have, and then use it somewhere else (after some testing).

Link to comment

This is where you would generally check the SMART attributes, then prepare to Preclear the drive (perhaps twice), or run a badblocks destructive write test on it.  Now if the Preclear fails, you would then consider replacing the drive.

 

Unfortunately the drive is in the array, so preclearing or anything destructive is going to erase the data on it. I think I might replace the drive anyway with a larger one that I have, and then use it somewhere else (after some testing).

 

Your thought is certainly the proper action!  Particularity, since you already have a replacement disk available.  Then test the one you take out.  If the errors are fixed on the first pass (or test cycle),  then do a couple of more passes.  If you don't get any additional errors, you could consider it a one-time event and use the disk.  If you get more errors on any of the later passes, I would pitch the disk!!!  (Is this disk still in warranty--- 2639 lifetime hours? You might want to consider sending it back.) 

Link to comment

(Is this disk still in warranty--- 2639 lifetime hours? You might want to consider sending it back.)

 

The disk is at least a few years old and has been running in the array, so I'm not sure where that number comes from.

 

What might be of interest is that I stuck that same disk into a Windows 7 box (native SATA) and ran the WD Data Lifeguard Diagnostics on it, and here's what I got (in the order in which I ran the tests):

Quick Test: fail (twice)

Extended Test: pass

Quick Test: fail

Write Zeros (short): pass

Write Zeros (full): pass

Quick Test: pass (twice)

 

Now I'm not sure what to make of this! Is it possible that the writing fixed the Quick Test error?

 

 

Link to comment

(Is this disk still in warranty--- 2639 lifetime hours? You might want to consider sending it back.)

 

The disk is at least a few years old and has been running in the array, so I'm not sure where that number comes from.

 

What might be of interest is that I stuck that same disk into a Windows 7 box (native SATA) and ran the WD Data Lifeguard Diagnostics on it, and here's what I got (in the order in which I ran the tests):

Quick Test: fail (twice)

Extended Test: pass

Quick Test: fail

Write Zeros (short): pass

Write Zeros (full): pass

Quick Test: pass (twice)

 

Now I'm not sure what to make of this! Is it possible that the writing fixed the Quick Test error?

 

My thought is that writing to the disk will clear pending sectors from usage by mapping in new sectors from the pool of spare sectors.  (This is a normal procedure  by the way!) 

 

I would suggest that you google about your results and see what answers you might find that way.  WD might also have a manual for their utility that might provide some answers. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.