Jump to content

SMART Report errors


aidan8181

Recommended Posts

Hi I've got an old hard drive that I wish to use as a data drive for my VMs, (not part of the array). I ran a preclear on the drive using the preclear plugin which failed during the preread process. I then ran an extended S.M.A.R.T self test. I'm trying my best to understand the report but I'm getting confused by it. I have uploaded the smart report to pastebin here:

 

https://pastebin.com/raw/4wTfBUui

 

If someone could help me understand the report and let me know if the drive is worth trying to use, that would be great. Should I just give up on this drive? Is it coming to the end of life?

Thanks for any advice

Link to comment

That drive is not in a very healthy state and not worth using.

 

In simplistic items values you want to see in the SMART report to consider a drive worth using with unRAID are:

  • 0 in the value for Pending sectors.    A non-zero value indicates sectors that cannot be read reliably and will cause data corruption.
  • 0 (or at least a very low value) in the reallocated sectors. In theory non-zero values are OK as long as they are static, but experience shows that high values indicate drives that are likely to fail.
Link to comment
1 hour ago, johnnie.black said:

Extended SMART test result tells you all you need to know:

 


SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       80%     46088         343013536

So this means it completed 80% of the read test, the drive has 46088 hours of life, is this correct? What does LBA_of_first_error mean? 343013536 what does this measure?

 

Thanks so much for the replies

Link to comment
10 hours ago, aidan8181 said:

So this means it completed 80% of the read test, the drive has 46088 hours of life, is this correct? What does LBA_of_first_error mean? 343013536 what does this measure?

 

Thanks so much for the replies

 

It means that this specific test was run when the drive had been in use for 46088 hours.

 

And it managed to scan 80% of the disk surface before it detected an error that resulted in the extended SMART test failing.

 

At that time, the SMART test had reached addressable block 343013536 - and failed to read the block. So with knowledge of which file system is used and suitable tools it's possible to look up if there is any file stored at this specific address. If the file system stores a file on this address, then that file is broken. If the file system stores internal data structures at this address, then these structures are broken.

 

There may be more errors at later disk blocks, but the SMART test exists on first found error. It's possible to manually start partial scans, but it isn't that meaningful. Writing new data to this specific sector could potentially clear this error and get a new SMART test to reach further. But it all comes down to trust - disks that have already failed once are more likely to keep failing.

Link to comment
20 hours ago, aidan8181 said:

Hi I've got an old hard drive that I wish to use as a data drive for my VMs, (not part of the array). I ran a preclear on the drive using the preclear plugin which failed during the preread process. I then ran an extended S.M.A.R.T self test. I'm trying my best to understand the report but I'm getting confused by it. I have uploaded the smart report to pastebin here:

 

https://pastebin.com/raw/4wTfBUui

 

If someone could help me understand the report and let me know if the drive is worth trying to use, that would be great. Should I just give up on this drive? Is it coming to the end of life?

Thanks for any advice

 

I would not trust using this drive for anything important:

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   187   187   140    -    252
197 Current_Pending_Sector  -O--CK   192   192   000    -    1465
198 Offline_Uncorrectable   ----CK   198   194   000    -    342
200 Multi_Zone_Error_Rate   ---R--   200   001   000    -    26

These attributes look particularly concerning. The drive has reallocated 252 sectors. Normally this should be zero. I have seem some drives with very small numbers (1-2) reallocated sectors stabilize, but 252 is a big number, and I expect over time this number will get worse.

 

Worse, the drive has detected 1465 sectors that appear to be failing, but there has not yet been a write to those sectors. Only a write will trigger the actual reallocation. You might say that pending sectors are possible read errors waiting to happen. Should be zero.

 

Offline uncorrectable and multi-zone errors are, in my experience, indications of hardware issues in the drive. When I start to see them increment beyond low single digits, even in the absence of pending and reallocated sectors, I start to get nervous. But combined with the sector issues - it is a very bad sign.

 

And, as @johnnie.black, the extended test hit a read error. This is a clear sign of a bad drive.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  9 Power_On_Hours          -O--CK   037   037   000    -    46093

One more thing to point out, this disk has been powered on for 7.25 YEARS. It is old, and assuming these issues are rather recent, would represent a long lived drive that is finally ready to retire.

 

It is a WD Green 1T. These, in my experience, were awesome drives. I have a number of these drives that are still functional (although some have failed). Due to their small size they are not installed even in my backup server. But I would give them a gold star for reliability, and can't fault the drive for failing at this age.

 

I often recommend a drive with some SMART issues as a backup drive, but I'd say this drive's issues would not make it the best candidate - although it would be better than no backup.

Link to comment
35 minutes ago, SSD said:

And, as @johnnie.black, the extended test hit a read error. This is a clear sign of a bad drive.

 

Note that all the 342 offline uncorrectable are sectors that can't be read - and since the drive can't read the old content, it can't move the content to a spare sector.

 

38 minutes ago, SSD said:

I often recommend a drive with some SMART issues as a backup drive, but I'd say this drive's issues would not make it the best candidate - although it would be better than no backup.

 

In this case, it isn't impossible that one of the heads is no longer working well enough - so like a partially deaf old human it has lots of problems hearing the contents on the relevant platter. So the current pending sector count could very well tick up 1000 more within days or weeks. This is most definitely a deteriorating disk. It is not likely to stop finding more problematic sectors - it's more a question of how many more failing sectors per month or week or day.

Link to comment
7 minutes ago, pwm said:

Note that all the 342 offline uncorrectable are sectors that can't be read - and since the drive can't read the old content, it can't move the content to a spare sector.

 

The SMART system tends to put failures in the context of media issues on the drive. Reallocated sectors. Pending sectors. Uncorrectable errors (sectors). But it is interesting that once a drive develops even a single one of these types of conditions, the number continues to grow and does not stabilize. I used to advise those with these types of issues to run 3 parity checks, and if the counts stayed consistent, that the drive was probably ok and that SMART had correctly defected a real weak spot on the media. But no one ever found the numbers to say consistent for more than 1 parity check. And this meshes closely with my own experience. So long as the SMART attributes are zero, all is good. But the first increment is reason to worry, and expect to see the attributes continue to increment. The only drives I've seen where the attributes stabilize above zero, is on new drives where the issue is reported very very early. This is rare, but I have had a couple drives over the years with 1-2 reallocated sectors or even one with 7 pending sectors that never got worse after multiple parity checks and even preclear cycles.

 

So while I might agree with your definition, I am skeptical that this drive is actually reporting errors that are due to media issues. I believe it is more likely the failure of some hardware component in the drive and that the uncorrectables are symptoms of a more systemic hardware issue, preventing the drive from functioning normally after the condition is detected and dealt with (which I believe was the true intent of SMART).

Link to comment

 

3 minutes ago, SSD said:

So while I might agree with your definition, I am skeptical that this drive is actually reporting errors that are due to media issues. I believe it is more likely the failure of some hardware component in the drive and that the uncorrectables are symptoms of a more systemic hardware issue, preventing the drive from functioning normally after the condition is detected and dealt with (which I believe was the true intent of SMART).

 

Which is what I did say, if you notice the second paragraph in my answer. I do not think this drive was a large patch of media that is bad because of war. I suspect that one of the heads has problems reading the data which means it has lost much of the safety margin. So there will be more and more sectors it will fail to read. And if the drive can't read the sectors, then it can't refresh the content and it can't move the content.

 

1 hour ago, pwm said:

In this case, it isn't impossible that one of the heads is no longer working well enough - so like a partially deaf old human it has lots of problems hearing the contents on the relevant platter.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...