Brand new hard drives with tons of errors ? NORMAL?


pyrater

17 posts in this topic Last Reply

Recommended Posts

The most important SMART Attributives which are setup for automatic notification are #5, 187, 188, 197, 198, and 199.  Occasionally, you may get an attribute that says:   FAILING NOW.  Obviously, you should attend to that as soon as possible.  Google and/or a post to the forum would be your friend.  The others can be pretty much ignored...

Edited by Frank1940
Link to post
On 2/28/2018 at 1:09 PM, pyrater said:

...

 

I would draw your attention to the 3 columns named "Value", "Worst", "Threshold" in your screenshots above.

 

These are called "normalized values".

 

The Value is the current normalized value.

 

The Worst is the lowest (worst) value the attribute has ever been at.

 

The Threshold has nothing to do with your specific disk. It is the level at which the drive itself will start reporting it is failing.

 

If you look at the smart reports, the raw read error rates show

 

        Value  Worst Threshold

D1:     79      65        6

D2:     83      68        6

 

So this means both disks have gone down into the 60s, but it isn't until they get down to 6 would the drive would consider itself failed.

 

Generally the nominal "normal" is 100. These are below that. But every manufacturer and even model has different rules, and your values of 79/83 for value and 65/68 for worst might be perfectly normal for these disks.

 

My experience is that the smart values themselves are not that useful. It is really the delta in smart values that are interesting. For example, if you told me that 2 days ago, the value was 100 and worst was 98, and now the value is 79 and worst is 65, I'd be concern with a rapid drop and closely monitor. If you kept this as a baseline and looked at the attributes from time to time for sudden drops, you would be able to more closely track the health of the drive across the various attributes.

 

The attributes Frank mentions above are the ones we typically associate with failing drives. And we look more closely at the Raw values than the normalized ones for those specific attributes. And we are pickier than the drive manufacturers when some of the Raw attributes start to increment. 

 

Hope this helps.

 

Enjoy your array!

 

(#ssdindex - Smart attributes - value, worst, threshold)

Link to post

I would add one more comment to what @SSD has said.  The manufacturers are the one who have setup this SMART system and they NEVER want to get a disk back on a warranty claim based on a SMART report recommendation that it might still be usable!  So they will not flag a disk as failing until (1)  it is either out of warranty or (2) you looked at the SMART values because the disk problem was already obvious to the casual user.  As @SSD mentioned, we unRAID users can't use a disk where 5% of the disk area is unreadable but many users can and do without even realizing it.  If every SINGLE sector can't be read on all but one of the disks in our arrays, our parity system can not recover the data on that one remaining disk!  (Obliviously, two disks for dual parity system.) 

 

That is why you should turn-on the Notification system and look at the reports faithfully.  You will get a message that something is wrong before it becomes a full blown crises situation where data loss is very likely!   

 

Having said that about the manufacturers, I will say that I have never heard that they did not honor the warranty agreement on any disk which was returned to them as defective.  Every time, I have returned a disk under an RMA, the replacement has been shipped on the same day as receipt.  (I have no doubt that they do maintain a database on claims and you might be 'flagged' if you are working the system in some way.) 

Link to post

I'm in the process of migrating in some 2TB Seagate drives (cheap ones, not NAS specific). Each of them has so far flagged up between 4 and 8 'UDMA CRC error count' as they go through the array rebuild process (I'm swapping out some older 500GB drives).

 

That type of error being flagged is a little unnerving, but they are recoverable errors with no data loss. If I see those numbers increasing over time, I'll be concerned.

I'm also seeing high 'raw read error rate' numbers. I'm ignoring those as spurious.

Link to post
10 minutes ago, DigitalStefan said:

I'm in the process of migrating in some 2TB Seagate drives (cheap ones, not NAS specific). Each of them has so far flagged up between 4 and 8 'UDMA CRC error count' as they go through the array rebuild process (I'm swapping out some older 500GB drives).

 

That type of error being flagged is a little unnerving, but they are recoverable errors with no data loss. If I see those numbers increasing over time, I'll be concerned.

I'm also seeing high 'raw read error rate' numbers. I'm ignoring those as spurious.

 

Those are usually fixed by replacing the SATA cable.

Link to post

I think the cables are OK. It might be the SATA controller on my original ASUS Sabertooth 990FX motherboard.

The drives coming out of the server are SATA 3Gbps drives and the new ones are 6Gbps.

 

I will change the cables though, to be sure. I may just knock everything back to 3Gbps speeds. It's not like these drives are going to go beyond 200MB/s during transfers.

Link to post
1 hour ago, DigitalStefan said:

The drives coming out of the server are SATA 3Gbps drives and the new ones are 6Gbps.

Remember that older SATA cables weren't designed for 6Gbps transfer rates.

The first generation cables were intended for max 1.5 Gbps and the second generation cables for 3 Gbps.

Link to post
7 hours ago, pwm said:

Remember that older SATA cables weren't designed for 6Gbps transfer rates.

The first generation cables were intended for max 1.5 Gbps and the second generation cables for 3 Gbps.

 

This hadn't actually crossed my mind. Thanks.

 

On checking, only my parity drive is actually running at 6Gbps speeds. All 3 array drives are running at 3Gbps.

 

They are all connected to the 6Gbps on-board SATA, so I think you've given me some very good advice. I'll be ordering some new cables.

Link to post
19 minutes ago, Fireball3 said:

Buy a new WD drive, please replace the cabling also ...

Good to know which drives I don't want to buy.

 

Now if you read the release a couple of times, it seems that WD didn't do this completely on a whim.  There was some problem behind it... (But the exact nature of the problem is not at all clear to me!)  This article was written by WD and addresses only their drives.  But, what I don't know is, if any other drive manufacturers have also recognized the same problem and addressed it in the same manner as WD!   (As I have said many times, the whole SATA connector system is a poster child on how NOT to design a connector!)  

 

My observation (on a very limited number of cables from a very few manufacturers) is that the ones with metal locking tabs do not have the 'internal bumps' where as the cables with the metal locks do have them.  (You can tell by gently pulling on the cable.  If it take a little bit of force to remove, the cable has the internal bumps.  If it slides with no discernible force, it does not.) 

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.