Faulty disk again


Kaltar

Recommended Posts

Hi All

 

I have again :( a faulty disk, this time it is my parity disk that has som multy zone error.

I noticed it when I was swapping disk a bit around,  added a cache drive and remove another faulty disk from my array (it had some Raw read error rate faults)

 

Is there any way to recover from this, like preclear, lowlvl format or something else, or is it just another RMA of the disk.

I've attached the smart report.

tower-smart-20180212-0803.zip

Link to comment
1 hour ago, Kaltar said:

Hi All

 

I have again :( a faulty disk, this time it is my parity disk that has som multy zone error.

I noticed it when I was swapping disk a bit around,  added a cache drive and remove another faulty disk from my array (it had some Raw read error rate faults)

 

Is there any way to recover from this, like preclear, lowlvl format or something else, or is it just another RMA of the disk.

I've attached the smart report.

tower-smart-20180212-0803.zip

Nothing to recover from - the drive is still working. But reason to keep an eye on the drive because it is indicating that it has a little bit hard to read some data.

 

1 hour ago, Kaltar said:

Does Unraid do something funky with the drives?

Nothing funky happening. unRAID is just better than your normal computer at showing drive issues, with the intention to catch issues before you get a big data loss.

Link to comment
2 minutes ago, pwm said:

Nothing to recover from - the drive is still working. But reason to keep an eye on the drive because it is indicating that it has a little bit hard to read some data.

I beg to differ :(

Sure from the smart report it "may seem fine" but yesterday when I assigned it as parity drive (it has been that for a long time) and it needed to rebuild the array, it failed totally to a point where unraid simply disabled the drive.

I was able to enable it again by doing a "tools->new config" then assign the drives the right places and start the array, but again, the disk failed totally.

 

So right now my array is running without parity :( That causes me some worries.

Link to comment
2 minutes ago, Kaltar said:

I beg to differ :(

Sure from the smart report it "may seem fine" but yesterday when I assigned it as parity drive (it has been that for a long time) and it needed to rebuild the array, it failed totally to a point where unraid simply disabled the drive.

I was able to enable it again by doing a "tools->new config" then assign the drives the right places and start the array, but again, the disk failed totally.

 

So right now my array is running without parity :( That causes me some worries.

 

The drive hasn't flagged any sector as uncorrectable.

 

But I didn't see you also had a failed extended test - a bit interesting that the extended test failed but did not flag any broken sector.

 

Multi zone errors means the drive have had problems when writing, but the drive normally retries multiple times.

 

Do you have lots of vibrations in the case? What cooling do you use - I see your drive is below room temperature.

 

Link to comment
29 minutes ago, pwm said:

 

The drive hasn't flagged any sector as uncorrectable.

 

But I didn't see you also had a failed extended test - a bit interesting that the extended test failed but did not flag any broken sector.

 

Multi zone errors means the drive have had problems when writing, but the drive normally retries multiple times.

 

Do you have lots of vibrations in the case? What cooling do you use - I see your drive is below room temperature.

 

 

No real vibrations, the drive is mounted in my Inter-Tech IPC 4U 4424 which is sitting on some rubberpads.

It is located in my attic, which is pretty cool at the moment.

i have 2 other WD red disk, which is over 3 years old now, which so far have no issues.

 

Could it be cause the drives are running cold ?

When in use, it is not warm, but right now it is running some test at around 17 C

According to WD, the should operate fine at temperatures from 0 - 65 C

WD RED temp.JPG

Edited by Kaltar
Link to comment

These attributes should be 0 on an healthy WD drive:

 

Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    4200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    3

While a non zero value, specially if it's only low single digits Raw_Read_Error doesn't necessarily means the disk is failing, errors on both together with a failing extended SMART test means that disk is failing and should be replaced.

 

 

Link to comment
37 minutes ago, Kaltar said:

Could it be cause the drives are running cold ?

No. The temperature is fine. I just wanted to make sure it really was below room temperature - I have seen drives that have crashed their SMART data making the SMART reports read out noise.

 

With just the read error rate + multi zone I would have recommended to keep a very close eye on the drive. But with the failed extended test, I would recommend replacement.

Link to comment

Damn, I have a disk that is REALLY getting on my nerves..

A few days ago it was bugging me as hell.

I could't use it as parity drive, because Unraid then disabled it.

I could run a smart quicktest without errors but extended test failed..

 

No, without doing anything, it works again.

Extended smart test ran without errors.

It is the same disk I wrote about here a week ago. :(

 

If I ship it back to Western Digital, wouldn't they just say that the drive is fine ?

tower-smart-20180219-0719.zip

Link to comment
3 hours ago, Kaltar said:

Damn, I have a disk that is REALLY getting on my nerves..

A few days ago it was bugging me as hell.

I could't use it as parity drive, because Unraid then disabled it.

I could run a smart quicktest without errors but extended test failed..

 

No, without doing anything, it works again.

Extended smart test ran without errors.

It is the same disk I wrote about here a week ago. :(

 

If I ship it back to Western Digital, wouldn't they just say that the drive is fine ?

tower-smart-20180219-0719.zip

 

No. Just means they would send it to some poor slob as a refurb. Just like you have a risk of getting some other user's problem drive when they send you one.

 

Not hugely surprising to see a drive that intermittently works.

 

I have read that drives that operate in a relatively consistent temperature range fair better than those with wide variance. So a drive operating consistently, even at a relatively high temperature, might fair better than another drive with high temperature variance that never gets as hot.

 

Given the relative small numbers involved, it is incredibly difficult to assess root cause of a user having problems. Could be latent shipping damage of drives not well secured, could be temps too hot/cold/variable, high vibration, could be just bad luck. Could be drive model is not a good one, (although we have people here that hate and have nothing but problems with brand x and love brand y, and then others just the reverse).

 

So it's hard to know what's going on look your case, but with a server in an attic I tend to think environmental issues like heat and condensation.

 

The nature of unRaid drive usage is different than usage in a workstation. The drives tend to get used less frequently, but when they are used, involved in extended sequential use. Much less random reading and writing.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.