Can i trust this disk ?

May 2, 200917 yr

Hi,

after i had errors with one of my disk (http://lime-technology.com/forum/index.php?topic=3715.0)

i read a lot about smart, badblock , pending_sectors ...... but nothing of this help me do get the disk error free.

Also every long smarttest stops with a "read failure"

Because of this, i had buy new one and replaced the bad disk.

Only for interest i try the preclear_disk script on this bad disk....

and for my surprise it runs through all steps without an error. With smartctl i saw

that the current_pending_sectors goes to zero (without any reallocated sectors).

So i start a long smarttest. This time the test runs until his end without any error! (for the first time)

To see if this is a random result, i start again the preclear srcipt. This time with count=2

And again the script ends without any error. Also a second full smarttest ends without an error.

So my question is:

Can i trust this disk and insert it into my array ??

Are there any other tests i can do ?

(I´m a linux rookie)

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00L5B1
Serial Number:    WD-WCAU46045241
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat May  2 16:48:05 2009 GMT-1
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x05) Offline data collection activity
                                        was aborted by an interrupting command from host.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (24000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   161   158   021    Pre-fail  Always       -       6933
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       362
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1940
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       44
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       7
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       360
194 Temperature_Celsius     0x0022   116   107   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1939         -
# 2  Extended offline    Completed without error       00%      1924         -
# 3  Short offline       Completed: read failure       90%      1890         351130716
# 4  Extended offline    Completed: read failure       80%      1875         351127623
# 5  Short offline       Completed without error       00%      1874         -
# 6  Extended offline    Completed: read failure       80%      1862         351130716
# 7  Extended offline    Completed: read failure       90%      1861         351130716
# 8  Extended offline    Completed: read failure       90%      1861         351130716
# 9  Extended offline    Completed: read failure       80%      1860         351155054
#10  Extended offline    Completed: read failure       90%      1860         351130716
#11  Extended offline    Completed: read failure       90%      1838         351130716
#12  Extended offline    Aborted by host               90%      1838         -
#13  Extended offline    Aborted by host               60%      1838         -
#14  Short offline       Completed without error       00%      1836         -
#15  Short offline       Completed without error       00%      1836         -
#16  Short offline       Completed: read failure       90%      1829         351130716
#17  Short offline       Completed: read failure       90%      1829         351130716
#18  Short offline       Completed: read failure       90%      1829         351130716
#19  Short offline       Completed without error       00%      1829         -
#20  Extended offline    Completed: read failure       90%      1824         351125639
#21  Short offline       Completed: read failure       90%      1824         351125639

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

May 2, 200917 yr

You can probably trust the disk at this point. The preclear_disk procedure does a good job of stressing the disk and showing if is truly is bad. I would probably run another preclear on the disk and get another smart test (long) and compare the results. If they look the same I say use the disk but keep an eye on it to be on the safe side.

May 2, 200917 yr

This seems to be a fairly new drive.

There does not seem to be allot of power on hours and if you get more errors, then it's a sign of impending failure.

You can probably trust this for a short time at this point. Next time it gets a read failure, do an RMA.

Usually when I get a read failure, I RMA the drive unless there is data on it I want to get off.

May 2, 200917 yr

and for my surprise it runs through all steps without an error. With smartctl i saw

that the current_pending_sectors goes to zero (without any reallocated sectors).

I've read of what you described, but this is the first time I've heard of somebody specifically noticing it on an unRAID disk.

When a sector "read" fails, the sector is marked for possible "re-allocation" the next time it is written to. The counter for "current_pending_sectors" is incremented showing you how many sectors are waiting for a subsequent write.

Now, when you eventually "write" to a sector marked for a possible "re-allocation" it is FIRST re-written to the correct sector, then re-read and compared. If the "read" is successful, then there is no need to re-allocate at all. It was the original "write" that had written a track that was un-readable... The subsequent "write" was successful... (Who knows why the original write failed, but it really does not matter)

So, it is possible for the current_pending_sectors to go to zero, and not have any re-allocated sectors...

I'd do as suggested, use the preclear script, it is easiest and will do its best to keep you from clearing a drive you did not intend to clear.

May 2, 200917 yr

Author

Thanks for the responds.

I added the disk to my array and start copying data to it. I'll keep an eye at the smart reports.....

I'd do as suggested, use the preclear script, it is easiest and will do its best to keep you from clearing a drive you did not intend to clear.

I used your preclear script three times at this "bad" disk and also to the new one (Thanks for this).

by

May 3, 200917 yr

Couldn't the error not be related to the disk but to the sata cable, or the PSU ?

That could explain why the pre_clear script and the smart tests didn't report any error.

bad cable works fine almost all the time and fail just from time to time.

May 3, 200917 yr

Author

I don´t think it was the cable....because this was the first thing i changed.

The only thing i ask my self at this point:

From what i read unraid write the reconstruct data to a sector if he gets an read error.

So why did i see current_pending_sectors ? If the write was successfully, this is not a pending sector and if the write fails i

must see a reallocated sector.

Also i assumed that for each error at a parity check, i also see the write count increasing for this disk (for each error one write),

but i had one run with 53 errors and only one write.

Only for my understanding, what is wrong with this expectation ?

(Sorry for my bad English, i hope you can understand what i mean)

May 3, 200917 yr

1. A reallocated sector (or pending reallocated sector) cannot be caused by a bad cable. Bad cables can cause all sorts of other problems, but not this one.

2. A pending reallocated sector, according to the troubleshooting wiki, gets tested once more before being marked bad. If that final test is successful, the sector is put back in service. (I think RobJ added this section. RobJ, if you have a source of this info I'd be interested in reading more about how this works). That would explain how your pending reallocations just went away.

3. The read and write counts on the disks are stats kept by the OS, and I'm not sure exactly what they are counting. I've had some peculiar and inconsistent results.

4. The error column on a disk shows READ errors too. A write error would cause the disk to be taken out of service.

5. Given what I've read in this thread, it seems like you have a finicky drive that some of the time is fine and some of the time is not. Personally, I'd do my best to get it replaced. You will never fully trust this drive.

May 4, 200917 yr

I don´t think it was the cable....because this was the first thing i changed.

It certainly was not the cable. There was an internal SMART Read test performed and there were read failures.

This had nothing to do with the cable and controller. it was all internal to the drive.

From what i read unraid write the reconstruct data to a sector if he gets an read error.

Was the drive full, was data actually read from the sector?

These are things you may not know unless you review the syslog history over time.

There could be bad sectors that have not been accessed before.

So why did i see current_pending_sectors ? If the write was successfully, this is not a pending sector and if the write fails i

must see a reallocated sector.

When you did the pre-clear you wrote to every sector.

This may have refreshed every sector header and then the read verify was probably successful.

Can i trust this disk ?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)