May 2, 200917 yr Hi, after i had errors with one of my disk (http://lime-technology.com/forum/index.php?topic=3715.0) i read a lot about smart, badblock , pending_sectors ...... but nothing of this help me do get the disk error free. Also every long smarttest stops with a "read failure" Because of this, i had buy new one and replaced the bad disk. Only for interest i try the preclear_disk script on this bad disk.... and for my surprise it runs through all steps without an error. With smartctl i saw that the current_pending_sectors goes to zero (without any reallocated sectors). So i start a long smarttest. This time the test runs until his end without any error! (for the first time) To see if this is a random result, i start again the preclear srcipt. This time with count=2 And again the script ends without any error. Also a second full smarttest ends without an error. So my question is: Can i trust this disk and insert it into my array ?? Are there any other tests i can do ? (I´m a linux rookie) === START OF INFORMATION SECTION === Device Model: WDC WD10EADS-00L5B1 Serial Number: WD-WCAU46045241 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat May 2 16:48:05 2009 GMT-1 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x05) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (24000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 161 158 021 Pre-fail Always - 6933 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 362 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1940 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 44 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 360 194 Temperature_Celsius 0x0022 116 107 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1939 - # 2 Extended offline Completed without error 00% 1924 - # 3 Short offline Completed: read failure 90% 1890 351130716 # 4 Extended offline Completed: read failure 80% 1875 351127623 # 5 Short offline Completed without error 00% 1874 - # 6 Extended offline Completed: read failure 80% 1862 351130716 # 7 Extended offline Completed: read failure 90% 1861 351130716 # 8 Extended offline Completed: read failure 90% 1861 351130716 # 9 Extended offline Completed: read failure 80% 1860 351155054 #10 Extended offline Completed: read failure 90% 1860 351130716 #11 Extended offline Completed: read failure 90% 1838 351130716 #12 Extended offline Aborted by host 90% 1838 - #13 Extended offline Aborted by host 60% 1838 - #14 Short offline Completed without error 00% 1836 - #15 Short offline Completed without error 00% 1836 - #16 Short offline Completed: read failure 90% 1829 351130716 #17 Short offline Completed: read failure 90% 1829 351130716 #18 Short offline Completed: read failure 90% 1829 351130716 #19 Short offline Completed without error 00% 1829 - #20 Extended offline Completed: read failure 90% 1824 351125639 #21 Short offline Completed: read failure 90% 1824 351125639 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
May 2, 200917 yr You can probably trust the disk at this point. The preclear_disk procedure does a good job of stressing the disk and showing if is truly is bad. I would probably run another preclear on the disk and get another smart test (long) and compare the results. If they look the same I say use the disk but keep an eye on it to be on the safe side.
May 2, 200917 yr This seems to be a fairly new drive. There does not seem to be allot of power on hours and if you get more errors, then it's a sign of impending failure. You can probably trust this for a short time at this point. Next time it gets a read failure, do an RMA. Usually when I get a read failure, I RMA the drive unless there is data on it I want to get off.
May 2, 200917 yr and for my surprise it runs through all steps without an error. With smartctl i saw that the current_pending_sectors goes to zero (without any reallocated sectors). I've read of what you described, but this is the first time I've heard of somebody specifically noticing it on an unRAID disk. When a sector "read" fails, the sector is marked for possible "re-allocation" the next time it is written to. The counter for "current_pending_sectors" is incremented showing you how many sectors are waiting for a subsequent write. Now, when you eventually "write" to a sector marked for a possible "re-allocation" it is FIRST re-written to the correct sector, then re-read and compared. If the "read" is successful, then there is no need to re-allocate at all. It was the original "write" that had written a track that was un-readable... The subsequent "write" was successful... (Who knows why the original write failed, but it really does not matter) So, it is possible for the current_pending_sectors to go to zero, and not have any re-allocated sectors... I'd do as suggested, use the preclear script, it is easiest and will do its best to keep you from clearing a drive you did not intend to clear.
May 2, 200917 yr Author Thanks for the responds. I added the disk to my array and start copying data to it. I'll keep an eye at the smart reports..... I'd do as suggested, use the preclear script, it is easiest and will do its best to keep you from clearing a drive you did not intend to clear. I used your preclear script three times at this "bad" disk and also to the new one (Thanks for this). by
May 3, 200917 yr Couldn't the error not be related to the disk but to the sata cable, or the PSU ? That could explain why the pre_clear script and the smart tests didn't report any error. bad cable works fine almost all the time and fail just from time to time.
May 3, 200917 yr Author I don´t think it was the cable....because this was the first thing i changed. The only thing i ask my self at this point: From what i read unraid write the reconstruct data to a sector if he gets an read error. So why did i see current_pending_sectors ? If the write was successfully, this is not a pending sector and if the write fails i must see a reallocated sector. Also i assumed that for each error at a parity check, i also see the write count increasing for this disk (for each error one write), but i had one run with 53 errors and only one write. Only for my understanding, what is wrong with this expectation ? (Sorry for my bad English, i hope you can understand what i mean)
May 3, 200917 yr 1. A reallocated sector (or pending reallocated sector) cannot be caused by a bad cable. Bad cables can cause all sorts of other problems, but not this one. 2. A pending reallocated sector, according to the troubleshooting wiki, gets tested once more before being marked bad. If that final test is successful, the sector is put back in service. (I think RobJ added this section. RobJ, if you have a source of this info I'd be interested in reading more about how this works). That would explain how your pending reallocations just went away. 3. The read and write counts on the disks are stats kept by the OS, and I'm not sure exactly what they are counting. I've had some peculiar and inconsistent results. 4. The error column on a disk shows READ errors too. A write error would cause the disk to be taken out of service. 5. Given what I've read in this thread, it seems like you have a finicky drive that some of the time is fine and some of the time is not. Personally, I'd do my best to get it replaced. You will never fully trust this drive.
May 4, 200917 yr I don´t think it was the cable....because this was the first thing i changed. It certainly was not the cable. There was an internal SMART Read test performed and there were read failures. This had nothing to do with the cable and controller. it was all internal to the drive. From what i read unraid write the reconstruct data to a sector if he gets an read error. Was the drive full, was data actually read from the sector? These are things you may not know unless you review the syslog history over time. There could be bad sectors that have not been accessed before. So why did i see current_pending_sectors ? If the write was successfully, this is not a pending sector and if the write fails i must see a reallocated sector. When you did the pre-clear you wrote to every sector. This may have refreshed every sector header and then the read verify was probably successful.
Archived
This topic is now archived and is closed to further replies.