peter_sm Posted July 25, 2010 Share Posted July 25, 2010 Hi, I'm concern about one disk, it says that I have error on one disk on the main page :-( It's looks I got these errors yesterday when I did a parity check, I'm doing a new parity check right now and see if those error could be correcter. That I can see at the report is this. 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 4 Below you see the smart result, could someone tell me if this disk is bad? or is it possible to fix those errors? // Peter smartctl -a -d ata /dev/sde smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD10EAVS-00D7B0 Serial Number: WD-WCAU40258662 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Jul 25 11:42:26 2010 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 114) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (23400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 148 144 021 Pre-fail Always - 7558 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3922 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 051 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 12329 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 820 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3921 194 Temperature_Celsius 0x0022 126 102 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 4 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 20% 12329 4921505 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
Joe L. Posted July 25, 2010 Share Posted July 25, 2010 A disk error reported on the unRAID main page is a "read" error. Basically the disk reported to the OS it could not read a sector. When unRAID's driver gets this it re-constructs what could not be read by reading all the other disks in the array and presents to you the re-constructed sector. (the one that could not be read from the physical disk.) unRAID also writes the sector back to the disk it could not read. This gives the firmware on the disk a chance to re-allocate the contents of the un-readable sector to one of the spare sectors. On a typical disk these days there are several thousand spare sectors. it can re-allocate the sector because the "write" to it gave it the contents. You apparently have several other sectors that have been detected as un-readable at some point. When they are next read (or written) they too will be re-allocated (or not, since the firmware first tries to write to the original location, just in case it was not written to originally in a way that could be read back.) The "normalized" value for re-allocated sectors probably started at 200. It will be considered as "failed" when it gets to zero (the value of the "threshold" column). You cannot do anything to change the errors. When you next "write" to those sectors they will be re-allocated. The only way to "force" a write of the entire disk would be to: Run a full parity check (which you've just done) to make sure no other problems exist. Stop the array Un-assign the disk Start the disk with it un-assigned. This will simulate its failure and cause the array to forget its model/serial number Stop the array a second time Re-assign the disk "Start" the array once more by pressing "Start" The array will then re-construct the disk by reading the others in the array, and when it gets to those needing re-allocation they will be re-allocated if the original sector is un-writable. You will be without parity protection while the disk is being re-constructed. The re-construction of a 1TB drive will take at least 8 hours. If your array is not stable enough to be without parity protection for that long, you should make copies of any critical files elsewhere or on multiple disks. Joe L. Link to comment
peter_sm Posted July 25, 2010 Author Share Posted July 25, 2010 Thanks Joe! Great answer, So I should not be worried :-) What I do is only write to the disk from the disk shares from my windows PC ,never to the user shares. So perhaps it's OK to do the procedure that you suggest. EDIT This is the new smart report, there is a change in the Current_Pending_Sector it's now 3, before it's was 4. 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 smartctl -a -d ata /dev/sde smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD10EAVS-00D7B0 Serial Number: WD-WCAU40258662 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Jul 25 18:12:49 2010 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 114) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (23400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 148 144 021 Pre-fail Always - 7558 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3923 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 051 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 12335 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 820 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3922 194 Temperature_Celsius 0x0022 129 102 000 Old_age Always - 21 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 20% 12329 4921505 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Info from the syslog Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923248/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923256/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923264/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923272/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923280/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923288/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923296/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923304/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923312/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923320/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923328/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923336/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923344/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923352/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923360/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923368/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923376/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923384/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923392/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923400/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923408/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923416/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923424/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923432/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923440/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923448/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923456/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923464/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923472/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923480/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923488/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923496/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923504/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923512/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923520/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923528/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923536/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923544/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923552/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923560/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923568/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923576/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923584/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923592/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923600/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923608/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923616/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923624/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923632/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923640/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923648/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923656/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923664/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923672/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923680/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923688/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923696/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923704/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923712/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923720/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923728/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923736/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923744/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923752/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923760/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923768/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923776/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923784/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923792/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923800/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923808/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923816/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923824/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923832/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923840/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923848/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923856/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923864/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923872/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923880/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923888/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923896/4, count: 1 Link to comment
Joe L. Posted July 25, 2010 Share Posted July 25, 2010 Thanks Joe! Great answer, So I should not be worried :-) What I do is only write to the disk from the disk shares from my windows PC ,never to the user shares. users-shares or disk-shares... Makes absolutely no difference how you are writing to the disks. You could be doing a linux "cp" (copy) command and you would be writing to the disks, again no difference. So perhaps it's OK to do the procedure that you suggest. EDIT This is the new smart report, there is a change in the Current_Pending_Sector it's now 3, before it's was 4. 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 smartctl -a -d ata /dev/sde smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD10EAVS-00D7B0 Serial Number: WD-WCAU40258662 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Jul 25 18:12:49 2010 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 114) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (23400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 148 144 021 Pre-fail Always - 7558 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3923 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 051 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 12335 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 820 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3922 194 Temperature_Celsius 0x0022 129 102 000 Old_age Always - 21 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 20% 12329 4921505 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Info from the syslog Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923248/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923256/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923264/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923272/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923280/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923288/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923296/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923304/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923312/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923320/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923328/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923336/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923344/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923352/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923360/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923368/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923376/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923384/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923392/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923400/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923408/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923416/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923424/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923432/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923440/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923448/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923456/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923464/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923472/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923480/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923488/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923496/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923504/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923512/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923520/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923528/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923536/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923544/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923552/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923560/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923568/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923576/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923584/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923592/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923600/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923608/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923616/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923624/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923632/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923640/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923648/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923656/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923664/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923672/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923680/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923688/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923696/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923704/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923712/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923720/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923728/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923736/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923744/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923752/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923760/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923768/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923776/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923784/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923792/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923800/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923808/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923816/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923824/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923832/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923840/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923848/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923856/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923864/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923872/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923880/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923888/4, count: 1 Jul 25 11:48:19 Tower kernel: md: disk4 read error Jul 25 11:48:19 Tower kernel: handle_stripe read error: 4923896/4, count: 1 I see you have no re-allocated sectors... That indicates that so far each time a read error occurred, that unRAID re-wrote the sector based on parity and the other data disks and that the actual sector was not re-allocated since the "write" was successful I'm guessing you did not pre-clear this disk with the preclear_disk.sh script?? Or did you? Joe L. Link to comment
peter_sm Posted July 25, 2010 Author Share Posted July 25, 2010 Hi Joe, I did not cleared the disc, should I move/copy all data to a new disk, and then clear the disc? If that is the case I move the data in the console. //Peter Link to comment
Joe L. Posted July 25, 2010 Share Posted July 25, 2010 Hi Joe, I did not cleared the disc, should I move/copy all data to a new disk, and then clear the disc? If that is the case I move the data in the console. //Peter It is not enough to move the data... you must also remove the disk from the array (un-assign it) I would not worry about it.. Just run a smart report every now and again. The pre-clear script would have identified the un-readable sectors in the pre-read phase then written zeros to them (re-allocating them if appropriate) and then re-reading them in the post phase to ensure what was written as zeros is read back as zeros and to ensure no un-readable sectors remain. Since you did not do that as a pre-clearing process, you'll find your un-readable sectors only when attempting to read your files. (or during a parity check, which reads all the sectors) It is why we stress doing a parity check immediately after doing the initial parity calc. Joe L. Link to comment
peter_sm Posted July 25, 2010 Author Share Posted July 25, 2010 Hi Joe, I have some What would the stepp be if I want to clear the disc, but I need to be sure I back up all data to a new disc, these disc must be outside the array? You say..... Stop the array Un-assign the disk Start the disk with it un-assigned. This will simulate its failure and cause the array to forget its model/serial number Stop the array a second time Re-assign the disk "Start" the array once more by pressing "Start" .....but when shall I do pre clear ? and should above procedure rebuild my data? Link to comment
Joe L. Posted July 25, 2010 Share Posted July 25, 2010 You would perform the pre-clear after you un-assign the disk and re-start the array, but before you re-assign the disk to the array. Yes, it will rebuild the data back onto the existing (but pre-cleared) disk. Before you do anything you'll want to copy any critical files off your array or at least copy them to multiple disks, since you'll be without parity protection for more than a day. Figure 20 hours or so to run the preclear script on the drive, and another 8 or so to rebuild it. Joe L. Link to comment
peter_sm Posted July 26, 2010 Author Share Posted July 26, 2010 Thanks Joe Everything is clear now //Peter Link to comment
peter_sm Posted July 28, 2010 Author Share Posted July 28, 2010 Joe! I did a pre clear of above disk, and I'm going to rebuild the disk, but I am very concern of all the messages a got, should I trust the disk ? //Peter = unRAID server Pre-Clear disk /dev/sdf = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 26C, Elapsed Time: 18:07:09 ============================================================================ == == Disk /dev/sdf has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 64,65c64,65 < 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 < 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 --- > 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 73 71c71,166 < No Errors Logged --- > Warning: ATA error count 713 inconsistent with error log pointer 1 > > ATA Error Count: 713 (device log contains only the most recent five errors) > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 713 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:59.687 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:59.686 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:59.686 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:59.686 SET FEATURES [set transfer mode] > > Error 712 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:55.713 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:55.712 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:55.712 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:55.712 SET FEATURES [set transfer mode] > > Error 711 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:51.606 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:51.605 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:51.605 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:51.605 SET FEATURES [set transfer mode] > > Error 710 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:47.333 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:47.332 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:47.332 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:47.332 SET FEATURES [set transfer mode] > > Error 709 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:43.359 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:43.358 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:43.358 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:43.358 SET FEATURES [set transfer mode] ============================================================================ Link to comment
prostuff1 Posted July 28, 2010 Share Posted July 28, 2010 Stuff here... You need to run another preclear on this drive before you put data back on it and trust it. The sector count should not be going up and should stabilize. For new disks I always run 2 passes at least, with a third to be determined by the outcome of the first 2. Link to comment
Joe L. Posted July 28, 2010 Share Posted July 28, 2010 Joe! I did a pre clear of above disk, and I'm going to rebuild the disk, but I am very concern of all the messages a got, should I trust the disk ? //Peter = unRAID server Pre-Clear disk /dev/sdf = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 26C, Elapsed Time: 18:07:09 ============================================================================ == == Disk /dev/sdf has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 64,65c64,65 < 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 < 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 --- > 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 73 71c71,166 < No Errors Logged --- > Warning: ATA error count 713 inconsistent with error log pointer 1 > > ATA Error Count: 713 (device log contains only the most recent five errors) > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 713 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:59.687 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:59.686 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:59.686 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:59.686 SET FEATURES [set transfer mode] > > Error 712 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:55.713 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:55.712 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:55.712 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:55.712 SET FEATURES [set transfer mode] > > Error 711 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:51.606 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:51.605 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:51.605 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:51.605 SET FEATURES [set transfer mode] > > Error 710 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:47.333 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:47.332 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:47.332 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:47.332 SET FEATURES [set transfer mode] > > Error 709 occurred at disk power-on lifetime: 12384 hours (516 days + 0 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 69 9e 06 e0 Error: UNC at LBA = 0x00069e69 = 433769 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 08 68 9e 06 00 08 1d+12:42:43.359 READ DMA > ef 10 02 00 00 00 00 08 1d+12:42:43.358 SET FEATURES [Reserved for Serial ATA] > ec 00 00 00 00 00 00 08 1d+12:42:43.358 IDENTIFY DEVICE > ef 03 46 00 00 00 00 08 1d+12:42:43.358 SET FEATURES [set transfer mode] ============================================================================ I agree with the previous post. You have 73 sectors pending re-allocation. That is not good. The disk might be usable if a subsequent preclear_disk.sh pass results in no additional un-readable sectors. To have the errors as currently reported, the sectors had to be un-readable in the post-read phase, since the "writing" of zeros to the entire drive should have performed any possible re-allocations of sectors un-readable in the pre-read phase. I would perform another preclear_disk.sh on the disk. If the sectors pending re-allocation do not go away (get re-allocated, or get re-written in place) and the number of Sectors pending re-allocation drop to 0, I would not trust this drive. Joe L. Link to comment
peter_sm Posted July 28, 2010 Author Share Posted July 28, 2010 HI, DATA rebuild was OK, but when run a parity check I got new errors on the main page. right now I'm copying all the data to my new disk, and then take this disk off the array,and test the disk more. EDIT: I going to run some preclear_disk.sh on this disk and see what happens. My new smart results looks like this martctl -a -d ata /dev/sdf smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD10EAVS-00D7B0 Serial Number: WD-WCAU40258662 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jul 28 15:32:11 2010 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (23400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 152 144 021 Pre-fail Always - 7375 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3941 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 051 Old_age Always - 0 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 12399 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 826 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3940 194 Temperature_Celsius 0x0022 124 102 000 Old_age Always - 26 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 20% 12329 4921505 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
peter_sm Posted July 28, 2010 Author Share Posted July 28, 2010 What is the best way to remove a disk from the array ? Link to comment
Joe L. Posted July 28, 2010 Share Posted July 28, 2010 What is the best way to remove a disk from the array ? Please clarify your question. Do you wish to remove a drive and not install a replacement? Do you wish to remove a drive temporally, so you can run a pre_clear on it, and then re-install it? Do you wish to remove it and replace the drive with a different drive? Link to comment
peter_sm Posted July 28, 2010 Author Share Posted July 28, 2010 I was down in the basement, and I heard some "clicking" so its looks like the disk is not OK! Joe: I want to have the disk off the array, and not replace it, if it good after some test i need to figured out if I want the disk in the array or not, right now( then I add it as a "new one") I'm copy the data to my new disk right now EDIT Joe ask :Do you wish to remove a drive and not install a replacement? YES Link to comment
Joe L. Posted July 28, 2010 Share Posted July 28, 2010 I was down in the basement, and I heard some "clicking" so its looks like the disk is not OK! Joe: I want to have the disk off the array, and not replace it, if it good after some test i need to figured out if I want the disk in the array or not, right now( then I add it as a "new one") I'm copy the data to my new disk right now EDIT Joe ask :Do you wish to remove a drive and not install a replacement? YES Ok, That is easy. After you finish copying the data to a new disk. 1. stop the array 2. un-assign the drive you wish to remove 3. Then, do one of the two following, depending on your version of unRAID. 3a. if on a version of unRAID with a "restore" button on the main page, press it after checking the checkbox under it 3b. if on one of the recent versions of unRAID where the "restore" button has been replaced by a command line equivalent, log in via telnet or on the system console and type: initconfig 4. Press "refresh" on your web-browser, all disks should show as "blue" 5. Press "Start" (A new initial parity calculation will begin. You'll be without parity protection until it is complete) Joe L. Link to comment
peter_sm Posted July 28, 2010 Author Share Posted July 28, 2010 Thanks Joe, I have the 4.5.6 version, and I didn't know about the new command initconfig I hope I can copy all my data (so far so good), there is some "funny" noise from the disk and I don't know if I can trust my parity, if so then I should remove my new disk and replace that with my bad disk Link to comment
Joe L. Posted July 28, 2010 Share Posted July 28, 2010 Thanks Joe, I have the 4.5.6 version, and I didn't know about the new command initconfig Too many users of unRAID mistakenly pressed the button labeled as "restore" thinking it would restore their data. It was actually an "Initialize Disk Configuration and Immediately Invalidate Parity" button. Parity is invalidated since it was based on the prior disk configuration. It does not affect data on existing data disks, but pressing the "Restore button" at the wrong time (when a disk has failed) would eliminate any ability to re-construct a failed drive. For that reason the button was removed and re-named as "initconfig" on the command line at this time. It might get put back on the web-interface at some time in the future, but if it does, I hope it is re-named as "Initialize Disk Configuration" to confuse less users. Joe L. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.