disco Posted August 25, 2010 Share Posted August 25, 2010 Wow, wish I would have seen that WIKI post earlier. Going to take a long time to add jumpers to all of my drives one by one and rebuild data. Guess it's worth it though? hmm Quote Link to comment
disco Posted August 25, 2010 Share Posted August 25, 2010 So it's only the WD20EARS, right? I have some WD20EADS drives as well...I'm assuming these are not affected? Quote Link to comment
BRiT Posted August 25, 2010 Share Posted August 25, 2010 Older EADS do not use 4K sectors (advance format, new format, etc). However, I heard mention that some newer EADS may be using 4K sectors as well. Quote Link to comment
jazzysmooth Posted August 25, 2010 Share Posted August 25, 2010 So it's only the WD20EARS, right? I have some WD20EADS drives as well...I'm assuming these are not affected? You need to look on the label and see if it states Advanced Format Technology. I recently purchased 2 1 TB EADS drives and they have it. Quote Link to comment
DingHo Posted September 8, 2010 Share Posted September 8, 2010 I just finished a the preclear script of 1 drive and had some questions. I was preclearing this drive, and another simultaneously. I get the message about "suspended by an offline command from host" and was just making sure everything is ok. I also attached smartctl report below, taken just after preclear finished. =========================================================================== = unRAID server Pre-Clear disk /dev/sdd = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 42C, Elapsed Time: 16:54:58 ============================================================================ == == Disk /dev/sdd has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x82) Offline data collection activity < was completed without error. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. ============================================================================ root@Scour:/boot# Device Model: WDC WD1001FALS-00K1B0 Serial Number: WD-WMATV1553669 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Sep 8 16:03:21 2010 SGT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 229 226 021 Pre-fail Always - 8516 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2172 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5172 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 773 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 123 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2172 194 Temperature_Celsius 0x0022 108 102 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
Orbi Posted September 8, 2010 Share Posted September 8, 2010 Looks good, I don't see any reallocated sectors or pending reallocation. Those are really the two lines (196&197) you want to keep an eye on. Quote Link to comment
DingHo Posted September 8, 2010 Share Posted September 8, 2010 Thanks Orbi, What I'm also concerned about is, "Offline data collection activity was suspended by an interrupting command". What is this about? Quote Link to comment
Joe L. Posted September 8, 2010 Share Posted September 8, 2010 Thanks Orbi, What I'm also concerned about is, "Offline data collection activity was suspended by an interrupting command". What is this about? Many disks do their own tests when not being accessed. The "short" and "long" tests often mentioned are two types of those tests. Some drives also re-callibrate themselves every so often. These are all considered off-line tests. The message simply indicates the off-line test was terminated when the drive was accessed. You can ignore it. You would get the same type of message if a long test was requested and you asked the disk to spin-down. Joe L. Quote Link to comment
CrashnBrn Posted September 9, 2010 Share Posted September 9, 2010 Is this anything to be concerned about? 195 Hardware_ECC_Recovered 0x001a 060 048 000 Old_age Always Thanks. Quote Link to comment
CrashnBrn Posted September 9, 2010 Share Posted September 9, 2010 No. 060 > 000 Thanks Quote Link to comment
barbapapa Posted September 13, 2010 Share Posted September 13, 2010 First off, thanks again to Joe for a great script! I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered). I precleared them simultaneously using separate screen sessions. They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning. I gather from reading posts in the forum about it, I shouldn't need to worry about this. I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though: Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 Are my worries about this drive warranted? Quote Link to comment
Orbi Posted September 13, 2010 Share Posted September 13, 2010 First off, thanks again to Joe for a great script! I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered). I precleared them simultaneously using separate screen sessions. They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning. I gather from reading posts in the forum about it, I shouldn't need to worry about this. I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though: Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 Are my worries about this drive warranted? Well that depends. Any sectors pending reallocation is not good. But it's ok as long as there not thausands of them and they do not increase over time. So what you should do it to run a few more times the preclear script and check if there will be an increase in "5 Reallocated_Sector_Ct " and the pending sectors. After you run the next preclear, the 17 pending sectors should have been reallocated and appear in "5 Reallocated_Sector_Ct ". If there is no increase in pending sectors, you'll probably be fine. Just keep an eye on it from time to time by running some SMART tests while it's in use in your array. Quote Link to comment
Joe L. Posted September 13, 2010 Share Posted September 13, 2010 First off, thanks again to Joe for a great script! I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered). I precleared them simultaneously using separate screen sessions. They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning. I gather from reading posts in the forum about it, I shouldn't need to worry about this. I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though: Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 Are my worries about this drive warranted? Well that depends. Any sectors pending reallocation is not good. But it's ok as long as there not thausands of them and they do not increase over time. So what you should do it to run a few more times the preclear script and check if there will be an increase in "5 Reallocated_Sector_Ct " and the pending sectors. After you run the next preclear, the 17 pending sectors should have been reallocated and appear in "5 Reallocated_Sector_Ct ". If there is no increase in pending sectors, you'll probably be fine. Just keep an eye on it from time to time by running some SMART tests while it's in use in your array. The statement is true... but remember this... those pending re-allocation must have been detected during the post-read phase. If they were detected in the pre-read phase, they should have already been re-allocated when the zeros were written to the drive. Therefore, the advice given is sound. Run the pre-clear script several more times on that drive. I realize it will take a while, but it is a LOT easier to get the drive replaced now it is continues to have errors than after it is in your array and it holds your data. As far as "raw read error rate" the raw values for that attribute are usable only by the manufacturer. We can only look at the "normalized" values. Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291 The "worst" value was 195, the failure threshold is 51. You are nowhere close to "failing" All drives have raw-read-errors, they all re-try. Some report it on the smart report, others do not. The raw-read-error does not mean the "read" failed, just that it had to re-try. Joe L. Quote Link to comment
barbapapa Posted September 15, 2010 Share Posted September 15, 2010 Thanks guys! Running 3 more cycles on that drive now. Quote Link to comment
wsume99 Posted September 15, 2010 Share Posted September 15, 2010 I don't think I've seen this question answered yet in this thread... How many times should preclear be run on a new drive? Should I run it once and if nothing unusual pops up in the SMART reports then stasrt using it. If the drive is going to be problematic does it usually show signs from the very beginning or are more cycles needed to weed out the bad drives before putting them into service? Quote Link to comment
Joe L. Posted September 15, 2010 Share Posted September 15, 2010 I don't think I've seen this question answered yet in this thread... How many times should preclear be run on a new drive? Should I run it once and if nothing unusual pops up in the SMART reports then stasrt using it. If the drive is going to be problematic does it usually show signs from the very beginning or are more cycles needed to weed out the bad drives before putting them into service? Good question... To know how long to run a drive before "early" problems are discovered would be a whole research topic in itself. I'd guess the term "bathtub curve" is the way it is described. If you can get a full pre-clear cycle on a modern disk and it shows no problems, then you've run it fairly hard for 20 to 30 hours or so. It is a longer burn-in period than most manufacturers will provide. If you have the time and do not need the disk in the array immediately, go ahead and do another cycle. If no problems showed, and you need the disk in the array now, assign it and start using it. I usually run several cycles on my disks if I'm not pressed for time. I would guess many do not do multiple cycles, but it was a requested feature so I even added a command line option for it preclear_disk.sh -c N /dev/sdX where N is a number from 1 through 20. Joe L. Quote Link to comment
prostuff1 Posted September 15, 2010 Share Posted September 15, 2010 Good question... To know how long to run a drive before "early" problems are discovered would be a whole research topic in itself. I'd guess the term "bathtub curve" is the way it is described. What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail. I am a prefect example of this at the very moment. I had a 1.5TB Seagate I ran 3 preclear cycles on. The first changed some stuff so I ran one more. Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array. Yesterday something happened with that drive and now it is going back for RMA. Quote Link to comment
dgaschk Posted September 15, 2010 Share Posted September 15, 2010 HITACHI Deskstar 7K2000 HDS722020ALA330 (0F10311) 2TB 7200 RPM SATA 3.0Gb/s =========================================================================== = unRAID server Pre-Clear disk /dev/sdc = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 35C, Elapsed Time: 27:12:55 ============================================================================ == == Disk /dev/sdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 61,62c61,62 < 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 481 < 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 481 --- > 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 482 > 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 482 ============================================================================ root@192:/boot# I'm curious about the differences at the end. What does this indicate? Quote Link to comment
dgaschk Posted September 15, 2010 Share Posted September 15, 2010 Model WDC WD2500JB-32FUA0 Series Caviar SE Interface IDE Ultra ATA100 Capacity 250GB RPM 7200 RPM Cache 8MB =========================================================================== = unRAID server Pre-Clear disk /dev/hdc = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 38C, Elapsed Time: 7:44:38 ============================================================================ == == Disk /dev/hdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 20,21c20,21 < Offline data collection status: (0x82) Offline data collection activity < was completed without error. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. 56c56 < 7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 --- > 7 Seek_Error_Rate 0x000b 100 253 051 Pre-fail Always - 0 ============================================================================ root@192:/boot# What does the change in SMART data indicate? Quote Link to comment
Joe L. Posted September 15, 2010 Share Posted September 15, 2010 Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss. Load_Cycle_Count is how many times the disk heads are loaded from their parked position. Both counts incremented by 1. (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them) I'd look for a loose power connection to the drive. Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc... It is not normal to see those counts increment in a pre-clear cycle. Quote Link to comment
dgaschk Posted September 15, 2010 Share Posted September 15, 2010 Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss. Load_Cycle_Count is how many times the disk heads are loaded from their parked position. Both counts incremented by 1. (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them) I'd look for a loose power connection to the drive. Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc... It is not normal to see those counts increment in a pre-clear cycle. I will check the cables. I'm running a new US budget box build. Is there any reason the 400W power supply is not enough? Quote Link to comment
Joe L. Posted September 16, 2010 Share Posted September 16, 2010 Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss. Load_Cycle_Count is how many times the disk heads are loaded from their parked position. Both counts incremented by 1. (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them) I'd look for a loose power connection to the drive. Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc... It is not normal to see those counts increment in a pre-clear cycle. I will check the cables. I'm running a new US budget box build. Is there any reason the 400W power supply is not enough? We have no idea... How many disks do you have attached, how many are "green"? What specific 400 Watt supply? Is it a single 12 Volt rail supply? Joe L. Quote Link to comment
wsume99 Posted September 16, 2010 Share Posted September 16, 2010 I'd guess the term "bathtub curve" is the way it is described. My background is in reliability so I do know a thing or two about the bathtub curve. Many manufacturers will do ESS (Environmental Stress Screening) or HASS (Highly Accelerated Screening) on their products before shipping them. However since hard drives are a commodity I doubt that any of them do it. Maybe on their enterprise products which carry a higher price. The intent is to stress the products so that latent mtg. defects will be found prior to operational use. Essentially they truncate the front part off the bathtub curve so that all the customer is exposed to is a constant, very low (hopefully) failure rate for many years prior to climbing up the other side of the bathtub curve which would be associated with failures due to wearout. Your preclear process is essentially a form of HASS. We're stressing the drive to determine if there are any defects - the thought being that if it lasts X number of hours without failure (i.e. SMART errors) then it should last a long time. The only problem is that most companies tailor the length of their ESS or HASS tests based upon their knowledge of the bathtub curve specific to their device. Since we lack that knowledge all we can do is guess. I'd say that something like 50-100 hours would be reasonable, so the number of recommended cycles would vary depending upon the size and speed of your drive. What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail. I am a prefect example of this at the very moment. I had a 1.5TB Seagate I ran 3 preclear cycles on. The first changed some stuff so I ran one more. Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array. Yesterday something happened with that drive and now it is going back for RMA. You are right and like I said above, stressing the drive will get you into the constant, very low failure rate portion of the bathtub curve. There is no such thing as a failure rate of zero. But you can always just skip the several cycles of preclear and install your drives fresh out of the box. Let me know how that works out for you. Quote Link to comment
Joe L. Posted September 16, 2010 Share Posted September 16, 2010 I'd guess the term "bathtub curve" is the way it is described. My background is in reliability so I do know a thing or two about the bathtub curve. Many manufacturers will do ESS (Environmental Stress Screening) or HASS (Highly Accelerated Screening) on their products before shipping them. However since hard drives are a commodity I doubt that any of them do it. Maybe on their enterprise products which carry a higher price. The intent is to stress the products so that latent mtg. defects will be found prior to operational use. Essentially they truncate the front part off the bathtub curve so that all the customer is exposed to is a constant, very low (hopefully) failure rate for many years prior to climbing up the other side of the bathtub curve which would be associated with failures due to wearout. Your preclear process is essentially a form of HASS. We're stressing the drive to determine if there are any defects - the thought being that if it lasts X number of hours without failure (i.e. SMART errors) then it should last a long time. The only problem is that most companies tailor the length of their ESS or HASS tests based upon their knowledge of the bathtub curve specific to their device. Since we lack that knowledge all we can do is guess. I'd say that something like 50-100 hours would be reasonable, so the number of recommended cycles would vary depending upon the size and speed of your drive. What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail. I am a prefect example of this at the very moment. I had a 1.5TB Seagate I ran 3 preclear cycles on. The first changed some stuff so I ran one more. Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array. Yesterday something happened with that drive and now it is going back for RMA. You are right and like I said above, stressing the drive will get you into the constant, very low failure rate portion of the bathtub curve. There is no such thing as a failure rate of zero. But you can always just skip the several cycles of preclear and install your drives fresh out of the box. Let me know how that works out for you. Wow, thanks for the interesting description and insight. I know the basics, but obviously you are much more familiar with the concept of stress testing. Remember ... There are only two types of hard disks. No, not IDE and SATA The two types are.... 1. Those disks that have already crashed/failed. 2. Those disks that have not yet crashed/failed.... but will... just wait a bit longer. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.