Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

  • 2 weeks later...

I just finished a the preclear script of 1 drive and had some questions.  I was preclearing this drive, and another simultaneously.  I get the message about "suspended by an offline command from host" and was just making sure everything is ok.  I also attached smartctl report below, taken just after preclear finished.

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdd

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 42C, Elapsed Time:  16:54:58

============================================================================

==

== Disk /dev/sdd has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

19,20c19,20

< Offline data collection status:  (0x82)      Offline data collection activity

<                                      was completed without error.

---

> Offline data collection status:  (0x84)      Offline data collection activity

>                                      was suspended by an interrupting command

from host.

============================================================================

root@Scour:/boot#

 

Device Model:    WDC WD1001FALS-00K1B0

Serial Number:    WD-WMATV1553669

Firmware Version: 05.00K05

User Capacity:    1,000,204,886,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Sep  8 16:03:21 2010 SGT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 221) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  229  226  021    Pre-fail  Always      -      8516

  4 Start_Stop_Count        0x0032  098  098  000    Old_age  Always      -      2172

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  093  093  000    Old_age  Always      -      5172

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      773

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      123

193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      2172

194 Temperature_Celsius    0x0022  108  102  000    Old_age  Always      -      42

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Thanks Orbi,

 

What I'm also concerned about is, "Offline data collection activity was suspended by an interrupting command".

 

What is this about?

Many disks do their own tests when not being accessed.  The "short" and "long" tests often mentioned are two types of those tests.  Some drives also re-callibrate themselves every so often.  These are all considered off-line tests.    The message simply indicates the off-line test was terminated when the drive was accessed.  You can ignore it.  You would get the same type of message if a long test was requested and you asked the disk to spin-down.

 

Joe L.

Link to comment

First off, thanks again to Joe for a great script!

 

I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered).   I precleared them simultaneously using separate screen sessions.  They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning.  I gather from reading posts in the forum about it, I shouldn't need to worry about this.

 

I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though:

Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

 

Are my worries about this drive warranted?

Link to comment

First off, thanks again to Joe for a great script!

 

I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered).   I precleared them simultaneously using separate screen sessions.  They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning.  I gather from reading posts in the forum about it, I shouldn't need to worry about this.

 

I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though:

Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

 

Are my worries about this drive warranted?

 

Well that depends. Any sectors pending reallocation is not good. But it's ok as long as there not thausands of them and they do not increase over time. So what you should do it to run a few more times the preclear script and check if there will be an increase in "5 Reallocated_Sector_Ct " and the pending sectors. After you run the next preclear, the 17 pending sectors should have been reallocated and appear in "5 Reallocated_Sector_Ct ".

 

If there is no increase in pending sectors, you'll probably be fine. Just keep an eye on it from time to time by running some SMART tests while it's in use in your array.

Link to comment

First off, thanks again to Joe for a great script!

 

I just precleared 2 WD20EARS drives (with pins 7 and 8 jumpered).   I precleared them simultaneously using separate screen sessions.  They both successfully precleared, but gave me the "Offline data collection activity was suspended by an interrupting command from host" warning.  I gather from reading posts in the forum about it, I shouldn't need to worry about this.

 

I am a little worried about the Raw_Read_Error_Rate and Current_Pending_Sector values for one of the drives though:

Sep 12 14:26:23 Tower preclear_disk-finish[30093]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 1 Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 32
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
Sep 12 14:26:23 Tower preclear_disk-finish[30093]: 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

 

Are my worries about this drive warranted?

 

Well that depends. Any sectors pending reallocation is not good. But it's ok as long as there not thausands of them and they do not increase over time. So what you should do it to run a few more times the preclear script and check if there will be an increase in "5 Reallocated_Sector_Ct " and the pending sectors. After you run the next preclear, the 17 pending sectors should have been reallocated and appear in "5 Reallocated_Sector_Ct ".

 

If there is no increase in pending sectors, you'll probably be fine. Just keep an eye on it from time to time by running some SMART tests while it's in use in your array.

The statement is true... but remember this... those pending re-allocation must have been detected during the post-read phase.  

 

If they were detected in the pre-read phase, they should have already been re-allocated when the zeros were written to the drive.

 

Therefore, the advice given is sound.  Run the pre-clear script several more times on that drive.  I realize it will take a while, but it is a LOT easier to get the drive replaced now it is continues to have errors than after it is in your array and it holds your data.

 

As far as "raw read error rate" the raw values for that attribute are usable only by the manufacturer.  We can only look at the "normalized" values.

Raw_Read_Error_Rate 0x002f 197 195 051 Pre-fail Always - 291

The "worst" value was 195, the failure threshold is 51.  You are nowhere close to "failing"  All drives have raw-read-errors, they all re-try.  Some report it on the smart report, others do not.  The raw-read-error does not mean the "read" failed, just that it had to re-try.

 

Joe L.

Link to comment

I don't think I've seen this question answered yet in this thread...

 

How many times should preclear be run on a new drive?

 

Should I run it once and if nothing unusual pops up in the SMART reports then stasrt using it.  If the drive is going to be problematic does it usually show signs from the very beginning or are more cycles needed to weed out the bad drives before putting them into service?

Link to comment

I don't think I've seen this question answered yet in this thread...

 

How many times should preclear be run on a new drive?

 

Should I run it once and if nothing unusual pops up in the SMART reports then stasrt using it.  If the drive is going to be problematic does it usually show signs from the very beginning or are more cycles needed to weed out the bad drives before putting them into service?

Good question...  To know how long to run a drive before "early" problems are discovered would be a whole research topic in itself.  I'd guess the term "bathtub curve" is the way it is described. 

 

If you can get a full pre-clear cycle on a modern disk and it shows no problems, then you've run it fairly hard for 20 to 30 hours or so.  It is a longer burn-in period than most manufacturers will provide.   

 

If you have the time and do not need the disk in the array immediately, go ahead and do another cycle.  If no problems showed, and you need the disk in the array now, assign it and start using it.    I usually run several cycles on my disks if I'm not pressed for time.

 

I would guess many do not do multiple cycles, but it was a requested feature so I even added a command line option for it 

preclear_disk.sh -c N /dev/sdX

where N is a number from 1 through 20.

 

Joe L.

Link to comment

Good question...  To know how long to run a drive before "early" problems are discovered would be a whole research topic in itself.  I'd guess the term "bathtub curve" is the way it is described. 

What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail.  I am a prefect example of this at the very moment.

 

I had a 1.5TB Seagate I ran 3 preclear cycles on.  The first changed some stuff so I ran one more.  Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array.  Yesterday something happened with that drive and now it is going back for RMA.

Link to comment

HITACHI Deskstar 7K2000 HDS722020ALA330 (0F10311) 2TB 7200 RPM SATA 3.0Gb/s

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 35C, Elapsed Time:  27:12:55

============================================================================

==

== Disk /dev/sdc has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

61,62c61,62

< 192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      481

< 193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      481

---

> 192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      482

> 193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      482

============================================================================

root@192:/boot#

 

I'm curious about the differences at the end. What does this indicate?

 

Link to comment

Model  WDC WD2500JB-32FUA0

Series  Caviar SE

Interface  IDE Ultra ATA100

Capacity  250GB

RPM  7200 RPM

Cache  8MB

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/hdc

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 38C, Elapsed Time:  7:44:38

============================================================================

==

== Disk /dev/hdc has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

20,21c20,21

< Offline data collection status:  (0x82)      Offline data collection activity

<                                      was completed without error.

---

> Offline data collection status:  (0x84)      Offline data collection activity

>                                      was suspended by an interrupting command from host.

56c56

<  7 Seek_Error_Rate        0x000b  200  200  051    Pre-fail  Always      -      0

---

>  7 Seek_Error_Rate        0x000b  100  253  051    Pre-fail  Always      -      0

============================================================================

root@192:/boot#

 

 

 

What does the change in SMART data indicate?

Link to comment

Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss.

Load_Cycle_Count is how many times the disk heads are loaded from their parked position.

 

Both counts incremented by 1.  (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them)

I'd look for a loose power connection to the drive.  Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc...

 

It is not normal to see those counts increment in a pre-clear cycle.

 

 

 

 

 

Link to comment

Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss.

Load_Cycle_Count is how many times the disk heads are loaded from their parked position.

 

Both counts incremented by 1.  (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them)

I'd look for a loose power connection to the drive.  Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc...

 

It is not normal to see those counts increment in a pre-clear cycle.

 

 

 

I will check the cables.

I'm running a new US budget box build. Is there any reason the 400W power supply is not enough?

Link to comment

Power-Off_Retract_Count is usually the count of how many times the disk heads were retracted in a power loss.

Load_Cycle_Count is how many times the disk heads are loaded from their parked position.

 

Both counts incremented by 1.  (as if the disk lost power, it retracted the disk heads, then power was restored, and it re-loaded them)

I'd look for a loose power connection to the drive.  Or, it could be your power supply is not up to the task, or you have a bad splitter, back-plane, card-rack, etc...

 

It is not normal to see those counts increment in a pre-clear cycle.

 

 

 

I will check the cables.

I'm running a new US budget box build. Is there any reason the 400W power supply is not enough?

We have no idea...  How many disks do you have attached, how many are "green"? What specific 400 Watt supply?  Is it a single 12 Volt rail supply?

 

Joe L.

Link to comment
I'd guess the term "bathtub curve" is the way it is described.

My background is in reliability so I do know a thing or two about the bathtub curve.  Many manufacturers will do ESS (Environmental Stress Screening) or HASS (Highly Accelerated Screening) on their products before shipping them.  However since hard drives are a commodity I doubt that any of them do it.  Maybe on their enterprise products  which carry a higher price.  The intent is to stress the products so that latent mtg. defects will be found prior to operational use.  Essentially they truncate the front part off the bathtub curve so that all the customer is exposed to is a constant, very low (hopefully) failure rate for many years prior to climbing up the other side of the bathtub curve which would be associated with failures due to wearout.  Your preclear process is essentially a form of HASS.  We're stressing the drive to determine if there are any defects - the thought being that if it lasts X number of hours without failure (i.e. SMART errors) then it should last a long time.  The only problem is that most companies tailor the length of their ESS or HASS tests based upon their knowledge of the bathtub curve specific to their device.  Since we lack that knowledge all we can do is guess.  I'd say that something like 50-100 hours would be reasonable, so the number of recommended cycles would vary depending upon the size and speed of your drive.

 

What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail.  I am a prefect example of this at the very moment.

 

I had a 1.5TB Seagate I ran 3 preclear cycles on.  The first changed some stuff so I ran one more.  Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array.  Yesterday something happened with that drive and now it is going back for RMA.

You are right and like I said above, stressing the drive will get you into the constant, very low failure rate portion of the bathtub curve.  There is no such thing as a failure rate of zero.  But you can always just skip the several cycles of preclear and install your drives fresh out of the box.  Let me know how that works out for you. ;)

Link to comment

I'd guess the term "bathtub curve" is the way it is described.

My background is in reliability so I do know a thing or two about the bathtub curve.  Many manufacturers will do ESS (Environmental Stress Screening) or HASS (Highly Accelerated Screening) on their products before shipping them.  However since hard drives are a commodity I doubt that any of them do it.  Maybe on their enterprise products  which carry a higher price.  The intent is to stress the products so that latent mtg. defects will be found prior to operational use.  Essentially they truncate the front part off the bathtub curve so that all the customer is exposed to is a constant, very low (hopefully) failure rate for many years prior to climbing up the other side of the bathtub curve which would be associated with failures due to wearout.  Your preclear process is essentially a form of HASS.  We're stressing the drive to determine if there are any defects - the thought being that if it lasts X number of hours without failure (i.e. SMART errors) then it should last a long time.  The only problem is that most companies tailor the length of their ESS or HASS tests based upon their knowledge of the bathtub curve specific to their device.  Since we lack that knowledge all we can do is guess.  I'd say that something like 50-100 hours would be reasonable, so the number of recommended cycles would vary depending upon the size and speed of your drive.

 

What do we always say... It's not a matter of if the drive will fail, it is a matter of when the drive will fail.  I am a prefect example of this at the very moment.

 

I had a 1.5TB Seagate I ran 3 preclear cycles on.  The first changed some stuff so I ran one more.  Nothing changed with this one, but I wanted to make sure that nothing else funky was going to happen so I ran one more preclear cycle. Nothing change between 2 and 3 so i added the drive to the array.  Yesterday something happened with that drive and now it is going back for RMA.

You are right and like I said above, stressing the drive will get you into the constant, very low failure rate portion of the bathtub curve.  There is no such thing as a failure rate of zero.  But you can always just skip the several cycles of preclear and install your drives fresh out of the box.  Let me know how that works out for you. ;)

Wow, thanks for the interesting description and insight.  I know the basics, but obviously you are much more familiar with the concept of stress testing.

 

Remember ... There are only two types of hard disks.

 

No, not IDE and SATA

 

The two types are....

      1. Those disks that have already crashed/failed.

      2. Those disks that have not yet crashed/failed.... but will... just wait a bit longer.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.