Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Here is my hdparm info and smart test info.  Interestingly..  After the power cycle my disk 2 was "missing"  I power cycled again and it came back?

 

Now I just have to see if disk 2 is on the same controller as my new disk..

 

/dev/sdd:

ATA device, with non-removable media
        Model Number:       WDC WD15EADS-00H7B0
        Serial Number:      WD-WCAUP0018631
        Firmware Revision:  05.00K05
        Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
        Supported: 8 7 6 5
        Likely used: 8
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors: 2930277168
        device size with M = 1024*1024:     1430799 MBytes
        device size with M = 1000*1000:     1500301 MBytes (1500 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    64-bit World wide name
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12] (vendor specific)
                unknown 206[13] (vendor specific)
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
                supported: enhanced erase
        412min for SECURITY ERASE UNIT. 412min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2ad0035dd
        NAA             : 5
        IEEE OUI        : 14ee
        Unique ID       : 2ad0035dd
Checksum: correct
root@Tower:~#
root@Tower:~#
root@Tower:~# smartctl -a -d ata /dev/sdd
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD15EADS-00H7B0
Serial Number:    WD-WCAUP0018631
Firmware Version: 05.00K05
User Capacity:    1,500,301,910,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu May 14 13:33:22 2009 GMT+5
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (40500) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   139   139   051    Pre-fail  Always       -       14844
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       9
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       9
194 Temperature_Celsius     0x0022   127   121   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   195   195   000    Old_age   Always       -       1311
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Now do I have to be concerned about the "Current_Pending_Sector "  Number?  Seems like that should  be 0 for a new good drive..

 

Could a bad controller have any effect on that number?

Looks like there are a bit over 1300 sectors that were unreadable and are waiting for a subsequent "write" to relocate themselves...

Not too good for a "new" drive.  If you could ever get past the pre-read cycle, you would see those re-located as the blocks were written.

 

In any case... I'd probably RMA the drive...  It is not looking that good for a start.  (Kind of makes you wonder what is going on under MS-Windows that they don't tell you about)

 

Joe L.

Link to comment
Now do I have to be concerned about the "Current_Pending_Sector "  Number?  Seems like that should  be 0 for a new good drive..

As Joe said, that is not good!  Especially for a drive that says it has zero Power_On_Hours.  The one test you might try is a SMART long test, it is supposed to resolve those 'pending' sectors into 'remapped' or recovered.  See the bottom of the Troubleshooting page, Obtaining a SMART report section, for info on running the long SMART test.

 

Just a clarification, this is a Western Digital green drive, not a Maxtor green?

Link to comment

Where do we download preclear?

 

I was getting ready to reply, then discovered your original message had completely vanished!  A little disconcerting!

 

The script itself is at the very bottom of the very first post in this thread.  An intro and summary of it is at the Preclear Disk section of the UnRAID Add Ons wiki page.

 

For Telnet, check the Telnet page, and the FAQ, unRAID Console and Addon Questions section.  After checking these myself, I can tell I need to reorganize and 'raise' the visibility of the Telnet and PuTTY help.

Link to comment

I just added a drive to my array.  First one since this script came out, and finally got a chance to give it a try.

 

I ran 2 cycles (which took nearly 24 hours on a 1T drive).  (Maybe next time I think I'll just go with 1 cycle.)  Caused zero disruption in normal use of the array as this ran in the background.

 

Worked flawlessly

 

Drive passed all tests (no reallocated sectors or other nasties in the smartctl reports).

 

Adding drive to the array afterwards was instantaneous.  Formatting took 30 seconds or so and the drive was ready for use.  Sure beats waiting hours for unRAID to clear the disk.

 

I manually started my monthly parity check just after adding it.  It is about 10% complete with no sync errors.

 

Thanks for a very useful tool Joe L.!

Link to comment

Dumb question here..  I got got my replacement drive and it's on it's second pass. (18 hours for the first pass!)

 

Since this is my new parity drive, do I really gain anything by running this script? (other than testing the drive)  It's going to have to write all the parity anyway when I assign it, correct?

 

oh..  And how hard would it be to modify this script to e-mail me updates as the test is progressing?

 

Jim

 

 

Link to comment

Dumb question here..  I got got my replacement drive and it's on it's second pass. (18 hours for the first pass!)

 

Since this is my new parity drive, do I really gain anything by running this script? (other than testing the drive)  It's going to have to write all the parity anyway when I assign it, correct?

 

oh..  And how hard would it be to modify this script to e-mail me updates as the test is progressing?

 

Jim

 

 

The script does nothing to help the "replacement" process other than to test it and exercise it to induce any early failures.

When any existing drive is replaced in an array, the old drive's contents are written to it.   The zeros written in the pre-clearing are all overwritten.

 

You are correct, the entire drive will be written to when parity is generated...  pre-clearing to save array down-time is not an option, as there is no down-time when initially calculating parity, or when replacing the parity drive, or when replacing an existing data drive.

 

To add e-mail notifications is easy... step-by-step instructions follow  ;):

1.  Perform a functional decomposition of the source code of the preclear_disk.sh script.

2.  Identify the temporal positions where an e-mail notification would be desired.

3.  Design, develop, and code an appropriate e-mail function that will allow you to capture the screen output at the appropriate intervals... (This is the tricky part)

4.  Add the e-mail function at the desired notification points, sending it the desired content to be mailed.

5.  Install an e-mail program on your unRAID server.  (As delivered, unRAID has no e-mail capability)

6.  Configure the e-mail program to interface with your service provider and test it is able to send status e-mail.

7.  Run a series of tests, to ensure the e-mail is being routed correctly with appropriate subject lines to identify itself in your mailbox.

8.  Clear your disk.

 

Joe L.

Link to comment

[To add e-mail notifications is easy... step-by-step instructions follow  ;):

1.  Perform a functional decomposition of the source code of the preclear_disk.sh script.

2.  Identify the temporal positions where an e-mail notification would be desired.

3.  Design, develop, and code an appropriate e-mail function that will allow you to capture the screen output at the appropriate intervals... (This is the tricky part)

4.  Add the e-mail function at the desired notification points, sending it the desired content to be mailed.

5.  Install an e-mail program on your unRAID server.  (As delivered, unRAID has no e-mail capability)

6.  Configure the e-mail program to interface with your service provider and test it is able to send status e-mail.

7.  Run a series of tests, to ensure the e-mail is being routed correctly with appropriate subject lines to identify itself in your mailbox.

8.  Clear your disk.

 

Joe L.

Nice! :D 

Let me rephrase.. 

how hard would it be for you to modify this script to e-mail updates as the test is progressing :-)

 

I've already got unraid notify working..  Maybe I will dig into that..  someday..  For now I'm having to connect back into the session every couple of hours.. :-)

hmm..  redirect the output to tee and send the contents to bashmail..  hhmm...  all with variables to disable it...

 

Jim

 

Link to comment

I've already got unraid notify working..  Maybe I will dig into that..  someday..   For now I'm having to connect back into the session every couple of hours.. :-)

hmm..  redirect the output to tee and send the contents to bashmail..  hhmm...  all with variables to disable it...

 

Jim

 

I'm afraid it is not as simple as that... since it would probably send one HUGE mail message at the end, and not incremental progress as it goes...

Seriously, the script is reasonably well commented.  It invokes "display_progress" to display the progress. (Or something like that name... I don't have it in front of me)

You just need to add your e-mail call there... but be careful... it might send an e-mail every few seconds as it updates the display.  AND you have no easy way to capture the progress as it zeros the drive.  That is done in the background, and inter process signals used to get it to send its progress to its error output. (which conveniently ends up on the screen)  As I already said, that is the tricky part.  It would be a fine learning experience for a linux newbie...  Remember, all the answers to any programming questions are available at google.com  ;D

 

I have no interest in adding any e-mail alert at this time myself.  I can use the progress rate it shows and take a pretty good guess how long it will take before it will end.  A simple exercise on your calculator should let you do the same.

 

If all you want is an e-mail at the end stating it is done, you might try invoking it like this

preclear_disk.sh /dev/xxx && ( echo "finished" | mail -s "clear finished" )

 

Joe L.

Link to comment

Remember, all the answers to any programming questions are available at google.com  ;D

So you are saying you are also known as google.com? :D

I have no interest in adding any e-mail alert at this time myself.  I can use the progress rate it shows and take a pretty good guess how long it will take before it will end.  A simple exercise on your calculator should let you do the same.

Awww..  come on!  Where's your thirst for adventure!  Think of it as a challenge!  I dare you to try! :)  just kidding!  Yeah..  Adding something to display_progress() function might be easy enough..  I'll tinker once I have some time..  but my guess is that I'll be done with the parity upgrade and forget about this until the next time!  ::)

 

Oh..  and it is well commented..   good job! 8)

Link to comment

Whew!  56 hours of testing later, my replacement 1.5T disk passed 3 cycles!

 

Thanks again for a great script!  It helped catch the first disk being bad!!

 

 

===========================================================================
=                unRAID server Pre-Clear disk /dev/sdd
=                       cycle 3 of 3
= Disk Pre-Clear-Read completed                                 DONE
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Testing if the clear has been successful.     DONE
= Disk Post-Clear-Read completed                                DONE
Elapsed Time:  56:18:58
============================================================================
==
== Disk /dev/sdd has been successfully precleared
==
============================================================================
S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
19,20c19,20
< Offline data collection status:  (0x80)       Offline data collection activity
<                                       was never started.
---
> Offline data collection status:  (0x84)       Offline data collection activity
>                                       was suspended by an interrupting command from host.
54c54
<   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
---
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
============================================================================

Link to comment

Whew!  56 hours of testing later, my replacement 1.5T disk passed 3 cycles!

 

Thanks again for a great script!  It helped catch the first disk being bad!!

Aw... you should have tried 20 cycles...  as you said "come on!  Where's your thirst for adventure!  Think of it as a challenge!"

 

Really, I'm happy the replacement drive is working much better than the original...  I think you are ready to add it to your array.

 

Joe L.

Link to comment

 

Aw... you should have tried 20 cycles...  as you said "come on!  Where's your thirst for adventure!  Think of it as a challenge!"

The thirst for adventure is outweighed by my thirst for more disk space! :)  My cache drive is filling up because there is no room left on the array! :D

Link to comment

Thanks for this very useful script.

 

I just finished preclearing a suspect 1.5TB Seagate drive (SD37 -- firmware is not one of the bad ones). I had problems with this drive and was going to RMA it, but I now believe I had two bad USB enclosures, as other drives put in these enclosures had errors, but outside the enclosure were fine.

 

So I ran the SeaTools Long Test with the drive on an internal SATA connection, which passed, then I figured I'd preclear it to see if it would be OK to become a new parity disk.

 

Here are the results: http://pastebin.com/f12c86c35

 

From reading this forum, I think that the fact is has a 0 reallocated sector count is a good thing? The High Fly Writes is increasing, but it has the entire 1.5TB written to twice, and I've seen higher numbers than this reported here.

 

Is there anything else here you think I should be concerned about?

 

Thanks for your help,

 

Neil.

Link to comment

Thanks for this very useful script.

 

I just finished preclearing a suspect 1.5TB Seagate drive (SD37 -- firmware is not one of the bad ones). I had problems with this drive and was going to RMA it, but I now believe I had two bad USB enclosures, as other drives put in these enclosures had errors, but outside the enclosure were fine.

 

So I ran the SeaTools Long Test with the drive on an internal SATA connection, which passed, then I figured I'd preclear it to see if it would be OK to become a new parity disk.

 

Here are the results: http://pastebin.com/f12c86c35

 

From reading this forum, I think that the fact is has a 0 reallocated sector count is a good thing? The High Fly Writes is increasing, but it has the entire 1.5TB written to twice, and I've seen higher numbers than this reported here.

 

Is there anything else here you think I should be concerned about?

 

Thanks for your help,

 

Neil.

Looks pretty good to me...  I'd use it in the array.    zero relocated sectors is a good thing...

 

When you do add it as the parity drive, the array will read and write to it once more, as it calculates parity on the array and writes it to the new parity drive.

 

Joe L.

Link to comment
  • 2 weeks later...

First off... I love this utility.  I've used this while adding the last 3 - 1TB drives to my server and it's great to not have the hours of downtime.

 

I just added another disk to my system yesterday and the preclear utility finished with the following message.  I don't remember ever seeing this before.  Can someone tell me if this looks like anything to worry about:

 

============================================================================
S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
19,20c19,20
< Offline data collection status:  (0x80)       Offline data collection activity
<                                       was never started.
---
> Offline data collection status:  (0x84)       Offline data collection activity
>                                       was suspended by an interrupting command from host.
54c54
<   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
---
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
============================================================================

 

I've attached a copy of the preclear syslog entries.

Link to comment

Short answer - nothing at all is wrong.

 

It is a good illustration of why I think that the value of 253 represents 'Un-initialized' or 'Unused yet'.  I have not seen anything in the SMART docs or literature yet about this, but there does seem to be a quiet convention that if a variable has not yet been used, it is given the 'factory installed' value of 253.  I think most or all of the drive manufacturers do this, although none of them do it in a standard or completely consistent way.

 

You will notice in your example that the WORST value is 253, and I have come to see that as the originally installed value, which is then set to a true initial value by whatever the particular programmer of that part of the firmware decides to use, on the first actual usage of that value.  In your case, he (or his internal docs) had decided to initialize Raw_Read_Error_Rate as 200, on its actual first internal usage.  These are incredibly inconsistent between manufacturers.  Some of these line items are only set when offline testing occurs, while others may be partly initialized, but never actually set until a relevant event occurs, and others are updated all of the time.

Link to comment
  • 2 weeks later...

Hello!  I just got my new LC server from Tom!  Unfortunately, I am having problems with the preclear script.  I have 4 x 1TB Samsung F1 HD103UJ drives.  The parity disk (sdb) and Disk2 (sdc) both froze at 88% (Disk2 was done after a reboot of the server).  Disk3 (sde) reported the following SMART error differences:

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

77,78c77,78

< 200 Multi_Zone_Error_Rate  0x000a  253  253  000    Old_age  Always

-      0

< 201 Soft_Read_Error_Rate    0x000a  253  253  000    Old_age  Always

-      0

---

> 200 Multi_Zone_Error_Rate  0x000a  100  100  000    Old_age  Always

-      0

> 201 Soft_Read_Error_Rate    0x000a  100  100  000    Old_age  Always

-      0

============================================================================

 

Are all of these drives bad?

 

I've also attached my syslog.

 

Thanks.

Link to comment

The script has frozen twice at 88% at the post-read progress.  The sdc drive has been at 88% for a few hours now and sdb drive did the same thing when I ran the preclear script on it 1-2 days ago.  Here is what I see in the telnet window right now:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 88% complete.

(  888,330,240,000  of  1,000,204,886,016  bytes read )

Elapsed Time:  8:31:39

 

 

______________

 

Unfortunately, it doesn't proceed after this point.

 

BTW, I am using version .9.1 of the preclear script.

Link to comment

The script has frozen twice at 88% at the post-read progress.  The sdc drive has been at 88% for a few hours now and sdb drive did the same thing when I ran the preclear script on it 1-2 days ago. 

This happened to me several times with WD10EADS drives.  I don't know why.  I have probably ran preclear_disk against 5 different WD10EADS drives maybe 50 times (I used it as a burn-in to test all slots on my Norco 2020).  I would say about 10% of the time it would hang at the exact same place.  This happened on different drives and appeared random.  If I ran it again, it would typically complete successfully.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.