Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

It sure might be a candidate for return, but they might not take it if their utility does not indicate it is over their failure "threshold"

I've returned several drives to Seagate with the advance replacement option and never had a return questioned, some in this same category, where Seatools passed the drive.

 

Link to comment

Joe,

 

Thanks for taking the time to respond.  I ran the original script once more and it seems to have gotten farther:

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 99% complete. 

(  1,497,000,960,000  of  1,500,301,910,016  bytes read )

Elapsed Time:  9:33:36

 

 

 

 

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  9:34:27

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x000f  103  099  006    Pre-fail  Always      -      42345792

---

>  1 Raw_Read_Error_Rate    0x000f  111  099  006    Pre-fail  Always      -      37597902

57,58c57,58

<  5 Reallocated_Sector_Ct  0x0033  096  096  036    Pre-fail  Always      -      187

<  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      320916

---

>  5 Reallocated_Sector_Ct  0x0033  096  096  036    Pre-fail  Always      -      191

>  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      333592

62c62

< 187 Reported_Uncorrect      0x0032  076  076  000    Old_age  Always      -      24

---

> 187 Reported_Uncorrect      0x0032  070  070  000    Old_age  Always      -      30

64,68c64,68

< 189 High_Fly_Writes        0x003a  075  075  000    Old_age  Always      -      25

< 190 Airflow_Temperature_Cel 0x0022  081  067  045    Old_age  Always      -      19 (Lifetime Min/Max 18/25)

< 195 Hardware_ECC_Recovered  0x001a  037  037  000    Old_age  Always     

< 197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      4

< 198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      4

---

> 189 High_Fly_Writes        0x003a  048  048  000    Old_age  Always      -      52

> 190 Airflow_Temperature_Cel 0x0022  078  067  045    Old_age  Always      -      22 (Lifetime Min/Max 18/25)

> 195 Hardware_ECC_Recovered  0x001a  051  037  000    Old_age  Always     

> 197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

> 198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

72c72

< ATA Error Count: 24 (device log contains only the most recent five errors)

---

> ATA Error Count: 30 (device log contains only the most recent five errors)

87c87

< Error 24 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

---

> Error 30 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)

98,102c98,102

<  60 00 00 ff ff ff 4f 00      06:55:09.851  READ FPDMA QUEUED

<  27 00 00 00 00 00 e0 02      06:55:09.831  READ NATIVE MAX ADDRESS EXT

<  ec 00 00 00 00 00 a0 02      06:55:09.811  IDENTIFY DEVICE

<  ef 03 46 00 00 00 a0 02      06:55:09.791  SET FEATURES [set transfer mode]

<  27 00 00 00 00 00 e0 02      06:55:09.771  READ NATIVE MAX ADDRESS EXT

---

>  60 00 00 ff ff ff 4f 00      18:39:43.313  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      18:39:43.293  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      18:39:43.273  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      18:39:43.253  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      18:39:43.233  READ NATIVE MAX ADDRESS EXT

104c104

< Error 23 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

---

> Error 29 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)

115,119c115,119

<  60 00 00 ff ff ff 4f 00      06:55:06.474  READ FPDMA QUEUED

<  27 00 00 00 00 00 e0 02      06:55:06.454  READ NATIVE MAX ADDRESS EXT

<  ec 00 00 00 00 00 a0 02      06:55:06.434  IDENTIFY DEVICE

<  ef 03 46 00 00 00 a0 02      06:55:06.414  SET FEATURES [set transfer mode]

<  27 00 00 00 00 00 e0 02      06:55:06.394  READ NATIVE MAX ADDRESS EXT

---

>  60 00 00 ff ff ff 4f 00      18:39:39.826  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      18:39:39.806  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      18:39:39.786  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      18:39:39.766  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      18:39:39.746  READ NATIVE MAX ADDRESS EXT

121c121

< Error 22 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

---

> Error 28 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)

132,136c132,136

<  60 00 00 ff ff ff 4f 00      06:55:02.987  READ FPDMA QUEUED

<  27 00 00 00 00 00 e0 02      06:55:02.967  READ NATIVE MAX ADDRESS EXT

<  ec 00 00 00 00 00 a0 02      06:55:02.947  IDENTIFY DEVICE

<  ef 03 46 00 00 00 a0 02      06:55:02.927  SET FEATURES [set transfer mode]

<  27 00 00 00 00 00 e0 02      06:55:02.907  READ NATIVE MAX ADDRESS EXT

---

>  60 00 00 ff ff ff 4f 00      18:39:36.419  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      18:39:36.399  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      18:39:36.379  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      18:39:36.359  SET FEATURES [set transfer mode]

>  27 00 00 00 00 00 e0 02      18:39:36.339  READ NATIVE MAX ADDRESS EXT

138c138

< Error 21 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

---

> Error 27 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)

149,153c149,153

<  60 00 00 ff ff ff 4f 00      06:54:59.692  READ FPDMA QUEUED

<  60 00 00 ff ff ff 4f 00      06:54:59.690  READ FPDMA QUEUED

<  27 00 00 00 00 00 e0 02      06:54:59.670  READ NATIVE MAX ADDRESS EXT

<  ec 00 00 00 00 00 a0 02      06:54:59.650  IDENTIFY DEVICE

<  ef 03 46 00 00 00 a0 02      06:54:59.630  SET FEATURES [set transfer mode]

---

>  60 00 00 ff ff ff 4f 00      18:39:33.033  READ FPDMA QUEUED

>  60 00 00 ff ff ff 4f 00      18:39:33.032  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      18:39:33.012  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      18:39:32.992  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      18:39:32.972  SET FEATURES [set transfer mode]

155c155

< Error 20 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)

---

> Error 26 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)

166,170c166,170

<  60 00 00 ff ff ff 4f 00      06:54:56.314  READ FPDMA QUEUED

<  60 00 00 ff ff ff 4f 00      06:54:56.313  READ FPDMA QUEUED

<  27 00 00 00 00 00 e0 02      06:54:56.293  READ NATIVE MAX ADDRESS EXT

<  ec 00 00 00 00 00 a0 02      06:54:56.273  IDENTIFY DEVICE

<  ef 03 46 00 00 00 a0 02      06:54:56.253  SET FEATURES [set transfer mode]

---

>  60 00 00 ff ff ff 4f 00      18:39:29.676  READ FPDMA QUEUED

>  60 00 00 ff ff ff 4f 00      18:39:29.675  READ FPDMA QUEUED

>  27 00 00 00 00 00 e0 02      18:39:29.655  READ NATIVE MAX ADDRESS EXT

>  ec 00 00 00 00 00 a0 02      18:39:29.635  IDENTIFY DEVICE

>  ef 03 46 00 00 00 a0 02      18:39:29.615  SET FEATURES [set transfer mode]

============================================================================

root@Tower:/boot#

 

 

 

But with the increasing number of S.M.A.R.T. errors, I will go ahead and send this back.

 

 

Jimwhite,

 

I purposely go this from Amazon for their no hassle return policy.  I will request an RMA and replacement drive (new).  In my experience, replacement drives have been refurbished and I have not had such good luck with them.  Has your experience differed?

 

Regards,  Peter

Link to comment

Joe,

 

Thanks for taking the time to respond.  I ran the original script once more and it seems to have gotten farther:

 

<snip>

 

But with the increasing number of S.M.A.R.T. errors, I will go ahead and send this back.

I noticed an increasing number of reallocated sectors, and still more pending reallocation. 

I would RMA the drive...  It is marginal at best, and a prime candidate for problems as it gets older.  Since errors continue to occur there is no way to know if they will ever stop.

 

If you have not removed the diisk yet from your array, please download and use the newer version of the preclear_disk.sh script on it. 

Your disk is a perfect test case for me.  (you will help me, and others who follow)

 

Joe L.

Link to comment

Joe,

 

No problem. I was just in the process of shipping it back but will take it out of the box and run it through again overnight with the new script and report back.  On a side note, dealing with Amazon is very gratifying.  They issued a free return label via UPS and are sending me new drive via Fedex overnight.

 

Regards,  Peter

Link to comment

Joe,

 

Here are the latest results:

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Post-Read in progress: 99% complete. 

(  1,500,299,297,280  of  1,500,301,910,016  bytes read )

Elapsed Time:  12:26:13

 

 

 

 

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  12:26:13

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<   1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       37608068

---

>   1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       35710728

58c58

<   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       387419

---

>   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       404283

64,66c64,66

< 189 High_Fly_Writes         0x003a   048   048   000    Old_age   Always       -       52

< 190 Airflow_Temperature_Cel 0x0022   079   067   045    Old_age   Always       -       21 (Lifetime Min/Max 17/21)

< 195 Hardware_ECC_Recovered  0x001a   037   037   000    Old_age   Always       

---

> 189 High_Fly_Writes         0x003a   006   006   000    Old_age   Always       -       94

> 190 Airflow_Temperature_Cel 0x0022   079   067   045    Old_age   Always       -       21 (Lifetime Min/Max 17/23)

> 195 Hardware_ECC_Recovered  0x001a   051   037   000    Old_age   Always       

============================================================================

 

I'm not sure but the results seem to be settling down a bit?  The individual errors no longer appear.  Does that mean they no longer exist or that they are not being displayed?  Should I run anything else while I still have the drive?  I will ship it off on monday morning...

 

Regards,  Peter

Link to comment

I'm not sure but the results seem to be settling down a bit?  The individual errors no longer appear.  Does that mean they no longer exist or that they are not being displayed?  Should I run anything else while I still have the drive?  I will ship it off on monday morning...

 

Regards,  Peter

You are correct... the number of relocated sectors did not change from the last pre-clear cycle.  You can be sure, the bad sectors are still there, the preclear script just does not show them when it displays the "differences" between the SMART report it takes at the start of the cycle, and the SMART report it takes at the end of the pre-clear. 

 

In your previous test to this, at the beginning there were 187 reallocated sectors, and 4 sectors pending reallocation. At the end, there were 191 reallocated sectors, and 0 pending re-allocation.    It re-affirms what the script was designed to do, to identify the un-readable sectors during its "read" phases, and allowing the SMART software to re-map them during its writing of zeros to the drive.  Although it might have found all the currently un-readable sectors,  the "High-Fly-Writes are still incrementing.  It my understanding they are not a sign of a healthy drive.  If you want to see the full SMART reports, they are also saved in the syslog...

 

I feel pretty good in that the script has proven its worth, both in allowing you to add a drive to your array with minimal down-time, and as in this case, allowing you to burn-in a drive before you add it to your array and get it replaced if it shows signs of an early failure.  In my array I've got a very old 250Gig drive, way out of warranty that has 100 Sectors reallocated.  That number has not changed since I started running it through pre-clear cycles, so I'm guessing it will last until I eventually replace it with a larger disk.  I don't have the luxury of returning it for replacement. 

 

Send the drive in for replacement... It is not one you want to start out with in your array.  Thanks for running it through another test cycle for me.

 

Joe L.

Link to comment

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdd

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 19% complete. 

(  49,351,680,000  of  250,059,350,016  bytes read )

Elapsed Time:  2:52:47

 

This is for a 3 year old WD 250gig SATA2.  A 500 gig is currently running as well.

Link to comment

 

Send the drive in for replacement... It is not one you want to start out with in your array.   Thanks for running it through another test cycle for me.

 

Joe L.

 

Drive went back today.  Replacement should be here Monday.  Hopefully I will fare better with the new one.  I have a few worries though.  I wonder if this is typical for this new series of drives.  As you can see from reviews and other user reports, there have been quite a few problems encountered.  I am currently using one as my parity drive.  Though its labeling indicates that it is not in the affected batch with firmware problems, I need to check it out anyway.  I hope the new drive coming on Monday will test fine.  Assuming that it does, I will replace the parity drive with the new one and then run your tool on the older one to verify its health.

 

On a different note, what resources does this tool consume the most?  I ask because while I was running the test, I noticed that the performance of the array was visibly slower.  Copying files to the cache drive as well as moving from cache to the array seem to have a drop in performance.  I think reading from the array was affected as well.  I am running a Celeron 2.0gHz (single core) with 3gb of ram.  All drives are on either the on board SATA connections or Adaptec PCI Express SATA cards.  What might I do to restore the performance?

 

Thanks again for developing this script.  I sleep better knowing that drives that pass the test are indeed more reliable.

 

Regards,  Peter

Link to comment

 

Send the drive in for replacement... It is not one you want to start out with in your array.   Thanks for running it through another test cycle for me.

 

Joe L.

 

Drive went back today.  Replacement should be here Monday.  Hopefully I will fare better with the new one.  I have a few worries though.  I wonder if this is typical for this new series of drives.  As you can see from reviews and other user reports, there have been quite a few problems encountered.  I am currently using one as my parity drive.  Though its labeling indicates that it is not in the affected batch with firmware problems, I need to check it out anyway.  I hope the new drive coming on Monday will test fine.  Assuming that it does, I will replace the parity drive with the new one and then run your tool on the older one to verify its health.

That sounds like a very good plan.

On a different note, what resources does this tool consume the most?  I ask because while I was running the test, I noticed that the performance of the array was visibly slower.  Copying files to the cache drive as well as moving from cache to the array seem to have a drop in performance.  I think reading from the array was affected as well.  I am running a Celeron 2.0gHz (single core) with 3gb of ram.  All drives are on either the on board SATA connections or Adaptec PCI Express SATA cards.  What might I do to restore the performance?

While it is running it uses bandwidth to and from the disks on whatever BUS they are on.  It basically is sitting waiting on the disk to respond most of the time.  Other than that, because it is reading and writing so much it used memory from the buffer cache it makes less memory available to other processes.  Now, as soon as it stops running those blocks of memory in the buffer cache will be re-used by Linux once they become the "least-recently-used" blocks. (the first therefore to be re-used)

 

To restore performance, stop running the script... it is as simple as that.  It uses no resources when not running.  Think of it this way, a high bit-rate HD movie might have a bit rate of 35MB/s.  This script is reading and writing to disks at twice that speed.  (75MB/s or so)   It really puts a strain on the disk... probably even more than a parity check (on a single disk)  In fact, part of the "read" routine is to perform 5 read requests in parallel...moving the disk head randomly all over the disk.  That situation is much more difficult on the disk hardware than simply playing a linear set of blocks of a movie. (but then I am trying to determine marginal hardware issues...before you add the disk to the array)

Thanks again for developing this script.  I sleep better knowing that drives that pass the test are indeed more reliable.

 

Regards,  Peter

You are welcome.  Passing this test does not guaranty the disk will not crash the following week, it could happen.  It is less likely to crash, at least in my mind, if it does pass.  I initially wrote this routiine to add drives more quickly to the array and minimize down time.  (amazing how dependent upon the server my wife has become... it is *way* more convenient to watch all the holiday movies) If not pre-cleared, I'd be facing 4 or more hours downtime as a disk is cleared.    The ability to use it the same routine to burn-in a drive was a natural addition.

 

Joe L.

Link to comment

There's a command called IONICE which might help in controlling priority of the DD's.

 

http://linux.die.net/man/1/ionice

 

In my rsyncmv scripts I do

 

/usr/bin/nice -19 /usr/bin/ionice -c3 rsync -avP --bwlimit=${BWLIMIT:=6400} --remove-sent-files "$@" ${DIR}

 

This has the effect of lowering the priorty of the rsync to a level where it does not affect my rtorrent or any streaming of the system.

 

Link to comment
  • 2 weeks later...

Hi Joel,

 

Is there any procedure to run the script with a brand new drive? As to mount the drive first or any other thing I need to do from the unRAID management WEB interface?

 

I have just receive my new 1TB drive and want to use your script to break-in the drive before I put it into my system as the parity drive. Currently I have one 1 TB drive and a few 500 G drives in my system. Since it is supposed to be a faster drive, I want it to be my parity drive. Now I put the hard drive in an enclosure and connect to e-SATA port on my unRAID server. Here is my current drive status:

 

root@Tower:/boot/download# mount

fusectl on /sys/fs/fuse/connections type fusectl (rw)

usbfs on /proc/bus/usb type usbfs (rw)

/dev/sdg1 on /boot type vfat (rw,umask=066,shortname=mixed)

/dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime)

/dev/md1 on /mnt/disk1 type reiserfs (rw,noatime,nodiratime)

/dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime)

/dev/md6 on /mnt/disk6 type reiserfs (rw,noatime,nodiratime)

/dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime)

/dev/md2 on /mnt/disk2 type reiserfs (rw,noatime,nodiratime)

shfs on /mnt/user type fuse.shfs (rw,nosuid,nodev)

 

Thanks,

--Tom

Link to comment

Hi Joel,

 

Is there any procedure to run the script with a brand new drive? As to mount the drive first or any other thing I need to do from the unRAID management WEB interface?

Thanks,

--Tom

Nope, no special procedure.. in fact if the drive is mounted in any way the pre-clear script will refuse to run on it.

 

You do need to know its proper device name, and that is it. 

 

Before you pre-clear it you just need to physically connect it to your server, but DO NOT assign it to your array.

 

The pre-clear will allow you to burn in a drive.  I'd try it for 1 cycle first, then for a few more.  Before you do, make sure the "smartctl" program is functional on your version of unRAID.  (A library needed for it to work is missing on the 4.4 and 4.5beat versions, but can easily be added once it is downloaded)  A 1TB drive might take between 6 and 10 hours to pre-read/clear/post-read for 1 cycle, depending on the speed of your array.

 

The smartctl program is not needed to burn-in a drive, but it is the only way to know if the burn-in turns up any issues.

 

Joe L.

Link to comment

Hi Joe,

 

Thanks for your prompt reply. How do I know the correct device name?

 

--Tom

At the command prompt type:

ls -l /dev/disk/by-id

 

A listing of all your drives will appear looking like the listing below.  The "device" is at the very end of the line.  A device that is brand new is likely to only have one entry matching its model/serial number.  A disk that has been partitioned will have an additional entry per partition in the listing with a trailing "1" on its device name.  You want to use the base device name, not the name of the first partition or subsequent partition.

 

If the last field on the line matching your model/serial number is ../../sdb then your device is /dev/sdb

 

(partial listing of my array follows)

root@Tower:~# ls -l /dev/disk/by-id

total 0

lrwxrwxrwx 1 root root  9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD -> ../../hdb

lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD-part1 -> ../../hdb1

lrwxrwxrwx 1 root root  9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D -> ../../hdd

lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D-part1 -> ../../hdd1

lrwxrwxrwx 1 root root  9 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983 -> ../../sde

lrwxrwxrwx 1 root root 10 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983-part1 -> ../../sde1

lrwxrwxrwx 1 root root  9 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0 -> ../../sda

lrwxrwxrwx 1 root root 10 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0-part1 -> ../../sda1

 

To confirm, type:

smartctl -i -d ata /dev/sdb

It should print the drive size, model, and serial number that matches your new drive as below...

root@Tower:~# smartctl -i -d ata /dev/sdb

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Model Family:    Seagate Barracuda 7200.10 family

Device Model:    ST3750640AS

Serial Number:    5QD2ZR29

Firmware Version: 3.AAE

User Capacity:    750,156,374,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jan  2 15:00:51 2009 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

Joe L.

 

Link to comment

Hi Joe,

 

Those few commands are exactly what I needed. Thanks so much for the clear and prompt reply. Now my pre-clear process is about a few percentage into the process. Once it is done and hopefully no error (may be more than 10 hours), I will replace the current parity drive with this drive.

 

Happy New Year!

 

--Tom

Link to comment

The script just halted at 88 % postread on my 2 D10EADS-00L5B1 drives.

 

What should I do / where do I have to look for ?

 

Greets,

 

Deva

 

EDIT:

 

smartctl shows this:

Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.

 

was:

Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.

 

Also, nothing funky returned:

root@tower:/tmp# smartctl -a /dev/sda
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00L5B1
Serial Number:    WD-WCAU42804491
Firmware Version: 01.01A01
User Capacity:    1,000,203,804,160 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Jan  3 23:08:00 2009 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (22200) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   165   021    Pre-fail  Always       -       6666
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       37
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       18
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       21
194 Temperature_Celsius     0x0022   126   117   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

This cannot be a coincidence.  I just received a batch of WD10EACS (you have the EADS version with 32mb cache).  The first drive I pre-cleared finished but there was not SMART report.  Rather than enjoy the 13 hour ordeal again.  I just skipped it and tried another drive.  This one stopped at the 88% mark and hung... I am running v4.4 on my trusty Asus P5B-VM DO.

 

Regards,  Peter

Link to comment

Hi Joe,

 

Thanks for your prompt reply. How do I know the correct device name?

 

--Tom

 

Another way to do it is to stop your array.  Go to the devices page and pretend like you are going to add a new drive to the array.  Your new drives name is visible in the drop down box.  Just don't add the drive and go back to the main page and start the array again.

 

Regards,  Peter

Link to comment

Are you both using the newer version of the pre-clear script, or the older original one?

 

The older script exited the pre/post read when the "dd" command failed to read the expected number of blocks. I originally thought that would only happen when you reached the end of the disk and it tried to read disk blocks past the end.

 

Apparently, a "read" error is occurring, before the end of the disk is reached.  On the original script this exits the read phase early.

 

When I was told this was happening I re-wrote that logic and had the loop continue reading until the end., even if errors occurred.

(We really want the errors to identify bad blocks, and bad hardware)

 

The newer version of preclear_disk.sh should never exit the read phase early.

 

Joe L.

Link to comment

Joe,

 

I am using the last one we tested.  It worked fine on the Seagate 1.5TB units.  But I am testing WD GP 1TB and seeing a bit of weirdness.  I used the -n switch on the last run to get the drive cleared without all the additional testing and that worked well.  I've two more drives that I can try again.  Although I am not sure if I can get to it in the next couple of days (off to CES!).

 

By the way, do you happen to know if erasing the MBR will make the drive appear fresh to unRAID?  If I take a drive that was used as a data drive in one unRAID system, wipe the MBR and install it in another unRAID system, that should cause the new system to start a clearing process, right?

 

Finally, do drives need to be pre-cleared in the system that they will be used in or can they be cleared in one machine and used in another?

 

Thanks and regards,  Peter

Link to comment

Joe,

 

I am using the last one we tested.  It worked fine on the Seagate 1.5TB units.  But I am testing WD GP 1TB and seeing a bit of weirdness.  I used the -n switch on the last run to get the drive cleared without all the additional testing and that worked well.  I've two more drives that I can try again.  Although I am not sure if I can get to it in the next couple of days (off to CES!).

Cool... have fun.

By the way, do you happen to know if erasing the MBR will make the drive appear fresh to unRAID?  If I take a drive that was used as a data drive in one unRAID system, wipe the MBR and install it in another unRAID system, that should cause the new system to start a clearing process, right?

It sure will make it start the clearing process. There are only three situations where the second array would not clear the drive.

1. No parity drive is defined

2. The disk has a valid reiserfs that starts at cylinder 63 and extends to the end of the disk as partition 1 and no other partitions. (The normal way unRAID creates a data disk)

3. The disk has a special "pre-clear" signature in the first 512 bytes on the first sector on the disk.  (This differs based on disk size and geometry, and it the whole purpose of the preclear script)

 

If you want to erase just the MBR you can use the -n option and type "Control-C" once the zeroing of the bulk of the drive starts.  That should be enough to make it look unformatted to a new unRAID server.

 

Finally, do drives need to be pre-cleared in the system that they will be used in or can they be cleared in one machine and used in another?

 

Thanks and regards,  Peter

They can be pre-cleared in any system.  The script does look for the unRAID specific files and folders to ensure you do not shoot yourself in your foot, but it will probably even work on a non-unraid linux box.  It might give an error or two to stdout as it will not be able to open /boot/config/disks.cfg or run the "mdcmd status" command, but it should work.  It will still make sure the disk is not mounted.

 

Joe L.

Link to comment

Here's a really dumb question:

 

If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script.  But, do I have to leave the Putty session open for 10+ hours in order to view the status?  Is there any way to start preclear, disconnect from my box, connect again and view the preclear status?

 

I bought myself 2 WD Green drives that I am going to preclear.

 

Thanks for any info.

 

Edit: Never mind - Answer found in original post

 

"You will either need to kick this preclear_disk.sh script off from the system console, or from a telnet session.  You must leave the session open as it runs. (and it will typically run for hours)"

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.