Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Is there anyway to run preclear_disk.sh in the background? And have the results mailed

 

 

Would this work?

 

preclear_disk.sh -m [email protected] /dev/hdk > /dev/null &

 

Then I can disconnect my terminal and wait for an email.

 

What if output where redirected to a file. Could I then login and check the tail of the file for a status update?

That will not work.

 

Install "screen" and use it.  It will allow you to disconnect the terminal and re-connect as you desire.

 

Joe L.

Link to comment

Here's a really dumb question:

 

If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script.  But, do I have to leave the Putty session open for 10+ hours in order to view the status?  Is there any way to start preclear, disconnect from my box, connect again and view the preclear status?

 

I bought myself 2 WD Green drives that I am going to preclear.

 

Thanks for any info.

You can install "screen" as a supplemental package.  When you invoke it and then start a command you can then disconnect and later re-connect to a running process.   Otherwise, there is no other way I know if you don't have a system console.

 

Both these packages are needed.   Use "installpkg package_name.tgz" to install each in turn.

 

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

 

Then type

screen

Then start up the preclear_disk.sh process.

 

To detach, leaving it running, type

Control-A d

 

Then, 10 hours later you can re-attach to the running process by logging in and typing

screen -r

 

A good article on "screen" can be found here:

http://www.linuxjournal.com/article/6340

 

It can do a lot more. You can "name" the screen sessions, list the sessions

Control-A "

(Control-A followed by a "quote")

 

Edit: updated links to screen packages

 

Joe L.

 

 

Are these instructions still current?

 

TIA.

Link to comment

Here's a really dumb question:

 

If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script.  But, do I have to leave the Putty session open for 10+ hours in order to view the status?  Is there any way to start preclear, disconnect from my box, connect again and view the preclear status?

 

I bought myself 2 WD Green drives that I am going to preclear.

 

Thanks for any info.

You can install "screen" as a supplemental package.  When you invoke it and then start a command you can then disconnect and later re-connect to a running process.   Otherwise, there is no other way I know if you don't have a system console.

 

Both these packages are needed.   Use "installpkg package_name.tgz" to install each in turn.

 

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

 

Then type

screen

Then start up the preclear_disk.sh process.

 

To detach, leaving it running, type

Control-A d

 

Then, 10 hours later you can re-attach to the running process by logging in and typing

screen -r

 

A good article on "screen" can be found here:

http://www.linuxjournal.com/article/6340

 

It can do a lot more. You can "name" the screen sessions, list the sessions

Control-A "

(Control-A followed by a "quote")

 

Edit: updated links to screen packages

 

Joe L.

 

 

Are these instructions still current?

 

TIA.

The "screen" package has not changed in probably 20 years... Yes, they will still work.
Link to comment

My server is rather old, a P4 2.4Ghz and my drives are connected to a Promise TX4 PCI SATA controller, my parity on the onboard SATA port. When I preclear a drive I can't stream HD movies anymore as the system is saturated. For instance it takes me 30 hours to preclear a 1.5Tb drive.

 

I would like to know if setting the write and read block size would allow me to watch movies while preclearing and what size would optimal for my system.

Link to comment

My server is rather old, a P4 2.4Ghz and my drives are connected to a Promise TX4 PCI SATA controller, my parity on the onboard SATA port. When I preclear a drive I can't stream HD movies anymore as the system is saturated. For instance it takes me 30 hours to preclear a 1.5Tb drive.

 

I would like to know if setting the write and read block size would allow me to watch movies while preclearing and what size would optimal for my system.

I have no way of knowing if smaller block sizes will help, but you can sure try.  I'd try something like 8192 for both.

Not having your system, it is impossible to know if it will help.  It will take longer for the pre-clear with the smaller block sizes, but it might just let you watch the movies.

Link to comment

You need to ditch the PCI Bus, it's over saturated. All devices on the PCI Bus share the same bandwidth. It's limited to 133 MB/sec for all devices combined. Your preclear is most likely dominating that limited bandwidth. If possible move your drives off the PCI Bus and onto the onboard SATA ports, if you have any available.

Link to comment

I have no way of knowing if smaller block sizes will help, but you can sure try.  I'd try something like 8192 for both.

Not having your system, it is impossible to know if it will help.  It will take longer for the pre-clear with the smaller block sizes, but it might just let you watch the movies.

 

Joe, I'll try that block size you recommend. The lenght of time it will take is of secondary importance. My wife does not understand why we can't watch a movie while I'm expanding the capacity of the server. She just wants it to work without other fuss around.

 

You need to ditch the PCI Bus, it's over saturated. All devices on the PCI Bus share the same bandwidth. It's limited to 133 MB/sec for all devices combined. Your preclear is most likely dominating that limited bandwidth. If possible move your drives off the PCI Bus and onto the onboard SATA ports, if you have any available.

 

I know I'm limited by the PCI bus, but I don't feel like spending 350€ on new hardware when all I do is usenet and watch movies from the server. I'd prefer to spend that money in buying new disks, that's 6TB I can buy for that amount. I have 2 onboard sata ports and they are being used by the parity and the cache drive. Thanks for the advice, I already considered it.

Link to comment

Just got a 2TB EARS drive and dropped it into my machine with Pin 7&8 Jumpered.

 

I started up the script and I get a

Clearing will NOT be performed

 

I ran the following

preclear_disk.sh /dev/sdf

 

This is what ls -l /dev/disk/by-id

Generated

 

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1371380 -> ../../sdc

lrwxrwxrwx 1 root root 10 Sep 23 23:02 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1371380-part1 -> ../../sdc1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1447030 -> ../../sdb

lrwxrwxrwx 1 root root 10 Sep 23 23:02 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1447030-part1 -> ../../sdb1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD20EARS-00MVWB0_WD-WMAZ20201794 -> ../../sda

lrwxrwxrwx 1 root root 10 Sep 23 23:02 ata-WDC_WD20EARS-00MVWB0_WD-WMAZ20201794-part1 -> ../../sda1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0138511 -> ../../sdd

lrwxrwxrwx 1 root root 10 Sep 23 23:02 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0138511-part1 -> ../../sdd1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0241582 -> ../../sdf

lrwxrwxrwx 1 root root  9 Sep 23 23:02 ata-WDC_WD5000AAVS-00ZTB0_WD-WCASU0114518 -> ../../sde

lrwxrwxrwx 1 root root 10 Sep 23 23:02 ata-WDC_WD5000AAVS-00ZTB0_WD-WCASU0114518-part1 -> ../../sde1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD15EADS-00_WD-WCAVY1371380 -> ../../sdc

lrwxrwxrwx 1 root root 10 Sep 23 23:02 scsi-SATA_WDC_WD15EADS-00_WD-WCAVY1371380-part1 -> ../../sdc1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD15EADS-00_WD-WCAVY1447030 -> ../../sdb

lrwxrwxrwx 1 root root 10 Sep 23 23:02 scsi-SATA_WDC_WD15EADS-00_WD-WCAVY1447030-part1 -> ../../sdb1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD20EARS-00_WD-WMAZ20201794 -> ../../sda

lrwxrwxrwx 1 root root 10 Sep 23 23:02 scsi-SATA_WDC_WD20EARS-00_WD-WMAZ20201794-part1 -> ../../sda1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0138511 -> ../../sdd

lrwxrwxrwx 1 root root 10 Sep 23 23:02 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0138511-part1 -> ../../sdd1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0241582 -> ../../sdf

lrwxrwxrwx 1 root root  9 Sep 23 23:02 scsi-SATA_WDC_WD5000AAVS-_WD-WCASU0114518 -> ../../sde

lrwxrwxrwx 1 root root 10 Sep 23 23:02 scsi-SATA_WDC_WD5000AAVS-_WD-WCASU0114518-part1 -> ../../sde1

lrwxrwxrwx 1 root root  9 Sep 23 23:02 usb-SanDisk_U3_Cruzer_Micro_000016A078713815-0:0 -> ../../sdg

lrwxrwxrwx 1 root root 10 Sep 23 23:02 usb-SanDisk_U3_Cruzer_Micro_000016A078713815-0:0-part1 -> ../../sdg1

 

So I'm pretty sure I have the correct drive, but so far its not agreeing with me.

 

Link to comment

Did you respond to the initial prompt to continue with a "Yes"  (with a capital "Y" and lower case "es")?  If not, it will not be cleared.

 

Joe L.

 

:(

Uh, I guess when it has the Capital Y and the lower case es it means it wants you to type it.

 

Now I really feel stupid. LOL Thanks Joe L

Link to comment
  • 3 weeks later...

I have some suggested enhancements to preclear:

 

1 - Allow an option to skip the preread, without skipping the post read.  I had a problem running preclear on a disk, that appears to have been connection related.  I wanted to run a preclear cycle but after preclearing the disk once (unsuccessfully) and then preclearing it again (successfully), I wanted to skip the preread cycle.

 

2 - Allow an option to JUST preread a disk, without writing binary zeros to the beginning of the disk. This would be a pure read test.

 

3 - Allow an option to verify that the disk is full of binary zeros, ignoring the partiioning info.

Link to comment

First I should say I am new to unraid and have no linux experience, but I am slowly learning.  Very Slowly!

 

My computer locked up while running preclear after about 12 hours.  I guess the first question I should ask is there a limit to the number of drives that can be precleared at the same time?  Maybe that's my problem.  I was using putty and screen to run the preclears on 11 drives (All 2TB Western Digital WD20EADS) at the same time.  Is that to many?  What is the limit? 

 

The preclear on the 11 drives worked fine for about the first 12 hours then the computer locked up completely.  The screen was blank and I couldn't get the to unraid web menu or unmenu.  The activity lights on 9 of the 11 drives were off, but 2 of the 11 were on solid but it didn't sound like there was any disk activity.

 

If there isn't a limit on the number of preclears that can be done at once, what should I do now to figure out what the problem might be?  I've attached my syslog and smart report for all of the drives.

 

Computer Specs:

Motherboard:  Supermicro X8SIA-F

CPU: Xeon L3406

Memory: 8GB ECC

SATA Controller: 6 port on MB and two Supermicro AOC-SASLP-MV8

Hard Drives - 11 Western Digital WD20EADS 2 TB Green

 

Thanks,

Kerry

syslog-2010-10-17.txt

All.txt

Link to comment

First I should say I am new to unraid and have no linux experience, but I am slowly learning.  Very Slowly!

 

My computer locked up while running preclear after about 12 hours.  I guess the first question I should ask is there a limit to the number of drives that can be precleared at the same time?  Maybe that's my problem.  I was using putty and screen to run the preclears on 11 drives (All 2TB Western Digital WD20EADS) at the same time.  Is that to many?  What is the limit? 

 

The preclear on the 11 drives worked fine for about the first 12 hours then the computer locked up completely.  The screen was blank and I couldn't get the to unraid web menu or unmenu.   The activity lights on 9 of the 11 drives were off, but 2 of the 11 were on solid but it didn't sound like there was any disk activity.

 

If there isn't a limit on the number of preclears that can be done at once, what should I do now to figure out what the problem might be?  I've attached my syslog and smart report for all of the drives.

 

Computer Specs:

Motherboard:  Supermicro X8SIA-F

CPU: Xeon L3406

Memory: 8GB ECC

SATA Controller: 6 port on MB and two Supermicro AOC-SASLP-MV8

Hard Drives - 11 Western Digital WD20EADS 2 TB Green

 

Thanks,

Kerry

Each process will use memory...  The disk buffer cache will try to grab as much memory possible too,  So yes, you probably ran out of memory.  I have absolutely no idea of the "limit" even if you have a large amount of RAM as you do.

 

I'd only do a few at a time, or, use the options in the preclear script to limit the buffer size.

 

      -w size  = write block size in bytes

 

      -r size  = read block size in bytes

 

      -b count = number of blocks to read at a time

 

they are described in this post: http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

 

Joe L.

 

 

 

Link to comment

Joe,

 

Thanks a lot for your preclear script. You have a lot of clever things in there, eg modifying the partition table using echo and dd ! It took the script 105 hours to complete 3 passes on four 2TB Seagate ST32000542AS (I cleared all 4 disks in parallel on an Acer h342, an Atom dual-core machine).

 

I modified your preclear script so the SMART differences will be printed a bit more clearly. In particular, the table of attributes is separated from the main report and the rows are better interleaved 1-by-1 to better highlight differences. I've attached my new script... but please note I also shortened the short_test lengths because I'm impatient.

 

Do you mind if I suggest a few other improvements?

 

1) On passes after the first pass, there is no need to do a pre-read, because the post-read of the previous pass already read all the sectors. This would allow for more write-test passes in the same period of time.

 

2) On each pass, write a different value to the sectors rather than all 0's. Starting with data=0, maybe write the value "data = data ^ 255 ^ (pass-1) ^ (pass)" for each pass. Starting with pass=1, this should write the sequence 254, 2, 252, 4, ... (255-pass), (pass), ...    On the final pass, you can write 0's. This will make it harder to use 'sum' to verify, but it will better test the underlying medium and detect when data is not being written correctly on successive passes.

 

Thanks,

Guy

preclear_disk_guy.zip

Link to comment

Here are the results from a drive I had in my whs.  I was thinking I could ude it as a cache disk.

 

============================================================================

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: S.M.A.R.T. error count differences detected after pre-clear

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: note, some 'raw' values may change, but not be an indication of a problem

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: 54c54

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: < 1 Raw_Read_Error_Rate 0x000f 101 099 006 Pre-fail Always - 3525934

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: ---

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: > 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 171199195

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: 58c58

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: < 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 76815635

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: ---

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: > 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 76912546

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: 64,66c64,66

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: < 189 High_Fly_Writes 0x003a 039 039 000 Old_age Always - 61

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: < 190 Airflow_Temperature_Cel 0x0022 072 055 045 Old_age Always - 28 (Lifetime Min/Max 23/28)

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: < 195 Hardware_ECC_Recovered 0x001a 029 025 000 Old_age Always

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: ---

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: > 189 High_Fly_Writes 0x003a 031 031 000 Old_age Always - 69

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: > 190 Airflow_Temperature_Cel 0x0022 069 055 045 Old_age Always - 31 (Lifetime Min/Max 23/34)

Oct 24 02:36:35 Tower preclear_disk-diff[14738]: > 195 Hardware_ECC_Recovered 0x001a 055 025 000 Old_age Always

Oct 24 02:36:35 Tower preclear_disk-diff[14738]:

Link to comment

When I/O errors like this:

 

Oct 24 04:48:14 Beanstalk kernel: end_request: I/O error, dev sdd, sector 390702

5712

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] Unhandled error code

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driver

byte=0x00

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 e8 e

0 7b 30 00 00 08 00

Oct 24 04:48:14 Beanstalk kernel: end_request: I/O error, dev sdd, sector 390702

5712

 

occur during preclear, shouldn't preclear_disk.sh exit with a failure?  Believe it or not, after filling up the syslog with these errors, preclear ended with a success on this drive.

Link to comment

When I/O errors like this:

 

Oct 24 04:48:14 Beanstalk kernel: end_request: I/O error, dev sdd, sector 390702

5712

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] Unhandled error code

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driver

byte=0x00

Oct 24 04:48:14 Beanstalk kernel: sd 3:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 e8 e

0 7b 30 00 00 08 00

Oct 24 04:48:14 Beanstalk kernel: end_request: I/O error, dev sdd, sector 390702

5712

 

occur during preclear, shouldn't preclear_disk.sh exit with a failure?  Believe it or not, after filling up the syslog with these errors, preclear ended with a success on this drive.

It has no idea the errors are occurring if the operating system retries and does not let it know anything was occurring.

 

Sorry, but it is just issuing a "dd" command (actually a large series of commands) and when verifying it expects specific values.  If your drive is experiencing those kinds of errors it could be almost anything...

Link to comment

My 2 TB seagate (5900 RPM drive) is pre-clearing now.  I am 26 hours in to it and at 1.7 TB per a drive (doing 4 at a time) in the post read process now.  Does that seem excessively long?

 

Thanks!

 

Neil

 

No, it does not seem excessively long.

 

It usually takes about  30 to 35 hours for a single 2TB drive.  With 4 being cleared concurrently I'd guess you could easily add 20% to the time it takes depending on the bus bandwidth and cpu(s) involved.

 

Joe L.

Link to comment

Wow that is a long time. I am a newbie. Can you let me know howmthis worksmin regard to stopping the raid array and adding drives? It is quick after I pre clear stop the array and add a new drive. Does the new drive get formatted at that time? If so how long does that take. I am trying to understand the penalty for adding space after the fact. Thanks!

Link to comment

Wow that is a long time. I am a newbie.

Actually, it is limited mainly by the disk drive itself.  Most drives can be read/written to with a sustained average speed of somewhere between 50 MB/s and 100MB/s.  That translates into a speed of between 10 and 20 seconds per gigabyte.    For a 2TB drive (2000GB) it could take between 20,000 and 40,000 seconds just to read or write the drive without any additional overheard or tasks occurring.  ( 5.5 to 11.1 hours)

 

The preclear-script does not just read the disk.  It reads all the sectors on the disk in addition to constantly moving the disk heads to read additional random sectors and across the entire surface on the disk from the first to last tracks.  This makes it take longer, since the disk heads are constantly being re-positioned and to read the next linear sectors it must wait for the platters to rotate to the correct sector to be accessed next. 

 

This constant movement of the disk heads is to identify those disks that will be prone to an early mechanical failure.    In the same way, the post-read also verifies that what is being read is all zeros.  This also takes time but has been proven necessary as there have been drives that return random values on occasion and exhibit no other errors.  These can cause hair loss, as files read back from them are "sometimes" corrupted, and subsequent parity checks "sometimes" have errors, and at that point it is impossible to know which drive is returning the random bits.  (you will pull your hair out trying to find the cause of the parity errors)

 

You can think of the preclear-script as performing a burn-in of the disk to exercise it far more than usual and to assist in identification of bad disks before you assign them to your array and store your precious data on them.  It is not a guarantee the disk will not fail soon after being put into service, but it is a good initial test of the drive.

Can you let me know how this works in regard to stopping the raid array and adding drives? Is it quick after I pre clear stop the array and add a new drive.
To add a pre-cleared drive to an array you simply stop the array, assign the drive and then press "Start" to re-start the array.  The down-time is the minute or so needed to perform those three steps. 
Does the new drive get formatted at that time?
A pre-cleared drive is formatted after the array is started, and after you are back on-line with the rest of your data.  There is a "Format" button present if there is an "unformatted" drive present.  You can press it at any time once the array is started.  A drive can be formatted while reading or writing to other shares in the array.
If so how long does that take.
Typically this takes just a few minutes for larger drives, less for smaller drives.
I am trying to understand the penalty for adding space after the fact. Thanks!

The penalty of adding a drive that has NOT been pre-cleared is that the unRAID software will clear the drive for you, but while it does it the array will be off-line.  As already described, this will keep the array off-line for 6 to 12 hours for a 2TB drive.  (not the type of thing you want to do if you expected to watch a movie shortly after adding the new disk.)  Once it finishes clearing the drive the array will come back online and you will again be presented with a "Format" button.

 

The built-in clearing step also does none of the additional pre-read and post-read steps the preclear_disk.sh script performs.  It only writes zeros to the disk.  It makes no attempt to detect un-writable sectors or verify the disk is capable of reading what was written.  The disk will not detect the un-readable sectors until it is eventually read after being written to.  The built-in clearing step does not "burn-in" a drive to attempt to detect an early failure. 

 

Note: If you have not yet assigned a parity drive the built-in clearing step is not performed at all.  Its purpose is to allow parity to be correct without having a full parity calculation again, so if you have not yet performed the initial parity calculation, clearing is not necessary.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.