Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Thanks Joe, guess I was expecting a Cat 5 given that the drive is brand new but sticking it through 21 hours of straight use has probably given it a little wear :D

I don't have a category 5 in my measurement scale...  I only go to category 4.  Sorry.

 

Just remember... there are really only two broad categories:

 

  • Those disks that have already failed.
       
  • Those disks that have not yet failed... (but if given enough time, will)

 

Those two reasons are why we have unRAID servers.

 

Joe L.

Link to comment

I have a quick question about the mail function in this new version.

 

In your directions you link to the unRAID_notify script as an example of what you need installed for this to work.  I do have unRAID_notify installed as part of my BubbaRaid install and that might be what is causing the problem.  I just started a preclear on a 1TB drive and was going to use the new -m -M options to send myself updates as it was happening.  The first try came back  with a "mail command not found" type error. It was already late and I need to get this new 1TB drive in the server to replace a failing one as soon as possible so I did not mess with it the rest of the night.

 

If there is a way you (or I) can modify the script to look for the BubbaRAID install of unRAID_notify that would be great.  If you need any information I will provide it as soon as I get a chance.

 

Thanks

Link to comment

I have a quick question about the mail function in this new version.

 

In your directions you link to the unRAID_notify script as an example of what you need installed for this to work.  I do have unRAID_notify installed as part of my BubbaRaid install and that might be what is causing the problem.  I just started a preclear on a 1TB drive and was going to use the new -m -M options to send myself updates as it was happening.  The first try came back  with a "mail command not found" type error. It was already late and I need to get this new 1TB drive in the server to replace a failing one as soon as possible so I did not mess with it the rest of the night.

 

If there is a way you (or I) can modify the script to look for the BubbaRAID install of unRAID_notify that would be great.  If you need any information I will provide it as soon as I get a chance.

 

Thanks

I don't use bubbaRAID, so I cannot help you with anything in it.  It uses a very old version of unRAID, so I can't even test it if I wanted to.

 

The "mail" command must be able to take as an argument the subject AND expect an addressee.  If your "mail" command is not in the search path, I cannot help you.

The mail is invoked as

mail -s "subject line for mail" your_address@your_mail_server.com

 

One version of the "mail" commands out there, in an early version of "unraid_notify" did not comply with the normal mail syntax and will not work. 

 

You can test your own mail command by typing:

echo "This is mail for me" | mail -s "Test Mail Subject" your_email@your_domain.com

 

If it can send the mail, then so can preclear_disk.sh

If it does not work, then nothing will ever get through, since you have a broken mail command.

 

Joe L.

Link to comment

I don't use bubbaRAID, so I cannot help you with anything in it.  It uses a very old version of unRAID, so I can't even test it if I wanted to.

 

The "mail" command must be able to take as an argument the subject AND expect an addressee.  If your "mail" command is not in the search path, I cannot help you.

The mail is invoked as

mail -s "subject line for mail" your_address@your_mail_server.com

 

One version of the "mail" commands out there, in an early version of "unraid_notify" did not comply with the normal mail syntax and will not work.   

 

You can test your own mail command by typing:

echo "This is mail for me" | mail -s "Test Mail Subject" your_email@your_domain.com

 

If it can send the mail, then so can preclear_disk.sh

If it does not work, then nothing will ever get through, since you have a broken mail command.

 

Joe L.

 

Thanks for the directions.  I will try them out when I get home from work today and see what happens.  I have a feeling that the version of unRAID_notify is old and therefore does not comply to this new syntax.  I have not bothered to update as it still works for me and sends me the notifications I need.  I probably won't bother trying to update it since it is "built into" the BubbaRAID install.

Link to comment

  I have a feeling that the version of unRAID_notify is old and therefore does not comply to this new syntax. 

It is NOT new syntax.. It is the exact syntax used by unix and linux mail since the early 1970's.

 

The author of unraid_notify wrote a version that did not comply and could not accept an addressee.

 

Joe L.

 

Actually, what I said is slightly incorrect... In the early 70's we used a slightly different method to route mail.

An email address might have been something like

mail -s "this is a test message" att!belllabs!joe

 

The "!" notation was used to forward mail from one machine to another via "uucp" until it finally arrived at the machine where I had a login.

Link to comment

  I have a feeling that the version of unRAID_notify is old and therefore does not comply to this new syntax. 

It is NOT new syntax.. It is the exact syntax used by unix and linux mail since the early 1970's.

 

The author of unraid_notify wrote a version that did not comply and could not accept an addressee.

 

Joe L.

 

Right right, that is what I meant.  I worded that I little funny.  I remember the thread discussion about making it conform to the standard syntax and how unRAID_notify did not.  I know that was changed in a newer version but I am not sure (don't think) the version included with BubbaRAID, is the newer version.

Link to comment
  • 3 weeks later...

Hi Joe,

 

Just had a chance to try the latest version with a few new hdds.

 

The new version does the verification of zeros as you described as its default behavior in the post-read phase.  It will take about 10-15% more time, as it has to evaluate what it is reading.  It only does it in the post-read phase, as it is there where the zeros are expected.

 

You can elect to NOT do the post-read verification of all zeros with a new -N option...  (if you are in a hurry)

 

If I do the verification of zeros, post-read speed drops to 28-30 mb/s.

If I use the -N option, post-read speed is 60-65 mb/s (same as pre-read).

 

So essentially, it is taking 100% longer for me if I do the verification of zeros. Is there anything wrong that I'm doing?  :(

 

Link to comment

Hi Joe,

 

Just had a chance to try the latest version with a few new hdds.

 

The new version does the verification of zeros as you described as its default behavior in the post-read phase.  It will take about 10-15% more time, as it has to evaluate what it is reading.  It only does it in the post-read phase, as it is there where the zeros are expected.

 

You can elect to NOT do the post-read verification of all zeros with a new -N option...  (if you are in a hurry)

 

If I do the verification of zeros, post-read speed drops to 28-30 mb/s.

If I use the -N option, post-read speed is 60-65 mb/s (same as pre-read).

 

So essentially, it is taking 100% longer for me if I do the verification of zeros. Is there anything wrong that I'm doing?  :(

 

No, I don't think you are doing anything wrong... it takes time to read the disk an additional time, perform a  "sum" of the bytes read and and do the verification.  The 10-15% time I was referring to was the overall time of the whole preclear process.

 

The step was added as at least one person had a disk drive that acted normal otherwise, but the data written to it could not be read back as written. (We wrote all zeros, occasionally it returned back something as non-zero.  This was really bad since it was only showing up later when repeated parity checks showed errors each time it ran)

Link to comment
  • 2 weeks later...

Hi, thanks for a great script, Joe!  I used it when I was setting up my unRaid server in January.  I'm adding some drives now so I thought I'd download the most recent version... and I couldn't find the download link.  I felt like an idiot, I kept searching the post for the link and couldn't find it... until finally I tried logging in to the forum, and then the link appeared at the bottom of the post!

 

It would be nice if the link showed up for guests, even if it just tells you you need to log in when you click on it, but as it is, there is no indication there is any link at all.

 

Not your fault, obviously!  Just a usability problem with the forum.

 

Could you maybe make mention of this?  Maybe at the top of the first post, say something like "Download link at the bottom of the post.  Make sure you are logged in to see it"  I'm sure most people here are logged in anyways, but for those who aren't, it could save a lot of time and frustration looking for the link.

Link to comment

Hi, thanks for a great script, Joe!  I used it when I was setting up my unRaid server in January.  I'm adding some drives now so I thought I'd download the most recent version... and I couldn't find the download link.  I felt like an idiot, I kept searching the post for the link and couldn't find it... until finally I tried logging in to the forum, and then the link appeared at the bottom of the post!

 

It would be nice if the link showed up for guests, even if it just tells you you need to log in when you click on it, but as it is, there is no indication there is any link at all.

 

Not your fault, obviously!  Just a usability problem with the forum.

 

Could you maybe make mention of this?  Maybe at the top of the first post, say something like "Download link at the bottom of the post.  Make sure you are logged in to see it"  I'm sure most people here are logged in anyways, but for those who aren't, it could save a lot of time and frustration looking for the link.

Good point... I did not know the download link would not be visible unless you logged in.  I'll add a note as suggested.

 

Joe L.

Link to comment

I must confirm that same problem.  The post kept saying "I've attached..." and I kept thinking "No you didn't".  I was logged into my other OS that didn't keep me logged in.

 

Duh.

 

For me the reason was I'd recently wiped my system and installed Windows 7.  So there are definitely reasons why even longtime unraid users might be using the forums while not being logged in.

 

Anyways thanks for adding the note Joe, and for creating such a useful script!

Link to comment

Have been burning in (2) ST32000542AS 5900RPM 2TB drives, by running 2 copies of Preclear simultaneously.  With version .9.8, each pass takes 30 hours!

 

I manually spindown the array nightly from the standard web interface.  During the 4th pass, pressing the Spin Down button caused the web interface to become completely unresponsive.  Couldn't even launch the web interface from another PC.  UnMenu worked fine, and I could access files files via the disk shares or user shares.

 

Gave up for the night.  In the morning, all was well.  The syslog showed a 24 minute delay between pressing Spin Down and Unraid trying to spin down the drives.

 

Dec  3 20:51:09 Tower emhttp: shcmd (54): sync

Dec  3 21:15:12 Tower emhttp: shcmd (55): /usr/sbin/hdparm -y /dev/sdg >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (56): /usr/sbin/hdparm -y /dev/sdc >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (57): /usr/sbin/hdparm -y /dev/sda >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (58): /usr/sbin/hdparm -y /dev/sdb >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/hdg >/dev/null

 

There are 3 other examples in the log, when the simultaneous Preclear's were running, where pressing Spin Down immediately spun down the drives.

 

4.5 Beta 11, md_num_stripes=5120, C2SEE, 4GB RAM, disks connected to the integrated ICH10.

 

Link to comment

Have been burning in (2) ST32000542AS 5900RPM 2TB drives, by running 2 copies of Preclear simultaneously.  With version .9.8, each pass takes 30 hours!

 

I manually spindown the array nightly from the standard web interface.  During the 4th pass, pressing the Spin Down button caused the web interface to become completely unresponsive.  Couldn't even launch the web interface from another PC.  UnMenu worked fine, and I could access files files via the disk shares or user shares.

 

Gave up for the night.  In the morning, all was well.  The syslog showed a 24 minute delay between pressing Spin Down and Unraid trying to spin down the drives.

 

Dec  3 20:51:09 Tower emhttp: shcmd (54): sync

Dec  3 21:15:12 Tower emhttp: shcmd (55): /usr/sbin/hdparm -y /dev/sdg >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (56): /usr/sbin/hdparm -y /dev/sdc >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (57): /usr/sbin/hdparm -y /dev/sda >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (58): /usr/sbin/hdparm -y /dev/sdb >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/hdg >/dev/null

 

There are 3 other examples in the log, when the simultaneous Preclear's were running, where pressing Spin Down immediately spun down the drives.

 

4.5 Beta 11, md_num_stripes=5120, C2SEE, 4GB RAM, disks connected to the integrated ICH10.

 

You basically experienced "resource contention"  (too much going on, too little memory for it all to happen at once)  One of the pre-clear processes was using some resource others needed, so they waited until it was free.

 

You will benefit from the three parameters I added most recently that allow you to specify smaller block sizes when reading and writing and a smaller number of blocks as well.   Those parameters are:

 

      -w size  = write block size in bytes

 

       -r size  = read block size in bytes

 

       -b count = number of blocks to read at a time

They are described in more detail in this post: http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

 

Joe L.

Link to comment

Have been burning in (2) ST32000542AS 5900RPM 2TB drives, by running 2 copies of Preclear simultaneously.  With version .9.8, each pass takes 30 hours!

 

I manually spindown the array nightly from the standard web interface.  During the 4th pass, pressing the Spin Down button caused the web interface to become completely unresponsive.  Couldn't even launch the web interface from another PC.  UnMenu worked fine, and I could access files files via the disk shares or user shares.

 

Gave up for the night.  In the morning, all was well.  The syslog showed a 24 minute delay between pressing Spin Down and Unraid trying to spin down the drives.

 

Dec  3 20:51:09 Tower emhttp: shcmd (54): sync

Dec  3 21:15:12 Tower emhttp: shcmd (55): /usr/sbin/hdparm -y /dev/sdg >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (56): /usr/sbin/hdparm -y /dev/sdc >/dev/null

Dec  3 21:15:12 Tower emhttp: shcmd (57): /usr/sbin/hdparm -y /dev/sda >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (58): /usr/sbin/hdparm -y /dev/sdb >/dev/null

Dec  3 21:15:13 Tower emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/hdg >/dev/null

 

There are 3 other examples in the log, when the simultaneous Preclear's were running, where pressing Spin Down immediately spun down the drives.

 

4.5 Beta 11, md_num_stripes=5120, C2SEE, 4GB RAM, disks connected to the integrated ICH10.

 

You basically experienced "resource contention"  (too much going on, too little memory for it all to happen at once)  One of the pre-clear processes was using some resource others needed, so they waited until it was free.

 

You will benefit from the three parameters I added most recently that allow you to specify smaller block sizes when reading and writing and a smaller number of blocks as well.   Those parameters are:

 

      -w size  = write block size in bytes

 

       -r size  = read block size in bytes

 

       -b count = number of blocks to read at a time

They are described in more detail in this post: http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

 

Joe L.

 

Thanks

Link to comment

Am I wrong in thinking that I have to modify the script to work with my 3ware (9800S) card?

Running the "smartctl  -a  -d  ata  /dev/sda " cmd returns

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

 

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

 

While if I run "smartctl  -a  -d  3ware,0 /dev/twa0 " I get the expect return.

I guess I'm not 100% sure how to modify as the disks are numbered [0-4] in my case so I would run it

smartctl  -a  -d  3ware,N /dev/twa0 where N is the disk number and the script appears to run alphabetical.

 

Aaron

 

Link to comment

Am I wrong in thinking that I have to modify the script to work with my 3ware (9800S) card?

Perhaps.... but if you prefer, perhaps not.  I'll explain below.

Running the "smartctl  -a  -d  ata  /dev/sda " cmd returns

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

 

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

 

While if I run "smartctl  -a  -d  3ware,0 /dev/twa0 " I get the expect return.

First, I have absolutely no idea if your card will work under unRAID, you'll have to explore that in a different thread.

 

The "smart" reports in the pre-clear script are not part of the clearing process itself.  If you are willing to run the smart report before the pre-clear, and then again after it, and then run "diff" on the two resulting reports, you can run preclear_disk.sh exactly as it is.  (It will not show the drive temperature, as that uses the "smartctl" output, but that will not stop the clearing process from working)

 

You would do something like this:

smartctl  -a  -d  3ware,0 /dev/twa0  >/tmp/preclear-twa0-smart.txt

preclear_disk.sh /dev/twa0

smartctl  -a  -d  3ware,0 /dev/twa0  >/tmp/postclear-twa0-smart.txt

diff  /tmp/preclear-twa0-smart.txt  /tmp/postclear-twa0-smart.txt

 

No change would be needed to the preclear_disk.sh script AS LONG AS THE FOLLOWING COMMANDS CAN WORK

fdisk -l /dev/twa0

fdisk needs to be able to list the geometry

sfdisk -g /dev/twa0

sfdisk needs to be able to list the geometry too

 

I guess I'm not 100% sure how to modify as the disks are numbered [0-4] in my case so I would run it

smartctl  -a  -d  3ware,N /dev/twa0 where N is the disk number and the script appears to run alphabetical.

 

Aaron

 

Again, can't help you there... it would seem it is more of a general Linux question, not an unRAID one. Others with the same 3ware card must have had to address the same issue of how to address the individual drives under Linux.  Whatever you do I'm pretty sure the disks need to be able to be configured as individual disks, and not a hardware card raid array to use it under unRAID.

 

Joe L.

Link to comment
  • 4 weeks later...

Hi Joe, I'm about to preclear three new 2GB Seagate drives and was wondering if I should invoke any of the new switches (or any switches for that matter)?  I seem to remember just invoking the script without any switches previously but it looks this is a new version from the last time I precleared any drives.  Anyway, just wanted to check before I started since it will take over a day!  Thanks again for such a great utility and your help in this forum in general...this place would be lost without you!  ;)

Link to comment

Hi Joe, I'm about to preclear three new 2GB Seagate drives and was wondering if I should invoke any of the new switches (or any switches for that matter)?  I seem to remember just invoking the script without any switches previously but it looks this is a new version from the last time I precleared any drives.  Anyway, just wanted to check before I started since it will take over a day!  Thanks again for such a great utility and your help in this forum in general...this place would be lost without you!  ;)

The only ones that might benefit you are these new parameters as described in this post:

http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

 

If you have a reasonable amount of memory, I'd try it with no parameters at all.    If you specify any of the new block size parameters to use smaller block sizes, or to read fewer blocks each time it performs a read, it will take longer, since it will make more reads/writes to the disks. 

 

If you are really adventurous, you can try the "-c 20" option.  If you do, you can be certain you will not have to wait a "day" for completion.  ;D

(but you might have to wait a few weeks ;))

 

Joe L.

Link to comment

What should I see as a 'reasonable' expectation of clearing time?  I just cleared a 250GB SATA drive in about 4 hours, and its now online.  Then I tried a 500GB WD SATA drive, and its been running for 2 1/2 hours and is only 2% done in the Zero'ing phase.  At this rate, it will take about a week to complete what I would consider to be a pretty small hard drive based on today's standards.

 

I can wait for it, I guess, but I have two 1.5TB drives arriving in a couple of days and I don't want to be waiting for months to put them online.

 

I think I read that someone had originally put together a way to clear these drives on a separate computer first?  I have a spare machine I can configure for this purpose if it would speed things up.  But I have a feeling I'm doing something wrong here.  Why would one drive clear in a reasonable amount of time, and the 2nd one take 10-20x as long to do the same thing?

 

The computer I'm using for this is a decent ASUS A8N machine with 2GB of RAM.  Its got 4 drives in it currently, with a 1.5TB parity drive in there.

 

Any hints, suggestions, or directions to help speed this process up would be greatly appreciated.

 

Myles

Link to comment

What should I see as a 'reasonable' expectation of clearing time?  I just cleared a 250GB SATA drive in about 4 hours, and its now online.  Then I tried a 500GB WD SATA drive, and its been running for 2 1/2 hours and is only 2% done in the Zero'ing phase.  At this rate, it will take about a week to complete what I would consider to be a pretty small hard drive based on today's standards.

That might indicate it has already performed the pre-read (about 1 third of the time) and has just started on the second third (the zeroing itself).  If it takes 2 1/2 hours, then the final post-read would be 2 1/2hours more.  That would sound about right.  Nearly 8 hours total.

I can wait for it, I guess, but I have two 1.5TB drives arriving in a couple of days and I don't want to be waiting for months to put them online.

It should not take months, although with larger 2TB drives it can take nearly a day.

 

I think I read that someone had originally put together a way to clear these drives on a separate computer first?  I have a spare machine I can configure for this purpose if it would speed things up.  But I have a feeling I'm doing something wrong here.  Why would one drive clear in a reasonable amount of time, and the 2nd one take 10-20x as long to do the same thing?

Probably because there is something wrong with the disk, or the cabling, or how it is configured.  Or, you are mis-interpreting the status display of the preclear_disk script (and I certainly would accept feedback on how to make the wording better)

The computer I'm using for this is a decent ASUS A8N machine with 2GB of RAM.  Its got 4 drives in it currently, with a 1.5TB parity drive in there.

 

Any hints, suggestions, or directions to help speed this process up would be greatly appreciated.

 

Myles

Post a syslog.  It is the only way to see the errors, if there are any.  Attach it to your next post.

 

Joe L.

Link to comment
Post a syslog.  It is the only way to see the errors, if there are any.  Attach it to your next post.

 

Just checked syslog and there's nothing there other than my Telnet logins to the box and the odd spinup/spindown from the kernels due to inactivity on the box. 

 

I'm going to try and transfer the SATA interface from the Mobo to a separate SuperMicro 8 SATA card I have in that box and see if that makes any difference.  Not seeing any errors, but there is definitely something weird about the speed with this drive.  The other drives didn't have anywhere near this sort of problem.

 

Myles

Link to comment

Quick followup... Looks like this was a dodgy drive.  Replaced it with a 1TB Hitachi SATA drive and its flying.  About 97mb/s.

 

However with that taken care of, do you know of any 'Live CD' implementation of preclear that will allow drives to be prep'd on a 2nd computer?  Although Preclear does a bang up job of prep'ing a disk, if I'm adding 2 or 3 drives at once, it would really help to be able to prep them on a 2nd computer and run this process in parallel.

 

Myles

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.