Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Post a syslog.  It is the only way to see the errors, if there are any.  Attach it to your next post.

 

Just checked syslog and there's nothing there other than my Telnet logins to the box and the odd spinup/spindown from the kernels due to inactivity on the box. 

 

I'm going to try and transfer the SATA interface from the Mobo to a separate SuperMicro 8 SATA card I have in that box and see if that makes any difference.  Not seeing any errors, but there is definitely something weird about the speed with this drive.  The other drives didn't have anywhere near this sort of problem.

 

Myles

There may not be specific errors, but there will be lines showing how the disk itself was initialized by the disk controller.  If it was initialized in PIO mode rather than a DMA mode it will be vastly slower.  If you know what to look for, look for all entries specific to that drive.  If you are not an expert in syslog analysis, post a full syslog.

 

Joe L.

Link to comment
  • 3 weeks later...

Joe, as you know, I've been running 10 preclear cycles on two 2TB Hitachi drives.  They are currently at 83% done on cycle 6 post-read @ 170 hrs.  The problem is, my telnet windows seem to have stuck or something, because I am no longer getting refreshed on the progress.  I checked the read / write columns in myMain under unMenu, and the numbers aren't changing.  When I touch the drives it feels like they're still spinning, and unMenu seems to confirm this.

 

This is the past few days in the syslog...

Jan 14 02:21:08 Tower kernel: sdb: sdb1
Jan 14 02:21:19 Tower kernel: udev: starting version 130
Jan 14 02:48:36 Tower kernel: sda: sda1
Jan 14 02:48:47 Tower kernel: udev: starting version 130
Jan 15 07:17:50 Tower kernel: sdb: sdb1
Jan 15 07:18:00 Tower kernel: udev: starting version 130
Jan 15 07:49:58 Tower kernel: sda: sda1
Jan 15 07:50:09 Tower kernel: udev: starting version 130
Jan 16 09:15:13 Tower unmenu[1256]: gawk: ./08-unmenu-array_mgmt.awk:115: warning: escape sequence `\'' treated as plain `''

Link to comment

Joe, as you know, I've been running 10 preclear cycles on two 2TB Hitachi drives.  They are currently at 83% done on cycle 6 post-read @ 170 hrs.  The problem is, my telnet windows seem to have stuck or something, because I am no longer getting refreshed on the progress.  I checked the read / write columns in myMain under unMenu, and the numbers aren't changing.  When I touch the drives it feels like they're still spinning, and unMenu seems to confirm this.

 

This is the past few days in the syslog...

Jan 14 02:21:08 Tower kernel: sdb: sdb1
Jan 14 02:21:19 Tower kernel: udev: starting version 130
Jan 14 02:48:36 Tower kernel: sda: sda1
Jan 14 02:48:47 Tower kernel: udev: starting version 130
Jan 15 07:17:50 Tower kernel: sdb: sdb1
Jan 15 07:18:00 Tower kernel: udev: starting version 130
Jan 15 07:49:58 Tower kernel: sda: sda1
Jan 15 07:50:09 Tower kernel: udev: starting version 130
Jan 16 09:15:13 Tower unmenu[1256]: gawk: ./08-unmenu-array_mgmt.awk:115: warning: escape sequence `\'' treated as plain `''

No way to know without seeing a process list

ps -ef

 

the script ran into one bug in the "bash" shell where it could not deal with more than 4096 forks/waits of a sub-shell.

Are you using the fixed version?  (it has been fixed for a while, but I have n idea how long ago you got your version)

It had the same effect, a freeze at a certain point in the clear process as a child process never was waited for properly.

Link to comment

I'm using the latest download from here.  The modified date is 10/6/2009 9:15.

 

Process list:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jan08 ?        00:00:02 init
root         2     0  0 Jan08 ?        00:00:00 [kthreadd]
root         3     2  0 Jan08 ?        00:00:00 [migration/0]
root         4     2  0 Jan08 ?        00:00:00 [ksoftirqd/0]
root         5     2  0 Jan08 ?        00:00:00 [migration/1]
root         6     2  0 Jan08 ?        00:00:00 [ksoftirqd/1]
root         7     2  0 Jan08 ?        00:00:00 [events/0]
root         8     2  0 Jan08 ?        00:00:00 [events/1]
root         9     2  0 Jan08 ?        00:00:00 [khelper]
root        14     2  0 Jan08 ?        00:00:00 [async/mgr]
root       107     2  0 Jan08 ?        00:00:00 [kblockd/0]
root       108     2  0 Jan08 ?        00:00:00 [kblockd/1]
root       109     2  0 Jan08 ?        00:00:00 [kacpid]
root       110     2  0 Jan08 ?        00:00:00 [kacpi_notify]
root       111     2  0 Jan08 ?        00:00:00 [kacpi_hotplug]
root       187     2  0 Jan08 ?        00:00:00 [ata/0]
root       188     2  0 Jan08 ?        00:00:00 [ata/1]
root       189     2  0 Jan08 ?        00:00:00 [ata_aux]
root       193     2  0 Jan08 ?        00:00:00 [ksuspend_usbd]
root       198     2  0 Jan08 ?        00:00:00 [khubd]
root       201     2  0 Jan08 ?        00:00:00 [kseriod]
root       262     2  0 Jan08 ?        01:42:23 [pdflush]
root       263     2  0 Jan08 ?        01:39:06 [pdflush]
root       264     2  3 Jan08 ?        07:13:20 [kswapd0]
root       307     2  0 Jan08 ?        00:00:00 [aio/0]
root       308     2  0 Jan08 ?        00:00:00 [aio/1]
root       314     2  0 Jan08 ?        00:00:00 [nfsiod]
root       319     2  0 Jan08 ?        00:00:00 [cifsoplockd]
root       547     2  0 Jan08 ?        00:00:00 [usbhid_resumer]
root       553     2  0 Jan08 ?        00:00:00 [rpciod/0]
root       554     2  0 Jan08 ?        00:00:00 [rpciod/1]
root       725     2  0 Jan08 ?        00:00:00 [scsi_eh_0]
root       726     2  0 Jan08 ?        00:00:00 [usb-storage]
root       728     2  0 Jan08 ?        00:00:00 [scsi_eh_1]
root       729     2  0 Jan08 ?        00:00:00 [scsi_eh_2]
root      1051     1  0 Jan08 ?        00:00:00 /usr/sbin/syslogd -m0
root      1055     1  0 Jan08 ?        00:00:00 /usr/sbin/klogd -c 3 -x
root      1094     1  0 Jan08 ?        00:00:00 /usr/sbin/ifplugd -i eth0 -fwI -
bin       1102     1  0 Jan08 ?        00:00:00 /sbin/rpc.portmap
nobody    1106     1  0 Jan08 ?        00:00:00 /sbin/rpc.statd
root      1116     1  0 Jan08 ?        00:00:00 /usr/sbin/inetd
root      1126     1  0 Jan08 ?        00:00:00 /usr/sbin/acpid
root      1133     1  0 Jan08 ?        00:00:00 /usr/sbin/crond -l10
daemon    1135     1  0 Jan08 ?        00:00:00 /usr/sbin/atd -b 15 -l 1
root      1140     1  0 Jan08 ?        00:00:12 /usr/sbin/nmbd -D
root      1142     1  0 Jan08 ?        00:00:00 /usr/sbin/smbd -D
root      1144  1142  0 Jan08 ?        00:00:00 /usr/sbin/smbd -D
root      1150     1  0 Jan08 ?        00:00:00 /usr/local/sbin/emhttp
root      1155     1  0 Jan08 tty1     00:00:00 /sbin/agetty 38400 tty1 linux
root      1156     1  0 Jan08 tty2     00:00:00 /sbin/agetty 38400 tty2 linux
root      1158     1  0 Jan08 tty3     00:00:00 /sbin/agetty 38400 tty3 linux
root      1160     1  0 Jan08 tty4     00:00:00 /sbin/agetty 38400 tty4 linux
root      1167     2  0 Jan08 ?        00:00:00 [mdrecoveryd]
root      1174     1  0 Jan08 tty5     00:00:00 /sbin/agetty 38400 tty5 linux
root      1176     1  0 Jan08 tty6     00:00:00 /sbin/agetty 38400 tty6 linux
root      1235     1  0 Jan08 ?        00:00:00 /usr/sbin/ntpd -g -p /var/run/nt
root      1255     1  0 Jan08 ?        00:00:00 /bin/bash ./uu
root      1256     1  0 Jan08 ?        00:00:00 logger -tunmenu -plocal7.info -i
root      1257  1255  0 Jan08 ?        00:00:01 awk -W re-interval -f ./unmenu.a
root      6576     1  0 Jan15 ?        00:00:00 udevd --daemon
root     25154  1116  0 10:35 ?        00:00:00 in.telnetd: 192.168.10.199
root     25155 25154  0 10:35 pts/1    00:00:00 -bash
root     25166 25155  0 10:35 pts/1    00:00:00 ps -ef

Link to comment

Me either.  It completely stopped running, on both drives, simultaneously.  ???  Is there another log that I can look at to see what happened?  Should I just restart with the remaining number of cycles?

You can look in your syslog to see if the kernel killed the processes for any reason (it needed the memory, and the "bash shell" was using it all)

 

The starting and ending "smart" reports are in the /tmp directory, named after their process IDs.

The end report from the process is just a "diff /tmp/smart_startNNNN /tmp/smart_finishNNNN" of the two files.

 

I'd just start it again, for the remaining cycles.

Link to comment

I'm running the cycles via 2 telnet sessions from a laptop that I leave on 24/7.  Never logged off the server, never shut it down, and the telnet windows have been moved, minimized, maximized, etc without any issues thus far.  It's not a big issue to me, I can easily restart the remaining cycles.  I was more curious if it was something more fundamental, like the system just got bored.  :P

Link to comment

In the interim, at least you will have given them a good initial burn in.

Do you think it's enough burn in?  5 cycles and 170 hours?

Yes, I think most people here only do a single cycle which can take a day almost with the newer drives.  I usually run 3 cycles on mine.  I run the first, get the diff, then start another pass to see if anything has changed, then I do one more cycle to make sure that nothing is going to change.

Link to comment
  • 1 month later...

Okay, this looks like a nifty utility, but I want to be CERTAIN I'm doing this right so I don't destroy data already on the array.

(If I understand it correctly, the script won't let me do that -- but just to be sure ...)

 

So to use it, I do this:

 

(1)  Copy the script to the Flash drive (from Windows Explorer)

(2)  Run a Telnet client and type "o tower", then "root"  to get a prompt from the UnRAID box

(3)  cd /boot    to get to the flash drive

(4)  preclear_disk.sh /dev/sdX    to start the preclear process

 

Assuming that's all correct, I have a few questions ...

 

(a)  How do I determine what "X" is for line 4?    Is there a Linux command that will list my drives with their serial numbers?

 

(b)  Do I have to leave the Telnet window open for the entire PreClear process?

 

©  Is there any analog to this process to test drives already in the array?    In particular, what does the "Long SMART Test" do in UnMenu on the Disk Management section?  I've installed UnMenu to look around, but am not sure I understand all of the various options.    Is there a good "manual" to read through to help with this?

 

Link to comment

(a)  How do I determine what "X" is for line 4?    Is there a Linux command that will list my drives with their serial numbers?

ls -la /dev/disk/by-id/

 

(b)  Do I have to leave the Telnet window open for the entire PreClear process?

Yes. 

If you want to be able to disconnect from the session, then run the preclear script in "screen". See this:

http://lime-technology.com/forum/index.php?topic=2817.msg24827#msg24827

 

Link to comment

Thanks -- although I realized after I'd asked the question that I can simply look in UnMenu on the Disk Management page and see the Linux designations for each of my disks  :)

 

r.e. my question #3  ==>  are the tests shown in UnMenu (i.e. the Short & Long SMART tests) "data safe" ??    i.e. can they be run without any concern for the data on the array?    And do they impact the availability of the array (i.e. can you be streaming a movie while the test is in progress) ??

 

Link to comment

Thanks -- although I realized after I'd asked the question that I can simply look in UnMenu on the Disk Management page and see the Linux designations for each of my disks  :)

 

r.e. my question #3  ==>  are the tests shown in UnMenu (i.e. the Short & Long SMART tests) "data safe" ?? 

yes.  They are read-only tests of the drives
  i.e. can they be run without any concern for the data on the array?
Yes, they can be run at any time.
     And do they impact the availability of the array (i.e. can you be streaming a movie while the test is in progress) ??

 

You can watch a movie at the same time.  You should disable the spin-down timer, since if unRAID forces a drive to spin down it will abort a long test.

 

You will probably want to use the newest "Disk-management plug-in version"  I think 1.4 is the newest.  They are attached here: http://lime-technology.com/forum/index.php?topic=4993.msg46057#msg46057

Link to comment

I've experimented with a few disks and have noted that the post-read takes appreciably longer than the pre-read.    Is this normal?    For example, with a 500GB disk it took ~ 2 hrs for the pre-read;  1.5 hrs for zeroing; and over 3 hrs for the post-read.    The SMART data was fine, and no errors were reported.    I had similar results on an 80GB drive I tried (not going to actually use that -- but wanted something to compare with).

 

 

Link to comment

I've experimented with a few disks and have noted that the post-read takes appreciably longer than the pre-read.    Is this normal?    For example, with a 500GB disk it took ~ 2 hrs for the pre-read;  1.5 hrs for zeroing; and over 3 hrs for the post-read.    The SMART data was fine, and no errors were reported.    I had similar results on an 80GB drive I tried (not going to actually use that -- but wanted something to compare with).

Yes, it is expected, since during the post read we are verifying that the bytes read back are all zero.  That verification step is not needed (or performed) on the pre-read since the contents could be anything.

 

The verification step was added after one user found a drive that, when read, gave occasional "1" bits set where zeros were written.  Not frequently, but enough to drive him crazy with parity errors since each time a check was done an occasional bit here and there would not be correct and it would "fix" parity to match.  Of course, it was actually reading back bad data and making parity match the bad data.

 

Joe L.

Link to comment

Hey everyone! I have at the moment 4 discs on my array and I just added 4 more (Samsung green 1.5TB disks) last week to grow the array. After the disks were installed, I ran simultaneously preclear_disk on those disks and I got errors on every disk. I thought that was very strange, so I re-ran the 1st disk with pre-clear to see what would come up. I took 2 pictures of the errors I got on the 2nd run (I don't have any of the first run, sorry, but there were more errors :/ does preclear keep a log file somewhere?)

 

img0413zq.th.jpg

img0415m.th.jpg

 

Are these errors that I should be concerned about or can I add the drives to the array?

What I thought was strange was that each and every one of the new disks had errors... This didn't happened to me with the first 4 disks (those are WD GReen).

Link to comment

Ok, so I did what you suggested,prostuff1. I had already re-ran preclear on that drive that I took the printscreen with, so I did it again with the other 3 drives. 2 of them also gave those errors, but I just ignored them.

The last one, however, isn't "pre-clearing"... I've already repeated the process 3 times but it just gets to 11% of the 1st step and then it doesn't do anything else, it just stays there... Time increases, but it just goes on and on without reading the drive... Any ideas? :/

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.