Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

BRiT · December 29, 2009

You can run multiple preclear_discs on the server, either open multiple telnet/ssh terminals or on the physical console.

Joe L. · December 29, 2009

Post a syslog. It is the only way to see the errors, if there are any. Attach it to your next post.

Just checked syslog and there's nothing there other than my Telnet logins to the box and the odd spinup/spindown from the kernels due to inactivity on the box.

I'm going to try and transfer the SATA interface from the Mobo to a separate SuperMicro 8 SATA card I have in that box and see if that makes any difference. Not seeing any errors, but there is definitely something weird about the speed with this drive. The other drives didn't have anywhere near this sort of problem.

Myles

There may not be specific errors, but there will be lines showing how the disk itself was initialized by the disk controller. If it was initialized in PIO mode rather than a DMA mode it will be vastly slower. If you know what to look for, look for all entries specific to that drive. If you are not an expert in syslog analysis, post a full syslog.

Joe L.

aiden · January 16, 2010

Joe, as you know, I've been running 10 preclear cycles on two 2TB Hitachi drives. They are currently at 83% done on cycle 6 post-read @ 170 hrs. The problem is, my telnet windows seem to have stuck or something, because I am no longer getting refreshed on the progress. I checked the read / write columns in myMain under unMenu, and the numbers aren't changing. When I touch the drives it feels like they're still spinning, and unMenu seems to confirm this.

This is the past few days in the syslog...

Jan 14 02:21:08 Tower kernel: sdb: sdb1
Jan 14 02:21:19 Tower kernel: udev: starting version 130
Jan 14 02:48:36 Tower kernel: sda: sda1
Jan 14 02:48:47 Tower kernel: udev: starting version 130
Jan 15 07:17:50 Tower kernel: sdb: sdb1
Jan 15 07:18:00 Tower kernel: udev: starting version 130
Jan 15 07:49:58 Tower kernel: sda: sda1
Jan 15 07:50:09 Tower kernel: udev: starting version 130
Jan 16 09:15:13 Tower unmenu[1256]: gawk: ./08-unmenu-array_mgmt.awk:115: warning: escape sequence `\'' treated as plain `''

Joe L. · January 16, 2010

Joe, as you know, I've been running 10 preclear cycles on two 2TB Hitachi drives. They are currently at 83% done on cycle 6 post-read @ 170 hrs. The problem is, my telnet windows seem to have stuck or something, because I am no longer getting refreshed on the progress. I checked the read / write columns in myMain under unMenu, and the numbers aren't changing. When I touch the drives it feels like they're still spinning, and unMenu seems to confirm this.

This is the past few days in the syslog...
Jan 14 02:21:08 Tower kernel: sdb: sdb1
Jan 14 02:21:19 Tower kernel: udev: starting version 130
Jan 14 02:48:36 Tower kernel: sda: sda1
Jan 14 02:48:47 Tower kernel: udev: starting version 130
Jan 15 07:17:50 Tower kernel: sdb: sdb1
Jan 15 07:18:00 Tower kernel: udev: starting version 130
Jan 15 07:49:58 Tower kernel: sda: sda1
Jan 15 07:50:09 Tower kernel: udev: starting version 130
Jan 16 09:15:13 Tower unmenu[1256]: gawk: ./08-unmenu-array_mgmt.awk:115: warning: escape sequence `\'' treated as plain `''

No way to know without seeing a process list

ps -ef

the script ran into one bug in the "bash" shell where it could not deal with more than 4096 forks/waits of a sub-shell.

Are you using the fixed version? (it has been fixed for a while, but I have n idea how long ago you got your version)

It had the same effect, a freeze at a certain point in the clear process as a child process never was waited for properly.

aiden · January 16, 2010

I'm using the latest download from here. The modified date is 10/6/2009 9:15.

Process list:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jan08 ?        00:00:02 init
root         2     0  0 Jan08 ?        00:00:00 [kthreadd]
root         3     2  0 Jan08 ?        00:00:00 [migration/0]
root         4     2  0 Jan08 ?        00:00:00 [ksoftirqd/0]
root         5     2  0 Jan08 ?        00:00:00 [migration/1]
root         6     2  0 Jan08 ?        00:00:00 [ksoftirqd/1]
root         7     2  0 Jan08 ?        00:00:00 [events/0]
root         8     2  0 Jan08 ?        00:00:00 [events/1]
root         9     2  0 Jan08 ?        00:00:00 [khelper]
root        14     2  0 Jan08 ?        00:00:00 [async/mgr]
root       107     2  0 Jan08 ?        00:00:00 [kblockd/0]
root       108     2  0 Jan08 ?        00:00:00 [kblockd/1]
root       109     2  0 Jan08 ?        00:00:00 [kacpid]
root       110     2  0 Jan08 ?        00:00:00 [kacpi_notify]
root       111     2  0 Jan08 ?        00:00:00 [kacpi_hotplug]
root       187     2  0 Jan08 ?        00:00:00 [ata/0]
root       188     2  0 Jan08 ?        00:00:00 [ata/1]
root       189     2  0 Jan08 ?        00:00:00 [ata_aux]
root       193     2  0 Jan08 ?        00:00:00 [ksuspend_usbd]
root       198     2  0 Jan08 ?        00:00:00 [khubd]
root       201     2  0 Jan08 ?        00:00:00 [kseriod]
root       262     2  0 Jan08 ?        01:42:23 [pdflush]
root       263     2  0 Jan08 ?        01:39:06 [pdflush]
root       264     2  3 Jan08 ?        07:13:20 [kswapd0]
root       307     2  0 Jan08 ?        00:00:00 [aio/0]
root       308     2  0 Jan08 ?        00:00:00 [aio/1]
root       314     2  0 Jan08 ?        00:00:00 [nfsiod]
root       319     2  0 Jan08 ?        00:00:00 [cifsoplockd]
root       547     2  0 Jan08 ?        00:00:00 [usbhid_resumer]
root       553     2  0 Jan08 ?        00:00:00 [rpciod/0]
root       554     2  0 Jan08 ?        00:00:00 [rpciod/1]
root       725     2  0 Jan08 ?        00:00:00 [scsi_eh_0]
root       726     2  0 Jan08 ?        00:00:00 [usb-storage]
root       728     2  0 Jan08 ?        00:00:00 [scsi_eh_1]
root       729     2  0 Jan08 ?        00:00:00 [scsi_eh_2]
root      1051     1  0 Jan08 ?        00:00:00 /usr/sbin/syslogd -m0
root      1055     1  0 Jan08 ?        00:00:00 /usr/sbin/klogd -c 3 -x
root      1094     1  0 Jan08 ?        00:00:00 /usr/sbin/ifplugd -i eth0 -fwI -
bin       1102     1  0 Jan08 ?        00:00:00 /sbin/rpc.portmap
nobody    1106     1  0 Jan08 ?        00:00:00 /sbin/rpc.statd
root      1116     1  0 Jan08 ?        00:00:00 /usr/sbin/inetd
root      1126     1  0 Jan08 ?        00:00:00 /usr/sbin/acpid
root      1133     1  0 Jan08 ?        00:00:00 /usr/sbin/crond -l10
daemon    1135     1  0 Jan08 ?        00:00:00 /usr/sbin/atd -b 15 -l 1
root      1140     1  0 Jan08 ?        00:00:12 /usr/sbin/nmbd -D
root      1142     1  0 Jan08 ?        00:00:00 /usr/sbin/smbd -D
root      1144  1142  0 Jan08 ?        00:00:00 /usr/sbin/smbd -D
root      1150     1  0 Jan08 ?        00:00:00 /usr/local/sbin/emhttp
root      1155     1  0 Jan08 tty1     00:00:00 /sbin/agetty 38400 tty1 linux
root      1156     1  0 Jan08 tty2     00:00:00 /sbin/agetty 38400 tty2 linux
root      1158     1  0 Jan08 tty3     00:00:00 /sbin/agetty 38400 tty3 linux
root      1160     1  0 Jan08 tty4     00:00:00 /sbin/agetty 38400 tty4 linux
root      1167     2  0 Jan08 ?        00:00:00 [mdrecoveryd]
root      1174     1  0 Jan08 tty5     00:00:00 /sbin/agetty 38400 tty5 linux
root      1176     1  0 Jan08 tty6     00:00:00 /sbin/agetty 38400 tty6 linux
root      1235     1  0 Jan08 ?        00:00:00 /usr/sbin/ntpd -g -p /var/run/nt
root      1255     1  0 Jan08 ?        00:00:00 /bin/bash ./uu
root      1256     1  0 Jan08 ?        00:00:00 logger -tunmenu -plocal7.info -i
root      1257  1255  0 Jan08 ?        00:00:01 awk -W re-interval -f ./unmenu.a
root      6576     1  0 Jan15 ?        00:00:00 udevd --daemon
root     25154  1116  0 10:35 ?        00:00:00 in.telnetd: 192.168.10.199
root     25155 25154  0 10:35 pts/1    00:00:00 -bash
root     25166 25155  0 10:35 pts/1    00:00:00 ps -ef

Joe L. · January 16, 2010

I don't see it running at all in the process list.

aiden · January 16, 2010

Me either. It completely stopped running, on both drives, simultaneously. Is there another log that I can look at to see what happened? Should I just restart with the remaining number of cycles?

Joe L. · January 16, 2010

Me either. It completely stopped running, on both drives, simultaneously. Is there another log that I can look at to see what happened? Should I just restart with the remaining number of cycles?

You can look in your syslog to see if the kernel killed the processes for any reason (it needed the memory, and the "bash shell" was using it all)

The starting and ending "smart" reports are in the /tmp directory, named after their process IDs.

The end report from the process is just a "diff /tmp/smart_startNNNN /tmp/smart_finishNNNN" of the two files.

I'd just start it again, for the remaining cycles.

aiden · January 16, 2010

Syslog didn't reveal anything. I'll just start the final 4 cycles. Thanks.

Joe L. · January 16, 2010

Only other thing I can think of is if you logged off the terminal running the pre-clear sessions. They would terminate themselves.

Joe L.

purko · January 16, 2010

Aiden, are you running the preclear_disk.sh from the console or from telnet?

If it's telnet, have you considered runing it on top of screen ?

aiden · January 16, 2010

I'm running the cycles via 2 telnet sessions from a laptop that I leave on 24/7. Never logged off the server, never shut it down, and the telnet windows have been moved, minimized, maximized, etc without any issues thus far. It's not a big issue to me, I can easily restart the remaining cycles. I was more curious if it was something more fundamental, like the system just got bored.

Joe L. · January 16, 2010

I was more curious if it was something more fundamental, like the system just got bored.

Probably not bored... perhaps tired, and may be eager to start using the new disks... but not bored...

In the interim, at least you will have given them a good initial burn in.

aiden · January 16, 2010

In the interim, at least you will have given them a good initial burn in.

Do you think it's enough burn in? 5 cycles and 170 hours?

prostuff1 · January 16, 2010

In the interim, at least you will have given them a good initial burn in.

Do you think it's enough burn in? 5 cycles and 170 hours?

Yes, I think most people here only do a single cycle which can take a day almost with the newer drives. I usually run 3 cycles on mine. I run the first, get the diff, then start another pass to see if anything has changed, then I do one more cycle to make sure that nothing is going to change.

aiden · January 16, 2010

Thanks prostuff.

garycase · February 20, 2010

Okay, this looks like a nifty utility, but I want to be CERTAIN I'm doing this right so I don't destroy data already on the array.

(If I understand it correctly, the script won't let me do that -- but just to be sure ...)

So to use it, I do this:

(1) Copy the script to the Flash drive (from Windows Explorer)

(2) Run a Telnet client and type "o tower", then "root" to get a prompt from the UnRAID box

(3) cd /boot to get to the flash drive

(4) preclear_disk.sh /dev/sdX to start the preclear process

Assuming that's all correct, I have a few questions ...

(a) How do I determine what "X" is for line 4? Is there a Linux command that will list my drives with their serial numbers?

(b) Do I have to leave the Telnet window open for the entire PreClear process?

© Is there any analog to this process to test drives already in the array? In particular, what does the "Long SMART Test" do in UnMenu on the Disk Management section? I've installed UnMenu to look around, but am not sure I understand all of the various options. Is there a good "manual" to read through to help with this?

purko · February 20, 2010

(a) How do I determine what "X" is for line 4? Is there a Linux command that will list my drives with their serial numbers?

ls -la /dev/disk/by-id/

(b) Do I have to leave the Telnet window open for the entire PreClear process?

Yes.

If you want to be able to disconnect from the session, then run the preclear script in "screen". See this:

http://lime-technology.com/forum/index.php?topic=2817.msg24827#msg24827

garycase · February 21, 2010

Thanks -- although I realized after I'd asked the question that I can simply look in UnMenu on the Disk Management page and see the Linux designations for each of my disks

r.e. my question #3 ==> are the tests shown in UnMenu (i.e. the Short & Long SMART tests) "data safe" ?? i.e. can they be run without any concern for the data on the array? And do they impact the availability of the array (i.e. can you be streaming a movie while the test is in progress) ??

Joe L. · February 21, 2010

Thanks -- although I realized after I'd asked the question that I can simply look in UnMenu on the Disk Management page and see the Linux designations for each of my disks

r.e. my question #3 ==> are the tests shown in UnMenu (i.e. the Short & Long SMART tests) "data safe" ??

yes. They are read-only tests of the drives

i.e. can they be run without any concern for the data on the array?

Yes, they can be run at any time.

And do they impact the availability of the array (i.e. can you be streaming a movie while the test is in progress) ??

You can watch a movie at the same time. You should disable the spin-down timer, since if unRAID forces a drive to spin down it will abort a long test.

You will probably want to use the newest "Disk-management plug-in version" I think 1.4 is the newest. They are attached here: http://lime-technology.com/forum/index.php?topic=4993.msg46057#msg46057

garycase · February 26, 2010

I've experimented with a few disks and have noted that the post-read takes appreciably longer than the pre-read. Is this normal? For example, with a 500GB disk it took ~ 2 hrs for the pre-read; 1.5 hrs for zeroing; and over 3 hrs for the post-read. The SMART data was fine, and no errors were reported. I had similar results on an 80GB drive I tried (not going to actually use that -- but wanted something to compare with).

Joe L. · February 26, 2010

I've experimented with a few disks and have noted that the post-read takes appreciably longer than the pre-read. Is this normal? For example, with a 500GB disk it took ~ 2 hrs for the pre-read; 1.5 hrs for zeroing; and over 3 hrs for the post-read. The SMART data was fine, and no errors were reported. I had similar results on an 80GB drive I tried (not going to actually use that -- but wanted something to compare with).

Yes, it is expected, since during the post read we are verifying that the bytes read back are all zero. That verification step is not needed (or performed) on the pre-read since the contents could be anything.

The verification step was added after one user found a drive that, when read, gave occasional "1" bits set where zeros were written. Not frequently, but enough to drive him crazy with parity errors since each time a check was done an occasional bit here and there would not be correct and it would "fix" parity to match. Of course, it was actually reading back bad data and making parity match the bad data.

Joe L.

samukas · March 6, 2010

Hey everyone! I have at the moment 4 discs on my array and I just added 4 more (Samsung green 1.5TB disks) last week to grow the array. After the disks were installed, I ran simultaneously preclear_disk on those disks and I got errors on every disk. I thought that was very strange, so I re-ran the 1st disk with pre-clear to see what would come up. I took 2 pictures of the errors I got on the 2nd run (I don't have any of the first run, sorry, but there were more errors does preclear keep a log file somewhere?)

Are these errors that I should be concerned about or can I add the drives to the array?

What I thought was strange was that each and every one of the new disks had errors... This didn't happened to me with the first 4 disks (those are WD GReen).

prostuff1 · March 6, 2010

No, those look fine. The ones you need to be concerned about are the reports that come back and have a lot of reallocated sectors and current pending sectors.

samukas · March 11, 2010

Ok, so I did what you suggested,prostuff1. I had already re-ran preclear on that drive that I took the printscreen with, so I did it again with the other 3 drives. 2 of them also gave those errors, but I just ignored them.

The last one, however, isn't "pre-clearing"... I've already repeated the process 3 times but it just gets to 11% of the 1st step and then it doesn't do anything else, it just stays there... Time increases, but it just goes on and on without reading the drive... Any ideas?

Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Joe L.

sureguy

sureguy

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation