Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

I just re-ran the preclear script on all 4 drives and sdb and sdc were successfully precleared.  But, sdd and sde this time froze at 88%.  This is interesting because when I ran the test on sde earlier, it did complete successfully.  However, I will rerun the tests on sdd and sde as JimmyJoe suggested to see what my results are.

 

Can someone help me interpret the SMART results from the two discs that successfully precleared?  Here are my results:

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022   071   067   000    Old_age   Always

-       29 (Lifetime Min/Max 29/29)

---

> 190 Airflow_Temperature_Cel 0x0022   069   067   000    Old_age   Always

-       31 (Lifetime Min/Max 29/31)

78c78

< 201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always

-       0

---

> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always

-       0

============================================================================

 

============================================================================

==

== Disk /dev/sdc has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022   071   066   000    Old_age   Always

-       29 (Lifetime Min/Max 29/29)

---

> 190 Airflow_Temperature_Cel 0x0022   068   066   000    Old_age   Always

-       32 (Lifetime Min/Max 29/32)

78c78

< 201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always

-       0

---

> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always

-       0

============================================================================

 

I've attached my syslog in case anyone needs it.

 

Thanks.

Link to comment

Hi all,

 

I'm brand new to unraid.

 

I have just installed it, and did a 1-pass preclear on 4 drives.

 

I don't know what to make of it.

 

Are these things indicative of a bad drive?

11 Calibration_Retry_Count

200 Multi_Zone_Error_Rate 

 

Can someone have a look at it?

 

I have start and stop reports, were are the diff's saved ?

 

I am now starting a 20 pass preclear.

 

 

I find this in my syslog, what does DPO or FUA mean?

 

Jun 18 07:23:27 Tower kernel: sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 

Should/can I enable it somewhere?

 

Thanks,

Wim

 

Link to comment

Hi all,

 

I'm brand new to unraid.

 

I have just installed it, and did a 1-pass preclear on 4 drives.

 

I don't know what to make of it.

 

Are these things indicative of a bad drive?

11 Calibration_Retry_Count

200 Multi_Zone_Error_Rate 

Those are just names of internal registers tracked by the drive itself.  Both have "thresholds" used by the manufacturer, unless the threshold is exceeded, and the smart report says the drive failed for that reason, then neither is an issue.

 

Most important is relocated sectors, and sectors pending relocation.  Those are sectors that cannot be read from the disk.

Even then, there is a fair reservoir of un-allocated sectors on each drive (exact amount known only to the manufacturer. ) If all you have is a small number of relocated sectors, and the number remain un-changed over time, then odds are the disk will last a long time.  If the number keeps increasing, then it is time to RMA the drive.  I personally have a 250Gig drive that has 100 relocated sectors, but the number has never changed from when I first put it in the array.   

Can someone have a look at it?

Looks pretty decent

I have start and stop reports, were are the diff's saved ?

They are not... However, the start and stop files are in /tmp (as long as you did not reboot)

You can see the diff by typing

diff  /tmp/smart_startNNNN  /tmp/smart_finishNNNN

Where NNNN is the process ID of the preclear script run earlier...  You will want to run the diff on the start and finish of the same drive obviously...

Each pair of files will have their own unique number... You'll need to look in them to see the drive model/serial they were from.

I am now starting a 20 pass preclear.

Wow, I'm impressed... I think you are the first to report doing a full 20 cycles on a drive... It will take a while if you have large drives, but you will be pretty confident of them at the end.

 

I find this in my syslog, what does DPO or FUA mean?

 

Jun 18 07:23:27 Tower kernel: sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 

Should/can I enable it somewhere?

I think it is hardware related..   You might need to enable ACHI mode in your BIOS. instead of legacy mode. (according to one link I found on Google)   I don't think it is a huge issue, as it is an indication your disk controller does not support some SCSI emulation mode commands regarding its caching of data.

 

Joe L.

Link to comment
I find this in my syslog, what does DPO or FUA mean?

 

Jun 18 07:23:27 Tower kernel: sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I don't think I have ever seen a drive with support for those, so I too think it may be only in SCSI drives, or perhaps enterprise class drives.

 

I am now starting a 20 pass preclear.

To be honest, that seems like overkill to me.  In my own view, 2 or 3 passes is enough to thoroughly test a drive.  All you want to do is test it hard enough to force weak or fragile drives to fail now, rather than later when online with your data stored on them.  Beating them to death does not seem necessary.

Link to comment

Joe L., RobJ,

 

Thanks both of you for your help.

 

I'm doing 20 passes because I am not in a hurry to put my unraid server into commission. It could be that I am testing these drives 'to death', but rather now than when they are in use in the server  ;D.

 

I have saved the smart reports, so I can do a diff against the result of the 20-pass test, and the smart report before the first test.

 

There is one thing that is for me a little strange. I am testing 4 drives, all 4 are the same drives (make and model). They are all connected to MB SATA channels. All tests started within 2 minutes, and now 1 drive is lagging in the test compared to the other 3 drives. 3 drives are around step 10 in pass 4, the lagging drive is still around 80% of step 2, pass 4. It will be interesting to see how much more time this drive needs to complete the 20 pass test, compared to the other 3.

 

It takes almost 11 hours for one pass,  so in about 9 to 10 days I know if they passed the test.

 

Wim

Link to comment
  • 2 weeks later...

The script is running currently on my setup, pre-clearing a WD 1TB disk.

 

MY question: is that normal, that other disks are not spinning down in the meantime? All the disks are spinning except parity. And it script still only in pre-read phase.

Your other disks are spinning because they are being accessed by something...  But it is not the preclear script.

 

It does perform a test when first invoked to ensure the disk being cleared is not part of your array (not assigned to the array, and not mounted), but that is it.

 

If you really want to see what is accessed on your other disks, download and install inotifywait.

http://lime-technology.com/forum/index.php?topic=3759.msg33489#msg33489

 

Joe L.

Link to comment

The script is running currently on my setup, pre-clearing a WD 1TB disk.

 

MY question: is that normal, that other disks are not spinning down in the meantime? All the disks are spinning except parity. And it script still only in pre-read phase.

Your other disks are spinning because they are being accessed by something...  But it is not the preclear script.

 

It does perform a test when first invoked to ensure the disk being cleared is not part of your array (not assigned to the array, and not mounted), but that is it.

 

If you really want to see what is accessed on your other disks, download and install inotifywait.

http://lime-technology.com/forum/index.php?topic=3759.msg33489#msg33489

 

Joe L.

 

Hmmm.. Very strange. All I know is running at the moment is one desktop pc with one putty session(to run the preclear script on the unraid machine). But something is _continously_ reading from ALL disks, it can be monitored from unRAID read statistics.

 

I've installed inotifywait, and ran by "inotifywait -m -r -e access /mnt/disk*". Nothing detected so far up until now.

Link to comment

No, there is not any scheduled task on my setup. And as I mentioned, parity is spun down. :)

 

Which is interesting, that there are only small amount of reads from each disks. Maybe cache_dirs? But if I check the open files in unMENU, then there is no active find command...

Link to comment

No, there is not any scheduled task on my setup. And as I mentioned, parity is spun down. :)

 

Which is interesting, that there are only small amount of reads from each disks. Maybe cache_dirs? But if I check the open files in unMENU, then there is no active find command...

if you are running cache_dirs, then that is it.

 

The preclear script reads/writes the entire disk being pre-cleared in a way that is pretty much guaranteed to make the directory entries attempted to be cached by cache_dirs to eventually end up as the least recently accessed blocks and subsequently returned to the pool of blocks available to use as disk cache.

 

The cache_dirs script can only work if the rate at which it can access the directory entry "blocks" is more frequent than the rate at which you use other disk blocks in your array.

 

The preclear script is accessing the disk being cleared far faster than normal use when playing a movie, or scanning directories.  I can see how it can easily end up with the blocks on its disk as being cached and more recently accessed than any from the directory scans.  (remember, oldest/least-recently-used in the buffer cache are those that are re-used for current access needs)

 

The solution... cancel the cache_dirs... or live with the fact that it is doing its job, trying to keep your directory listings of the shares on your server as responsive as possible.

If you kill cache_dirs, in an hour or so, the other disks will spin down, and you will have to wait for directory listings until they spin up.

 

Joe L.

Link to comment

No, there is not any scheduled task on my setup. And as I mentioned, parity is spun down. :)

 

Which is interesting, that there are only small amount of reads from each disks. Maybe cache_dirs? But if I check the open files in unMENU, then there is no active find command...

if you are running cache_dirs, then that is it.

 

The preclear script reads/writes the entire disk being pre-cleared in a way that is pretty much guaranteed to make the directory entries attempted to be cached by cache_dirs to eventually end up as the least recently accessed blocks and subsequently returned to the pool of blocks available to use as disk cache.

 

The cache_dirs script can only work if the rate at which it can access the directory entry "blocks" is more frequent than the rate at which you use other disk blocks in your array.

 

The preclear script is accessing the disk being cleared far faster than normal use when playing a movie, or scanning directories.  I can see how it can easily end up with the blocks on its disk as being cached and more recently accessed than any from the directory scans.  (remember, oldest/least-recently-used in the buffer cache are those that are re-used for current access needs)

 

The solution... cancel the cache_dirs... or live with the fact that it is doing its job, trying to keep your directory listings of the shares on your server as responsive as possible.

If you kill cache_dirs, in an hour or so, the other disks will spin down, and you will have to wait for directory listings until they spin up.

 

Joe L.

 

Thank you for the comprehensive explanation Joe L. That's fully logical, except I thought cache_dirs only doing its job on the array and the cache drive. That's why I thought, that  a drive, which sits outside the array cannot affect it. But now I know I was wrong.

 

Thank you again for both great scripts!

Link to comment

No, there is not any scheduled task on my setup. And as I mentioned, parity is spun down. :)

 

Which is interesting, that there are only small amount of reads from each disks. Maybe cache_dirs? But if I check the open files in unMENU, then there is no active find command...

if you are running cache_dirs, then that is it.

 

The preclear script reads/writes the entire disk being pre-cleared in a way that is pretty much guaranteed to make the directory entries attempted to be cached by cache_dirs to eventually end up as the least recently accessed blocks and subsequently returned to the pool of blocks available to use as disk cache.

 

The cache_dirs script can only work if the rate at which it can access the directory entry "blocks" is more frequent than the rate at which you use other disk blocks in your array.

 

The preclear script is accessing the disk being cleared far faster than normal use when playing a movie, or scanning directories.  I can see how it can easily end up with the blocks on its disk as being cached and more recently accessed than any from the directory scans.  (remember, oldest/least-recently-used in the buffer cache are those that are re-used for current access needs)

 

The solution... cancel the cache_dirs... or live with the fact that it is doing its job, trying to keep your directory listings of the shares on your server as responsive as possible.

If you kill cache_dirs, in an hour or so, the other disks will spin down, and you will have to wait for directory listings until they spin up.

 

Joe L.

 

Thank you for the comprehensive explanation Joe L. That's fully logical, except I thought cache_dirs only doing its job on the array and the cache drive. That's why I thought, that  a drive, which sits outside the array cannot affect it. But now I know I was wrong.

 

Thank you again for both great scripts!

cache_dirs is only scanning the disks in the array and one the cache drive...your thinking was correct. However, the disk buffer cache is shared by all disks, regardless of their assignment to the array or not.  That same buffer cache is used by anything accessing the disks.

 

If the preclear script uses disk buffer blocks faster than the cache_dirs script can re-scan a given directory, the cache_dirs script must re-read the directory to get satisfied.

 

There is only one set of buffer memory for disk I/O, it is shared by everything.

 

Joe L.

Link to comment

No, there is not any scheduled task on my setup. And as I mentioned, parity is spun down. :)

 

Which is interesting, that there are only small amount of reads from each disks. Maybe cache_dirs? But if I check the open files in unMENU, then there is no active find command...

if you are running cache_dirs, then that is it.

 

The preclear script reads/writes the entire disk being pre-cleared in a way that is pretty much guaranteed to make the directory entries attempted to be cached by cache_dirs to eventually end up as the least recently accessed blocks and subsequently returned to the pool of blocks available to use as disk cache.

 

The cache_dirs script can only work if the rate at which it can access the directory entry "blocks" is more frequent than the rate at which you use other disk blocks in your array.

 

The preclear script is accessing the disk being cleared far faster than normal use when playing a movie, or scanning directories.  I can see how it can easily end up with the blocks on its disk as being cached and more recently accessed than any from the directory scans.  (remember, oldest/least-recently-used in the buffer cache are those that are re-used for current access needs)

 

The solution... cancel the cache_dirs... or live with the fact that it is doing its job, trying to keep your directory listings of the shares on your server as responsive as possible.

If you kill cache_dirs, in an hour or so, the other disks will spin down, and you will have to wait for directory listings until they spin up.

 

Joe L.

 

Thank you for the comprehensive explanation Joe L. That's fully logical, except I thought cache_dirs only doing its job on the array and the cache drive. That's why I thought, that  a drive, which sits outside the array cannot affect it. But now I know I was wrong.

 

Thank you again for both great scripts!

cache_dirs is only scanning the disks in the array and one the cache drive...your thinking was correct. However, the disk buffer cache is shared by all disks, regardless of their assignment to the array or not.   That same buffer cache is used by anything accessing the disks.

 

If the preclear script uses disk buffer blocks faster than the cache_dirs script can re-scan a given directory, the cache_dirs script must re-read the directory to get satisfied.

 

There is only one set of buffer memory for disk I/O, it is shared by everything.

 

Joe L.

 

Now I fully understand. Thank you very much Joe L.!

Link to comment
  • 2 weeks later...

I have a question: I started preclear_disk on a drive I wanted to add to my array.

Came back tonight expecting it to be finished, but it seems stuck.

Telnetscreen shows:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 88% complete.

(  888,330,240,000  of  1,000,204,886,016  bytes read )

Elapsed Time:  25:16:35

 

ps shows:

root@XMS-GMI-01:~# ps -ef | grep preclear

root    20752 27552 11 14:19 pts/0    00:44:03 /bin/bash ./preclear_disk.sh /dev/sdc

root    21116 21101  0 20:40 pts/1    00:00:00 grep preclear

root    27552 27244  0 Jul13 pts/0    00:01:06 /bin/bash ./preclear_disk.sh /dev/sdc

root@XMS-GMI-01:~#

 

Anything I can do except restating the whole from the beginning?

Unraidserver is alive, can read and write to it.

Thanks, Guzzi

 

Link to comment

I have a question: I started preclear_disk on a drive I wanted to add to my array.

Came back tonight expecting it to be finished, but it seems stuck.

Telnetscreen shows:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Post-Read in progress: 88% complete.

(  888,330,240,000  of  1,000,204,886,016  bytes read )

Elapsed Time:  25:16:35

 

ps shows:

root@XMS-GMI-01:~# ps -ef | grep preclear

root     20752 27552 11 14:19 pts/0    00:44:03 /bin/bash ./preclear_disk.sh /dev/sdc

root     21116 21101  0 20:40 pts/1    00:00:00 grep preclear

root     27552 27244  0 Jul13 pts/0    00:01:06 /bin/bash ./preclear_disk.sh /dev/sdc

root@XMS-GMI-01:~#

 

Anything I can do except restating the whole from the beginning?

Unraidserver is alive, can read and write to it.

Thanks, Guzzi

 

 

This seems to happen once in a while.  Most of the time if you start another pass on the drive it will finish as it should.

Link to comment

I have a question: I started preclear_disk on a drive I wanted to add to my array.

Came back tonight expecting it to be finished, but it seems stuck.

Telnetscreen shows:

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Post-Read in progress: 88% complete.

(  888,330,240,000  of  1,000,204,886,016  bytes read )

Elapsed Time:  25:16:35

 

ps shows:

root@XMS-GMI-01:~# ps -ef | grep preclear

root     20752 27552 11 14:19 pts/0    00:44:03 /bin/bash ./preclear_disk.sh /dev/sdc

root     21116 21101  0 20:40 pts/1    00:00:00 grep preclear

root     27552 27244  0 Jul13 pts/0    00:01:06 /bin/bash ./preclear_disk.sh /dev/sdc

root@XMS-GMI-01:~#

 

Anything I can do except restating the whole from the beginning?

Unraidserver is alive, can read and write to it.

Thanks, Guzzi

 

 

This seems to happen once in a while.  Most of the time if you start another pass on the drive it will finish as it should.

 

Hmmm, well ok, I cancelled the process and started it on another drive - same size (1 TB WD green) and it happens exactly the same - hangs at 88% complete of the post-read, same position (888.330.240.000 of .... bytes read).

Is this a problem with the WD-drives? 1st reading is ok, all steps of preclearing including writing zeroes is ok, only last pass ("post-read") hangs always at the same position. Any ideas?

Only thing I saw in the log was some errors at the very beginning - while preclear didnb't give me any messages or errors.

Beside the preclear hanging: Should I be worried about the logentries although I didn't get errors reported by preclear?

 

Log:

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: BMDMA2 stat 0x6d0009

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10: SError: { 10B8B BadCRC }

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: cmd 25/00:00:4f:3b:4f/00:04:4c:00:00/e0 tag 0 dma 524288 in

Jul 15 03:44:27 XMS-GMI-01 kernel: res 51/04:3f:10:3e:4f/00:01:4c:00:00/f0 Emask 0x1 (device error)

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: status: { DRDY ERR }

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: error: { ABRT }

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: configured for UDMA/100

Jul 15 03:44:27 XMS-GMI-01 kernel: ata10: EH complete

Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)

Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write Protect is off

Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Mode Sense: 00 3a 00 00

Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[...]

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: BMDMA2 stat 0x6d0009

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10: SError: { 10B8B BadCRC }

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: cmd 25/00:00:cf:86:33/00:04:1d:00:00/e0 tag 0 dma 524288 in

Jul 15 04:24:55 XMS-GMI-01 kernel: res 51/04:2f:a0:87:33/00:03:1d:00:00/f0 Emask 0x1 (device error)

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: status: { DRDY ERR }

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: error: { ABRT }

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: configured for UDMA/100

Jul 15 04:24:55 XMS-GMI-01 kernel: ata10: EH complete

Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)

Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write Protect is off

Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Mode Sense: 00 3a 00 00

Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Jul 15 05:21:20 XMS-GMI-01 emhttp: shcmd (103): /usr/sbin/hdparm -y /dev/sdm >/dev/null

Link to comment

...and it happens exactly the same - hangs at 88% complete of the post-read...

 

Exactly the same happens to me: the script hangs at 88% in the post-read.

 

So I went back and read ALL the posts in this thread.

 

A curious thing I found: at least 10 different people at different times reported a hang at 88% in the post-read!

 

Apparently these coincidences with the 'magic' number 88% have skipped Joe L.'s attention, as he gave different advices to those different people.

 

Now, if my case can be of any help...

I started the whole process again, and this time I monitored htop until the hang happened: All the way until the 88% there was a dd process in htop, and my cpu usage was nice and low. When it reached the 88% the dd process was gone from htop, and the preclear.sh script itself started using 99% cpu, and hung there forever.

 

Maybe Joe L. can look into this.

 

Yours,

Purko

 

 

 

 

Link to comment

The BadCRC error flag is usually associated with a poor cable, not the drive.  Try replacing/upgrading the cable to sdl on ata10.00.  The Devices tab or your syslog should help you determine which drive that is.

 

Thanks for the hint - argh, I hate those cables. I replaced all Satacables some time ago because of problems, maybe I reused some of the old ones since this is my 2nd unraid server...

tnx anyway, will have a look at this.

Link to comment

...and it happens exactly the same - hangs at 88% complete of the post-read...

 

Exactly the same happens to me: the script hangs at 88% in the post-read.

 

So I went back and read ALL the posts in this thread.

 

A curious thing I found: at least 10 different people at different times reported a hang at 88% in the post-read!

 

Apparently these coincidences with the 'magic' number 88% have skipped Joe L.'s attention, as he gave different advices to those different people.

<snip>

Maybe Joe L. can look into this.

 

Yours,

Purko

Time for me to re-read all the posts too.   From what is described, either the disk stop responding, and then the "dd" does not proceed.  I 've never had it happen here... or, it might be some kind of a "race" condition, which would only occur on some servers, and on some disks, where "dd reads" occur faster than expected, and their exit signals get missed...  It almost looks like you end up in a tight "while" loop using all CPU cycles, but I don't see how it would not be incrementing the progress display.

 

I'm working on one possible change, but I'm only guessing on what might be happening.    I'm currently testing it with a 250 Gig drive and I have two 1.5 TB drives I want to run through it too.  They take quite a while to clear on my server. 

 

As a bonus, the new version incorporates code added by "jbuszkie" to send periodic e-mail notifications of the progress of the clearing. (only works if you have a valid "mail" command installed)

 

In the interim, I have nothing to suggest other than to re-run the preclear.  (I've got a lot of reading to do) ;D

Edit: To fix the 88% problem download and use the new version of preclear_disk.sh (attached to the first post in this thread). 

The 88% freeze issue turned out to be a bug in "bash" when it invokes more than 4096 background processes it would end up in a infinite loop.

Depending on the drive geometry, some people would hit this bug.  Others with smaller disks, or different geometry would not. 

Joe L.

Link to comment
(only works if you have a valid "mail" command installed)

Aaaaaaaa!!!!!!!

 

HOW do I do that??

 

I've been searching these boards for the mail command in unraid...

 

(It's a newb speaking... I't will probably turn out to be something embarrasingly simple)

 

And how about a POP3 and a SMTP server in the unraid box?

 

---

Purko

Link to comment

I did 2 "preclears" on WD 1 TB drives - on two different servers. Both did hang at 88% - took approx. 25 hours (that's what makes it difficult to just "retest" ;-))

One board was 780G chipset, the other 690 - not sure if the drives were connected to onboard sata (there is some workaround in the kernel for those chipsets, ist't it?) or to sil3114.

Maybe this info helps?

cheers, Guzzi

PS: I did run it in telnet session .... and yes, it stopped updating the screen. Using latest Unraid beta.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.