Joe L. Posted December 18, 2008 Share Posted December 18, 2008 I uploaded a new version of the script, slightly modified to not abort the pre/post-read loop early on a read-failure. Please download again if you are using an earlier version. Other than that small difference, it is exactly the same. Joe L. Quote Link to comment
jimwhite Posted December 19, 2008 Share Posted December 19, 2008 It sure might be a candidate for return, but they might not take it if their utility does not indicate it is over their failure "threshold" I've returned several drives to Seagate with the advance replacement option and never had a return questioned, some in this same category, where Seatools passed the drive. Quote Link to comment
abq-pete Posted December 19, 2008 Share Posted December 19, 2008 Joe, Thanks for taking the time to respond. I ran the original script once more and it seems to have gotten farther: =========================================================================== = unRAID server Pre-Clear disk /dev/sda = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Post-Read in progress: 99% complete. ( 1,497,000,960,000 of 1,500,301,910,016 bytes read ) Elapsed Time: 9:33:36 =========================================================================== = unRAID server Pre-Clear disk /dev/sda = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 9:34:27 ============================================================================ == == Disk /dev/sda has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x000f 103 099 006 Pre-fail Always - 42345792 --- > 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 37597902 57,58c57,58 < 5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 187 < 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 320916 --- > 5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 191 > 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 333592 62c62 < 187 Reported_Uncorrect 0x0032 076 076 000 Old_age Always - 24 --- > 187 Reported_Uncorrect 0x0032 070 070 000 Old_age Always - 30 64,68c64,68 < 189 High_Fly_Writes 0x003a 075 075 000 Old_age Always - 25 < 190 Airflow_Temperature_Cel 0x0022 081 067 045 Old_age Always - 19 (Lifetime Min/Max 18/25) < 195 Hardware_ECC_Recovered 0x001a 037 037 000 Old_age Always < 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 4 < 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 4 --- > 189 High_Fly_Writes 0x003a 048 048 000 Old_age Always - 52 > 190 Airflow_Temperature_Cel 0x0022 078 067 045 Old_age Always - 22 (Lifetime Min/Max 18/25) > 195 Hardware_ECC_Recovered 0x001a 051 037 000 Old_age Always > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 72c72 < ATA Error Count: 24 (device log contains only the most recent five errors) --- > ATA Error Count: 30 (device log contains only the most recent five errors) 87c87 < Error 24 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours) --- > Error 30 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours) 98,102c98,102 < 60 00 00 ff ff ff 4f 00 06:55:09.851 READ FPDMA QUEUED < 27 00 00 00 00 00 e0 02 06:55:09.831 READ NATIVE MAX ADDRESS EXT < ec 00 00 00 00 00 a0 02 06:55:09.811 IDENTIFY DEVICE < ef 03 46 00 00 00 a0 02 06:55:09.791 SET FEATURES [set transfer mode] < 27 00 00 00 00 00 e0 02 06:55:09.771 READ NATIVE MAX ADDRESS EXT --- > 60 00 00 ff ff ff 4f 00 18:39:43.313 READ FPDMA QUEUED > 27 00 00 00 00 00 e0 02 18:39:43.293 READ NATIVE MAX ADDRESS EXT > ec 00 00 00 00 00 a0 02 18:39:43.273 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 02 18:39:43.253 SET FEATURES [set transfer mode] > 27 00 00 00 00 00 e0 02 18:39:43.233 READ NATIVE MAX ADDRESS EXT 104c104 < Error 23 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours) --- > Error 29 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours) 115,119c115,119 < 60 00 00 ff ff ff 4f 00 06:55:06.474 READ FPDMA QUEUED < 27 00 00 00 00 00 e0 02 06:55:06.454 READ NATIVE MAX ADDRESS EXT < ec 00 00 00 00 00 a0 02 06:55:06.434 IDENTIFY DEVICE < ef 03 46 00 00 00 a0 02 06:55:06.414 SET FEATURES [set transfer mode] < 27 00 00 00 00 00 e0 02 06:55:06.394 READ NATIVE MAX ADDRESS EXT --- > 60 00 00 ff ff ff 4f 00 18:39:39.826 READ FPDMA QUEUED > 27 00 00 00 00 00 e0 02 18:39:39.806 READ NATIVE MAX ADDRESS EXT > ec 00 00 00 00 00 a0 02 18:39:39.786 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 02 18:39:39.766 SET FEATURES [set transfer mode] > 27 00 00 00 00 00 e0 02 18:39:39.746 READ NATIVE MAX ADDRESS EXT 121c121 < Error 22 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours) --- > Error 28 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours) 132,136c132,136 < 60 00 00 ff ff ff 4f 00 06:55:02.987 READ FPDMA QUEUED < 27 00 00 00 00 00 e0 02 06:55:02.967 READ NATIVE MAX ADDRESS EXT < ec 00 00 00 00 00 a0 02 06:55:02.947 IDENTIFY DEVICE < ef 03 46 00 00 00 a0 02 06:55:02.927 SET FEATURES [set transfer mode] < 27 00 00 00 00 00 e0 02 06:55:02.907 READ NATIVE MAX ADDRESS EXT --- > 60 00 00 ff ff ff 4f 00 18:39:36.419 READ FPDMA QUEUED > 27 00 00 00 00 00 e0 02 18:39:36.399 READ NATIVE MAX ADDRESS EXT > ec 00 00 00 00 00 a0 02 18:39:36.379 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 02 18:39:36.359 SET FEATURES [set transfer mode] > 27 00 00 00 00 00 e0 02 18:39:36.339 READ NATIVE MAX ADDRESS EXT 138c138 < Error 21 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours) --- > Error 27 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours) 149,153c149,153 < 60 00 00 ff ff ff 4f 00 06:54:59.692 READ FPDMA QUEUED < 60 00 00 ff ff ff 4f 00 06:54:59.690 READ FPDMA QUEUED < 27 00 00 00 00 00 e0 02 06:54:59.670 READ NATIVE MAX ADDRESS EXT < ec 00 00 00 00 00 a0 02 06:54:59.650 IDENTIFY DEVICE < ef 03 46 00 00 00 a0 02 06:54:59.630 SET FEATURES [set transfer mode] --- > 60 00 00 ff ff ff 4f 00 18:39:33.033 READ FPDMA QUEUED > 60 00 00 ff ff ff 4f 00 18:39:33.032 READ FPDMA QUEUED > 27 00 00 00 00 00 e0 02 18:39:33.012 READ NATIVE MAX ADDRESS EXT > ec 00 00 00 00 00 a0 02 18:39:32.992 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 02 18:39:32.972 SET FEATURES [set transfer mode] 155c155 < Error 20 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours) --- > Error 26 occurred at disk power-on lifetime: 19 hours (0 days + 19 hours) 166,170c166,170 < 60 00 00 ff ff ff 4f 00 06:54:56.314 READ FPDMA QUEUED < 60 00 00 ff ff ff 4f 00 06:54:56.313 READ FPDMA QUEUED < 27 00 00 00 00 00 e0 02 06:54:56.293 READ NATIVE MAX ADDRESS EXT < ec 00 00 00 00 00 a0 02 06:54:56.273 IDENTIFY DEVICE < ef 03 46 00 00 00 a0 02 06:54:56.253 SET FEATURES [set transfer mode] --- > 60 00 00 ff ff ff 4f 00 18:39:29.676 READ FPDMA QUEUED > 60 00 00 ff ff ff 4f 00 18:39:29.675 READ FPDMA QUEUED > 27 00 00 00 00 00 e0 02 18:39:29.655 READ NATIVE MAX ADDRESS EXT > ec 00 00 00 00 00 a0 02 18:39:29.635 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 02 18:39:29.615 SET FEATURES [set transfer mode] ============================================================================ root@Tower:/boot# But with the increasing number of S.M.A.R.T. errors, I will go ahead and send this back. Jimwhite, I purposely go this from Amazon for their no hassle return policy. I will request an RMA and replacement drive (new). In my experience, replacement drives have been refurbished and I have not had such good luck with them. Has your experience differed? Regards, Peter Quote Link to comment
Joe L. Posted December 19, 2008 Share Posted December 19, 2008 Joe, Thanks for taking the time to respond. I ran the original script once more and it seems to have gotten farther: <snip> But with the increasing number of S.M.A.R.T. errors, I will go ahead and send this back. I noticed an increasing number of reallocated sectors, and still more pending reallocation. I would RMA the drive... It is marginal at best, and a prime candidate for problems as it gets older. Since errors continue to occur there is no way to know if they will ever stop. If you have not removed the diisk yet from your array, please download and use the newer version of the preclear_disk.sh script on it. Your disk is a perfect test case for me. (you will help me, and others who follow) Joe L. Quote Link to comment
abq-pete Posted December 19, 2008 Share Posted December 19, 2008 Joe, No problem. I was just in the process of shipping it back but will take it out of the box and run it through again overnight with the new script and report back. On a side note, dealing with Amazon is very gratifying. They issued a free return label via UPS and are sending me new drive via Fedex overnight. Regards, Peter Quote Link to comment
jimwhite Posted December 20, 2008 Share Posted December 20, 2008 In my experience, replacement drives have been refurbished and I have not had such good luck with them. Has your experience differed? Well, I've only had to request a second RMA due to a recon drive once, and it was DOA! Quote Link to comment
abq-pete Posted December 20, 2008 Share Posted December 20, 2008 Joe, Here are the latest results: =========================================================================== = unRAID server Pre-Clear disk /dev/sda = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Post-Read in progress: 99% complete. ( 1,500,299,297,280 of 1,500,301,910,016 bytes read ) Elapsed Time: 12:26:13 =========================================================================== = unRAID server Pre-Clear disk /dev/sda = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 12:26:13 ============================================================================ == == Disk /dev/sda has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 37608068 --- > 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 35710728 58c58 < 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 387419 --- > 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 404283 64,66c64,66 < 189 High_Fly_Writes 0x003a 048 048 000 Old_age Always - 52 < 190 Airflow_Temperature_Cel 0x0022 079 067 045 Old_age Always - 21 (Lifetime Min/Max 17/21) < 195 Hardware_ECC_Recovered 0x001a 037 037 000 Old_age Always --- > 189 High_Fly_Writes 0x003a 006 006 000 Old_age Always - 94 > 190 Airflow_Temperature_Cel 0x0022 079 067 045 Old_age Always - 21 (Lifetime Min/Max 17/23) > 195 Hardware_ECC_Recovered 0x001a 051 037 000 Old_age Always ============================================================================ I'm not sure but the results seem to be settling down a bit? The individual errors no longer appear. Does that mean they no longer exist or that they are not being displayed? Should I run anything else while I still have the drive? I will ship it off on monday morning... Regards, Peter Quote Link to comment
Joe L. Posted December 20, 2008 Share Posted December 20, 2008 I'm not sure but the results seem to be settling down a bit? The individual errors no longer appear. Does that mean they no longer exist or that they are not being displayed? Should I run anything else while I still have the drive? I will ship it off on monday morning... Regards, Peter You are correct... the number of relocated sectors did not change from the last pre-clear cycle. You can be sure, the bad sectors are still there, the preclear script just does not show them when it displays the "differences" between the SMART report it takes at the start of the cycle, and the SMART report it takes at the end of the pre-clear. In your previous test to this, at the beginning there were 187 reallocated sectors, and 4 sectors pending reallocation. At the end, there were 191 reallocated sectors, and 0 pending re-allocation. It re-affirms what the script was designed to do, to identify the un-readable sectors during its "read" phases, and allowing the SMART software to re-map them during its writing of zeros to the drive. Although it might have found all the currently un-readable sectors, the "High-Fly-Writes are still incrementing. It my understanding they are not a sign of a healthy drive. If you want to see the full SMART reports, they are also saved in the syslog... I feel pretty good in that the script has proven its worth, both in allowing you to add a drive to your array with minimal down-time, and as in this case, allowing you to burn-in a drive before you add it to your array and get it replaced if it shows signs of an early failure. In my array I've got a very old 250Gig drive, way out of warranty that has 100 Sectors reallocated. That number has not changed since I started running it through pre-clear cycles, so I'm guessing it will last until I eventually replace it with a larger disk. I don't have the luxury of returning it for replacement. Send the drive in for replacement... It is not one you want to start out with in your array. Thanks for running it through another test cycle for me. Joe L. Quote Link to comment
vdtruong Posted December 21, 2008 Share Posted December 21, 2008 =========================================================================== = unRAID server Pre-Clear disk /dev/sdd = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Post-Read in progress: 19% complete. ( 49,351,680,000 of 250,059,350,016 bytes read ) Elapsed Time: 2:52:47 This is for a 3 year old WD 250gig SATA2. A 500 gig is currently running as well. Quote Link to comment
abq-pete Posted December 21, 2008 Share Posted December 21, 2008 Send the drive in for replacement... It is not one you want to start out with in your array. Thanks for running it through another test cycle for me. Joe L. Drive went back today. Replacement should be here Monday. Hopefully I will fare better with the new one. I have a few worries though. I wonder if this is typical for this new series of drives. As you can see from reviews and other user reports, there have been quite a few problems encountered. I am currently using one as my parity drive. Though its labeling indicates that it is not in the affected batch with firmware problems, I need to check it out anyway. I hope the new drive coming on Monday will test fine. Assuming that it does, I will replace the parity drive with the new one and then run your tool on the older one to verify its health. On a different note, what resources does this tool consume the most? I ask because while I was running the test, I noticed that the performance of the array was visibly slower. Copying files to the cache drive as well as moving from cache to the array seem to have a drop in performance. I think reading from the array was affected as well. I am running a Celeron 2.0gHz (single core) with 3gb of ram. All drives are on either the on board SATA connections or Adaptec PCI Express SATA cards. What might I do to restore the performance? Thanks again for developing this script. I sleep better knowing that drives that pass the test are indeed more reliable. Regards, Peter Quote Link to comment
Joe L. Posted December 21, 2008 Share Posted December 21, 2008 Send the drive in for replacement... It is not one you want to start out with in your array. Thanks for running it through another test cycle for me. Joe L. Drive went back today. Replacement should be here Monday. Hopefully I will fare better with the new one. I have a few worries though. I wonder if this is typical for this new series of drives. As you can see from reviews and other user reports, there have been quite a few problems encountered. I am currently using one as my parity drive. Though its labeling indicates that it is not in the affected batch with firmware problems, I need to check it out anyway. I hope the new drive coming on Monday will test fine. Assuming that it does, I will replace the parity drive with the new one and then run your tool on the older one to verify its health. That sounds like a very good plan. On a different note, what resources does this tool consume the most? I ask because while I was running the test, I noticed that the performance of the array was visibly slower. Copying files to the cache drive as well as moving from cache to the array seem to have a drop in performance. I think reading from the array was affected as well. I am running a Celeron 2.0gHz (single core) with 3gb of ram. All drives are on either the on board SATA connections or Adaptec PCI Express SATA cards. What might I do to restore the performance? While it is running it uses bandwidth to and from the disks on whatever BUS they are on. It basically is sitting waiting on the disk to respond most of the time. Other than that, because it is reading and writing so much it used memory from the buffer cache it makes less memory available to other processes. Now, as soon as it stops running those blocks of memory in the buffer cache will be re-used by Linux once they become the "least-recently-used" blocks. (the first therefore to be re-used) To restore performance, stop running the script... it is as simple as that. It uses no resources when not running. Think of it this way, a high bit-rate HD movie might have a bit rate of 35MB/s. This script is reading and writing to disks at twice that speed. (75MB/s or so) It really puts a strain on the disk... probably even more than a parity check (on a single disk) In fact, part of the "read" routine is to perform 5 read requests in parallel...moving the disk head randomly all over the disk. That situation is much more difficult on the disk hardware than simply playing a linear set of blocks of a movie. (but then I am trying to determine marginal hardware issues...before you add the disk to the array) Thanks again for developing this script. I sleep better knowing that drives that pass the test are indeed more reliable. Regards, Peter You are welcome. Passing this test does not guaranty the disk will not crash the following week, it could happen. It is less likely to crash, at least in my mind, if it does pass. I initially wrote this routiine to add drives more quickly to the array and minimize down time. (amazing how dependent upon the server my wife has become... it is *way* more convenient to watch all the holiday movies) If not pre-cleared, I'd be facing 4 or more hours downtime as a disk is cleared. The ability to use it the same routine to burn-in a drive was a natural addition. Joe L. Quote Link to comment
WeeboTech Posted December 22, 2008 Share Posted December 22, 2008 There's a command called IONICE which might help in controlling priority of the DD's. http://linux.die.net/man/1/ionice In my rsyncmv scripts I do /usr/bin/nice -19 /usr/bin/ionice -c3 rsync -avP --bwlimit=${BWLIMIT:=6400} --remove-sent-files "$@" ${DIR} This has the effect of lowering the priorty of the rsync to a level where it does not affect my rtorrent or any streaming of the system. Quote Link to comment
Tom2000 Posted January 2, 2009 Share Posted January 2, 2009 Hi Joel, Is there any procedure to run the script with a brand new drive? As to mount the drive first or any other thing I need to do from the unRAID management WEB interface? I have just receive my new 1TB drive and want to use your script to break-in the drive before I put it into my system as the parity drive. Currently I have one 1 TB drive and a few 500 G drives in my system. Since it is supposed to be a faster drive, I want it to be my parity drive. Now I put the hard drive in an enclosure and connect to e-SATA port on my unRAID server. Here is my current drive status: root@Tower:/boot/download# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) usbfs on /proc/bus/usb type usbfs (rw) /dev/sdg1 on /boot type vfat (rw,umask=066,shortname=mixed) /dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime) /dev/md1 on /mnt/disk1 type reiserfs (rw,noatime,nodiratime) /dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime) /dev/md6 on /mnt/disk6 type reiserfs (rw,noatime,nodiratime) /dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime) /dev/md2 on /mnt/disk2 type reiserfs (rw,noatime,nodiratime) shfs on /mnt/user type fuse.shfs (rw,nosuid,nodev) Thanks, --Tom Quote Link to comment
Joe L. Posted January 2, 2009 Share Posted January 2, 2009 Hi Joel, Is there any procedure to run the script with a brand new drive? As to mount the drive first or any other thing I need to do from the unRAID management WEB interface? Thanks, --Tom Nope, no special procedure.. in fact if the drive is mounted in any way the pre-clear script will refuse to run on it. You do need to know its proper device name, and that is it. Before you pre-clear it you just need to physically connect it to your server, but DO NOT assign it to your array. The pre-clear will allow you to burn in a drive. I'd try it for 1 cycle first, then for a few more. Before you do, make sure the "smartctl" program is functional on your version of unRAID. (A library needed for it to work is missing on the 4.4 and 4.5beat versions, but can easily be added once it is downloaded) A 1TB drive might take between 6 and 10 hours to pre-read/clear/post-read for 1 cycle, depending on the speed of your array. The smartctl program is not needed to burn-in a drive, but it is the only way to know if the burn-in turns up any issues. Joe L. Quote Link to comment
Tom2000 Posted January 2, 2009 Share Posted January 2, 2009 Hi Joe, Thanks for your prompt reply. How do I know the correct device name? --Tom Quote Link to comment
Joe L. Posted January 2, 2009 Share Posted January 2, 2009 Hi Joe, Thanks for your prompt reply. How do I know the correct device name? --Tom At the command prompt type: ls -l /dev/disk/by-id A listing of all your drives will appear looking like the listing below. The "device" is at the very end of the line. A device that is brand new is likely to only have one entry matching its model/serial number. A disk that has been partitioned will have an additional entry per partition in the listing with a trailing "1" on its device name. You want to use the base device name, not the name of the first partition or subsequent partition. If the last field on the line matching your model/serial number is ../../sdb then your device is /dev/sdb (partial listing of my array follows) root@Tower:~# ls -l /dev/disk/by-id total 0 lrwxrwxrwx 1 root root 9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD -> ../../hdb lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG3V5LD-part1 -> ../../hdb1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D -> ../../hdd lrwxrwxrwx 1 root root 10 Dec 31 12:10 ata-HDS725050KLAT80_KRVA03ZAG4V99D-part1 -> ../../hdd1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983 -> ../../sde lrwxrwxrwx 1 root root 10 Dec 31 12:10 scsi-SATA_WDC_WD10EACS-00_WD-WCAU44206983-part1 -> ../../sde1 lrwxrwxrwx 1 root root 9 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0 -> ../../sda lrwxrwxrwx 1 root root 10 Dec 31 12:10 usb-SanDisk_Corporation_MobileMate_200445269218B56190D7-0:0-part1 -> ../../sda1 To confirm, type: smartctl -i -d ata /dev/sdb It should print the drive size, model, and serial number that matches your new drive as below... root@Tower:~# smartctl -i -d ata /dev/sdb smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3750640AS Serial Number: 5QD2ZR29 Firmware Version: 3.AAE User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 2 15:00:51 2009 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled Joe L. Quote Link to comment
Tom2000 Posted January 2, 2009 Share Posted January 2, 2009 Hi Joe, Those few commands are exactly what I needed. Thanks so much for the clear and prompt reply. Now my pre-clear process is about a few percentage into the process. Once it is done and hopefully no error (may be more than 10 hours), I will replace the current parity drive with this drive. Happy New Year! --Tom Quote Link to comment
devastator Posted January 3, 2009 Share Posted January 3, 2009 The script just halted at 88 % postread on my 2 D10EADS-00L5B1 drives. What should I do / where do I have to look for ? Greets, Deva EDIT: smartctl shows this: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. was: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Also, nothing funky returned: root@tower:/tmp# smartctl -a /dev/sda smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD10EADS-00L5B1 Serial Number: WD-WCAU42804491 Firmware Version: 01.01A01 User Capacity: 1,000,203,804,160 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Jan 3 23:08:00 2009 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 166 165 021 Pre-fail Always - 6666 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 21 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 37 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 18 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 21 194 Temperature_Celsius 0x0022 126 117 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
abq-pete Posted January 4, 2009 Share Posted January 4, 2009 This cannot be a coincidence. I just received a batch of WD10EACS (you have the EADS version with 32mb cache). The first drive I pre-cleared finished but there was not SMART report. Rather than enjoy the 13 hour ordeal again. I just skipped it and tried another drive. This one stopped at the 88% mark and hung... I am running v4.4 on my trusty Asus P5B-VM DO. Regards, Peter Quote Link to comment
abq-pete Posted January 4, 2009 Share Posted January 4, 2009 Hi Joe, Thanks for your prompt reply. How do I know the correct device name? --Tom Another way to do it is to stop your array. Go to the devices page and pretend like you are going to add a new drive to the array. Your new drives name is visible in the drop down box. Just don't add the drive and go back to the main page and start the array again. Regards, Peter Quote Link to comment
Joe L. Posted January 4, 2009 Share Posted January 4, 2009 Are you both using the newer version of the pre-clear script, or the older original one? The older script exited the pre/post read when the "dd" command failed to read the expected number of blocks. I originally thought that would only happen when you reached the end of the disk and it tried to read disk blocks past the end. Apparently, a "read" error is occurring, before the end of the disk is reached. On the original script this exits the read phase early. When I was told this was happening I re-wrote that logic and had the loop continue reading until the end., even if errors occurred. (We really want the errors to identify bad blocks, and bad hardware) The newer version of preclear_disk.sh should never exit the read phase early. Joe L. Quote Link to comment
abq-pete Posted January 4, 2009 Share Posted January 4, 2009 Joe, I am using the last one we tested. It worked fine on the Seagate 1.5TB units. But I am testing WD GP 1TB and seeing a bit of weirdness. I used the -n switch on the last run to get the drive cleared without all the additional testing and that worked well. I've two more drives that I can try again. Although I am not sure if I can get to it in the next couple of days (off to CES!). By the way, do you happen to know if erasing the MBR will make the drive appear fresh to unRAID? If I take a drive that was used as a data drive in one unRAID system, wipe the MBR and install it in another unRAID system, that should cause the new system to start a clearing process, right? Finally, do drives need to be pre-cleared in the system that they will be used in or can they be cleared in one machine and used in another? Thanks and regards, Peter Quote Link to comment
devastator Posted January 4, 2009 Share Posted January 4, 2009 I'm using the one attached to the first post in this thread. Quote Link to comment
Joe L. Posted January 4, 2009 Share Posted January 4, 2009 Joe, I am using the last one we tested. It worked fine on the Seagate 1.5TB units. But I am testing WD GP 1TB and seeing a bit of weirdness. I used the -n switch on the last run to get the drive cleared without all the additional testing and that worked well. I've two more drives that I can try again. Although I am not sure if I can get to it in the next couple of days (off to CES!). Cool... have fun. By the way, do you happen to know if erasing the MBR will make the drive appear fresh to unRAID? If I take a drive that was used as a data drive in one unRAID system, wipe the MBR and install it in another unRAID system, that should cause the new system to start a clearing process, right? It sure will make it start the clearing process. There are only three situations where the second array would not clear the drive. 1. No parity drive is defined 2. The disk has a valid reiserfs that starts at cylinder 63 and extends to the end of the disk as partition 1 and no other partitions. (The normal way unRAID creates a data disk) 3. The disk has a special "pre-clear" signature in the first 512 bytes on the first sector on the disk. (This differs based on disk size and geometry, and it the whole purpose of the preclear script) If you want to erase just the MBR you can use the -n option and type "Control-C" once the zeroing of the bulk of the drive starts. That should be enough to make it look unformatted to a new unRAID server. Finally, do drives need to be pre-cleared in the system that they will be used in or can they be cleared in one machine and used in another? Thanks and regards, Peter They can be pre-cleared in any system. The script does look for the unRAID specific files and folders to ensure you do not shoot yourself in your foot, but it will probably even work on a non-unraid linux box. It might give an error or two to stdout as it will not be able to open /boot/config/disks.cfg or run the "mdcmd status" command, but it should work. It will still make sure the disk is not mounted. Joe L. Quote Link to comment
bill_in_socal Posted January 5, 2009 Share Posted January 5, 2009 Here's a really dumb question: If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script. But, do I have to leave the Putty session open for 10+ hours in order to view the status? Is there any way to start preclear, disconnect from my box, connect again and view the preclear status? I bought myself 2 WD Green drives that I am going to preclear. Thanks for any info. Edit: Never mind - Answer found in original post "You will either need to kick this preclear_disk.sh script off from the system console, or from a telnet session. You must leave the session open as it runs. (and it will typically run for hours)" Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.