Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

WDC WD30EZRS drive fails preclear 1.13 invoked with -n on both v5b12a and v5b14 on two different unraid servers.  It completes all 10 steps fine, but at the very end if says drive (dev/sdf) fails preclear and drops from the list of drives that can be precleared.  No preclear report is saved and syslog explodes to 200 mb with all kinds of errors with sdf (the drive being precleared).  (Truncated version with first 10000 lines attached below)

 

I have precleared dozens of 2tb and 3tb drives on 4.7 and v5 without incident.

 

Any idea what is going wrong?  Restarting the server will bring the drive back online and allow preclear to start again.

 

root@Tower1:/boot# preclear_disk.sh -l
====================================1.13
Disks not assigned to the unRAID array
  (potential candidates for clearing)
========================================
No un-assigned disks detected

 

Restarting the server and testing for preclear status shows

 

 

Serial Number:    WD-WCAWZ2017532
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes

Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/sdg doesn't contain a valid partition table
########################################################################
failed test 1
failed test 2 00000 00000 00000 00000
failed test 3 00000 00000 00000 00000
failed test 5
failed test 6
========================================================================1.13
==
== Disk /dev/sdg is NOT precleared
== 0 0 4294967295
============================================================================

Link to comment

Lots of errors in communicating with the disk in the syslog.  Many are CRC errors (bad checksums in communications with the disk)

Mar 25 22:49:45 Tower1 emhttp: get_config_idx: fopen /boot/config/shares/Pix2012.cfg: No such file or directory - assigning defaults
Mar 25 22:49:45 Tower1 emhttp: Restart SMB...
Mar 25 22:49:45 Tower1 emhttp: shcmd (46): killall -HUP smbd
Mar 25 22:49:45 Tower1 emhttp: shcmd (47): ps axc | grep -q rpc.mountd
Mar 25 22:49:45 Tower1 emhttp: _shcmd: shcmd (47): exit status: 1
Mar 25 22:49:45 Tower1 emhttp: Start NFS...
Mar 25 22:49:45 Tower1 emhttp: shcmd (48): /etc/rc.d/rc.nfsd start |& logger
Mar 25 22:49:45 Tower1 logger: Starting NFS server daemons:
Mar 25 22:49:45 Tower1 logger:   /usr/sbin/exportfs -r
Mar 25 22:49:45 Tower1 logger:   /usr/sbin/rpc.nfsd 8
Mar 25 22:49:45 Tower1 logger:   /usr/sbin/rpc.mountd
Mar 25 22:49:45 Tower1 mountd[2091]: Kernel does not have pseudo root support.
Mar 25 22:49:45 Tower1 mountd[2091]: NFS v4 mounts will be disabled unless fsid=0
Mar 25 22:49:45 Tower1 mountd[2091]: is specfied in /etc/exports file.
Mar 25 22:49:45 Tower1 emhttp: shcmd (49): /usr/local/sbin/emhttp_event svcs_restarted
Mar 25 22:49:45 Tower1 emhttp_event: svcs_restarted
Mar 25 22:49:47 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6
Mar 25 22:49:47 Tower1 kernel: ata7.00: irq_stat 0x08000000
Mar 25 22:49:47 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC }
Mar 25 22:49:47 Tower1 kernel: ata7.00: failed command: READ DMA
Mar 25 22:49:47 Tower1 kernel: ata7.00: cmd c8/00:08:47:ae:00/00:00:00:00:00/e0 tag 0 dma 4096 in
Mar 25 22:49:47 Tower1 kernel:          res 50/00:00:46:ae:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Mar 25 22:49:47 Tower1 kernel: ata7.00: status: { DRDY }
Mar 25 22:49:47 Tower1 kernel: ata7: hard resetting link
Mar 25 22:49:48 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 25 22:49:53 Tower1 kernel: ata7.00: qc timeout (cmd 0xec)
Mar 25 22:49:53 Tower1 kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Mar 25 22:49:53 Tower1 kernel: ata7.00: revalidation failed (errno=-5)
Mar 25 22:49:53 Tower1 kernel: ata7: hard resetting link
Mar 25 22:49:53 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 25 22:49:53 Tower1 kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x100)
Mar 25 22:49:53 Tower1 kernel: ata7.00: revalidation failed (errno=-5)
Mar 25 22:49:58 Tower1 kernel: ata7: hard resetting link
Mar 25 22:49:59 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 25 22:49:59 Tower1 kernel: ata7.00: configured for UDMA/33
Mar 25 22:49:59 Tower1 kernel: ata7: EH complete
Mar 25 22:49:59 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6
Mar 25 22:49:59 Tower1 kernel: ata7.00: irq_stat 0x08000000
Mar 25 22:49:59 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC }
Mar 25 22:49:59 Tower1 kernel: ata7.00: failed command: READ DMA
Mar 25 22:49:59 Tower1 kernel: ata7.00: cmd c8/00:08:47:ae:00/00:00:00:00:00/e0 tag 0 dma 4096 in
Mar 25 22:49:59 Tower1 kernel:          res 50/00:42:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
Mar 25 22:49:59 Tower1 kernel: ata7.00: status: { DRDY }
Mar 25 22:49:59 Tower1 kernel: ata7: hard resetting link
Mar 25 22:49:59 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 25 22:49:59 Tower1 kernel: ata7.00: configured for UDMA/33
Mar 25 22:49:59 Tower1 kernel: ata7: EH complete
Mar 25 22:50:40 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6
Mar 25 22:50:40 Tower1 kernel: ata7.00: irq_stat 0x08000000
Mar 25 22:50:40 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC }
Mar 25 22:50:40 Tower1 kernel: ata7.00: failed command: IDENTIFY DEVICE
Mar 25 22:50:40 Tower1 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Mar 25 22:50:40 Tower1 kernel:          res 50/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)

 

Basically, the disk is ceasing to communicate with the disk controller somewhere in the process of reading and writing to the disk. 

A power cycle seems to get it to re-initialize and communicate once more.

The 10 steps in the preclear are being performed, but when the verification is performed, the expected values are not there.  (because the disk stopped responding to commands somewhere in the middle)

 

It could be a disk controller port issue, or a cable issue (noise on the cable causes CRC errors) or just a bad disk drive.  It could even be a marginal power supply for that drive.    The process of elimination is tedious, but I'd try a different PC first.  If it fails there, it was the disk.

 

Joe L.

Link to comment

It failed on two different unraid servers and failed at least twice on each.  (different cables were tested)

 

DOA is guess.....

I guess that isolates the issue to the only common hardware involved.  (the disk itself) 

Not DOA, but Zombie.  (keeps coming back from the dead)

 

I would not trust my data on it.  Not unless you want to keep power cycling the server to get to a file.

Link to comment

Hello all!  i have just copied data off three of my drives and hope to migrate them into my server (then copy more data onto them, move more drives etc)

 

here are the 3 preclear reports.  Note that all the drives `passed`as per the email reports.

 

see any serious problems which would prevent migrating the drives in

 

Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail.  hoping these aren`t serious problems!!!

 

 

preclear_finish_+WD-WCAU41253328_2012-03-25.txt

preclear_finish_+6XW04QZD_2012-03-27.txt

preclear_finish_+WD-WCAVY2471414_2012-03-27.txt

Link to comment

DOA I guess.....

Not DOA, but Zombie.  (keeps coming back from the dead)

 

Here is the detail of the last full test I ran (not -n).  Three strikes and its out.  (I hate RMAing stuff)

 

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Verifying if the MBR is cleared.              DONE
=
Elapsed Time:  16:36:09
========================================================================1.13
==
== SORRY: Disk /dev/sdg MBR could NOT be precleared
==
== out4= 00000
== out5= 00000
============================================================================
0+0 records in
0+0 records out
0000000
0 bytes (0 B) copied, 2.2213e-05 s, 0.0 kB/s
root@Tower1:/boot#
root@Tower1:/boot#

Link to comment

Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail.  hoping these aren`t serious problems!!!

Those are the categories those attributes belong to.  Not failure unless they also say FAILING_NOW on the same line.

 

As an example, run-time-hours would be an old_age indicator of a disk.  Un-correctable-disk-read-errors will be in a category of pre-failure.    High run-time hours does not indicate the drive will fail, just that it is getting older.    Un-correctable errors can occur at any age.  A large number, or increasing numbers of them might indicate a pending failure (once the disk runs out of spare sectors to re-allocate in place of the un-readable ones)

 

You just need to compare the normalized value with the failure threshold for any given attribute.  That will tell you of the drive's health.

Link to comment

Hello all!  i have just copied data off three of my drives and hope to migrate them into my server (then copy more data onto them, move more drives etc)

 

here are the 3 preclear reports.  Note that all the drives `passed`as per the email reports.

 

see any serious problems which would prevent migrating the drives in

 

Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail.  hoping these aren`t serious problems!!!

The third disk shows 38 sectors pending re-allocation.  There would normally be none as the writing of zeros should have re-allocated all the sectors.

 

There were no sectors re-allocated, so I'd suspect the 38 un-readable sectors were discovered in the post-read phase.  (that is not good)

 

I'd run another pre-clear on that disk.  If it continues to show sectors pending re-allocation, I'd not trust it.

Link to comment

What if my syslog weight more than 192k? Use a external host?

 

 

Just paste as quote?

zip it, (they zip really well) or, use ext host as you described.  For pre-clear results I really do not need to see the entire syslog.  You can attach only the pre-clear reports as found in /boot/preclear_reports

 

Joe L.

Link to comment

Hi guys,

 

What might be wrong here?

 

Mar 29 22:23:14 Tower kernel: end_request: I/O error, dev sde, sector 1301043712 (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related)

Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors)

Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors)

Mar 29 22:23:14 Tower kernel: sd 7:0:0:0: [sde] Result: hostbyte=0x00 driverbyte=0x08 (System)

Mar 29 22:23:14 Tower kernel: sd 7:0:0:0: [sde] Sense Key : 0xb [current] [descriptor] (Drive related)

Mar 29 22:23:14 Tower kernel: Descriptor sense data with sense descriptors (in hex):

Mar 29 22:23:14 Tower kernel:        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Mar 29 22:23:14 Tower kernel:        00 00 00 00

 

I just installed a brand new AOC-SASLP-MV8 card and tried to preclear a drive outside the array, that I previously had in my windows machine.

Preclear version used 1.13.

 

Thanks in advance!

Link to comment

What if my syslog weight more than 192k? Use a external host?

 

 

Just paste as quote?

zip it, (they zip really well) or, use ext host as you described.  For pre-clear results I really do not need to see the entire syslog.  You can attach only the pre-clear reports as found in /boot/preclear_reports

 

Joe L.

 

Ok, got the report on both drives.

 

thx in advance

preclear_rpt__WD-WCAWZ1679024_2012-03-29.txt

preclear_rpt__WD-WCAWZ1714217_2012-03-29.txt

Link to comment

What if my syslog weight more than 192k? Use a external host?

 

 

Just paste as quote?

zip it, (they zip really well) or, use ext host as you described.  For pre-clear results I really do not need to see the entire syslog.  You can attach only the pre-clear reports as found in /boot/preclear_reports

 

Joe L.

 

Ok, got the report on both drives.

 

thx in advance

both look fine.
Link to comment

What if my syslog weight more than 192k? Use a external host?

 

 

Just paste as quote?

zip it, (they zip really well) or, use ext host as you described.  For pre-clear results I really do not need to see the entire syslog.  You can attach only the pre-clear reports as found in /boot/preclear_reports

 

Joe L.

 

Ok, got the report on both drives.

 

thx in advance

both look fine.

 

Thx Joe

Link to comment

I'm not 100% sure on my results, first time I precleared, setting up my first array, if someone could give me the thumbs up?

 

I attached both finish, and the other reports.

 

These were brand new drives. EARX 2TB.

preclear_finish__WD-WMAZA5542794_2012-03-30.txt

preclear_finish__WD-WMAZA5710612_2012-03-30.txt

preclear_rpt__WD-WMAZA5542794_2012-03-30.txt

preclear_rpt__WD-WMAZA5710612_2012-03-30.txt

Link to comment

I'm currently in process of setting up my first unRAID server and just finished a 1 cycle preclear of my first three drives (all of which are Seagate 2TB ST2000DL003 5900RPM drives).

I would greatly appreciate your feedback on the integrity of these drives Joe.

 

 

Information that may be relevant:

 

- preclear version 1.13

- Due to these being AF drives I invoked the preclear script as such "./preclear_disk.sh -A /dev/sdX"

- Mobo - Foxconn A88GMV (http://www.newegg.com/Product/Product.aspx?Item=N82E16813186205)

- HDDs - Seagate ST2000DL003 2TB 5900RPM (http://www.newegg.com/Product/Product.aspx?Item=N82E16822148681)

 

I will end up having an extra onboard SATA port since my end goal is a 15 drive system and my motherboard has 6 SATA ports. I'll be using Norco SS-500 5x3 bays (3 of them) I'm a bit nervous about the Norco-SS 500's

now given the posts from joelones a few pages back. Hopefully these turn out ok.. I'll also have a Supermicro AOC-SASLP-MV8 8-Port SAS/SATA and SATA2 Serial ATA II PCI-Express Raid controller card SIL3132. This leaves

me with 1 extra onboard SATA port. Would it be advantageous for me to always preclear on that port?  Seems like it would for speed purposes but from some of the posts I've read I'm not sure if running a preclear would be

optimal through an expansion card.

 

First Drive preclear reports -

preclear_start_+5YD5RMP1_2012-03-31.txt

preclear_rpt_+5YD5RMP1_2012-03-31.txt

preclear_finish_+5YD5RMP1_2012-03-31.txt

Link to comment

I ran preclear twice on the same WD20EARS disk.  The first time I got this in the summary:

Changed attributes in files: /tmp/smart_start_hda  /tmp/smart_finish_hda
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Temperature_Celsius =   115     114            0        ok          37
  Reallocated_Event_Count =   198     199            0        ok          2
No SMART attributes are FAILING_NOW

1 sector was pending re-allocation before the start of the preclear.
1 sector was pending re-allocation after pre-read in cycle 1 of 1.
1 sector was pending re-allocation after zero of disk in cycle 1 of 1.
1 sector is pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
2 sectors had been re-allocated before the start of the preclear.
4 sectors are re-allocated at the end of the preclear,
    a change of 2 in the number of sectors re-allocated. 

The second time, for the same disk:

<code>

rt_start_sdh  /tmp/smart_finish_sdh
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
          Seek_Error_Rate =   100     200            0        ok          0
      Temperature_Celsius =   109     110            0        ok          43
No SMART attributes are FAILING_NOW

1 sector was pending re-allocation before the start of the preclear.
1 sector was pending re-allocation after pre-read in cycle 1 of 1.
1 sector was pending re-allocation after zero of disk in cycle 1 of 1.
1 sector is pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
4 sectors had been re-allocated before the start of the preclear.
4 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change. 

 

Can I trust this disk in the array? 

 

Thank you

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.