joeyke87 Posted March 23, 2012 Share Posted March 23, 2012 Here are my results from new hard disk. After preclearing i replaced a bad disk for this one. Rebuilding worked fine i guess (everthing green now). Is there a way to check if the rebuild is succeful, or can i just trust the green lights? syslog-2012-03-22-disk3new.txt Quote Link to comment
tr0910 Posted March 26, 2012 Share Posted March 26, 2012 WDC WD30EZRS drive fails preclear 1.13 invoked with -n on both v5b12a and v5b14 on two different unraid servers. It completes all 10 steps fine, but at the very end if says drive (dev/sdf) fails preclear and drops from the list of drives that can be precleared. No preclear report is saved and syslog explodes to 200 mb with all kinds of errors with sdf (the drive being precleared). (Truncated version with first 10000 lines attached below) I have precleared dozens of 2tb and 3tb drives on 4.7 and v5 without incident. Any idea what is going wrong? Restarting the server will bring the drive back online and allow preclear to start again. root@Tower1:/boot# preclear_disk.sh -l ====================================1.13 Disks not assigned to the unRAID array (potential candidates for clearing) ======================================== No un-assigned disks detected Restarting the server and testing for preclear status shows Serial Number: WD-WCAWZ2017532 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes 255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdg doesn't contain a valid partition table ######################################################################## failed test 1 failed test 2 00000 00000 00000 00000 failed test 3 00000 00000 00000 00000 failed test 5 failed test 6 ========================================================================1.13 == == Disk /dev/sdg is NOT precleared == 0 0 4294967295 ============================================================================ Quote Link to comment
tr0910 Posted March 26, 2012 Share Posted March 26, 2012 Sorry, here is the truncated syslog. syslog-2012-03-26_truncated.zip Quote Link to comment
Joe L. Posted March 26, 2012 Share Posted March 26, 2012 Lots of errors in communicating with the disk in the syslog. Many are CRC errors (bad checksums in communications with the disk) Mar 25 22:49:45 Tower1 emhttp: get_config_idx: fopen /boot/config/shares/Pix2012.cfg: No such file or directory - assigning defaults Mar 25 22:49:45 Tower1 emhttp: Restart SMB... Mar 25 22:49:45 Tower1 emhttp: shcmd (46): killall -HUP smbd Mar 25 22:49:45 Tower1 emhttp: shcmd (47): ps axc | grep -q rpc.mountd Mar 25 22:49:45 Tower1 emhttp: _shcmd: shcmd (47): exit status: 1 Mar 25 22:49:45 Tower1 emhttp: Start NFS... Mar 25 22:49:45 Tower1 emhttp: shcmd (48): /etc/rc.d/rc.nfsd start |& logger Mar 25 22:49:45 Tower1 logger: Starting NFS server daemons: Mar 25 22:49:45 Tower1 logger: /usr/sbin/exportfs -r Mar 25 22:49:45 Tower1 logger: /usr/sbin/rpc.nfsd 8 Mar 25 22:49:45 Tower1 logger: /usr/sbin/rpc.mountd Mar 25 22:49:45 Tower1 mountd[2091]: Kernel does not have pseudo root support. Mar 25 22:49:45 Tower1 mountd[2091]: NFS v4 mounts will be disabled unless fsid=0 Mar 25 22:49:45 Tower1 mountd[2091]: is specfied in /etc/exports file. Mar 25 22:49:45 Tower1 emhttp: shcmd (49): /usr/local/sbin/emhttp_event svcs_restarted Mar 25 22:49:45 Tower1 emhttp_event: svcs_restarted Mar 25 22:49:47 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6 Mar 25 22:49:47 Tower1 kernel: ata7.00: irq_stat 0x08000000 Mar 25 22:49:47 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC } Mar 25 22:49:47 Tower1 kernel: ata7.00: failed command: READ DMA Mar 25 22:49:47 Tower1 kernel: ata7.00: cmd c8/00:08:47:ae:00/00:00:00:00:00/e0 tag 0 dma 4096 in Mar 25 22:49:47 Tower1 kernel: res 50/00:00:46:ae:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) Mar 25 22:49:47 Tower1 kernel: ata7.00: status: { DRDY } Mar 25 22:49:47 Tower1 kernel: ata7: hard resetting link Mar 25 22:49:48 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 25 22:49:53 Tower1 kernel: ata7.00: qc timeout (cmd 0xec) Mar 25 22:49:53 Tower1 kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Mar 25 22:49:53 Tower1 kernel: ata7.00: revalidation failed (errno=-5) Mar 25 22:49:53 Tower1 kernel: ata7: hard resetting link Mar 25 22:49:53 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 25 22:49:53 Tower1 kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x100) Mar 25 22:49:53 Tower1 kernel: ata7.00: revalidation failed (errno=-5) Mar 25 22:49:58 Tower1 kernel: ata7: hard resetting link Mar 25 22:49:59 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 25 22:49:59 Tower1 kernel: ata7.00: configured for UDMA/33 Mar 25 22:49:59 Tower1 kernel: ata7: EH complete Mar 25 22:49:59 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6 Mar 25 22:49:59 Tower1 kernel: ata7.00: irq_stat 0x08000000 Mar 25 22:49:59 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC } Mar 25 22:49:59 Tower1 kernel: ata7.00: failed command: READ DMA Mar 25 22:49:59 Tower1 kernel: ata7.00: cmd c8/00:08:47:ae:00/00:00:00:00:00/e0 tag 0 dma 4096 in Mar 25 22:49:59 Tower1 kernel: res 50/00:42:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error) Mar 25 22:49:59 Tower1 kernel: ata7.00: status: { DRDY } Mar 25 22:49:59 Tower1 kernel: ata7: hard resetting link Mar 25 22:49:59 Tower1 kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 25 22:49:59 Tower1 kernel: ata7.00: configured for UDMA/33 Mar 25 22:49:59 Tower1 kernel: ata7: EH complete Mar 25 22:50:40 Tower1 kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x380100 action 0x6 Mar 25 22:50:40 Tower1 kernel: ata7.00: irq_stat 0x08000000 Mar 25 22:50:40 Tower1 kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC } Mar 25 22:50:40 Tower1 kernel: ata7.00: failed command: IDENTIFY DEVICE Mar 25 22:50:40 Tower1 kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Mar 25 22:50:40 Tower1 kernel: res 50/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Basically, the disk is ceasing to communicate with the disk controller somewhere in the process of reading and writing to the disk. A power cycle seems to get it to re-initialize and communicate once more. The 10 steps in the preclear are being performed, but when the verification is performed, the expected values are not there. (because the disk stopped responding to commands somewhere in the middle) It could be a disk controller port issue, or a cable issue (noise on the cable causes CRC errors) or just a bad disk drive. It could even be a marginal power supply for that drive. The process of elimination is tedious, but I'd try a different PC first. If it fails there, it was the disk. Joe L. Quote Link to comment
tr0910 Posted March 26, 2012 Share Posted March 26, 2012 It failed on two different unraid servers and failed at least twice on each. (different cables were tested) DOA is guess..... Quote Link to comment
Joe L. Posted March 26, 2012 Share Posted March 26, 2012 It failed on two different unraid servers and failed at least twice on each. (different cables were tested) DOA is guess..... I guess that isolates the issue to the only common hardware involved. (the disk itself) Not DOA, but Zombie. (keeps coming back from the dead) I would not trust my data on it. Not unless you want to keep power cycling the server to get to a file. Quote Link to comment
grither Posted March 27, 2012 Share Posted March 27, 2012 Hello all! i have just copied data off three of my drives and hope to migrate them into my server (then copy more data onto them, move more drives etc) here are the 3 preclear reports. Note that all the drives `passed`as per the email reports. see any serious problems which would prevent migrating the drives in Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail. hoping these aren`t serious problems!!! preclear_finish_+WD-WCAU41253328_2012-03-25.txt preclear_finish_+6XW04QZD_2012-03-27.txt preclear_finish_+WD-WCAVY2471414_2012-03-27.txt Quote Link to comment
tr0910 Posted March 27, 2012 Share Posted March 27, 2012 DOA I guess..... Not DOA, but Zombie. (keeps coming back from the dead) Here is the detail of the last full test I ran (not -n). Three strikes and its out. (I hate RMAing stuff) = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Elapsed Time: 16:36:09 ========================================================================1.13 == == SORRY: Disk /dev/sdg MBR could NOT be precleared == == out4= 00000 == out5= 00000 ============================================================================ 0+0 records in 0+0 records out 0000000 0 bytes (0 B) copied, 2.2213e-05 s, 0.0 kB/s root@Tower1:/boot# root@Tower1:/boot# Quote Link to comment
Joe L. Posted March 28, 2012 Share Posted March 28, 2012 Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail. hoping these aren`t serious problems!!! Those are the categories those attributes belong to. Not failure unless they also say FAILING_NOW on the same line. As an example, run-time-hours would be an old_age indicator of a disk. Un-correctable-disk-read-errors will be in a category of pre-failure. High run-time hours does not indicate the drive will fail, just that it is getting older. Un-correctable errors can occur at any age. A large number, or increasing numbers of them might indicate a pending failure (once the disk runs out of spare sectors to re-allocate in place of the un-readable ones) You just need to compare the normalized value with the failure threshold for any given attribute. That will tell you of the drive's health. Quote Link to comment
Joe L. Posted March 28, 2012 Share Posted March 28, 2012 Hello all! i have just copied data off three of my drives and hope to migrate them into my server (then copy more data onto them, move more drives etc) here are the 3 preclear reports. Note that all the drives `passed`as per the email reports. see any serious problems which would prevent migrating the drives in Each drive seems to have about 11 or 12 values of òld age and about 3 or 4 pre-fail. hoping these aren`t serious problems!!! The third disk shows 38 sectors pending re-allocation. There would normally be none as the writing of zeros should have re-allocated all the sectors. There were no sectors re-allocated, so I'd suspect the 38 un-readable sectors were discovered in the post-read phase. (that is not good) I'd run another pre-clear on that disk. If it continues to show sectors pending re-allocation, I'd not trust it. Quote Link to comment
slarco Posted March 29, 2012 Share Posted March 29, 2012 What if my syslog weight more than 192k? Use a external host? Just paste as quote? Quote Link to comment
Joe L. Posted March 29, 2012 Share Posted March 29, 2012 What if my syslog weight more than 192k? Use a external host? Just paste as quote? zip it, (they zip really well) or, use ext host as you described. For pre-clear results I really do not need to see the entire syslog. You can attach only the pre-clear reports as found in /boot/preclear_reports Joe L. Quote Link to comment
Darkoverlord Posted March 29, 2012 Share Posted March 29, 2012 Hi guys, What might be wrong here? Mar 29 22:23:14 Tower kernel: end_request: I/O error, dev sde, sector 1301043712 (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: ata7: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related) Mar 29 22:23:14 Tower kernel: ata7.00: device reported invalid CHS sector 0 (Drive related) Mar 29 22:23:14 Tower kernel: ata7: status=0x41 { DriveReady Error } (Errors) Mar 29 22:23:14 Tower kernel: ata7: error=0x04 { DriveStatusError } (Errors) Mar 29 22:23:14 Tower kernel: sd 7:0:0:0: [sde] Result: hostbyte=0x00 driverbyte=0x08 (System) Mar 29 22:23:14 Tower kernel: sd 7:0:0:0: [sde] Sense Key : 0xb [current] [descriptor] (Drive related) Mar 29 22:23:14 Tower kernel: Descriptor sense data with sense descriptors (in hex): Mar 29 22:23:14 Tower kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Mar 29 22:23:14 Tower kernel: 00 00 00 00 I just installed a brand new AOC-SASLP-MV8 card and tried to preclear a drive outside the array, that I previously had in my windows machine. Preclear version used 1.13. Thanks in advance! Quote Link to comment
slarco Posted March 29, 2012 Share Posted March 29, 2012 What if my syslog weight more than 192k? Use a external host? Just paste as quote? zip it, (they zip really well) or, use ext host as you described. For pre-clear results I really do not need to see the entire syslog. You can attach only the pre-clear reports as found in /boot/preclear_reports Joe L. Ok, got the report on both drives. thx in advance preclear_rpt__WD-WCAWZ1679024_2012-03-29.txt preclear_rpt__WD-WCAWZ1714217_2012-03-29.txt Quote Link to comment
Joe L. Posted March 30, 2012 Share Posted March 30, 2012 What if my syslog weight more than 192k? Use a external host? Just paste as quote? zip it, (they zip really well) or, use ext host as you described. For pre-clear results I really do not need to see the entire syslog. You can attach only the pre-clear reports as found in /boot/preclear_reports Joe L. Ok, got the report on both drives. thx in advance both look fine. Quote Link to comment
slarco Posted March 31, 2012 Share Posted March 31, 2012 What if my syslog weight more than 192k? Use a external host? Just paste as quote? zip it, (they zip really well) or, use ext host as you described. For pre-clear results I really do not need to see the entire syslog. You can attach only the pre-clear reports as found in /boot/preclear_reports Joe L. Ok, got the report on both drives. thx in advance both look fine. Thx Joe Quote Link to comment
khuong Posted March 31, 2012 Share Posted March 31, 2012 I'm not 100% sure on my results, first time I precleared, setting up my first array, if someone could give me the thumbs up? I attached both finish, and the other reports. These were brand new drives. EARX 2TB. preclear_finish__WD-WMAZA5542794_2012-03-30.txt preclear_finish__WD-WMAZA5710612_2012-03-30.txt preclear_rpt__WD-WMAZA5542794_2012-03-30.txt preclear_rpt__WD-WMAZA5710612_2012-03-30.txt Quote Link to comment
Hypknox Posted March 31, 2012 Share Posted March 31, 2012 I'm currently in process of setting up my first unRAID server and just finished a 1 cycle preclear of my first three drives (all of which are Seagate 2TB ST2000DL003 5900RPM drives). I would greatly appreciate your feedback on the integrity of these drives Joe. Information that may be relevant: - preclear version 1.13 - Due to these being AF drives I invoked the preclear script as such "./preclear_disk.sh -A /dev/sdX" - Mobo - Foxconn A88GMV (http://www.newegg.com/Product/Product.aspx?Item=N82E16813186205) - HDDs - Seagate ST2000DL003 2TB 5900RPM (http://www.newegg.com/Product/Product.aspx?Item=N82E16822148681) I will end up having an extra onboard SATA port since my end goal is a 15 drive system and my motherboard has 6 SATA ports. I'll be using Norco SS-500 5x3 bays (3 of them) I'm a bit nervous about the Norco-SS 500's now given the posts from joelones a few pages back. Hopefully these turn out ok.. I'll also have a Supermicro AOC-SASLP-MV8 8-Port SAS/SATA and SATA2 Serial ATA II PCI-Express Raid controller card SIL3132. This leaves me with 1 extra onboard SATA port. Would it be advantageous for me to always preclear on that port? Seems like it would for speed purposes but from some of the posts I've read I'm not sure if running a preclear would be optimal through an expansion card. First Drive preclear reports - preclear_start_+5YD5RMP1_2012-03-31.txt preclear_rpt_+5YD5RMP1_2012-03-31.txt preclear_finish_+5YD5RMP1_2012-03-31.txt Quote Link to comment
Hypknox Posted March 31, 2012 Share Posted March 31, 2012 Second Drive preclear reports attached - preclear_start_+5YD77WNB_2012-03-31.txt preclear_rpt_+5YD77WNB_2012-03-31.txt preclear_finish_+5YD77WNB_2012-03-31.txt Quote Link to comment
Hypknox Posted March 31, 2012 Share Posted March 31, 2012 Third Drive preclear reports attached - preclear_start_+6YD210G5_2012-03-31.txt preclear_rpt_+6YD210G5_2012-03-31.txt preclear_finish_+6YD210G5_2012-03-31.txt Quote Link to comment
Joe L. Posted March 31, 2012 Share Posted March 31, 2012 Third Drive preclear reports attached - all three drives look fine. Quote Link to comment
feliksk Posted March 31, 2012 Share Posted March 31, 2012 I ran preclear twice on the same WD20EARS disk. The first time I got this in the summary: Changed attributes in files: /tmp/smart_start_hda /tmp/smart_finish_hda ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Temperature_Celsius = 115 114 0 ok 37 Reallocated_Event_Count = 198 199 0 ok 2 No SMART attributes are FAILING_NOW 1 sector was pending re-allocation before the start of the preclear. 1 sector was pending re-allocation after pre-read in cycle 1 of 1. 1 sector was pending re-allocation after zero of disk in cycle 1 of 1. 1 sector is pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 2 sectors had been re-allocated before the start of the preclear. 4 sectors are re-allocated at the end of the preclear, a change of 2 in the number of sectors re-allocated. The second time, for the same disk: <code> rt_start_sdh /tmp/smart_finish_sdh ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Seek_Error_Rate = 100 200 0 ok 0 Temperature_Celsius = 109 110 0 ok 43 No SMART attributes are FAILING_NOW 1 sector was pending re-allocation before the start of the preclear. 1 sector was pending re-allocation after pre-read in cycle 1 of 1. 1 sector was pending re-allocation after zero of disk in cycle 1 of 1. 1 sector is pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 4 sectors had been re-allocated before the start of the preclear. 4 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Can I trust this disk in the array? Thank you Quote Link to comment
Hypknox Posted April 1, 2012 Share Posted April 1, 2012 Thanks a lot Joe, very much appreciated! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.