Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

Wordpad seems to format a little better as well heffneil.

 

Quick question regarding one of my drives.  This used to be my parity drive and I decided to go with a 3TB drive instead for parity so that I can use larger than 2TB drives for data in the future.  I followed your instructions listed here Joe - http://lime-technology.com/forum/index.php?topic=6126.msg58998#msg58998

 

I finished the preclear of my old parity drive successfully and out of curiosity decided to compare the results with the original ones some months ago since this is an existing drive from my array.

 

Everything for the most part looked pretty similar except this time I noticed the following at the bottom of my start and end reports -

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Aborted by host              90%      3104        -

# 2  Extended offline    Aborted by host              80%      3029        -

 

 

Should I be alarmed by this?  Is this drive ok or is it on it's way out already?

I've attached the preclear reports for this drive in the even any additional information is needed.

preclear_start__5YD77WNB_2012-07-21.txt

preclear_rpt__5YD77WNB_2012-07-21.txt

preclear_finish__5YD77WNB_2012-07-21.txt

Link to comment

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit).  I have some concerns with the data in bold.  Is this drive a concern?  I know it's also one that people seem to be iffy about (1.5t seagate).

 

========================================================================1.13

== invoked as: ./preclear_disk.sh -c 1 -M 4 -m xxxxxxxxxx /dev/hda

==  ST31500541AS    5XW05ES1

== Disk /dev/hda has been successfully precleared

== with a starting sector of 64

== Ran 1 cycle

==

== Using :Read block size = 8225280 Bytes

== Last Cycle's Pre Read Time  : 5:44:50 (72 MB/s)

== Last Cycle's Zeroing time  : 7:58:24 (52 MB/s)

== Last Cycle's Post Read Time : 11:28:37 (36 MB/s)

== Last Cycle's Total Time    : 25:12:52

==

== Total Elapsed Time 25:12:52

==

== Disk Start Temperature: 29C

==

== Current Disk Temperature: 29C,

==

============================================================================

** Changed attributes in files: /tmp/smart_start_hda  /tmp/smart_finish_hda

                ATTRIBUTE  NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE

      Raw_Read_Error_Rate =  110    113            6        ok          26687111

          Seek_Error_Rate =    44      44          30        near_thresh 790278696138

        Spin_Retry_Count =  100    100          97        near_thresh 0

        End-to-End_Error =  100    100          99        near_thresh 0

          High_Fly_Writes =    97      98            0        ok          3

No SMART attributes are FAILING_NOW

 

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

    the number of sectors pending re-allocation did not change.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

Link to comment

I tried to preclear 6 drives at once:

 

system:

System: Supermicro - X7SPA-HF

CPU: Intel® AtomTM CPU D525 @ 1.80GHz - 1.8 GHz

Cache: 48 kB

Memory: 4 GB - 800 MHz

Network: 1000Mb/s - Full Duplex

 

5.0-rc5

 

drive on mainbord:

1) Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584

 

drives on ARC1300-16:

2) Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584

3) Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584

4) Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584

5) Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584

6) Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224

 

6) worked ok and finsihed ok (320 GB)

 

1) still running

3) still running

4) still running

 

2) stopped at 1:40 hours in first step

5) stopped at 1:40 hours in first step

 

 

I added a part of the syslog with the errors, seems something crashed.

Can someone look and comment what is wrong here ?

 

Also i have the syslog filled with these lines:

 

Jul 22 06:52:52 Tower emhttp: shcmd (1274): /usr/local/sbin/emhttp_event driver_loaded
Jul 22 06:52:52 Tower emhttp_event: driver_loaded
Jul 22 06:52:56 Tower emhttp: shcmd (1275): rmmod md-mod |& logger
Jul 22 06:52:56 Tower emhttp: shcmd (1276): modprobe md-mod super=/boot/config/super.dat slots=21 |& logger
Jul 22 06:52:56 Tower kernel: md: unRAID driver removed
Jul 22 06:52:56 Tower emhttp: shcmd (1277): udevadm settle
Jul 22 06:52:56 Tower kernel: md: unRAID driver 2.1.4 installed
Jul 22 06:52:56 Tower kernel: read_file: error 2 opening /boot/config/super.dat
Jul 22 06:52:56 Tower kernel: md: could not read superblock from /boot/config/super.dat
Jul 22 06:52:56 Tower kernel: md: initializing superblock
Jul 22 06:52:56 Tower emhttp: Device inventory:
Jul 22 06:52:56 Tower emhttp: Hitachi_HDT721010SLA360_STF604MH0RR14B (sdb) 976762584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224
Jul 22 06:52:56 Tower kernel: mdcmd (1): import 0 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (2): import 1 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (3): import 2 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (4): import 3 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (5): import 4 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (6): import 5 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (7): import 6 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (: import 7 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (9): import 8 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (10): import 9 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (11): import 10 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (12): import 11 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (13): import 12 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (14): import 13 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (15): import 14 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (16): import 15 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (17): import 16 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (18): import 17 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (19): import 18 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (20): import 19 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (21): import 20 0,0

What does this mean ?

 

System is in a cooled room. The arc 1300 is colled with a fan.

 

Powersupply is a 1000 watts 80 amps single rail cooler master silent pro.

 

Thanks, Jaco

syslog_extract_preclear.txt

Link to comment

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit).  I have some concerns with the data in bold.  Is this drive a concern?  I know it's also one that people seem to be iffy about (1.5t seagate).

        Seek_Error_Rate =    44      44          30        near_thresh 790278696138

I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and  odds are the starting value was 100 or 200.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors.   

 

Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA.  (the other parameters that are near their thresholds just have very  high thresholds... they are not an issue)

Link to comment

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit).  I have some concerns with the data in bold.  Is this drive a concern?  I know it's also one that people seem to be iffy about (1.5t seagate).

        Seek_Error_Rate =    44      44          30        near_thresh 790278696138

I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and  odds are the starting value was 100 or 200.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors.   

 

Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA.  (the other parameters that are near their thresholds just have very  high thresholds... they are not an issue)

 

Thanks.  Surprisingly, it's still under warranty.  I will see if seatools shows it as bad enough to warranty.

Link to comment

Thoughts on this one?

 

== Last Cycle's Pre Read Time  : 6:11:42 (134 MB/s)

== Last Cycle's Zeroing time  : 6:26:03 (129 MB/s)

== Last Cycle's Post Read Time : 12:58:02 (64 MB/s)

 

** Changed attributes in files: /tmp/smart_start_sde  /tmp/smart_finish_sde

                ATTRIBUTE  NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE

      Raw_Read_Error_Rate =  118    100            6        ok          183174328

          Seek_Error_Rate =    60    100          30        ok          1107922

        Spin_Retry_Count =  100    100          97        near_thresh 0

        End-to-End_Error =  100    100          99        near_thresh 0

          High_Fly_Writes =    99    100            0        ok          1

  Airflow_Temperature_Cel =    63      71          45        near_thresh 37

      Temperature_Celsius =    37      29            0        ok          37

No SMART attributes are FAILING_NOW

 

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

    the number of sectors pending re-allocation did not change.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

Link to comment

Thoughts on this one?

 

== Last Cycle's Pre Read Time  : 6:11:42 (134 MB/s)

== Last Cycle's Zeroing time  : 6:26:03 (129 MB/s)

== Last Cycle's Post Read Time : 12:58:02 (64 MB/s)

 

** Changed attributes in files: /tmp/smart_start_sde  /tmp/smart_finish_sde

                ATTRIBUTE  NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE

      Raw_Read_Error_Rate =  118    100            6        ok          183174328

          Seek_Error_Rate =    60    100          30        ok          1107922

        Spin_Retry_Count =  100    100          97        near_thresh 0

        End-to-End_Error =  100    100          99        near_thresh 0

          High_Fly_Writes =    99    100            0        ok          1

  Airflow_Temperature_Cel =    63      71          45        near_thresh 37

      Temperature_Celsius =    37      29            0        ok          37

No SMART attributes are FAILING_NOW

 

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

    the number of sectors pending re-allocation did not change.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

looks fine.
Link to comment
  • 2 weeks later...

Can someone tell me what these errors mean during a preclear:"

 

Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [3] tag[3], task [d6eaf900]:
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 0000000F,  slot [3].
Aug  5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug  5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f76480 timed out
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [1] tag[1], task [d6eaf2c0]:
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000007,  slot [1].
Aug  5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug  5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f760c0 timed out
Aug  5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2d75780 timed out
Aug  5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2e5c240 timed out
Aug  5 18:24:51 Tower kernel: sas: Enter sas_scsi_recover_host busy: 4 failed: 4
Aug  5 18:24:51 Tower kernel: sas: trying to find task 0xd4febb80
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xd4febb80
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xd4febb80 is aborted
Aug  5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xd4febb80 is aborted
Aug  5 18:24:51 Tower kernel: sas: trying to find task 0xde7f5680
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xde7f5680
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xde7f5680 is aborted
Aug  5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xde7f5680 is aborted
Aug  5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: cmd error handler
Aug  5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: cmd error handler
Aug  5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: dev error handler
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: dev error handler
Aug  5 18:24:51 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
Aug  5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6
Aug  5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:16:ef/01:00:6f:00:00/40 tag 0 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 40/00:04:78:78:d7/00:00:6f:00:00/40 Emask 0x4 (timeout)
Aug  5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8d:8e/01:00:72:00:00/40 tag 0 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res ff/3f:3f:37:c8:10/00:00:56:88:2a/00 Emask 0x403 (HSM violation) <F>
Aug  5 18:24:51 Tower kernel: ata7.00: status: { Busy }
Aug  5 18:24:51 Tower kernel: ata7.00: error: { IDNF ABRT }
Aug  5 18:24:51 Tower kernel: ata8.00: status: { DRDY }
Aug  5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8e:8e/01:00:72:00:00/40 tag 1 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 01/04:04:30:8d:8e/00:00:72:00:00/40 Emask 0x2 (HSM violation)
Aug  5 18:24:51 Tower kernel: ata7.00: status: { ERR }
Aug  5 18:24:51 Tower kernel: ata7.00: error: { ABRT }
Aug  5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:17:ef/01:00:6f:00:00/40 tag 1 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  5 18:24:51 Tower kernel: ata7: hard resetting link
Aug  5 18:24:51 Tower kernel: ata8.00: status: { DRDY }
Aug  5 18:24:51 Tower kernel: ata8: hard resetting link
Aug  5 18:24:51 Tower kernel: sas: sas_form_port: phy3 belongs to port1 already(1)!
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:24:51 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:24:53 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[1]:rc= 0
Aug  5 18:24:54 Tower kernel: ata8.00: configured for UDMA/133
Aug  5 18:24:54 Tower kernel: ata8.00: device reported invalid CHS sector 0
Aug  5 18:24:54 Tower kernel: ata8: EH complete
Aug  5 18:24:56 Tower kernel: ata7: hard resetting link
Aug  5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:56 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:24:56 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:25:01 Tower kernel: ata7: hard resetting link
Aug  5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:25:02 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:25:02 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:25:02 Tower kernel: ata7.00: disabled
Aug  5 18:25:02 Tower kernel: ata7: EH complete
Aug  5 18:25:02 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8e 30 00 01 00 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921945136
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243142
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243143
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243144
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243145
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243146
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243147
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243148
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243149
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243150
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243151
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 01 00 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 20 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336

 

The preclear was at 65%

I have many thoused of these errrors, and i stopped the preclear.

In the beginning it were multiple sectors, the last 8000 lines it is only one sector it complains about.

Aug  5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 ae a8 75 c0 00 00 08 00
Aug  5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776

 

 

what i did:

- moved the hdd over to another sata data and power port (SFF8087 to 4 sata) on which another disk precleared fine

 

I am running 5.0 RC6

This disc is connected to a supermicro AOC-SAS2LP-MV8

 

Is the HDD bad, and perhaps remapping bad sectors ?

I can imagine if this takes to long unraid times out.

 

Link to comment

It means the disk is timing out when communications to it are attempted.

 

It could be a bad disk, or a bad disk controller, or a poor power supply, or poor quality splitter/drive cage connections.  Notice there are TWO disks involved.  They may share a common controller, or one might be causing the lock-up of the other sharing a disk controller.

 

Some-times it is just the drive that is confused and a power cycle will fix it, other times it will not.

 

get a smart report of the drives involved.

smartctl -a /dev/sdi

 

smartctl -a /dev/sdh

 

Link to comment

Thx, i will do that once i get back home today after work.

The SDI drive was continuing just fine it seems, so i guess i will have the 3 preclear files of that one.

 

The drives are on one power cable with 4 sata connectors (2 used) - original cable that came with the Power supply (monster realpower M1000)

No splitters there.

 

The Sata cable is a SFf-8087 splitted to 4 sata data connectors. The 2nd preclair on this SDH drive i did on a other sata data connector on which i pre-cleared a disk fine.

 

 

Is there a reason to suspect the quality of these SFF-8087 breakout cables ?

http://cybershop.ri-vier.nl/discrete-sff8087-to-4x-sata-mini-sas-forward-brkout-cable-l50-p-103.html

 

edit: they look a lot like these: mono price   for $9.63 (payed like 20 euro)

 

I had massive issues on my Areca also, if these things could be caused by the cables i would be wise to buy other ones to test perhaps.

 

edit: (8-8-2012) the 3rd go failed as well. According to seagate i still have warrantee on this disk, so i will go for a swap of the hdd.

Will get a seagate back it seems, do not know if i am very happy with that..

Link to comment

I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server.  It looks like it would be safe to use - no "FAILING_NOW" issues.  To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together.  Once things seem to be operating smoothly and more space is needed I will expand with new pieces.  Currently I'm just interested in getting the server up and running so I can test out the capabilities.  Your opinion after taking a look at the reports is appreciated.

 

Thanks.

preclear_250GB.zip

Link to comment

I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server.  It looks like it would be safe to use - no "FAILING_NOW" issues.  To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together.  Once things seem to be operating smoothly and more space is needed I will expand with new pieces.  Currently I'm just interested in getting the server up and running so I can test out the capabilities.  Your opinion after taking a look at the reports is appreciated.

 

Thanks.

Your disk looks fine.  There are only two items ni the SMART report worth mentioning:

  9 Power_On_Hours          0x0032  061  061  000    Old_age  Always      -      34907

199 UDMA_CRC_Error_Count    0x003e  200  197  000    Old_age  Always      -      34

 

The first is the run-time-hours.  (it has been in operation for about 4 years)

The UDMA CRC errors are usually noise pickup from cables.  (try NOT to be anal with cable management unless you use good quality SHIELDED cables. )  Do not tie-wrap SATA cables together and definitely not with power cables.    The errors are not bad, but you should be aware of their cause.

 

Lastly, I'd much rather trust an older drive such as this rather than a brand new un-tested drive.    Good luck with your test server.

 

 

Link to comment
  • 2 weeks later...

I've precleared many many disks and this is the first time I'm noticing something about the Post-Read. All my array disks are spun up and they are being read from. What's going on? Have I never noticed that before? No writes are being done, just reads. Should I be concerned?

 

putty window tells me this:

(  1,748,694,528,000  of  2,000,398,934,016  bytes read ) 109 MB/s

Disk Temperature: 32C, Elapsed Time:  20:57:25

 

unMENU tells me this:

Post-Read (1 of 3). 87% @ 41 MB/s (20:57:25)

 

Thanks

Link to comment
  • 2 weeks later...

I'm 21 hours into clearing a new WD 2TB EARX, and its 50% through the post-read.  The Post read MB/s is substantially lower than the pre-read MB/s, even at the same point in the read-cycle.  Why is this, is this normal?

 

For the post read 50% complete email (which came at 21 hours in)...  the email-subject header says: 79.2 MB/s but the message body contains: "Calculated Read Speed: 46 MB/s"

 

Compared to the pre-read 50% email subject header: 82.6 MB/s, email body "88 MB/s".  However, if I look at the screen's actual output of the currently running process, its currently reading high 70's-log 80's MB/s

Link to comment

Hi

Could some one have a look at my results and let me know if there anything i should worry about.

 

I'm still learning loads and enjoying my unraid experience, upgraded to a plus licence now :)

 

The 2x 1 TB are my oldest drives and the Samsung has come out of my current machine which I'm in the process in moving all the data from.

preclear_results_2TB_WD-WMAZA9227899.txt

preclear_results_2TB_WD-WMAZA8726413.txt

preclear_results_2TB_SAMSUNG_HD204UI.txt

preclear_results_1TB_WD-WCAV5D092592.txt

Link to comment

Hi guys

nvm - Solved - discovered after studying the usage script that the following command is possible:\

preclear_disk.sh -d sat /dev/sda

this instructs preclear to utilize alternate commands when running Smartctl.

 

hope someone can assist - i am running Unraid via an Adaptec controller  (Model 52445). It is rather overkill for unraid since it is meant for high levels of RAID, but i didn't want to shell out additional $$$'s to get another controller - currently its performing admirably with roughly 64MB/s on post read clearing 12 drives at the same time :D

 

 

everything seems fine - ie i set all the disks up as JBOD  - which seems to simluate pass through(not sure on the correct terms)

 

anyhow - so far so good. Unraid picks up the first set of 12 drives i connected (having power issues with connecting more :( ).

 

Problem i am having - it seems that smartctl doesnt give correct stats on the drive. See sample output below.

 

root@Storage:~# smartctl -a /dev/sdb
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device:          ST2000DL003-9VT1 Version: CC3C
Serial number:             6YD1RLL6
Device type: disk
Transport protocol: SAS
Local Time is: Sat Sep  1 08:46:46 2012 SAST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error Counter logging not supported
Device does not support Self Test logging
root@Storage:~#

 

as can be expected - this messes up Preclear a bit, since it is unable to read Smart results before ,during and after. (although otherwise - it doesn't crash or halt the preclear in any way - GREAT SCRIPT JOE L )

 

I managed to find a smartctl command that does give the output for the drive as required

Thank you Google :D

root@Storage:~# smartctl -d sat --all /dev/sg1
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST2000DL003-9VT166
Serial Number:    6YD1RLL6
Firmware Version: CC3C
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Sep  1 08:47:26 2012 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 612) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30b7) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   109   099   006    Pre-fail  Always       -       24419656
  3 Spin_Up_Time            0x0003   090   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       286
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       4318984658
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3906
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       286
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   059   023   045    Old_age   Always   In_the_past 41 (75 200 42 25)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       285
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       286
194 Temperature_Celsius     0x0022   041   077   000    Old_age   Always       -       41 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   036   015   000    Old_age   Always       -       24419656
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       66043712114499
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2825939409
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3158544765

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

So i guess my question is then - how do i go about trusting a drive after a preclear?

 

reading through some of the posts, it seems i need to look for the following :

FAILING NOW attributes,

5 Reallocated_Sector_Ct    (this should be preferably zero - or else stay a very low number.)

197 Current_Pending_Sector  (this should be preferably zero - or else stay a very low number.)

 

 

also - should i be worried about which "device" i am clearing?  (since i gather that SDB and SDG is possibly the same thing...)

clear should be completed in about  6 hours - will report on any results i don't understand

 

 

Thank you

 

Neo_x

 

PS

 

syslog attached just in case

syslog.zip

Link to comment

Hello all,

 

I am new to unraid, just in the midst of preclearing and the first disk finished.  Anything in here I need to be concerned with?

 

smartctl -a -d ata /dev/sdb (--)

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Device Model:    WDC WD10EALS-00Z8A0Serial Number:    WD-WCATR4743930Firmware Version: 05.01D05User Capacity:    1,000,204,886,016 bytesDevice is:        Not in smartctl database [for details use: -P showall]ATA Version is:  8ATA Standard is:  Exact ATA specification draft version not indicatedLocal Time is:    Sat Sep  1 07:44:19 2012 MDTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDGeneral SMART Values:Offline data collection status:  (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled.Self-test execution status:      (  0) The previous self-test routine completed without error or no self-test has ever been run.Total time to complete Offline data collection: (16500) seconds.Offline data collectioncapabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.SMART capabilities:            (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability:        (0x01) Error logging supported. General Purpose Logging supported.Short self-test routine recommended polling time: (  2) minutes.Extended self-test routinerecommended polling time: ( 191) minutes.Conveyance self-test routinerecommended polling time: (  5) minutes.SCT capabilities:       (0x3037) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.SMART Attributes Data Structure revision number: 16Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0  3 Spin_Up_Time            0x0027  178  172  021    Pre-fail  Always      -      4075  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      505  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      5903 10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0 11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      147192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      77193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      427194 Temperature_Celsius    0x0022  106  102  000    Old_age  Always      -      41196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0SMART Error Log Version: 1No Errors LoggedSMART Self-test log structure revision number 1No self-tests have been logged.  [To run self-tests, use: smartctl -t]SMART Selective self-test log data structure revision number 1 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS    1        0        0  Not_testing    2        0        0  Not_testing    3        0        0  Not_testing    4        0        0  Not_testing    5        0        0  Not_testingSelective self-test flags (0x0):  After scanning selected spans, do NOT read-scan remainder of disk.If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.