Preclear.sh results - Questions about your results? Post them here.

heffneil · July 14, 2012

I ran it through and it completed. Simple question is there a decent program for viewing the reports that would format it properly? I am using Windows notepad and it isn't doing it!

Thanks,

Neil

jowi · July 14, 2012

Try Notepad++

http://notepad-plus-plus.org/

Hypknox · July 22, 2012

Wordpad seems to format a little better as well heffneil.

Quick question regarding one of my drives. This used to be my parity drive and I decided to go with a 3TB drive instead for parity so that I can use larger than 2TB drives for data in the future. I followed your instructions listed here Joe - http://lime-technology.com/forum/index.php?topic=6126.msg58998#msg58998

I finished the preclear of my old parity drive successfully and out of curiosity decided to compare the results with the original ones some months ago since this is an existing drive from my array.

Everything for the most part looked pretty similar except this time I noticed the following at the bottom of my start and end reports -

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Aborted by host 90% 3104 -

# 2 Extended offline Aborted by host 80% 3029 -

Should I be alarmed by this? Is this drive ok or is it on it's way out already?

I've attached the preclear reports for this drive in the even any additional information is needed.

preclear_start__5YD77WNB_2012-07-21.txt

preclear_rpt__5YD77WNB_2012-07-21.txt

preclear_finish__5YD77WNB_2012-07-21.txt

Joe L. · July 22, 2012

nothing to be alarmed about.

Both the "short" and "long" tests are automatically aborted if the drive is spun down.

RokleM · July 22, 2012

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate).

========================================================================1.13

== invoked as: ./preclear_disk.sh -c 1 -M 4 -m xxxxxxxxxx /dev/hda

== ST31500541AS 5XW05ES1

== Disk /dev/hda has been successfully precleared

== with a starting sector of 64

== Ran 1 cycle

==

== Using :Read block size = 8225280 Bytes

== Last Cycle's Pre Read Time : 5:44:50 (72 MB/s)

== Last Cycle's Zeroing time : 7:58:24 (52 MB/s)

== Last Cycle's Post Read Time : 11:28:37 (36 MB/s)

== Last Cycle's Total Time : 25:12:52

==

== Total Elapsed Time 25:12:52

==

== Disk Start Temperature: 29C

==

== Current Disk Temperature: 29C,

==

============================================================================

** Changed attributes in files: /tmp/smart_start_hda /tmp/smart_finish_hda

ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE

Raw_Read_Error_Rate = 110 113 6 ok 26687111

Seek_Error_Rate = 44 44 30 near_thresh 790278696138

Spin_Retry_Count = 100 100 97 near_thresh 0

End-to-End_Error = 100 100 99 near_thresh 0

High_Fly_Writes = 97 98 0 ok 3

No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

the number of sectors pending re-allocation did not change.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

downloadski · July 22, 2012

I tried to preclear 6 drives at once:

system:

System: Supermicro - X7SPA-HF

CPU: Intel® AtomTM CPU D525 @ 1.80GHz - 1.8 GHz

Cache: 48 kB

Memory: 4 GB - 800 MHz

Network: 1000Mb/s - Full Duplex

5.0-rc5

drive on mainbord:

1) Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584

drives on ARC1300-16:

2) Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584

3) Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584

4) Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584

5) Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584

6) Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224

6) worked ok and finsihed ok (320 GB)

1) still running

3) still running

4) still running

2) stopped at 1:40 hours in first step

5) stopped at 1:40 hours in first step

I added a part of the syslog with the errors, seems something crashed.

Can someone look and comment what is wrong here ?

Also i have the syslog filled with these lines:

Jul 22 06:52:52 Tower emhttp: shcmd (1274): /usr/local/sbin/emhttp_event driver_loaded
Jul 22 06:52:52 Tower emhttp_event: driver_loaded
Jul 22 06:52:56 Tower emhttp: shcmd (1275): rmmod md-mod |& logger
Jul 22 06:52:56 Tower emhttp: shcmd (1276): modprobe md-mod super=/boot/config/super.dat slots=21 |& logger
Jul 22 06:52:56 Tower kernel: md: unRAID driver removed
Jul 22 06:52:56 Tower emhttp: shcmd (1277): udevadm settle
Jul 22 06:52:56 Tower kernel: md: unRAID driver 2.1.4 installed
Jul 22 06:52:56 Tower kernel: read_file: error 2 opening /boot/config/super.dat
Jul 22 06:52:56 Tower kernel: md: could not read superblock from /boot/config/super.dat
Jul 22 06:52:56 Tower kernel: md: initializing superblock
Jul 22 06:52:56 Tower emhttp: Device inventory:
Jul 22 06:52:56 Tower emhttp: Hitachi_HDT721010SLA360_STF604MH0RR14B (sdb) 976762584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584
Jul 22 06:52:56 Tower emhttp: Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224
Jul 22 06:52:56 Tower kernel: mdcmd (1): import 0 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (2): import 1 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (3): import 2 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (4): import 3 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (5): import 4 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (6): import 5 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (7): import 6 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (: import 7 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (9): import 8 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (10): import 9 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (11): import 10 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (12): import 11 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (13): import 12 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (14): import 13 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (15): import 14 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (16): import 15 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (17): import 16 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (18): import 17 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (19): import 18 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (20): import 19 0,0
Jul 22 06:52:56 Tower kernel: mdcmd (21): import 20 0,0

What does this mean ?

System is in a cooled room. The arc 1300 is colled with a fan.

Powersupply is a 1000 watts 80 amps single rail cooler master silent pro.

Thanks, Jaco

syslog_extract_preclear.txt

Joe L. · July 22, 2012

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate).

Seek_Error_Rate = 44 44 30 near_thresh 790278696138

I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and odds are the starting value was 100 or 200.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors.

Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA. (the other parameters that are near their thresholds just have very high thresholds... they are not an issue)

RokleM · July 22, 2012

I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate).

Seek_Error_Rate = 44 44 30 near_thresh 790278696138

I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and odds are the starting value was 100 or 200.

343 sectors had been re-allocated before the start of the preclear.

343 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors.

Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA. (the other parameters that are near their thresholds just have very high thresholds... they are not an issue)

Thanks. Surprisingly, it's still under warranty. I will see if seatools shows it as bad enough to warranty.

RokleM · July 23, 2012

Thoughts on this one?

== Last Cycle's Pre Read Time : 6:11:42 (134 MB/s)

== Last Cycle's Zeroing time : 6:26:03 (129 MB/s)

== Last Cycle's Post Read Time : 12:58:02 (64 MB/s)

** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde

ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE

Raw_Read_Error_Rate = 118 100 6 ok 183174328

Seek_Error_Rate = 60 100 30 ok 1107922

Spin_Retry_Count = 100 100 97 near_thresh 0

End-to-End_Error = 100 100 99 near_thresh 0

High_Fly_Writes = 99 100 0 ok 1

Airflow_Temperature_Cel = 63 71 45 near_thresh 37

Temperature_Celsius = 37 29 0 ok 37

No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

the number of sectors pending re-allocation did not change.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

Joe L. · July 23, 2012

Thoughts on this one?

== Last Cycle's Pre Read Time : 6:11:42 (134 MB/s)

== Last Cycle's Zeroing time : 6:26:03 (129 MB/s)

== Last Cycle's Post Read Time : 12:58:02 (64 MB/s)

** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde

ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE

Raw_Read_Error_Rate = 118 100 6 ok 183174328

Seek_Error_Rate = 60 100 30 ok 1107922

Spin_Retry_Count = 100 100 97 near_thresh 0

End-to-End_Error = 100 100 99 near_thresh 0

High_Fly_Writes = 99 100 0 ok 1

Airflow_Temperature_Cel = 63 71 45 near_thresh 37

Temperature_Celsius = 37 29 0 ok 37

No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

the number of sectors pending re-allocation did not change.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

looks fine.

downloadski · August 5, 2012

Can someone tell me what these errors mean during a preclear:"

Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [3] tag[3], task [d6eaf900]:
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 0000000F,  slot [3].
Aug  5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug  5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f76480 timed out
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [1] tag[1], task [d6eaf2c0]:
Aug  5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000007,  slot [1].
Aug  5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a
Aug  5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f760c0 timed out
Aug  5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2d75780 timed out
Aug  5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2e5c240 timed out
Aug  5 18:24:51 Tower kernel: sas: Enter sas_scsi_recover_host busy: 4 failed: 4
Aug  5 18:24:51 Tower kernel: sas: trying to find task 0xd4febb80
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xd4febb80
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xd4febb80 is aborted
Aug  5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xd4febb80 is aborted
Aug  5 18:24:51 Tower kernel: sas: trying to find task 0xde7f5680
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xde7f5680
Aug  5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xde7f5680 is aborted
Aug  5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xde7f5680 is aborted
Aug  5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: cmd error handler
Aug  5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: cmd error handler
Aug  5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: dev error handler
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: dev error handler
Aug  5 18:24:51 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
Aug  5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6
Aug  5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:16:ef/01:00:6f:00:00/40 tag 0 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 40/00:04:78:78:d7/00:00:6f:00:00/40 Emask 0x4 (timeout)
Aug  5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8d:8e/01:00:72:00:00/40 tag 0 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res ff/3f:3f:37:c8:10/00:00:56:88:2a/00 Emask 0x403 (HSM violation) <F>
Aug  5 18:24:51 Tower kernel: ata7.00: status: { Busy }
Aug  5 18:24:51 Tower kernel: ata7.00: error: { IDNF ABRT }
Aug  5 18:24:51 Tower kernel: ata8.00: status: { DRDY }
Aug  5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8e:8e/01:00:72:00:00/40 tag 1 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 01/04:04:30:8d:8e/00:00:72:00:00/40 Emask 0x2 (HSM violation)
Aug  5 18:24:51 Tower kernel: ata7.00: status: { ERR }
Aug  5 18:24:51 Tower kernel: ata7.00: error: { ABRT }
Aug  5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED
Aug  5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:17:ef/01:00:6f:00:00/40 tag 1 ncq 131072 in
Aug  5 18:24:51 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  5 18:24:51 Tower kernel: ata7: hard resetting link
Aug  5 18:24:51 Tower kernel: ata8.00: status: { DRDY }
Aug  5 18:24:51 Tower kernel: ata8: hard resetting link
Aug  5 18:24:51 Tower kernel: sas: sas_form_port: phy3 belongs to port1 already(1)!
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:51 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:24:51 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:24:53 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[1]:rc= 0
Aug  5 18:24:54 Tower kernel: ata8.00: configured for UDMA/133
Aug  5 18:24:54 Tower kernel: ata8.00: device reported invalid CHS sector 0
Aug  5 18:24:54 Tower kernel: ata8: EH complete
Aug  5 18:24:56 Tower kernel: ata7: hard resetting link
Aug  5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:24:56 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:24:56 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:25:01 Tower kernel: ata7: hard resetting link
Aug  5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004,  slot [0].
Aug  5 18:25:02 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV
Aug  5 18:25:02 Tower kernel: ata7.00: revalidation failed (errno=-2)
Aug  5 18:25:02 Tower kernel: ata7.00: disabled
Aug  5 18:25:02 Tower kernel: ata7: EH complete
Aug  5 18:25:02 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8e 30 00 01 00 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921945136
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243142
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243143
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243144
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243145
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243146
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243147
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243148
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243149
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243150
Aug  5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243151
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 01 00 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 20 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 08 00
Aug  5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336

The preclear was at 65%

I have many thoused of these errrors, and i stopped the preclear.

In the beginning it were multiple sectors, the last 8000 lines it is only one sector it complains about.

Aug  5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh]  Result: hostbyte=0x04 driverbyte=0x00
Aug  5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 ae a8 75 c0 00 00 08 00
Aug  5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776

what i did:

- moved the hdd over to another sata data and power port (SFF8087 to 4 sata) on which another disk precleared fine

I am running 5.0 RC6

This disc is connected to a supermicro AOC-SAS2LP-MV8

Is the HDD bad, and perhaps remapping bad sectors ?

I can imagine if this takes to long unraid times out.

Joe L. · August 5, 2012

It means the disk is timing out when communications to it are attempted.

It could be a bad disk, or a bad disk controller, or a poor power supply, or poor quality splitter/drive cage connections. Notice there are TWO disks involved. They may share a common controller, or one might be causing the lock-up of the other sharing a disk controller.

Some-times it is just the drive that is confused and a power cycle will fix it, other times it will not.

get a smart report of the drives involved.

smartctl -a /dev/sdi

smartctl -a /dev/sdh

downloadski · August 6, 2012

Thx, i will do that once i get back home today after work.

The SDI drive was continuing just fine it seems, so i guess i will have the 3 preclear files of that one.

The drives are on one power cable with 4 sata connectors (2 used) - original cable that came with the Power supply (monster realpower M1000)

No splitters there.

The Sata cable is a SFf-8087 splitted to 4 sata data connectors. The 2nd preclair on this SDH drive i did on a other sata data connector on which i pre-cleared a disk fine.

Is there a reason to suspect the quality of these SFF-8087 breakout cables ?

http://cybershop.ri-vier.nl/discrete-sff8087-to-4x-sata-mini-sas-forward-brkout-cable-l50-p-103.html

edit: they look a lot like these: mono price for $9.63 (payed like 20 euro)

I had massive issues on my Areca also, if these things could be caused by the cables i would be wise to buy other ones to test perhaps.

edit: (8-8-2012) the 3rd go failed as well. According to seagate i still have warrantee on this disk, so i will go for a swap of the hdd.

Will get a seagate back it seems, do not know if i am very happy with that..

downloadski · August 6, 2012

Other sata data cable, other power cable, same drive, same errors at 65%

Other drive of same type continues with pre-clear.

So drive issue i assum, will put it aside

bcpratt · August 7, 2012

I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server. It looks like it would be safe to use - no "FAILING_NOW" issues. To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together. Once things seem to be operating smoothly and more space is needed I will expand with new pieces. Currently I'm just interested in getting the server up and running so I can test out the capabilities. Your opinion after taking a look at the reports is appreciated.

Thanks.

preclear_250GB.zip

Joe L. · August 7, 2012

I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server. It looks like it would be safe to use - no "FAILING_NOW" issues. To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together. Once things seem to be operating smoothly and more space is needed I will expand with new pieces. Currently I'm just interested in getting the server up and running so I can test out the capabilities. Your opinion after taking a look at the reports is appreciated.

Thanks.

Your disk looks fine. There are only two items ni the SMART report worth mentioning:

9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34907

199 UDMA_CRC_Error_Count 0x003e 200 197 000 Old_age Always - 34

The first is the run-time-hours. (it has been in operation for about 4 years)

The UDMA CRC errors are usually noise pickup from cables. (try NOT to be anal with cable management unless you use good quality SHIELDED cables. ) Do not tie-wrap SATA cables together and definitely not with power cables. The errors are not bad, but you should be aware of their cause.

Lastly, I'd much rather trust an older drive such as this rather than a brand new un-tested drive. Good luck with your test server.

petsheep · August 17, 2012

I've precleared many many disks and this is the first time I'm noticing something about the Post-Read. All my array disks are spun up and they are being read from. What's going on? Have I never noticed that before? No writes are being done, just reads. Should I be concerned?

putty window tells me this:

( 1,748,694,528,000 of 2,000,398,934,016 bytes read ) 109 MB/s

Disk Temperature: 32C, Elapsed Time: 20:57:25

unMENU tells me this:

Post-Read (1 of 3). 87% @ 41 MB/s (20:57:25)

Thanks

servion · August 27, 2012

I'm 21 hours into clearing a new WD 2TB EARX, and its 50% through the post-read. The Post read MB/s is substantially lower than the pre-read MB/s, even at the same point in the read-cycle. Why is this, is this normal?

For the post read 50% complete email (which came at 21 hours in)... the email-subject header says: 79.2 MB/s but the message body contains: "Calculated Read Speed: 46 MB/s"

Compared to the pre-read 50% email subject header: 82.6 MB/s, email body "88 MB/s". However, if I look at the screen's actual output of the currently running process, its currently reading high 70's-log 80's MB/s

mr-hexen · August 27, 2012

this is normal.

pre-read just reads to Null, post-read reads and compares.

Harpz · August 27, 2012

Hi

Could some one have a look at my results and let me know if there anything i should worry about.

I'm still learning loads and enjoying my unraid experience, upgraded to a plus licence now

The 2x 1 TB are my oldest drives and the Samsung has come out of my current machine which I'm in the process in moving all the data from.

preclear_results_2TB_WD-WMAZA9227899.txt

preclear_results_2TB_WD-WMAZA8726413.txt

preclear_results_2TB_SAMSUNG_HD204UI.txt

preclear_results_1TB_WD-WCAV5D092592.txt

Harpz · August 27, 2012

Final result for above

preclear_results_1TB_WD-WCAU44987157.txt

drumstyk1 · August 30, 2012

Hey guys, just finished preclearing a brand new drive that will be used for my parity and it doesn't seem to have passed. Would someone mind looking at this and let me know if i should be concerned or if an RMA is in order? Thanks!

preclear_finish__WD-WCAZAF085957_2012-08-31.txt

preclear_results.txt

Neo_x · September 1, 2012

Hi guys

nvm - Solved - discovered after studying the usage script that the following command is possible:\

preclear_disk.sh -d sat /dev/sda

this instructs preclear to utilize alternate commands when running Smartctl.

hope someone can assist - i am running Unraid via an Adaptec controller (Model 52445). It is rather overkill for unraid since it is meant for high levels of RAID, but i didn't want to shell out additional $$$'s to get another controller - currently its performing admirably with roughly 64MB/s on post read clearing 12 drives at the same time

everything seems fine - ie i set all the disks up as JBOD - which seems to simluate pass through(not sure on the correct terms)

anyhow - so far so good. Unraid picks up the first set of 12 drives i connected (having power issues with connecting more ).

Problem i am having - it seems that smartctl doesnt give correct stats on the drive. See sample output below.

root@Storage:~# smartctl -a /dev/sdb
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device:          ST2000DL003-9VT1 Version: CC3C
Serial number:             6YD1RLL6
Device type: disk
Transport protocol: SAS
Local Time is: Sat Sep  1 08:46:46 2012 SAST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error Counter logging not supported
Device does not support Self Test logging
root@Storage:~#

as can be expected - this messes up Preclear a bit, since it is unable to read Smart results before ,during and after. (although otherwise - it doesn't crash or halt the preclear in any way - GREAT SCRIPT JOE L )

I managed to find a smartctl command that does give the output for the drive as required

Thank you Google

root@Storage:~# smartctl -d sat --all /dev/sg1
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST2000DL003-9VT166
Serial Number:    6YD1RLL6
Firmware Version: CC3C
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Sep  1 08:47:26 2012 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 612) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30b7) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   109   099   006    Pre-fail  Always       -       24419656
  3 Spin_Up_Time            0x0003   090   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       286
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       4318984658
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3906
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       286
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   059   023   045    Old_age   Always   In_the_past 41 (75 200 42 25)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       285
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       286
194 Temperature_Celsius     0x0022   041   077   000    Old_age   Always       -       41 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   036   015   000    Old_age   Always       -       24419656
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       66043712114499
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2825939409
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3158544765

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

So i guess my question is then - how do i go about trusting a drive after a preclear?

reading through some of the posts, it seems i need to look for the following :

FAILING NOW attributes,

5 Reallocated_Sector_Ct (this should be preferably zero - or else stay a very low number.)

197 Current_Pending_Sector (this should be preferably zero - or else stay a very low number.)

also - should i be worried about which "device" i am clearing? (since i gather that SDB and SDG is possibly the same thing...)

clear should be completed in about 6 hours - will report on any results i don't understand

Thank you

Neo_x

PS

syslog attached just in case

syslog.zip

bender1 · September 1, 2012

Just posting a result

Preclearing 3 drives, one of them just finished, and seemed like in decent time. WD10EALS 1tb finished in 11 hours 32 minutes. This particular drive wasn't listed in the preclear times, should I add it to the wiki?

bender1 · September 1, 2012

Hello all,

I am new to unraid, just in the midst of preclearing and the first disk finished. Anything in here I need to be concerned with?

smartctl -a -d ata /dev/sdb (--)

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Device Model: WDC WD10EALS-00Z8A0Serial Number: WD-WCATR4743930Firmware Version: 05.01D05User Capacity: 1,000,204,886,016 bytesDevice is: Not in smartctl database [for details use: -P showall]ATA Version is: 8ATA Standard is: Exact ATA specification draft version not indicatedLocal Time is: Sat Sep 1 07:44:19 2012 MDTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDGeneral SMART Values:Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled.Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run.Total time to complete Offline data collection: (16500) seconds.Offline data collectioncapabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability: (0x01) Error logging supported. General Purpose Logging supported.Short self-test routine recommended polling time: ( 2) minutes.Extended self-test routinerecommended polling time: ( 191) minutes.Conveyance self-test routinerecommended polling time: ( 5) minutes.SCT capabilities: (0x3037) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.SMART Attributes Data Structure revision number: 16Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 178 172 021 Pre-fail Always - 4075 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 505 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 5903 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 147192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 77193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 427194 Temperature_Celsius 0x0022 106 102 000 Old_age Always - 41196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0SMART Error Log Version: 1No Errors LoggedSMART Self-test log structure revision number 1No self-tests have been logged. [To run self-tests, use: smartctl -t]SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testingSelective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk.If Selective self-test is pending on power-up, resume after 0 minute delay.

Preclear.sh results - Questions about your results? Post them here.

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Inolvidable

RobJ

binhex

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation