heffneil Posted July 14, 2012 Share Posted July 14, 2012 I ran it through and it completed. Simple question is there a decent program for viewing the reports that would format it properly? I am using Windows notepad and it isn't doing it! Thanks, Neil Quote Link to comment
jowi Posted July 14, 2012 Share Posted July 14, 2012 Try Notepad++ http://notepad-plus-plus.org/ Quote Link to comment
Hypknox Posted July 22, 2012 Share Posted July 22, 2012 Wordpad seems to format a little better as well heffneil. Quick question regarding one of my drives. This used to be my parity drive and I decided to go with a 3TB drive instead for parity so that I can use larger than 2TB drives for data in the future. I followed your instructions listed here Joe - http://lime-technology.com/forum/index.php?topic=6126.msg58998#msg58998 I finished the preclear of my old parity drive successfully and out of curiosity decided to compare the results with the original ones some months ago since this is an existing drive from my array. Everything for the most part looked pretty similar except this time I noticed the following at the bottom of my start and end reports - SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Aborted by host 90% 3104 - # 2 Extended offline Aborted by host 80% 3029 - Should I be alarmed by this? Is this drive ok or is it on it's way out already? I've attached the preclear reports for this drive in the even any additional information is needed. preclear_start__5YD77WNB_2012-07-21.txt preclear_rpt__5YD77WNB_2012-07-21.txt preclear_finish__5YD77WNB_2012-07-21.txt Quote Link to comment
Joe L. Posted July 22, 2012 Share Posted July 22, 2012 nothing to be alarmed about. Both the "short" and "long" tests are automatically aborted if the drive is spun down. Quote Link to comment
RokleM Posted July 22, 2012 Share Posted July 22, 2012 I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate). ========================================================================1.13 == invoked as: ./preclear_disk.sh -c 1 -M 4 -m xxxxxxxxxx /dev/hda == ST31500541AS 5XW05ES1 == Disk /dev/hda has been successfully precleared == with a starting sector of 64 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 5:44:50 (72 MB/s) == Last Cycle's Zeroing time : 7:58:24 (52 MB/s) == Last Cycle's Post Read Time : 11:28:37 (36 MB/s) == Last Cycle's Total Time : 25:12:52 == == Total Elapsed Time 25:12:52 == == Disk Start Temperature: 29C == == Current Disk Temperature: 29C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_hda /tmp/smart_finish_hda ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 110 113 6 ok 26687111 Seek_Error_Rate = 44 44 30 near_thresh 790278696138 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 High_Fly_Writes = 97 98 0 ok 3 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 343 sectors had been re-allocated before the start of the preclear. 343 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Quote Link to comment
downloadski Posted July 22, 2012 Share Posted July 22, 2012 I tried to preclear 6 drives at once: system: System: Supermicro - X7SPA-HF CPU: Intel® AtomTM CPU D525 @ 1.80GHz - 1.8 GHz Cache: 48 kB Memory: 4 GB - 800 MHz Network: 1000Mb/s - Full Duplex 5.0-rc5 drive on mainbord: 1) Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584 drives on ARC1300-16: 2) Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584 3) Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584 4) Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584 5) Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584 6) Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224 6) worked ok and finsihed ok (320 GB) 1) still running 3) still running 4) still running 2) stopped at 1:40 hours in first step 5) stopped at 1:40 hours in first step I added a part of the syslog with the errors, seems something crashed. Can someone look and comment what is wrong here ? Also i have the syslog filled with these lines: Jul 22 06:52:52 Tower emhttp: shcmd (1274): /usr/local/sbin/emhttp_event driver_loaded Jul 22 06:52:52 Tower emhttp_event: driver_loaded Jul 22 06:52:56 Tower emhttp: shcmd (1275): rmmod md-mod |& logger Jul 22 06:52:56 Tower emhttp: shcmd (1276): modprobe md-mod super=/boot/config/super.dat slots=21 |& logger Jul 22 06:52:56 Tower kernel: md: unRAID driver removed Jul 22 06:52:56 Tower emhttp: shcmd (1277): udevadm settle Jul 22 06:52:56 Tower kernel: md: unRAID driver 2.1.4 installed Jul 22 06:52:56 Tower kernel: read_file: error 2 opening /boot/config/super.dat Jul 22 06:52:56 Tower kernel: md: could not read superblock from /boot/config/super.dat Jul 22 06:52:56 Tower kernel: md: initializing superblock Jul 22 06:52:56 Tower emhttp: Device inventory: Jul 22 06:52:56 Tower emhttp: Hitachi_HDT721010SLA360_STF604MH0RR14B (sdb) 976762584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4DNJS (sda) 3907018584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VJ7S (sdd) 3907018584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4R9WM (sde) 3907018584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK1311PAG4VMSS (sdf) 3907018584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDS724040ALE640_PK2311PAG4SX0M (sdg) 3907018584 Jul 22 06:52:56 Tower emhttp: Hitachi_HDT725032VLA380_VFA200R2CL7TPA (sdh) 312571224 Jul 22 06:52:56 Tower kernel: mdcmd (1): import 0 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (2): import 1 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (3): import 2 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (4): import 3 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (5): import 4 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (6): import 5 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (7): import 6 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (: import 7 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (9): import 8 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (10): import 9 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (11): import 10 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (12): import 11 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (13): import 12 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (14): import 13 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (15): import 14 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (16): import 15 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (17): import 16 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (18): import 17 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (19): import 18 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (20): import 19 0,0 Jul 22 06:52:56 Tower kernel: mdcmd (21): import 20 0,0 What does this mean ? System is in a cooled room. The arc 1300 is colled with a fan. Powersupply is a 1000 watts 80 amps single rail cooler master silent pro. Thanks, Jaco syslog_extract_preclear.txt Quote Link to comment
Joe L. Posted July 22, 2012 Share Posted July 22, 2012 I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate). Seek_Error_Rate = 44 44 30 near_thresh 790278696138 I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and odds are the starting value was 100 or 200. 343 sectors had been re-allocated before the start of the preclear. 343 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors. Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA. (the other parameters that are near their thresholds just have very high thresholds... they are not an issue) Quote Link to comment
RokleM Posted July 22, 2012 Share Posted July 22, 2012 I just finished pre-clearing an old drive that use to be in a DNS-323 (converting from that unit). I have some concerns with the data in bold. Is this drive a concern? I know it's also one that people seem to be iffy about (1.5t seagate). Seek_Error_Rate = 44 44 30 near_thresh 790278696138 I would only be concerned with this parameter, since the normalized value seems to be getting close to its failure threshold, and odds are the starting value was 100 or 200. 343 sectors had been re-allocated before the start of the preclear. 343 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. The number of re-allocated sectors did not change, and that is good, but the number 343 is very high, and most people would RMA the drive based only on the number of re-allocated sectors. Since the seek error rate is iffy, and the re-allocated sector count high, I'd RMA. (the other parameters that are near their thresholds just have very high thresholds... they are not an issue) Thanks. Surprisingly, it's still under warranty. I will see if seatools shows it as bad enough to warranty. Quote Link to comment
RokleM Posted July 23, 2012 Share Posted July 23, 2012 Thoughts on this one? == Last Cycle's Pre Read Time : 6:11:42 (134 MB/s) == Last Cycle's Zeroing time : 6:26:03 (129 MB/s) == Last Cycle's Post Read Time : 12:58:02 (64 MB/s) ** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 118 100 6 ok 183174328 Seek_Error_Rate = 60 100 30 ok 1107922 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 High_Fly_Writes = 99 100 0 ok 1 Airflow_Temperature_Cel = 63 71 45 near_thresh 37 Temperature_Celsius = 37 29 0 ok 37 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Quote Link to comment
Joe L. Posted July 23, 2012 Share Posted July 23, 2012 Thoughts on this one? == Last Cycle's Pre Read Time : 6:11:42 (134 MB/s) == Last Cycle's Zeroing time : 6:26:03 (129 MB/s) == Last Cycle's Post Read Time : 12:58:02 (64 MB/s) ** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 118 100 6 ok 183174328 Seek_Error_Rate = 60 100 30 ok 1107922 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 High_Fly_Writes = 99 100 0 ok 1 Airflow_Temperature_Cel = 63 71 45 near_thresh 37 Temperature_Celsius = 37 29 0 ok 37 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. looks fine. Quote Link to comment
downloadski Posted August 5, 2012 Share Posted August 5, 2012 Can someone tell me what these errors mean during a preclear:" Aug 5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [3] tag[3], task [d6eaf900]: Aug 5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 0000000F, slot [3]. Aug 5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f76480 timed out Aug 5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1952:Release slot [1] tag[1], task [d6eaf2c0]: Aug 5 18:24:21 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000007, slot [1]. Aug 5 18:24:21 Tower kernel: sas: sas_ata_task_done: SAS error 8a Aug 5 18:24:21 Tower kernel: sd 0:0:0:0: [sdh] command f2f760c0 timed out Aug 5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2d75780 timed out Aug 5 18:24:51 Tower kernel: sd 0:0:1:0: [sdi] command f2e5c240 timed out Aug 5 18:24:51 Tower kernel: sas: Enter sas_scsi_recover_host busy: 4 failed: 4 Aug 5 18:24:51 Tower kernel: sas: trying to find task 0xd4febb80 Aug 5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xd4febb80 Aug 5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xd4febb80 is aborted Aug 5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xd4febb80 is aborted Aug 5 18:24:51 Tower kernel: sas: trying to find task 0xde7f5680 Aug 5 18:24:51 Tower kernel: sas: sas_scsi_find_task: aborting task 0xde7f5680 Aug 5 18:24:51 Tower kernel: sas: sas_scsi_find_task: task 0xde7f5680 is aborted Aug 5 18:24:51 Tower kernel: sas: sas_eh_handle_sas_errors: task 0xde7f5680 is aborted Aug 5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: cmd error handler Aug 5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: cmd error handler Aug 5 18:24:51 Tower kernel: sas: ata7: end_device-0:0: dev error handler Aug 5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:24:51 Tower kernel: sas: ata8: end_device-0:1: dev error handler Aug 5 18:24:51 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen Aug 5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED Aug 5 18:24:51 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 Aug 5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:16:ef/01:00:6f:00:00/40 tag 0 ncq 131072 in Aug 5 18:24:51 Tower kernel: res 40/00:04:78:78:d7/00:00:6f:00:00/40 Emask 0x4 (timeout) Aug 5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED Aug 5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8d:8e/01:00:72:00:00/40 tag 0 ncq 131072 in Aug 5 18:24:51 Tower kernel: res ff/3f:3f:37:c8:10/00:00:56:88:2a/00 Emask 0x403 (HSM violation) <F> Aug 5 18:24:51 Tower kernel: ata7.00: status: { Busy } Aug 5 18:24:51 Tower kernel: ata7.00: error: { IDNF ABRT } Aug 5 18:24:51 Tower kernel: ata8.00: status: { DRDY } Aug 5 18:24:51 Tower kernel: ata7.00: failed command: READ FPDMA QUEUED Aug 5 18:24:51 Tower kernel: ata7.00: cmd 60/00:00:30:8e:8e/01:00:72:00:00/40 tag 1 ncq 131072 in Aug 5 18:24:51 Tower kernel: res 01/04:04:30:8d:8e/00:00:72:00:00/40 Emask 0x2 (HSM violation) Aug 5 18:24:51 Tower kernel: ata7.00: status: { ERR } Aug 5 18:24:51 Tower kernel: ata7.00: error: { ABRT } Aug 5 18:24:51 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED Aug 5 18:24:51 Tower kernel: ata8.00: cmd 60/00:00:40:17:ef/01:00:6f:00:00/40 tag 1 ncq 131072 in Aug 5 18:24:51 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 5 18:24:51 Tower kernel: ata7: hard resetting link Aug 5 18:24:51 Tower kernel: ata8.00: status: { DRDY } Aug 5 18:24:51 Tower kernel: ata8: hard resetting link Aug 5 18:24:51 Tower kernel: sas: sas_form_port: phy3 belongs to port1 already(1)! Aug 5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:24:51 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:24:51 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV Aug 5 18:24:51 Tower kernel: ata7.00: revalidation failed (errno=-2) Aug 5 18:24:53 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[1]:rc= 0 Aug 5 18:24:54 Tower kernel: ata8.00: configured for UDMA/133 Aug 5 18:24:54 Tower kernel: ata8.00: device reported invalid CHS sector 0 Aug 5 18:24:54 Tower kernel: ata8: EH complete Aug 5 18:24:56 Tower kernel: ata7: hard resetting link Aug 5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:24:56 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:24:56 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV Aug 5 18:24:56 Tower kernel: ata7.00: revalidation failed (errno=-2) Aug 5 18:25:01 Tower kernel: ata7: hard resetting link Aug 5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:25:02 Tower kernel: drivers/scsi/mvsas/mv_94xx.c 626:command active 00000004, slot [0]. Aug 5 18:25:02 Tower kernel: ata7.00: both IDENTIFYs aborted, assuming NODEV Aug 5 18:25:02 Tower kernel: ata7.00: revalidation failed (errno=-2) Aug 5 18:25:02 Tower kernel: ata7.00: disabled Aug 5 18:25:02 Tower kernel: ata7: EH complete Aug 5 18:25:02 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8e 30 00 01 00 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921945136 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243142 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243143 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243144 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243145 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243146 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243147 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243148 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243149 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243150 Aug 5 18:25:02 Tower kernel: Buffer I/O error on device sdh, logical block 240243151 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 01 00 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e 8d 30 00 00 08 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921944880 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 20 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:25:02 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 72 8e aa 50 00 00 08 00 Aug 5 18:25:02 Tower kernel: end_request: I/O error, dev sdh, sector 1921952336 The preclear was at 65% I have many thoused of these errrors, and i stopped the preclear. In the beginning it were multiple sectors, the last 8000 lines it is only one sector it complains about. Aug 5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776 Aug 5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] Unhandled error code Aug 5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Aug 5 18:44:28 Tower kernel: sd 0:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 ae a8 75 c0 00 00 08 00 Aug 5 18:44:28 Tower kernel: end_request: I/O error, dev sdh, sector 2930275776 what i did: - moved the hdd over to another sata data and power port (SFF8087 to 4 sata) on which another disk precleared fine I am running 5.0 RC6 This disc is connected to a supermicro AOC-SAS2LP-MV8 Is the HDD bad, and perhaps remapping bad sectors ? I can imagine if this takes to long unraid times out. Quote Link to comment
Joe L. Posted August 5, 2012 Share Posted August 5, 2012 It means the disk is timing out when communications to it are attempted. It could be a bad disk, or a bad disk controller, or a poor power supply, or poor quality splitter/drive cage connections. Notice there are TWO disks involved. They may share a common controller, or one might be causing the lock-up of the other sharing a disk controller. Some-times it is just the drive that is confused and a power cycle will fix it, other times it will not. get a smart report of the drives involved. smartctl -a /dev/sdi smartctl -a /dev/sdh Quote Link to comment
downloadski Posted August 6, 2012 Share Posted August 6, 2012 Thx, i will do that once i get back home today after work. The SDI drive was continuing just fine it seems, so i guess i will have the 3 preclear files of that one. The drives are on one power cable with 4 sata connectors (2 used) - original cable that came with the Power supply (monster realpower M1000) No splitters there. The Sata cable is a SFf-8087 splitted to 4 sata data connectors. The 2nd preclair on this SDH drive i did on a other sata data connector on which i pre-cleared a disk fine. Is there a reason to suspect the quality of these SFF-8087 breakout cables ? http://cybershop.ri-vier.nl/discrete-sff8087-to-4x-sata-mini-sas-forward-brkout-cable-l50-p-103.html edit: they look a lot like these: mono price for $9.63 (payed like 20 euro) I had massive issues on my Areca also, if these things could be caused by the cables i would be wise to buy other ones to test perhaps. edit: (8-8-2012) the 3rd go failed as well. According to seagate i still have warrantee on this disk, so i will go for a swap of the hdd. Will get a seagate back it seems, do not know if i am very happy with that.. Quote Link to comment
downloadski Posted August 6, 2012 Share Posted August 6, 2012 Other sata data cable, other power cable, same drive, same errors at 65% Other drive of same type continues with pre-clear. So drive issue i assum, will put it aside Quote Link to comment
bcpratt Posted August 7, 2012 Share Posted August 7, 2012 I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server. It looks like it would be safe to use - no "FAILING_NOW" issues. To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together. Once things seem to be operating smoothly and more space is needed I will expand with new pieces. Currently I'm just interested in getting the server up and running so I can test out the capabilities. Your opinion after taking a look at the reports is appreciated. Thanks. preclear_250GB.zip Quote Link to comment
Joe L. Posted August 7, 2012 Share Posted August 7, 2012 I'm attaching reports generated from a preclear of a disk I'm planning to use in my unRAID server. It looks like it would be safe to use - no "FAILING_NOW" issues. To get things going initially I'm taking the Frankenstein approach and cobbling old hardware together. Once things seem to be operating smoothly and more space is needed I will expand with new pieces. Currently I'm just interested in getting the server up and running so I can test out the capabilities. Your opinion after taking a look at the reports is appreciated. Thanks. Your disk looks fine. There are only two items ni the SMART report worth mentioning: 9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34907 199 UDMA_CRC_Error_Count 0x003e 200 197 000 Old_age Always - 34 The first is the run-time-hours. (it has been in operation for about 4 years) The UDMA CRC errors are usually noise pickup from cables. (try NOT to be anal with cable management unless you use good quality SHIELDED cables. ) Do not tie-wrap SATA cables together and definitely not with power cables. The errors are not bad, but you should be aware of their cause. Lastly, I'd much rather trust an older drive such as this rather than a brand new un-tested drive. Good luck with your test server. Quote Link to comment
petsheep Posted August 17, 2012 Share Posted August 17, 2012 I've precleared many many disks and this is the first time I'm noticing something about the Post-Read. All my array disks are spun up and they are being read from. What's going on? Have I never noticed that before? No writes are being done, just reads. Should I be concerned? putty window tells me this: ( 1,748,694,528,000 of 2,000,398,934,016 bytes read ) 109 MB/s Disk Temperature: 32C, Elapsed Time: 20:57:25 unMENU tells me this: Post-Read (1 of 3). 87% @ 41 MB/s (20:57:25) Thanks Quote Link to comment
servion Posted August 27, 2012 Share Posted August 27, 2012 I'm 21 hours into clearing a new WD 2TB EARX, and its 50% through the post-read. The Post read MB/s is substantially lower than the pre-read MB/s, even at the same point in the read-cycle. Why is this, is this normal? For the post read 50% complete email (which came at 21 hours in)... the email-subject header says: 79.2 MB/s but the message body contains: "Calculated Read Speed: 46 MB/s" Compared to the pre-read 50% email subject header: 82.6 MB/s, email body "88 MB/s". However, if I look at the screen's actual output of the currently running process, its currently reading high 70's-log 80's MB/s Quote Link to comment
mr-hexen Posted August 27, 2012 Share Posted August 27, 2012 this is normal. pre-read just reads to Null, post-read reads and compares. Quote Link to comment
Harpz Posted August 27, 2012 Share Posted August 27, 2012 Hi Could some one have a look at my results and let me know if there anything i should worry about. I'm still learning loads and enjoying my unraid experience, upgraded to a plus licence now The 2x 1 TB are my oldest drives and the Samsung has come out of my current machine which I'm in the process in moving all the data from. preclear_results_2TB_WD-WMAZA9227899.txt preclear_results_2TB_WD-WMAZA8726413.txt preclear_results_2TB_SAMSUNG_HD204UI.txt preclear_results_1TB_WD-WCAV5D092592.txt Quote Link to comment
Harpz Posted August 27, 2012 Share Posted August 27, 2012 Final result for above preclear_results_1TB_WD-WCAU44987157.txt Quote Link to comment
drumstyk1 Posted August 30, 2012 Share Posted August 30, 2012 Hey guys, just finished preclearing a brand new drive that will be used for my parity and it doesn't seem to have passed. Would someone mind looking at this and let me know if i should be concerned or if an RMA is in order? Thanks! preclear_finish__WD-WCAZAF085957_2012-08-31.txt preclear_results.txt Quote Link to comment
Neo_x Posted September 1, 2012 Share Posted September 1, 2012 Hi guys nvm - Solved - discovered after studying the usage script that the following command is possible:\ preclear_disk.sh -d sat /dev/sda this instructs preclear to utilize alternate commands when running Smartctl. hope someone can assist - i am running Unraid via an Adaptec controller (Model 52445). It is rather overkill for unraid since it is meant for high levels of RAID, but i didn't want to shell out additional $$$'s to get another controller - currently its performing admirably with roughly 64MB/s on post read clearing 12 drives at the same time everything seems fine - ie i set all the disks up as JBOD - which seems to simluate pass through(not sure on the correct terms) anyhow - so far so good. Unraid picks up the first set of 12 drives i connected (having power issues with connecting more ). Problem i am having - it seems that smartctl doesnt give correct stats on the drive. See sample output below. root@Storage:~# smartctl -a /dev/sdb smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: ST2000DL003-9VT1 Version: CC3C Serial number: 6YD1RLL6 Device type: disk Transport protocol: SAS Local Time is: Sat Sep 1 08:46:46 2012 SAST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Error Counter logging not supported Device does not support Self Test logging root@Storage:~# as can be expected - this messes up Preclear a bit, since it is unable to read Smart results before ,during and after. (although otherwise - it doesn't crash or halt the preclear in any way - GREAT SCRIPT JOE L ) I managed to find a smartctl command that does give the output for the drive as required Thank you Google root@Storage:~# smartctl -d sat --all /dev/sg1 smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: ST2000DL003-9VT166 Serial Number: 6YD1RLL6 Firmware Version: CC3C User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sat Sep 1 08:47:26 2012 SAST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 612) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x30b7) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 109 099 006 Pre-fail Always - 24419656 3 Spin_Up_Time 0x0003 090 090 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 286 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 4318984658 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3906 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 286 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 059 023 045 Old_age Always In_the_past 41 (75 200 42 25) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 285 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 286 194 Temperature_Celsius 0x0022 041 077 000 Old_age Always - 41 (0 14 0 0) 195 Hardware_ECC_Recovered 0x001a 036 015 000 Old_age Always - 24419656 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 66043712114499 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2825939409 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3158544765 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. So i guess my question is then - how do i go about trusting a drive after a preclear? reading through some of the posts, it seems i need to look for the following : FAILING NOW attributes, 5 Reallocated_Sector_Ct (this should be preferably zero - or else stay a very low number.) 197 Current_Pending_Sector (this should be preferably zero - or else stay a very low number.) also - should i be worried about which "device" i am clearing? (since i gather that SDB and SDG is possibly the same thing...) clear should be completed in about 6 hours - will report on any results i don't understand Thank you Neo_x PS syslog attached just in case syslog.zip Quote Link to comment
bender1 Posted September 1, 2012 Share Posted September 1, 2012 Just posting a result Preclearing 3 drives, one of them just finished, and seemed like in decent time. WD10EALS 1tb finished in 11 hours 32 minutes. This particular drive wasn't listed in the preclear times, should I add it to the wiki? Quote Link to comment
bender1 Posted September 1, 2012 Share Posted September 1, 2012 Hello all, I am new to unraid, just in the midst of preclearing and the first disk finished. Anything in here I need to be concerned with? smartctl -a -d ata /dev/sdb (--) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Device Model: WDC WD10EALS-00Z8A0Serial Number: WD-WCATR4743930Firmware Version: 05.01D05User Capacity: 1,000,204,886,016 bytesDevice is: Not in smartctl database [for details use: -P showall]ATA Version is: 8ATA Standard is: Exact ATA specification draft version not indicatedLocal Time is: Sat Sep 1 07:44:19 2012 MDTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDGeneral SMART Values:Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled.Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run.Total time to complete Offline data collection: (16500) seconds.Offline data collectioncapabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability: (0x01) Error logging supported. General Purpose Logging supported.Short self-test routine recommended polling time: ( 2) minutes.Extended self-test routinerecommended polling time: ( 191) minutes.Conveyance self-test routinerecommended polling time: ( 5) minutes.SCT capabilities: (0x3037) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.SMART Attributes Data Structure revision number: 16Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 178 172 021 Pre-fail Always - 4075 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 505 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 5903 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 147192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 77193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 427194 Temperature_Celsius 0x0022 106 102 000 Old_age Always - 41196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0SMART Error Log Version: 1No Errors LoggedSMART Self-test log structure revision number 1No self-tests have been logged. [To run self-tests, use: smartctl -t]SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testingSelective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk.If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.