mifronte Posted February 21, 2013 Share Posted February 21, 2013 Today I noticed red error messages in my log regarding disk 18. Is my disk failing? Any recommended command to run for further diagnosis? Here is the Smart report: smartctl -a -d ata /dev/sdg (disk18) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: Hitachi HDS5C3020ALA632 Serial Number: ************** Firmware Version: ML6OA580 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Feb 21 13:07:17 2013 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (23645) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 099 099 016 Pre-fail Always - 131072 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 111 3 Spin_Up_Time 0x0007 135 135 024 Pre-fail Always - 405 (Average 405) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 813 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 148 148 020 Pre-fail Offline - 28 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13606 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 814 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 814 194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Lifetime Min/Max 17/33) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 11 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 14 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 14 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ff 28 a6 2a 03 Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 27 a6 2a e0 08 5d+23:54:53.200 READ DMA EXT 27 00 00 00 00 00 e0 08 5d+23:54:53.180 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 5d+23:54:53.160 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 5d+23:54:53.140 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 5d+23:54:53.121 READ NATIVE MAX ADDRESS EXT Error 13 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ff 28 a6 2a 03 Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 27 a6 2a e0 08 5d+23:54:49.294 READ DMA EXT 27 00 00 00 00 00 e0 08 5d+23:54:49.275 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 5d+23:54:49.254 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 5d+23:54:49.235 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 5d+23:54:49.215 READ NATIVE MAX ADDRESS EXT Error 12 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ff 28 a6 2a 03 Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 27 a6 2a e0 08 5d+23:54:45.388 READ DMA EXT 27 00 00 00 00 00 e0 08 5d+23:54:45.369 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 5d+23:54:45.349 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 5d+23:54:45.329 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 5d+23:54:45.309 READ NATIVE MAX ADDRESS EXT Error 11 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ff 28 a6 2a 03 Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 27 a6 2a e0 08 5d+23:54:41.483 READ DMA EXT 27 00 00 00 00 00 e0 08 5d+23:54:41.463 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 5d+23:54:41.443 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 5d+23:54:41.423 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 5d+23:54:41.403 READ NATIVE MAX ADDRESS EXT Error 10 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ff 28 a6 2a 03 Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 27 a6 2a e0 08 5d+23:54:37.577 READ DMA EXT 27 00 00 00 00 00 e0 08 5d+23:54:37.557 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 5d+23:54:37.537 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 5d+23:54:37.517 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 5d+23:54:37.498 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1539 - # 2 Short offline Completed without error 00% 1533 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Here is a sample of error messages in the syslog: Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related) Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related) Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/StockInvestor.xls (95) Operation not supported (Drive related) Feb 21 11:11:15 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:15 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:15 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:15 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:15 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:15 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:15 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:15 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:16 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:16 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:16 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:19 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:19 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:19 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:19 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:19 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:19 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:19 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:19 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:20 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:20 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:20 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:23 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:23 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:23 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:23 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:23 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:23 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:23 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:23 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:24 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:24 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:24 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:27 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:27 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:27 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:27 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:27 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:27 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:27 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:27 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:28 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:28 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:28 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:31 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:31 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:31 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:31 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:31 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:31 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:31 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:31 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:31 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:32 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:32 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:35 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors) Feb 21 11:11:35 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors) Feb 21 11:11:35 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues) Feb 21 11:11:35 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related) Feb 21 11:11:35 Beanstalk kernel: res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors) Feb 21 11:11:35 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related) Feb 21 11:11:35 Beanstalk kernel: ata7.00: error: { UNC } (Errors) Feb 21 11:11:35 Beanstalk kernel: ata7: hard resetting link (Minor Issues) Feb 21 11:11:35 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 21 11:11:35 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related) Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Unhandled sense code (Drive related) Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x00 driverbyte=0x08 (System) Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Sense Key : 0x3 [current] [descriptor] (Drive related) Feb 21 11:11:35 Beanstalk kernel: Descriptor sense data with sense descriptors (in hex): Feb 21 11:11:35 Beanstalk kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Feb 21 11:11:35 Beanstalk kernel: c3 2a a6 28 Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] ASC=0x11 ASCQ=0x4 (Drive related) Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 c3 2a a6 27 00 04 00 00 (Drive related) Feb 21 11:11:35 Beanstalk kernel: end_request: I/O error, dev sdg, sector 3274352168 (Errors) Feb 21 11:11:35 Beanstalk kernel: ata7: EH complete (Drive related) Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors) Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352104/17, count: 1 (Errors) Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors) Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352112/17, count: 1 (Errors) Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors) ... Link to comment
mifronte Posted February 22, 2013 Author Share Posted February 22, 2013 I forgot to mention that I am running version 4.7. Looks like there are Current_Pending_Sector was at 16, and now 3 after a parity check. Should I ... [*]Rebuild this drive by unassign then reassign it in the array? [*]Replace it with a backup (let UnRaid rebuild new disk) and then run preclear on suspected disk? [*]Replace it and just RMA the drive because it is dying. Smart Information after parity check: smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: Hitachi HDS5C3020ALA632 Serial Number: ********** Firmware Version: ML6OA580 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Feb 21 23:09:39 2013 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (23645) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 111 3 Spin_Up_Time 0x0007 149 149 024 Pre-fail Always - 333 (Average 405) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 818 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 148 148 020 Pre-fail Offline - 28 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13616 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 819 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 819 194 Temperature_Celsius 0x0002 206 206 000 Old_age Always - 29 (Lifetime Min/Max 17/33) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 44 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 44 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4f 50 a8 e0 08 07:50:14.534 READ DMA EXT 27 00 00 00 00 00 e0 08 07:50:14.514 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 07:50:14.494 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 07:50:14.474 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 07:50:14.454 READ NATIVE MAX ADDRESS EXT Error 43 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4f 50 a8 e0 08 07:50:10.588 READ DMA EXT 27 00 00 00 00 00 e0 08 07:50:10.569 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 07:50:10.549 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 07:50:10.529 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 07:50:10.509 READ NATIVE MAX ADDRESS EXT Error 42 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4f 50 a8 e0 08 07:50:06.643 READ DMA EXT 27 00 00 00 00 00 e0 08 07:50:06.623 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 07:50:06.603 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 07:50:06.583 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 07:50:06.563 READ NATIVE MAX ADDRESS EXT Error 41 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4f 50 a8 e0 08 07:50:02.658 READ DMA EXT 27 00 00 00 00 00 e0 08 07:50:02.638 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 07:50:02.618 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 07:50:02.598 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 07:50:02.578 READ NATIVE MAX ADDRESS EXT Error 40 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 4f 50 a8 e0 08 07:49:58.723 READ DMA EXT 27 00 00 00 00 00 e0 08 07:49:58.703 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 07:49:58.683 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 07:49:58.663 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 08 07:49:58.644 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1539 - # 2 Short offline Completed without error 00% 1533 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
Helmonder Posted February 22, 2013 Share Posted February 22, 2013 Continuing pending sectors. I would replace the disk and then give it a few preclear cycles to see if it fails more so you can rma it easier (if still possible) Link to comment
Joe L. Posted February 22, 2013 Share Posted February 22, 2013 UNC errors (media errors) are un-correctable read errors of specific sectors where the contents of the sector does not agree with the affiliated checksum at the end of that sector on the disk. Occasionally it is just that the sector was poorly written, and can be re-written in place. Other times (more frequently) it is defects in the platter surface. The disk will usually re-allocate the sector from its pool of spare sectors when next written. (it will first try to re-write it in place, then use one from the pool of spare sectors... there are typically several thousand spare sectors on a modern large disk) If the disk is assigned to the array, it cannot be "precleared" Joe L. Link to comment
Helmonder Posted February 22, 2013 Share Posted February 22, 2013 I said replace the disk and then give it a preclear, when it is replaced it is bo longer in the array. Link to comment
mifronte Posted February 22, 2013 Author Share Posted February 22, 2013 Thank you for the replies. I am in the process of running another parity check before replacing the disk with a spare. Then I will run preclear on the suspect disk to see what the Smart data looks like. On parity check, if they is a parity error, how does unRAID determine if the error is on the data disk or the parity disk to make the correction? Link to comment
Helmonder Posted February 22, 2013 Share Posted February 22, 2013 It doesnt- Parity verifies the drive with parity information. A correcting check will correct parity information on the parity drive and a drive rebuild will rebuild the drive accordig to parity. This means that if you have a disk with bad data and you run a correcting parity check unraid will, after the parity check has been completed, be perfectly able to rebuild your disk (including the bad data) should the disk fail... Parity is not sacred and does not help with file corruption, it will make sure that a drive that has completely failed can be restored to a new drive (bringing the new drive in the same state as the old drive before the failure or removal). Link to comment
mifronte Posted February 22, 2013 Author Share Posted February 22, 2013 So what you are saying is that I should not be running a correcting parity check when a data drive encounters read errors? Because the good parity would then be "corrected" with the bad data that may occurred after the last good parity check? Link to comment
Helmonder Posted February 22, 2013 Share Posted February 22, 2013 Yes, that is what I mean. The webpage reads that a correcting parity check will `write corrections to parity disk´. So parity will be updated to reflect the current status of the drive, should that drive contain file corruption then the parity will recreate a new disk with the same file corruption. Link to comment
mifronte Posted February 22, 2013 Author Share Posted February 22, 2013 That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error. I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk. Thanks. Link to comment
Frank1940 Posted February 22, 2013 Share Posted February 22, 2013 That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error. I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk. Thanks. I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using the current parity information! Then you can delete that movie. You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk. Link to comment
mifronte Posted February 22, 2013 Author Share Posted February 22, 2013 That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error. I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk. Thanks. I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using the current parity information! Then you can delete that movie. You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk. That makes sense. Thanks for the quick response. Link to comment
mifronte Posted March 6, 2013 Author Share Posted March 6, 2013 I just completed my 8th preclear cycle. So far the reallocated sector count only increased by 1. However, I am still getting a value for the current pending sector. Is this something to be concerned about? smartctl -a -d ata /dev/sdr (--) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: Hitachi HDS5C3020ALA632 Serial Number: ********** Firmware Version: ML6OA580 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Mar 5 22:15:07 2013 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (23645) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 105 3 Spin_Up_Time 0x0007 227 227 024 Pre-fail Always - 219 (Average 265) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 830 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 2 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13903 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 831 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 831 194 Temperature_Celsius 0x0002 193 193 000 Old_age Always - 31 (Lifetime Min/Max 17/33) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 2 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 4 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 629 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 629 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 49 ff 77 42 01 Error: UNC 73 sectors at LBA = 0x014277ff = 21133311 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 48 77 42 e0 00 4d+22:13:41.243 READ DMA EXT 25 00 00 48 76 42 e0 00 4d+22:13:41.242 READ DMA EXT 25 00 00 48 75 42 e0 00 4d+22:13:41.240 READ DMA EXT 25 00 00 48 74 42 e0 00 4d+22:13:41.238 READ DMA EXT 25 00 00 48 73 42 e0 00 4d+22:13:41.236 READ DMA EXT Error 628 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 49 ff 77 42 01 Error: UNC 73 sectors at LBA = 0x014277ff = 21133311 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 48 77 42 e0 00 4d+22:13:13.517 READ DMA EXT 27 00 00 00 00 00 e0 00 4d+22:13:13.517 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 4d+22:13:13.516 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 4d+22:13:13.516 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 4d+22:13:13.516 READ NATIVE MAX ADDRESS EXT Error 627 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 49 ff 77 42 01 Error: UNC 73 sectors at LBA = 0x014277ff = 21133311 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 48 77 42 e0 00 4d+22:13:10.217 READ DMA EXT 25 00 00 48 76 42 e0 00 4d+22:13:10.215 READ DMA EXT 25 00 00 48 75 42 e0 00 4d+22:13:10.214 READ DMA EXT 25 00 00 48 74 42 e0 00 4d+22:13:10.212 READ DMA EXT 25 00 00 48 73 42 e0 00 4d+22:13:10.210 READ DMA EXT Error 626 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 39 ef 7a 79 00 Error: UNC 57 sectors at LBA = 0x00797aef = 7961327 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 28 7a 79 e0 00 4d+22:09:41.023 READ DMA EXT 27 00 00 00 00 00 e0 00 4d+22:09:41.023 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 4d+22:09:41.022 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 4d+22:09:41.022 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 4d+22:09:41.022 READ NATIVE MAX ADDRESS EXT Error 625 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 39 ef 7a 79 00 Error: UNC 57 sectors at LBA = 0x00797aef = 7961327 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 28 7a 79 e0 00 4d+22:09:37.720 READ DMA EXT 27 00 00 00 00 00 e0 00 4d+22:09:37.720 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 4d+22:09:37.719 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 4d+22:09:37.718 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 4d+22:09:37.718 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1539 - # 2 Short offline Completed without error 00% 1533 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
Harpz Posted March 6, 2013 Share Posted March 6, 2013 Im no expert but you wanna get the Current pending down to 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 4 Link to comment
itimpi Posted March 6, 2013 Share Posted March 6, 2013 If you run several pre-clear cycles and the pending sectors does not get down to zero I would be tempted to RMA the drive. You do not want to use a drive that has pending sectors outstanding in unRAID Link to comment
mifronte Posted March 8, 2013 Author Share Posted March 8, 2013 After the 10th pre_clear cycle, my pending sector count is zero. However, I noticed that every pre_clear cycle will cause the count to increase. So I have decided to RMA the drive back to HGST since it has logged over 500 errors on the drive. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.