[Solved] Is my disk failing? - Disk 18 - General Support (V5 and Older)

February 21, 201313 yr

Today I noticed red error messages in my log regarding disk 18. Is my disk failing? Any recommended command to run for further diagnosis?

Here is the Smart report:

smartctl -a -d ata /dev/sdg (disk18)
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    **************
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Feb 21 13:07:17 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (23645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       131072
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       111
  3 Spin_Up_Time            0x0007   135   135   024    Pre-fail  Always       -       405 (Average 405)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       813
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   148   148   020    Pre-fail  Offline      -       28
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       13606
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       814
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       814
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       11
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 14 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 14 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:53.200  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:53.180  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:53.160  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:53.140  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:53.121  READ NATIVE MAX ADDRESS EXT

Error 13 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:49.294  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:49.275  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:49.254  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:49.235  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:49.215  READ NATIVE MAX ADDRESS EXT

Error 12 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:45.388  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:45.369  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:45.349  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:45.329  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:45.309  READ NATIVE MAX ADDRESS EXT

Error 11 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:41.483  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:41.463  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:41.443  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:41.423  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:41.403  READ NATIVE MAX ADDRESS EXT

Error 10 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:37.577  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:37.557  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:37.537  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:37.517  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:37.498  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1539         -
# 2  Short offline       Completed without error       00%      1533         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Here is a sample of error messages in the syslog:

Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related)
Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related)
Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/StockInvestor.xls (95) Operation not supported (Drive related)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:15 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:16 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:16 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:16 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:19 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:20 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:20 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:20 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:23 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:24 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:24 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:24 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:27 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:28 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:28 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:28 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:31 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:31 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:32 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:32 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:35 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:35 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Unhandled sense code (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x00 driverbyte=0x08 (System)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Sense Key : 0x3 [current] [descriptor] (Drive related)
Feb 21 11:11:35 Beanstalk kernel: Descriptor sense data with sense descriptors (in hex):
Feb 21 11:11:35 Beanstalk kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Feb 21 11:11:35 Beanstalk kernel:         c3 2a a6 28 
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] ASC=0x11 ASCQ=0x4 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 c3 2a a6 27 00 04 00 00 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: end_request: I/O error, dev sdg, sector 3274352168 (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352104/17, count: 1 (Errors)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352112/17, count: 1 (Errors)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
...

Quote

February 22, 201313 yr

Author

I forgot to mention that I am running version 4.7.

Looks like there are Current_Pending_Sector was at 16, and now 3 after a parity check.

Should I ...

[*]Rebuild this drive by unassign then reassign it in the array?

[*]Replace it with a backup (let UnRaid rebuild new disk) and then run preclear on suspected disk?

[*]Replace it and just RMA the drive because it is dying.

Smart Information after parity check:

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDS5C3020ALA632
Serial Number: **********
Firmware Version: ML6OA580
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Feb 21 23:09:39 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (23645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 111
3 Spin_Up_Time 0x0007 149 149 024 Pre-fail Always - 333 (Average 405)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 818
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 148 148 020 Pre-fail Offline - 28
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13616
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 819
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 819
194 Temperature_Celsius 0x0002 206 206 000 Old_age Always - 29 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 3
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 44 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 44 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4f 50 a8 e0 08 07:50:14.534 READ DMA EXT
27 00 00 00 00 00 e0 08 07:50:14.514 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 07:50:14.494 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 07:50:14.474 SET FEATURES [set transfer mode]
27 00 00 00 00 00 e0 08 07:50:14.454 READ NATIVE MAX ADDRESS EXT

Error 43 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4f 50 a8 e0 08 07:50:10.588 READ DMA EXT
27 00 00 00 00 00 e0 08 07:50:10.569 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 07:50:10.549 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 07:50:10.529 SET FEATURES [set transfer mode]
27 00 00 00 00 00 e0 08 07:50:10.509 READ NATIVE MAX ADDRESS EXT

Error 42 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4f 50 a8 e0 08 07:50:06.643 READ DMA EXT
27 00 00 00 00 00 e0 08 07:50:06.623 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 07:50:06.603 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 07:50:06.583 SET FEATURES [set transfer mode]
27 00 00 00 00 00 e0 08 07:50:06.563 READ NATIVE MAX ADDRESS EXT

Error 41 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4f 50 a8 e0 08 07:50:02.658 READ DMA EXT
27 00 00 00 00 00 e0 08 07:50:02.638 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 07:50:02.618 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 07:50:02.598 SET FEATURES [set transfer mode]
27 00 00 00 00 00 e0 08 07:50:02.578 READ NATIVE MAX ADDRESS EXT

Error 40 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 7d d2 51 a8 0a Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 4f 50 a8 e0 08 07:49:58.723 READ DMA EXT
27 00 00 00 00 00 e0 08 07:49:58.703 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 07:49:58.683 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 07:49:58.663 SET FEATURES [set transfer mode]
27 00 00 00 00 00 e0 08 07:49:58.644 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1539 -
# 2 Short offline Completed without error 00% 1533 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

February 22, 201313 yr

Continuing pending sectors. I would replace the disk and then give it a few preclear cycles to see if it fails more so you can rma it easier (if still possible)

Quote

February 22, 201313 yr

UNC errors (media errors) are un-correctable read errors of specific sectors where the contents of the sector does not agree with the affiliated checksum at the end of that sector on the disk.

Occasionally it is just that the sector was poorly written, and can be re-written in place. Other times (more frequently) it is defects in the platter surface.

The disk will usually re-allocate the sector from its pool of spare sectors when next written. (it will first try to re-write it in place, then use one from the pool of spare sectors... there are typically several thousand spare sectors on a modern large disk)

If the disk is assigned to the array, it cannot be "precleared"

Joe L.

Quote

February 22, 201313 yr

I said replace the disk and then give it a preclear, when it is replaced it is bo longer in the array.

Quote

February 22, 201313 yr

Author

Thank you for the replies. I am in the process of running another parity check before replacing the disk with a spare. Then I will run preclear on the suspect disk to see what the Smart data looks like.

On parity check, if they is a parity error, how does unRAID determine if the error is on the data disk or the parity disk to make the correction?

Quote

February 22, 201313 yr

It doesnt- Parity verifies the drive with parity information. A correcting check will correct parity information on the parity drive and a drive rebuild will rebuild the drive accordig to parity.

This means that if you have a disk with bad data and you run a correcting parity check unraid will, after the parity check has been completed, be perfectly able to rebuild your disk (including the bad data) should the disk fail...

Parity is not sacred and does not help with file corruption, it will make sure that a drive that has completely failed can be restored to a new drive (bringing the new drive in the same state as the old drive before the failure or removal).

Quote

February 22, 201313 yr

Author

So what you are saying is that I should not be running a correcting parity check when a data drive encounters read errors? Because the good parity would then be "corrected" with the bad data that may occurred after the last good parity check?

Quote

February 22, 201313 yr

Yes, that is what I mean. The webpage reads that a correcting parity check will `write corrections to parity disk´. So parity will be updated to reflect the current status of the drive, should that drive contain file corruption then the parity will recreate a new disk with the same file corruption.

Quote

February 22, 201313 yr

Author

That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

Thanks.

Quote

February 22, 201313 yr

That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

Thanks.

I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using the current parity information! Then you can delete that movie. You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk.

Quote

February 22, 201313 yr

Author

That is good to know for future reference. Too late for now. At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

Thanks.

I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using the current parity information! Then you can delete that movie. You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk.

That makes sense. Thanks for the quick response.

Quote

March 6, 201313 yr

Author

I just completed my 8th preclear cycle. So far the reallocated sector count only increased by 1. However, I am still getting a value for the current pending sector. Is this something to be concerned about?

smartctl -a -d ata /dev/sdr (--)
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    **********
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Mar  5 22:15:07 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (23645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       105
  3 Spin_Up_Time            0x0007   227   227   024    Pre-fail  Always       -       219 (Average 265)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       830
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       13903
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       831
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       831
194 Temperature_Celsius     0x0002   193   193   000    Old_age   Always       -       31 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 629 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 629 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:41.243  READ DMA EXT
  25 00 00 48 76 42 e0 00   4d+22:13:41.242  READ DMA EXT
  25 00 00 48 75 42 e0 00   4d+22:13:41.240  READ DMA EXT
  25 00 00 48 74 42 e0 00   4d+22:13:41.238  READ DMA EXT
  25 00 00 48 73 42 e0 00   4d+22:13:41.236  READ DMA EXT

Error 628 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:13.517  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:13:13.517  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:13:13.516  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:13:13.516  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:13:13.516  READ NATIVE MAX ADDRESS EXT

Error 627 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:10.217  READ DMA EXT
  25 00 00 48 76 42 e0 00   4d+22:13:10.215  READ DMA EXT
  25 00 00 48 75 42 e0 00   4d+22:13:10.214  READ DMA EXT
  25 00 00 48 74 42 e0 00   4d+22:13:10.212  READ DMA EXT
  25 00 00 48 73 42 e0 00   4d+22:13:10.210  READ DMA EXT

Error 626 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 39 ef 7a 79 00  Error: UNC 57 sectors at LBA = 0x00797aef = 7961327

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 28 7a 79 e0 00   4d+22:09:41.023  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:09:41.023  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:09:41.022  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:09:41.022  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:09:41.022  READ NATIVE MAX ADDRESS EXT

Error 625 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 39 ef 7a 79 00  Error: UNC 57 sectors at LBA = 0x00797aef = 7961327

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 28 7a 79 e0 00   4d+22:09:37.720  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:09:37.720  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:09:37.719  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:09:37.718  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:09:37.718  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1539         -
# 2  Short offline       Completed without error       00%      1533         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

March 6, 201313 yr

Im no expert but you wanna get the Current pending down to 0

197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4

Quote

March 6, 201313 yr

If you run several pre-clear cycles and the pending sectors does not get down to zero I would be tempted to RMA the drive. You do not want to use a drive that has pending sectors outstanding in unRAID

Quote

March 8, 201313 yr

Author

After the 10th pre_clear cycle, my pending sector count is zero. However, I noticed that every pre_clear cycle will cause the count to increase. So I have decided to RMA the drive back to HGST since it has logged over 500 errors on the drive.

Quote

[Solved] Is my disk failing? - Disk 18

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)