Jump to content

[Solved] Is my disk failing? - Disk 18


mifronte

Recommended Posts

Today I noticed red error messages in my log regarding disk 18.  Is my disk failing?  Any recommended command to run for further diagnosis?

 

Here is the Smart report:

smartctl -a -d ata /dev/sdg (disk18)
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    **************
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Feb 21 13:07:17 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (23645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       131072
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       111
  3 Spin_Up_Time            0x0007   135   135   024    Pre-fail  Always       -       405 (Average 405)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       813
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   148   148   020    Pre-fail  Offline      -       28
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       13606
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       814
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       814
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       11
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 14 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 14 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:53.200  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:53.180  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:53.160  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:53.140  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:53.121  READ NATIVE MAX ADDRESS EXT

Error 13 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:49.294  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:49.275  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:49.254  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:49.235  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:49.215  READ NATIVE MAX ADDRESS EXT

Error 12 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:45.388  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:45.369  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:45.349  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:45.329  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:45.309  READ NATIVE MAX ADDRESS EXT

Error 11 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:41.483  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:41.463  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:41.443  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:41.423  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:41.403  READ NATIVE MAX ADDRESS EXT

Error 10 occurred at disk power-on lifetime: 13604 hours (566 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 28 a6 2a 03  Error: UNC 255 sectors at LBA = 0x032aa628 = 53126696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 27 a6 2a e0 08   5d+23:54:37.577  READ DMA EXT
  27 00 00 00 00 00 e0 08   5d+23:54:37.557  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08   5d+23:54:37.537  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08   5d+23:54:37.517  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08   5d+23:54:37.498  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1539         -
# 2  Short offline       Completed without error       00%      1533         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Here is a sample of error messages in the syslog:

Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related)
Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/E0400CF3.tmp (95) Operation not supported (Drive related)
Feb 21 10:30:26 Beanstalk shfs: shfs_setxattr: setxattr: /mnt/disk1/Data/Investing/StockInvestor.xls (95) Operation not supported (Drive related)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:15 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:15 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:15 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:16 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:16 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:16 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:19 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:19 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:19 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:20 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:20 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:20 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:23 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:23 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:23 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:24 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:24 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:24 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:27 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:27 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:27 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:28 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:28 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:28 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:31 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:31 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:31 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:31 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:32 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:32 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: edma_err_cause=00000084 pp_flags=00000001, dev error, EDMA self-disable (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: failed command: READ DMA EXT (Minor Issues)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: cmd 25/00:00:27:a6:2a/00:04:c3:00:00/e0 tag 0 dma 524288 in (Drive related)
Feb 21 11:11:35 Beanstalk kernel:          res 51/40:ff:28:a6:2a/40:03:c3:00:00/03 Emask 0x9 (media error) (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: status: { DRDY ERR } (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: error: { UNC } (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7: hard resetting link (Minor Issues)
Feb 21 11:11:35 Beanstalk kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 21 11:11:35 Beanstalk kernel: ata7.00: configured for UDMA/133 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Unhandled sense code (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x00 driverbyte=0x08 (System)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] Sense Key : 0x3 [current] [descriptor] (Drive related)
Feb 21 11:11:35 Beanstalk kernel: Descriptor sense data with sense descriptors (in hex):
Feb 21 11:11:35 Beanstalk kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Feb 21 11:11:35 Beanstalk kernel:         c3 2a a6 28 
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] ASC=0x11 ASCQ=0x4 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 c3 2a a6 27 00 04 00 00 (Drive related)
Feb 21 11:11:35 Beanstalk kernel: end_request: I/O error, dev sdg, sector 3274352168 (Errors)
Feb 21 11:11:35 Beanstalk kernel: ata7: EH complete (Drive related)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352104/17, count: 1 (Errors)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
Feb 21 11:11:35 Beanstalk kernel: handle_stripe read error: 3274352112/17, count: 1 (Errors)
Feb 21 11:11:35 Beanstalk kernel: md: disk18 read error (Errors)
...

Link to comment

I forgot to mention that I am running version 4.7.

 

Looks like there are Current_Pending_Sector was at 16, and now 3 after a parity check.

 

Should I ...

[*]Rebuild this drive by unassign then reassign it in the array?

[*]Replace it with a backup (let UnRaid rebuild new disk) and then run preclear on suspected disk?

[*]Replace it and just RMA the drive because it is dying.

 

Smart Information after parity check:



smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:    Hitachi HDS5C3020ALA632
Serial Number:  **********
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Feb 21 23:09:39 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (23645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (  1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities:       (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      0
  2 Throughput_Performance  0x0005  132  132  054    Pre-fail  Offline      -      111
  3 Spin_Up_Time            0x0007  149  149  024    Pre-fail  Always      -      333 (Average 405)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      818
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      1
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  148  148  020    Pre-fail  Offline      -      28
  9 Power_On_Hours          0x0012  099  099  000    Old_age  Always      -      13616
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      4
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      819
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      819
194 Temperature_Celsius    0x0002  206  206  000    Old_age  Always      -      29 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      1
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      3
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0

SMART Error Log Version: 1
ATA Error Count: 44 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 44 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 7d d2 51 a8 0a  Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 4f 50 a8 e0 08      07:50:14.534  READ DMA EXT
  27 00 00 00 00 00 e0 08      07:50:14.514  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      07:50:14.494  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      07:50:14.474  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08      07:50:14.454  READ NATIVE MAX ADDRESS EXT

Error 43 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 7d d2 51 a8 0a  Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 4f 50 a8 e0 08      07:50:10.588  READ DMA EXT
  27 00 00 00 00 00 e0 08      07:50:10.569  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      07:50:10.549  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      07:50:10.529  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08      07:50:10.509  READ NATIVE MAX ADDRESS EXT

Error 42 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 7d d2 51 a8 0a  Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 4f 50 a8 e0 08      07:50:06.643  READ DMA EXT
  27 00 00 00 00 00 e0 08      07:50:06.623  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      07:50:06.603  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      07:50:06.583  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08      07:50:06.563  READ NATIVE MAX ADDRESS EXT

Error 41 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 7d d2 51 a8 0a  Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 4f 50 a8 e0 08      07:50:02.658  READ DMA EXT
  27 00 00 00 00 00 e0 08      07:50:02.638  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      07:50:02.618  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      07:50:02.598  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08      07:50:02.578  READ NATIVE MAX ADDRESS EXT

Error 40 occurred at disk power-on lifetime: 13615 hours (567 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 7d d2 51 a8 0a  Error: UNC 125 sectors at LBA = 0x0aa851d2 = 178803154

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 4f 50 a8 e0 08      07:49:58.723  READ DMA EXT
  27 00 00 00 00 00 e0 08      07:49:58.703  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      07:49:58.683  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      07:49:58.663  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 08      07:49:58.644  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      1539        -
# 2  Short offline      Completed without error      00%      1533        -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

UNC errors (media errors) are un-correctable read errors of specific sectors where the contents of the sector does not agree with the affiliated checksum at the end of that sector on the disk.

 

Occasionally it is just that the sector was poorly written, and can be re-written in place.  Other times (more frequently) it is defects in the platter surface.

 

The disk will usually re-allocate the sector from its pool of spare sectors when next written.  (it will first try to re-write it in place, then use one from the pool of spare sectors... there are typically several thousand spare sectors on a modern large disk)

 

If the disk is assigned to the array, it cannot be "precleared"

 

Joe L.

Link to comment

Thank you for the replies.  I am in the process of running another parity check before replacing the disk with a spare.  Then I will run preclear on the suspect disk to see what the Smart data looks like.

 

On parity check, if they is a parity error, how does unRAID determine if the error is on the data disk or the parity disk to make the correction?

Link to comment

It doesnt- Parity verifies the drive with parity information. A correcting check will correct parity information on the parity drive and a drive rebuild will rebuild the drive accordig to parity.

 

This means that if you have a disk with bad data and you run a correcting parity check unraid will, after the parity check has been completed, be perfectly able to rebuild your disk (including the bad data) should the disk fail...

 

Parity is not sacred and does not help with file corruption, it will make sure that a drive that has completely failed can be restored to a new drive (bringing the new drive in the same state as the old drive before the failure or removal).

 

Link to comment

Yes, that is what I mean. The webpage reads that a correcting parity check will `write corrections to parity disk´. So parity will be updated to reflect the current status of the drive, should that drive contain file corruption then the parity will recreate a new disk with the same file corruption.

Link to comment

That is good to know for future reference.  Too late for now.  At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

 

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

 

Thanks.

Link to comment

That is good to know for future reference.  Too late for now.  At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

 

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

 

Thanks.

 

I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using  the current  parity information!  Then you can delete that movie.  You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk.

Link to comment

That is good to know for future reference.  Too late for now.  At least there was only one parity correction and chances are that it is the movie I was watching when I encountered the read error.

 

I will stop the current parity check, delete the problematic movie, replace the problematic disk, rebuild new disk.

 

Thanks.

 

I would NOT delete the Problematic movie until AFTER I had replaced the problem disk and rebuilt it using  the current  parity information!  Then you can delete that movie.  You really want to prevent any possibility of causing any additional errors in the parity information and the way to achieve that is not write to that problematic disk.

 

That makes sense.  Thanks for the quick response.

Link to comment
  • 2 weeks later...

I just completed my 8th preclear cycle.  So far the reallocated sector count only increased by 1.  However, I am still getting a value for the current pending sector.  Is this something to be concerned about?

 

smartctl -a -d ata /dev/sdr (--)
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    **********
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Mar  5 22:15:07 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (23645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       105
  3 Spin_Up_Time            0x0007   227   227   024    Pre-fail  Always       -       219 (Average 265)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       830
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       13903
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       831
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       831
194 Temperature_Celsius     0x0002   193   193   000    Old_age   Always       -       31 (Lifetime Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 629 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 629 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:41.243  READ DMA EXT
  25 00 00 48 76 42 e0 00   4d+22:13:41.242  READ DMA EXT
  25 00 00 48 75 42 e0 00   4d+22:13:41.240  READ DMA EXT
  25 00 00 48 74 42 e0 00   4d+22:13:41.238  READ DMA EXT
  25 00 00 48 73 42 e0 00   4d+22:13:41.236  READ DMA EXT

Error 628 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:13.517  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:13:13.517  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:13:13.516  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:13:13.516  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:13:13.516  READ NATIVE MAX ADDRESS EXT

Error 627 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 49 ff 77 42 01  Error: UNC 73 sectors at LBA = 0x014277ff = 21133311

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 48 77 42 e0 00   4d+22:13:10.217  READ DMA EXT
  25 00 00 48 76 42 e0 00   4d+22:13:10.215  READ DMA EXT
  25 00 00 48 75 42 e0 00   4d+22:13:10.214  READ DMA EXT
  25 00 00 48 74 42 e0 00   4d+22:13:10.212  READ DMA EXT
  25 00 00 48 73 42 e0 00   4d+22:13:10.210  READ DMA EXT

Error 626 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 39 ef 7a 79 00  Error: UNC 57 sectors at LBA = 0x00797aef = 7961327

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 28 7a 79 e0 00   4d+22:09:41.023  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:09:41.023  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:09:41.022  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:09:41.022  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:09:41.022  READ NATIVE MAX ADDRESS EXT

Error 625 occurred at disk power-on lifetime: 13897 hours (579 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 39 ef 7a 79 00  Error: UNC 57 sectors at LBA = 0x00797aef = 7961327

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 28 7a 79 e0 00   4d+22:09:37.720  READ DMA EXT
  27 00 00 00 00 00 e0 00   4d+22:09:37.720  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00   4d+22:09:37.719  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00   4d+22:09:37.718  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   4d+22:09:37.718  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1539         -
# 2  Short offline       Completed without error       00%      1533         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

After the 10th pre_clear cycle, my pending sector count is zero.  However, I noticed that every pre_clear cycle will cause the count to increase.  So I have decided to RMA the drive back to HGST since it has logged over 500 errors on the drive.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...