Jump to content

Drives do fail... Glad to have unRAID


alphazo

Recommended Posts

Last Saturday I was watching a HD movie from a small XMBC application connected via a Gigabit ethernet cable to my unRAID server. Several times movie stopped in order to cache more data. Never happened to me before (at least when using copper, wifi is another story).

Lately, I received my daily unRAID email notification.... but this time I discovered that I had one bad drive! Looking at its content revealed that it had the movie I was watching which explains the performance issue. I've been copying content to this bad drive (in reality to the parity drive) but didn't notice it was disabled! That is why unRAID is great.

 

For the records here is my smartctl output for this year and half old drive.

 

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0-ck] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD15EARS-00Z5B1
Serial Number:    WD-WMAVU2826445
LU WWN Device Id: 5 0014ee 0576e1465
Firmware Version: 80.00A80
User Capacity:    1 500 301 910 016 bytes [1,50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Sep  1 10:26:45 2011 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		(33000) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   199   199   051    Pre-fail  Always       -       4382
  3 Spin_Up_Time            0x0027   187   182   021    Pre-fail  Always       -       5625
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1384
  5 Reallocated_Sector_Ct   0x0033   041   041   140    Pre-fail  Always   FAILING_NOW 1265
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10904
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       18
193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always       -       73871
194 Temperature_Celsius     0x0022   121   110   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       968
197 Current_Pending_Sector  0x0032   199   199   000    Old_age   Always       -       330
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
Warning: ATA error count 664 inconsistent with error log pointer 3

ATA Error Count: 664 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 664 occurred at disk power-on lifetime: 10902 hours (454 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 45 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 45 00 00 00 a0 08  13d+01:16:56.399  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:56.379  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 08  13d+01:16:50.913  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08  13d+01:16:50.895  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:50.874  IDENTIFY DEVICE

Error 663 occurred at disk power-on lifetime: 10902 hours (454 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 45 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 45 00 00 00 a0 08  13d+01:16:50.895  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:50.874  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 08  13d+01:16:45.409  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08  13d+01:16:45.390  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:45.369  IDENTIFY DEVICE

Error 662 occurred at disk power-on lifetime: 10902 hours (454 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 45 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 45 00 00 00 a0 08  13d+01:16:45.390  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:45.369  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 08  13d+01:16:44.839  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  13d+01:16:44.820  SET FEATURES [set transfer mode]

Error 661 occurred at disk power-on lifetime: 10902 hours (454 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 08  13d+01:16:44.820  SET FEATURES [set transfer mode]
  ec 00 00 00 00 00 a0 08  13d+01:16:44.799  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 08  13d+01:16:44.270  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  13d+01:16:44.251  SET FEATURES [set transfer mode]

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...