Is this drive failing or I'm I reading the results wrong?

Dieseldes · March 24, 2014

Hi guys. So I have 2 hp microservers, one is used for backups about once a month and the other is my live server for media that I play with Mediaportal. I also use this server with sab, couchpotato and sickbeard. I have the supplied 250gb drive setup as a cache drive and scratch drive. Unraid is version 5

I use unmenu as I know little about the using the command line. So yesterday I noticed some errors in my log. So I restarted the server and the errors are still there. I ran a smart check with unmenu on this drive and this is what it tells me.

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: HP 250GB SATA disk VB0250EAVER Device Model: VB0250EAVER Serial Number: Z3TLXGR6 LU WWN Device Id: 5 000c50 063edafb2 Firmware Version: HPG9 User Capacity: 250,059,350,016 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Mon Mar 24 20:05:41 2014 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 625) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 45) minutes. SCT capabilities: (0x1039) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 119 095 006 Pre-fail Always - 223323636 3 Spin_Up_Time 0x0023 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 104 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x002f 075 060 030 Pre-fail Always - 36417414 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2743 10 Spin_Retry_Count 0x0033 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 38 180 Unknown_HDD_Attribute 0x002b 100 100 000 Pre-fail Always - 2112145650 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 097 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 217 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 081 070 045 Old_age Always - 19 (Min/Max 14/28) 194 Temperature_Celsius 0x0022 019 040 000 Old_age Always - 19 (0 8 0 0 0) 195 Hardware_ECC_Recovered 0x003a 059 041 000 Old_age Always - 223323636 196 Reallocated_Event_Count 0x0032 100 100 036 Old_age Always - 2 197 Current_Pending_Sector 0x0032 098 098 000 Old_age Always - 112 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 99 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 217 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 217 occurred at disk power-on lifetime: 2727 hours (113 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ff ff ff ef 00 2d+16:17:30.152 READ DMA EXT 35 00 00 00 19 73 e3 00 2d+16:17:30.149 WRITE DMA EXT 35 00 00 00 15 73 e3 00 2d+16:17:30.147 WRITE DMA EXT 25 00 00 10 25 43 e5 00 2d+16:17:30.121 READ DMA EXT ef 10 02 00 00 00 a0 00 2d+16:17:30.120 SET FEATURES [Enable SATA feature] Error 216 occurred at disk power-on lifetime: 2727 hours (113 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ff ff ff ef 00 2d+16:17:26.967 READ DMA EXT c8 00 00 10 24 43 e5 00 2d+16:17:26.966 READ DMA c8 00 00 10 23 43 e5 00 2d+16:17:26.965 READ DMA ef 10 02 00 00 00 a0 00 2d+16:17:26.965 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 00 2d+16:17:26.964 IDENTIFY DEVICE Error 215 occurred at disk power-on lifetime: 2727 hours (113 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ff ff ff ef 00 2d+16:17:23.808 READ DMA EXT 25 00 08 ff ff ff ef 00 2d+16:17:23.807 READ DMA EXT 25 00 08 ff ff ff ef 00 2d+16:17:23.807 READ DMA EXT 25 00 08 ff ff ff ef 00 2d+16:17:23.806 READ DMA EXT 25 00 08 ff ff ff ef 00 2d+16:17:23.806 READ DMA EXT Error 214 occurred at disk power-on lifetime: 2727 hours (113 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ff ff ff ef 00 2d+16:17:07.594 READ DMA EXT c8 00 00 00 68 3f e5 00 2d+16:17:07.593 READ DMA c8 00 00 00 67 3f e5 00 2d+16:17:07.567 READ DMA ef 10 02 00 00 00 a0 00 2d+16:17:07.566 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 00 2d+16:17:07.557 IDENTIFY DEVICE Error 213 occurred at disk power-on lifetime: 2727 hours (113 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 ff ff ff ef 00 2d+16:17:04.339 READ DMA EXT c8 00 00 00 66 3f e5 00 2d+16:17:04.338 READ DMA c8 00 00 00 65 3f e5 00 2d+16:17:04.337 READ DMA 25 00 08 ff ff ff ef 00 2d+16:17:04.222 READ DMA EXT 25 00 08 ff ff ff ef 00 2d+16:17:04.222 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 2742 380393034 # 2 Short offline Completed: read failure 90% 2742 380393034 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing

So what does it all mean?

Thanks des.

Dieseldes · March 24, 2014

And here is some of the system log. It looks like the faults are happening during the mover script for the last few days?.

in Your Trolley_ - Channel 4 Dispatches.ts

Mar 24 03:49:42 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)

Mar 24 03:49:42 Tower kernel: ata1.00: irq_stat 0x40000001 (Drive related)

Mar 24 03:49:42 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues)

Mar 24 03:49:42 Tower kernel: ata1.00: cmd 25/00:00:48:42:ac/00:01:16:00:00/e0 tag 0 dma 131072 in (Drive related)

Mar 24 03:49:42 Tower kernel: res 51/40:00:95:42:ac/00:00:16:00:00/00 Emask 0x9 (media error) (Errors)

Mar 24 03:49:42 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related)

Mar 24 03:49:42 Tower kernel: ata1.00: error: { UNC } (Errors)

Mar 24 03:49:42 Tower kernel: ata1.00: failed to enable AA (error_mask=0x1) (Errors)

Mar 24 03:49:42 Tower kernel: ata1.00: configured for UDMA/100 (Drive related)

Mar 24 03:49:42 Tower kernel: sd 1:0:0:0: [sdb] Unhandled sense code (Drive related)

Mar 24 03:49:42 Tower kernel: sd 1:0:0:0: [sdb] (Drive related)

Mar 24 03:49:42 Tower kernel: Result: hostbyte=0x00 driverbyte=0x08 (System)

Mar 24 03:49:42 Tower kernel: sd 1:0:0:0: [sdb] (Drive related)

Mar 24 03:49:42 Tower kernel: Sense Key : 0x3 [current] [descriptor]

Mar 24 03:49:42 Tower kernel: Descriptor sense data with sense descriptors (in hex):

Mar 24 03:49:42 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

Mar 24 03:49:42 Tower kernel: 16 ac 42 95

Mar 24 03:49:42 Tower kernel: sd 1:0:0:0: [sdb] (Drive related)

Mar 24 03:49:42 Tower kernel: ASC=0x11 ASCQ=0x4

Mar 24 03:49:42 Tower kernel: sd 1:0:0:0: [sdb] CDB: (Drive related)

Mar 24 03:49:42 Tower kernel: cdb[0]=0x28: 28 00 16 ac 42 48 00 01 00 00

Mar 24 03:49:42 Tower kernel: end_request: I/O error, dev sdb, sector 380387989 (Errors)

Mar 24 03:49:42 Tower kernel: ata1: EH complete (Drive related)

Mar 24 03:49:46 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)

Mar 24 03:49:46 Tower kernel: ata1.00: irq_stat 0x40000001 (Drive related)

Mar 24 03:49:46 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues)

Mar 24 03:49:46 Tower kernel: ata1.00: cmd 25/00:08:90:42:ac/00:00:16:00:00/e0 tag 0 dma 4096 in (Drive related)

Mar 24 03:49:46 Tower kernel: res 51/40:00:95:42:ac/00:00:16:00:00/00 Emask 0x9 (media error) (Errors)

Mar 24 03:49:46 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related)

Mar 24 03:49:46 Tower kernel: ata1.00: error: { UNC } (Errors)

Mar 24 03:49:46 Tower kernel: ata1.00: failed to enable AA (error_mask=0x1) (Errors)

vca · March 24, 2014

Unscrambling the important part of the report:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 
1 Raw_Read_Error_Rate 0x002f 119 095 006 Pre-fail Always - 223323636 
3 Spin_Up_Time 0x0023 097 097 000 Pre-fail Always - 0 
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 104 
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 
7 Seek_Error_Rate 0x002f 075 060 030 Pre-fail Always - 36417414 
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2743 
10 Spin_Retry_Count 0x0033 100 100 097 Pre-fail Always - 0 
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 38 
180 Unknown_HDD_Attribute 0x002b 100 100 000 Pre-fail Always - 2112145650 
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 
184 End-to-End_Error 0x0032 100 100 097 Old_age Always - 0 
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 217 
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 
190 Airflow_Temperature_Cel 0x0022 081 070 045 Old_age Always - 19 (Min/Max 14/28) 
194 Temperature_Celsius 0x0022 019 040 000 Old_age Always - 19 (0 8 0 0 0) 
195 Hardware_ECC_Recovered 0x003a 059 041 000 Old_age Always - 223323636 
196 Reallocated_Event_Count 0x0032 100 100 036 Old_age Always - 2 
197 Current_Pending_Sector 0x0032 098 098 000 Old_age Always - 112 
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 99 
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

The following lines are of concern:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 217 
196 Reallocated_Event_Count 0x0032 100 100 036 Old_age Always - 2 
197 Current_Pending_Sector 0x0032 098 098 000 Old_age Always - 112 
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 99

There are 112 currently identified bad blocks that have not been remapped (Current_Pending_Sector), which I think puts this drive into the "do not trust" territory. Especially as the drive is not very old (2700 hours). Seeing it is only a 250GB drive it's probably not worth the bother of doing an RMA.

Copy your data off it soon!

Regards,

Stephen

Dieseldes · March 24, 2014

Steven thanks for the advice. I have copied everything off the drive now to the array and deleted the stuff it wouldn't copy. Is it worth running preclear on this drive? I did run preclear 3 times on this and all my drives. These microservers are a great deal at £100 after cash back complete with this seagate 250gb drive. I just wish hp shipped a better quality drive than a seagate. This is my first drive issue ever after over 20 years of computing. There is an identical drive in my backup server which isn't being used so iI will swap them tonight after I can get everyone off to bed and I can play with the servers...

Cheers des.

jumperalex · March 24, 2014

it wouldn't hurt to run a few preclears and see if it gets worse. though as VCA eludes to, as a 250gb it probably isn't even worth the effort.

Dieseldes · March 25, 2014

Thanks for the the input guys. Well last night I managed to swap in the other identical drive to this server and the dodgy drive into my backup server. I copied across my plugins and started the server. It continued on as normal which is good. So in my backup server I set preclear to run 5 times on the dodgy drive. So this will kill it or cure it as they say. I suppose even if it doesn't die it isn't to be trusted? Any ideas what would cause this type of failure? Cheers des

vca · March 25, 2014

The quantity of errors is not particularly alarming right now, but if you see more appearing during a few passes of preclearing then its time to either RMA (if the cost makes sense) or toss it in the bin. If it susrvives several preclear passes then its probably still safe to use, perhaps as an extra backup copy or a drive to experiment with.

Regards,

Stephen

althoralthor · March 25, 2014

This is my first drive issue ever after over 20 years of computing.

Either you have incredible luck, have never dealt with the Quantum Fireball hard drives (or the Maxtor's from the 90's for that matter) or had hard drive issues and they just never got bad enough for you to lose data.

RobJ · March 26, 2014

This is my first drive issue ever after over 20 years of computing.

Either you have incredible luck, have never dealt with the Quantum Fireball hard drives (or the Maxtor's from the 90's for that matter) or had hard drive issues and they just never got bad enough for you to lose data.

I totally agree. It's only fair now that you take your share of drive issues, and let us have a few of those error-free years!

Dieseldes · March 29, 2014

Thanks guys maybe I am due some bad luck but I hope not! Thanks for the help.

Is this drive failing or I'm I reading the results wrong?

Recommended Posts

Dieseldes

Link to comment

Dieseldes

Link to comment

vca

Link to comment

Dieseldes

Link to comment

jumperalex

Link to comment

Dieseldes

Link to comment

vca

Link to comment

althoralthor

Link to comment

RobJ

Link to comment

Dieseldes

Link to comment

Join the conversation