drawde Posted September 29, 2018 Share Posted September 29, 2018 (edited) Hey all, I have a disk reporting some read errors after a power outage. Not sure what happened exactly since I do have a UPS but it didn't seem to help anything. I'll be investigating that separately. See attached diagnostics. Is this something I just keep an eye on if its increments or is this an issue? smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.49-unRAID] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4E3USPU9A LU WWN Device Id: 5 0014ee 20dcc0aef Firmware Version: 82.00A82 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Sep 29 16:03:05 2018 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 17) The self-test routine was aborted by the host. Total time to complete Offline data collection: (52980) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 530) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 68 3 Spin_Up_Time POS--K 182 178 021 - 7858 4 Start_Stop_Count -O--CK 097 097 000 - 3209 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 100 253 000 - 0 9 Power_On_Hours -O--CK 079 079 000 - 15596 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 35 192 Power-Off_Retract_Count -O--CK 200 200 000 - 9 193 Load_Cycle_Count -O--CK 193 193 000 - 22750 194 Temperature_Celsius -O---K 116 113 000 - 36 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log 0xb7 GPL,SL VS 39 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 6 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 6 [5] occurred at disk power-on lifetime: 15594 hours (649 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 05 40 00 00 01 63 58 98 e0 00 Error: UNC 1344 sectors at LBA = 0x01635898 = 23287960 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 05 40 00 00 01 63 58 98 e0 08 00:31:48.994 READ DMA EXT 25 00 00 01 90 00 00 01 63 57 08 e0 08 00:31:48.602 READ DMA EXT 25 00 00 05 40 00 00 01 63 51 c8 e0 08 00:31:48.599 READ DMA EXT 35 00 00 01 08 00 00 01 63 4a 98 e0 08 00:31:48.598 WRITE DMA EXT 25 00 00 04 28 00 00 01 63 4d a0 e0 08 00:31:48.595 READ DMA EXT Error 5 [4] occurred at disk power-on lifetime: 15594 hours (649 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 02 00 00 00 01 63 4a 98 e0 00 Error: UNC 512 sectors at LBA = 0x01634a98 = 23284376 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 02 00 00 00 01 63 49 a0 e0 08 00:31:44.254 READ DMA EXT c8 00 00 00 c0 00 00 01 63 48 e0 e1 08 00:31:44.230 READ DMA 25 00 00 05 40 00 00 01 63 43 a0 e0 08 00:31:44.229 READ DMA EXT 25 00 00 02 00 00 00 01 63 41 a0 e0 08 00:31:44.227 READ DMA EXT 25 00 00 02 00 00 00 01 63 3f a0 e0 08 00:31:44.226 READ DMA EXT Error 4 [3] occurred at disk power-on lifetime: 15594 hours (649 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 05 40 00 00 01 63 1e f0 e0 00 Error: UNC 1344 sectors at LBA = 0x01631ef0 = 23273200 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 05 40 00 00 01 63 1a 78 e0 08 00:31:40.196 READ DMA EXT 25 00 00 02 00 00 00 01 63 18 78 e0 08 00:31:40.171 READ DMA EXT 25 00 00 00 08 00 01 ba 14 17 00 e0 08 00:31:40.150 READ DMA EXT 25 00 00 01 a8 00 00 01 63 16 d0 e0 08 00:31:40.149 READ DMA EXT 25 00 00 05 40 00 00 01 63 11 90 e0 08 00:31:40.129 READ DMA EXT Error 3 [2] occurred at disk power-on lifetime: 15593 hours (649 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 05 40 00 00 01 63 58 50 e0 00 Error: UNC 1344 sectors at LBA = 0x01635850 = 23287888 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 05 40 00 00 01 63 56 08 e0 08 00:13:09.388 READ DMA EXT 35 00 00 01 b8 00 00 01 63 4a 50 e0 08 00:13:09.387 WRITE DMA EXT 25 00 00 00 08 00 00 bf fc 07 50 e0 08 00:13:09.372 READ DMA EXT 25 00 00 02 c0 00 00 01 63 53 48 e0 08 00:13:09.370 READ DMA EXT 25 00 00 05 40 00 00 01 63 4e 08 e0 08 00:13:09.367 READ DMA EXT Error 2 [1] occurred at disk power-on lifetime: 15593 hours (649 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 02 00 00 00 01 63 4a 50 e0 00 Error: UNC 512 sectors at LBA = 0x01634a50 = 23284304 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 02 00 00 00 01 63 4a 08 e0 08 00:13:05.412 READ DMA EXT 25 00 00 02 00 00 00 01 63 48 08 e0 08 00:13:05.411 READ DMA EXT 25 00 00 02 00 00 00 01 63 46 08 e0 08 00:13:05.410 READ DMA EXT 25 00 00 02 00 00 00 01 63 44 08 e0 08 00:13:05.409 READ DMA EXT 25 00 00 01 58 00 00 01 63 42 b0 e0 08 00:13:05.384 READ DMA EXT Error 1 [0] occurred at disk power-on lifetime: 15593 hours (649 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 02 00 00 00 01 63 1e a8 e0 00 Error: UNC 512 sectors at LBA = 0x01631ea8 = 23273128 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 02 00 00 00 01 63 1e 08 e0 08 00:13:01.826 READ DMA EXT 25 00 00 02 00 00 00 01 63 1c 08 e0 08 00:13:01.825 READ DMA EXT 25 00 00 02 c0 00 00 01 63 19 48 e0 08 00:13:01.821 READ DMA EXT 25 00 00 05 40 00 00 01 63 14 08 e0 08 00:13:01.817 READ DMA EXT 25 00 00 02 00 00 00 01 63 12 08 e0 08 00:13:01.816 READ DMA EXT SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Aborted by host 10% 15596 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) SCT Support Level: 1 Device State: Active (0) Current Temperature: 36 Celsius Power Cycle Min/Max Temperature: 29/36 Celsius Lifetime Min/Max Temperature: 19/39 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (286) Index Estimated Time Temperature Celsius 287 2018-09-29 08:06 27 ******** ... ..( 26 skipped). .. ******** 314 2018-09-29 08:33 27 ******** 315 2018-09-29 08:34 ? - 316 2018-09-29 08:35 27 ******** 317 2018-09-29 08:36 27 ******** 318 2018-09-29 08:37 28 ********* 319 2018-09-29 08:38 ? - 320 2018-09-29 08:39 29 ********** ... ..( 4 skipped). .. ********** 325 2018-09-29 08:44 29 ********** 326 2018-09-29 08:45 30 *********** ... ..( 2 skipped). .. *********** 329 2018-09-29 08:48 30 *********** 330 2018-09-29 08:49 31 ************ 331 2018-09-29 08:50 31 ************ 332 2018-09-29 08:51 31 ************ 333 2018-09-29 08:52 32 ************* ... ..( 11 skipped). .. ************* 345 2018-09-29 09:04 32 ************* 346 2018-09-29 09:05 33 ************** ... ..( 6 skipped). .. ************** 353 2018-09-29 09:12 33 ************** 354 2018-09-29 09:13 34 *************** ... ..( 12 skipped). .. *************** 367 2018-09-29 09:26 34 *************** 368 2018-09-29 09:27 35 **************** ... ..( 30 skipped). .. **************** 399 2018-09-29 09:58 35 **************** 400 2018-09-29 09:59 36 ***************** ... ..(107 skipped). .. ***************** 30 2018-09-29 11:47 36 ***************** 31 2018-09-29 11:48 29 ********** 32 2018-09-29 11:49 29 ********** 33 2018-09-29 11:50 29 ********** 34 2018-09-29 11:51 ? - 35 2018-09-29 11:52 29 ********** ... ..( 13 skipped). .. ********** 49 2018-09-29 12:06 29 ********** 50 2018-09-29 12:07 30 *********** ... ..( 12 skipped). .. *********** 63 2018-09-29 12:20 30 *********** 64 2018-09-29 12:21 31 ************ ... ..( 26 skipped). .. ************ 91 2018-09-29 12:48 31 ************ 92 2018-09-29 12:49 32 ************* ... ..( 5 skipped). .. ************* 98 2018-09-29 12:55 32 ************* 99 2018-09-29 12:56 31 ************ ... ..( 2 skipped). .. ************ 102 2018-09-29 12:59 31 ************ 103 2018-09-29 13:00 30 *********** ... ..( 6 skipped). .. *********** 110 2018-09-29 13:07 30 *********** 111 2018-09-29 13:08 29 ********** ... ..( 15 skipped). .. ********** 127 2018-09-29 13:24 29 ********** 128 2018-09-29 13:25 28 ********* ... ..( 22 skipped). .. ********* 151 2018-09-29 13:48 28 ********* 152 2018-09-29 13:49 27 ******** ... ..(133 skipped). .. ******** 286 2018-09-29 16:03 27 ******** SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) Device Statistics (GP/SMART Log 0x04) not supported Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 9 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 9 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 11497 Vendor specific tower-diagnostics-20180929-1333 (1).zip Edited September 29, 2018 by drawde Quote Link to comment
JorgeB Posted September 29, 2018 Share Posted September 29, 2018 It's a disk problem, you can run an extended SMART test, if it fails replace now, if it passes it might be OK to use for a while more, but it's likely on borrowed time, and keep an eye on this attribute: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 68 Ideally it should be zero, anything above low double digits is usually a very bad sign. Quote Link to comment
drawde Posted September 29, 2018 Author Share Posted September 29, 2018 6 minutes ago, johnnie.black said: It's a disk problem, you can run an extended SMART test, if it fails replace now, if it passes it might be OK to use for a while more, but it's likely on borrowed time, and keep an eye on this attribute: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 68 Ideally it should be zero, anything above low double digits is usually a very bad sign. I tried to run an extended SMART report and it got stuck on 90% for like forever. It's running a parity check, not sure if it has anything to do with it. The drive is still under warranty, is it possible to RMA? Quote Link to comment
JorgeB Posted September 29, 2018 Share Posted September 29, 2018 2 minutes ago, drawde said: I tried to run an extended SMART report and it got stuck on 90% for like forever. It's running a parity check, not sure if it has anything to do with it You shouldn't do both things at the same time, in practice a parity check is as good a test as the extended SMART test, just make sure it's running a non correcting check. 3 minutes ago, drawde said: The drive is still under warranty, is it possible to RMA? Yes, just report the problem as read errors (UNC at LBA on the SMART report) Quote Link to comment
drawde Posted September 29, 2018 Author Share Posted September 29, 2018 Okay so it sounds l like after the parity check, run an extended SMART test. If it passes it should be OK as long as the read errors don't increment? Or should I RMA regardless? Quote Link to comment
JorgeB Posted September 29, 2018 Share Posted September 29, 2018 3 minutes ago, drawde said: Okay so it sounds l like after the parity check, run an extended SMART test. One or the other is enough for now, if there are read errors during the parity check it needs to be replaced now. 3 minutes ago, drawde said: Or should I RMA regardless? In my experience once they start getting those high double digit raw read errors it's just a matter of time until it fails again, on the other hand, RMA refurnished disks are a crapshoot, so you decide, unless you can trade it with a new one, then go for it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.