Cache drive failing, or is it?


Recommended Posts

For the last few days I have been getting errors from my unRAID server like the following:

Event: unRAID Cache disk SMART health [1]

Subject: Warning [HOYLAKE] - raw read error rate (failing now) is 34213

Description: ST240HM000_Z4N0013X (sdc)

Importance: warning

But I ran an extended SMART test where the result is "SMART overall-health self-assessment test result: PASSED"

 

How can it have Passed with a read error rate that is failing?

 

Should I replace my cache drive?  It is a Segate 240GB SSD.  It has been in continuous use since I built this server a little over 2.5 years ago.

 

Here is the full SMART text file:

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.16-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST240HM000
Serial Number:    Z4N0013X
LU WWN Device Id: 5 000c50 02ff04f93
Firmware Version: C675
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug 31 16:25:44 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  558) seconds.
Offline data collection
capabilities: 			 (0x19) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  38) minutes.

SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -OSR--   001   001   006    NOW  35077
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
  9 Power_On_Hours          -O--CK   075   075   000    -    22890
 12 Power_Cycle_Count       -O--CK   100   100   020    -    97
171 Unknown_Attribute       -O--CK   100   100   000    -    0
172 Unknown_Attribute       -O--CK   100   100   000    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   000    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   000    -    0
194 Temperature_Celsius     -O---K   036   000   000    -    36 (Min/Max 20/58)
201 Unknown_SSD_Attribute   -O--CK   100   100   000    -    0
204 Soft_ECC_Correction     -O--CK   091   091   000    -    25571
231 Temperature_Celsius     PO--C-   062   062   010    -    39
234 Unknown_Attribute       -O--CK   100   100   000    -    525154
241 Total_LBAs_Written      -O--CK   100   100   000    -    28220
242 Total_LBAs_Read         -O--CK   100   100   000    -    3965
250 Read_Error_Retry_Rate   -O--CK   100   100   000    -    35079
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x03       GPL,SL  R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x07       GPL,SL  R/O      2  Extended self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa8       GPL     VS    1041  Device vendor specific log
0xa8       SL      VS     255  Device vendor specific log
0xb7       GPL,SL  VS       4  Device vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 114 (device log contains only the most recent 20 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 114 [13] occurred at disk power-on lifetime: 22406 hours (933 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 39d+21:51:06.154  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00 39d+21:51:06.154  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00 39d+21:51:06.154  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00 39d+21:51:06.154  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00 39d+21:51:06.154  SET FEATURES [Set transfer mode]

Error 113 [12] occurred at disk power-on lifetime: 22020 hours (917 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 23d+20:16:40.944  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00 23d+20:16:40.944  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00 23d+20:16:40.944  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00 23d+20:16:40.944  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00 23d+20:16:40.944  SET FEATURES [Set transfer mode]

Error 112 [11] occurred at disk power-on lifetime: 21945 hours (914 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 20d+17:26:27.854  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00 20d+17:26:27.854  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00 20d+17:26:27.854  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00 20d+17:26:27.854  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00 20d+17:26:27.854  SET FEATURES [Set transfer mode]

Error 111 [10] occurred at disk power-on lifetime: 21868 hours (911 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 17d+12:15:43.844  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00 17d+12:15:43.844  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00 17d+12:15:43.844  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00 17d+12:15:43.844  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00 17d+12:15:43.844  SET FEATURES [Set transfer mode]

Error 110 [9] occurred at disk power-on lifetime: 20883 hours (870 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 26d+03:54:38.180  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00 26d+03:54:38.180  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00 26d+03:54:38.180  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00 26d+03:54:38.180  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00 26d+03:54:38.180  SET FEATURES [Set transfer mode]

Error 109 [8] occurred at disk power-on lifetime: 20255 hours (843 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00     00:00:00.020  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.020  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00     00:00:00.010  SET FEATURES [Set transfer mode]

Error 108 [7] occurred at disk power-on lifetime: 19759 hours (823 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00     00:00:00.010  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00     00:00:00.010  SET FEATURES [Set transfer mode]

Error 107 [6] occurred at disk power-on lifetime: 18993 hours (791 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 e0 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00     00:00:00.020  SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  IDENTIFY DEVICE
  f5 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  SECURITY FREEZE LOCK
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:00:00.010  IDENTIFY DEVICE
  ef 00 03 00 45 00 00 00 00 00 00 00 00     00:00:00.010  SET FEATURES [Set transfer mode]

SMART Extended Self-test Log Version: 1 (2 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     22890         -
# 2  Short offline       Completed without error       00%     22889         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4              97  ---  Lifetime Power-On Resets
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            0  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           18  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.