Jump to content

lokiraid

Members
  • Posts

    1
  • Joined

  • Last visited

Posts posted by lokiraid

  1. Hi everyone,

     

    Last night a hard drive failed on me that is only two weeks old. At least according to the seller and the S.M.A.R.T. attributes (power on hours).

     

    This obviously made me suspicious. I got 4 of these drives, and I checked the other ones. It seems I was not cautious enough in the beginning, or blind, or something else: Because even though the S.M.A.R.T. attributes seem fine, I just only realized that the test history shows tests logged at completely different lifetimes, here's one example:

     

    smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.64-Unraid] (local build)
    Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Device Model:     HUH721010ALE601
    Serial Number:    1SGE8ULZ
    LU WWN Device Id: 5 000cca 26bc609f4
    Firmware Version: LHGL0003
    User Capacity:    10,000,831,348,736 bytes [10.0 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    7200 rpm
    Form Factor:      3.5 inches
    Device is:        Not in smartctl database 7.3/5528
    ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
    SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Thu Feb  1 17:44:47 2024 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM feature is:   Disabled
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, frozen [SEC2], Master PW ID: 0xfffd
    Wt Cache Reorder: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(   93) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 (   1) minutes.
    SCT capabilities: 	       (0x003d)	SCT Status supported.
    					SCT Error Recovery Control supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
      2 Throughput_Performance  P-S---   100   100   054    -    0
      3 Spin_Up_Time            POS---   155   155   024    -    449 (Average 397)
      4 Start_Stop_Count        -O--C-   100   100   000    -    183
      5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
      7 Seek_Error_Rate         PO-R--   100   100   067    -    0
      8 Seek_Time_Performance   P-S---   100   100   020    -    0
      9 Power_On_Hours          -O--C-   100   100   000    -    222
     10 Spin_Retry_Count        PO--C-   100   100   060    -    0
     12 Power_Cycle_Count       -O--CK   100   100   000    -    21
     22 Unknown_Attribute       PO---K   100   100   025    -    100
     45 Unknown_Attribute       PO---K   100   100   001    -    1095233372415
    192 Power-Off_Retract_Count -O--CK   100   100   000    -    185
    193 Load_Cycle_Count        -O--C-   100   100   000    -    185
    194 Temperature_Celsius     -O----   253   253   000    -    21 (Min/Max 0/37)
    196 Reallocated_Event_Count -O--CK   100   100   000    -    0
    197 Current_Pending_Sector  -O---K   100   100   000    -    0
    198 Offline_Uncorrectable   ---R--   100   100   000    -    0
    199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
    231 Temperature_Celsius     -O--CK   100   100   000    -    0
    241 Total_LBAs_Written      -O--C-   100   100   000    -    51157963136
    242 Total_LBAs_Read         -O--C-   100   100   000    -    67977101331
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x02           SL  R/O      1  Comprehensive SMART error log
    0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
    0x04       GPL     R/O    256  Device Statistics log
    0x04       SL      R/O    255  Device Statistics log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      1  Extended self-test log
    0x08       GPL     R/O      2  Power Conditions log
    0x09           SL  R/W      1  Selective self-test log
    0x0c       GPL     R/O   5501  Pending Defects log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x12       GPL     R/O      1  SATA NCQ Non-Data log
    0x13       GPL     R/O      1  SATA NCQ Send and Receive log
    0x15       GPL     R/W      1  Rebuild Assist log
    0x21       GPL     R/O      1  Write stream error log
    0x22       GPL     R/O      1  Read stream error log
    0x24       GPL     R/O    256  Current Device Internal Status Data log
    0x25       GPL     R/O    256  Saved Device Internal Status Data log
    0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
    0x80       GPL     R/W    688  Host vendor specific log
    0x81-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xb2       GPL     VS     688  Device vendor specific log
    0xc8       GPL     VS      12  Device vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
    No Errors Logged
    
    SMART Extended Self-test Log Version: 1 (1 sectors)
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%     34098         -
    # 2  Short captive       Completed without error       00%     34097         -
    # 3  Vendor (0x70)       Completed without error       00%     33523         -
    # 4  Vendor (0x71)       Completed without error       00%     33523         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       256 (0x0100)
    Device State:                        Active (0)
    Current Temperature:                    21 Celsius
    Power Cycle Min/Max Temperature:     19/31 Celsius
    Lifetime    Min/Max Temperature:      0/37 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     2
    Temperature Sampling Period:         1 minute
    Temperature Logging Interval:        1 minute
    Min/Max recommended Temperature:      0/60 Celsius
    Min/Max Temperature Limit:           -40/70 Celsius
    Temperature History Size (Index):    128 (35)
    
    Index    Estimated Time   Temperature Celsius
      36    2024-02-01 15:37    22  ***
      37    2024-02-01 15:38    22  ***
      38    2024-02-01 15:39    21  **
     ...    ..(  9 skipped).    ..  **
    [truncated by me for readability...]
    
    
    SCT Error Recovery Control:
               Read: Disabled
              Write: Disabled
    
    Device Statistics (GP Log 0x04)
    Page  Offset Size        Value Flags Description
    0x01  =====  =               =  ===  == General Statistics (rev 1) ==
    0x01  0x008  4              21  ---  Lifetime Power-On Resets
    0x01  0x010  4             222  ---  Power-on Hours
    0x01  0x018  6     51157963136  ---  Logical Sectors Written
    0x01  0x020  6        64212485  ---  Number of Write Commands
    0x01  0x028  6     67977101331  ---  Logical Sectors Read
    0x01  0x030  6        90385616  ---  Number of Read Commands
    0x01  0x038  6       802623700  ---  Date and Time TimeStamp
    0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
    0x03  0x008  4             127  ---  Spindle Motor Power-on Hours
    0x03  0x010  4             127  ---  Head Flying Hours
    0x03  0x018  4             185  ---  Head Load Events
    0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
    0x03  0x028  4               2  ---  Read Recovery Attempts
    0x03  0x030  4               0  ---  Number of Mechanical Start Failures
    0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
    0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
    0x04  0x010  4              10  ---  Resets Between Cmd Acceptance and Completion
    0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
    0x05  0x008  1              21  ---  Current Temperature
    0x05  0x010  1              24  N--  Average Short Term Temperature
    0x05  0x018  1               -  N--  Average Long Term Temperature
    0x05  0x020  1              37  ---  Highest Temperature
    0x05  0x028  1               0  ---  Lowest Temperature
    0x05  0x030  1              35  N--  Highest Average Short Term Temperature
    0x05  0x038  1               0  N--  Lowest Average Short Term Temperature
    0x05  0x040  1               -  N--  Highest Average Long Term Temperature
    0x05  0x048  1               -  N--  Lowest Average Long Term Temperature
    0x05  0x050  4               0  ---  Time in Over-Temperature
    0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
    0x05  0x060  4               0  ---  Time in Under-Temperature
    0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
    0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
    0x06  0x008  4              45  ---  Number of Hardware Resets
    0x06  0x010  4               3  ---  Number of ASR Events
    0x06  0x018  4               0  ---  Number of Interface CRC Errors
    0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
                                    |||_ C monitored condition met
                                    ||__ D supports DSN
                                    |___ N normalized value
    
    Pending Defects log (GP Log 0x0c)
    No Defects Logged
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x0001  2            0  Command failed due to ICRC error
    0x0002  2            0  R_ERR response for data FIS
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0005  2            0  R_ERR response for non-data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS
    0x0008  2            0  Device-to-host non-data FIS retries
    0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
    0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
    0x000b  2            0  CRC errors within host-to-device FIS
    0x000d  2            0  Non-CRC errors within host-to-device FIS

     

    It seems pretty obvious, but I’d still like a second opinion: I guess it's time to remove these drives from the array ASAP...?

     

    Thanks!

×
×
  • Create New...