Jump to content

(SOLVED) Help with diagnosing a drive error - contents emulated. Smart data questions


Recommended Posts

Hi unraid crew,

 

Two/three days ago my 8tb Seagate - sdc had 237 read errors, unraid disabled the drive (red X) and emulated the contents.

 

The yesterday I attempted to do some investigation, and was unable to get the drive to spin up, or download smart data for it. Once the array was stopped, sdc was no longer an option in the list of devices. I downloaded the diagnostics, but the smart data was predictably missing for the drive. The attached smart details are from after the reboot.

 

Powered down the box today, re-seated the drive cables (just in case, but don't think that was the issue), and then booted up, drive was visible again in the device list, so I started up the array and a data rebuild is in process.

 

Hardware Dell T110, Parity=sdf WDC_WD80EMAZ 8tb, disk1=sdc ST8000DM004 8tb, disk2=sdd TOSHIBA_DT01ACA300 3tb, disk3=sde WDC_WD20EARS 2tb, with a cache=sdb Crucial_CT120. 16gb ram. unRaid 6.7.2

 

Looking at some other threads on the forum, I'm a little concerned about the SMART numbers for the drive, the read/seek/timeout/ecc numbers seem really high - should I be replacing it asap? (or assuming it rebuilds without error should I be ok for a while)

 

 

 

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Compute
Device Model:     ST8000DM004-2CX188
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   080   064   006    -    94092324
  3 Spin_Up_Time            PO----   092   091   000    -    0
  4 Start_Stop_Count        -O--CK   099   099   020    -    1539
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   083   060   045    -    221147486
  9 Power_On_Hours          -O--CK   076   076   000    -    21404 (210 195 0)
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    55
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   094   057   000    -    214751641653
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   073   042   040    -    27 (Min/Max 25/27)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    576
193 Load_Cycle_Count        -O--CK   099   099   000    -    3228
194 Temperature_Celsius     -O---K   027   058   000    -    27 (0 16 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   080   064   000    -    94092324
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    15806 (156 110 0)
241 Total_LBAs_Written      ------   100   253   000    -    61074957028
242 Total_LBAs_Read         ------   100   253   000    -    402051143430
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

 

tower-diagnostics-20200108-0319(anon).zip tower-smart-20200108-2119.zip

Edited by seestray
Marking topic solved
Link to comment
Just now, latitudehopper said:

Sorry to jump on this thread but it's similar to something I have been wanting to know. I have a disk that is now missing and contents stimulated. When I boot to Ubuntu from a usb the disk is mountable and smart tests show the disk as ok. Does anyone know what the trigger is for the missing disk errors?

You say the disk is ‘missing’?    This would imply it cannot even be seen at thenBiOS level probably because it has dropped offline.    If instead you merely mean disabled then this will happen if a write to the disk fails for any reason.

Link to comment
1 minute ago, latitudehopper said:

Perhaps that is what it said, I was going from memory but will check. Ok, so any write failure disables it. I'm not an expert but does that mean the disk is on its way out or just a parity protection thing?

A write failure can be from a wide variety of reasons - only some of which indicate a problem with the disk.   Probably the commonest cause is cable/connection issues.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...