Drive died - or didn't it?


Recommended Posts

Hi there,

 

yesterday, I was greeted with the red X of despair, one of my WD Red 3TB had been disabled. So I went and bought a replacement but since that wasn't precleared and the SMART looked okay-ish to me, I re-activated the drive and the rebuild went flawless ... so is the drive really bad or could it have been something else?

 

Report attached, I run two of these HDDs, marked the reports.

I know the drives are collecting age and rust, but is it reasonable to keep them running?

 

Obviously, there is a parity drive and that's pretty new, 1200+ hours. Preclear on the replacement is running right now.

 

Thanks

failed - WDC_WD30EFRX-68EUZN0_WD-WMC4N0816310-20180725-0033.txt

other - WDC_WD30EFRX-68AX9N0_WD-WMC1T0041512-20180725-0034.txt

Link to comment
4 minutes ago, SheepContoller said:

Hi there,

 

yesterday, I was greeted with the red X of despair, one of my WD Red 3TB had been disabled. So I went and bought a replacement but since that wasn't precleared and the SMART looked okay-ish to me, I re-activated the drive and the rebuild went flawless ... so is the drive really bad or could it have been something else?

 

Report attached, I run two of these HDDs, marked the reports.

I know the drives are collecting age and rust, but is it reasonable to keep them running?

 

Obviously, there is a parity drive and that's pretty new, 1200+ hours. Preclear on the replacement is running right now.

 

Thanks

failed - WDC_WD30EFRX-68EUZN0_WD-WMC4N0816310-20180725-0033.txt

other - WDC_WD30EFRX-68AX9N0_WD-WMC1T0041512-20180725-0034.txt

 

The drives are not failing.

 

The red X is often due to bad or loose cabling. Especially common when you are opening a server to add or replace a drive, and touch the delicate wiring of some other drive(s), nudging a cable just enough to cause a marginal connection.

 

These are not spring chickens. The one called "failed" has been powered on for 3.7 years. The on called "other" has been powered on for 4.7 years.

Link to comment

Well thanks guys for taking a look - but what makes the "other" a soon-to-fail?

 

"failed" has come up a few times on unRAID checks, like corrected errors and that, "other" has always been the green one. I know "other" has seen more runtime, but is 4.7 years a critical value? (TBH, if it was just me I'd ditch all drives for 2+1 10TB Helium drives, but there's the issue of money)

 

What value other than "uncorrectable" is a critical one to give an extra look, other than the ones being monitored by unRAID anyway ?

Edited by SheepContoller
typos
Link to comment
10 hours ago, SheepContoller said:

but what makes the "other" a soon-to-fail?

 

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   198   195   051    -    27569

You want this to be 0 or a very low number.

 

It's also showing recent (looking at the power hours) UNC errors (read errors)

 

Quote

 

Error 5116 [3] occurred at disk power-on lifetime: 40112 hours (1671 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 03 48 00 01 35 48 6f b0 e0 00  Error: UNC 840 sectors at LBA = 0x135486fb0 = 5188906928

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 03 48 00 01 35 48 6c b0 e0 08     07:17:30.379  READ DMA EXT
  25 00 00 05 40 00 01 35 48 67 70 e0 08     07:17:30.371  READ DMA EXT
  25 00 00 03 50 00 01 35 48 64 20 e0 08     07:17:30.365  READ DMA EXT
  25 00 00 05 40 00 01 35 48 5e e0 e0 08     07:17:30.355  READ DMA EXT
  25 00 00 03 40 00 01 35 48 5b a0 e0 08     07:17:30.351  READ DMA EXT

 

 

Link to comment

I see, and added the value to the ones unRAID should monitor closely.

Well the replacement drive has been precleared 2 runs, guess I'll just keep watching how things go and either replace the one that fails first or maybe if possible get a second HDD and replace them both.

 

Anyway thanks to both of you for sparing a moment to give my problem a look, most appreciated.

Edited by SheepContoller
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.