Drive died - or didn't it? - Storage Devices and Controllers

July 24, 20187 yr

Hi there,

yesterday, I was greeted with the red X of despair, one of my WD Red 3TB had been disabled. So I went and bought a replacement but since that wasn't precleared and the SMART looked okay-ish to me, I re-activated the drive and the rebuild went flawless ... so is the drive really bad or could it have been something else?

Report attached, I run two of these HDDs, marked the reports.

I know the drives are collecting age and rust, but is it reasonable to keep them running?

Obviously, there is a parity drive and that's pretty new, 1200+ hours. Preclear on the replacement is running right now.

Thanks

failed - WDC_WD30EFRX-68EUZN0_WD-WMC4N0816310-20180725-0033.txt

other - WDC_WD30EFRX-68AX9N0_WD-WMC1T0041512-20180725-0034.txt

Quote

July 24, 20187 yr

4 minutes ago, SheepContoller said:

Hi there,

yesterday, I was greeted with the red X of despair, one of my WD Red 3TB had been disabled. So I went and bought a replacement but since that wasn't precleared and the SMART looked okay-ish to me, I re-activated the drive and the rebuild went flawless ... so is the drive really bad or could it have been something else?

Report attached, I run two of these HDDs, marked the reports.

I know the drives are collecting age and rust, but is it reasonable to keep them running?

Obviously, there is a parity drive and that's pretty new, 1200+ hours. Preclear on the replacement is running right now.

Thanks

failed - WDC_WD30EFRX-68EUZN0_WD-WMC4N0816310-20180725-0033.txt

other - WDC_WD30EFRX-68AX9N0_WD-WMC1T0041512-20180725-0034.txt

The drives are not failing.

The red X is often due to bad or loose cabling. Especially common when you are opening a server to add or replace a drive, and touch the delicate wiring of some other drive(s), nudging a cable just enough to cause a marginal connection.

These are not spring chickens. The one called "failed" has been powered on for 3.7 years. The on called "other" has been powered on for 4.7 years.

Quote

July 25, 20187 yr

The "failed" one might be OK, but the "other" is definitely failing.

Quote

July 26, 20187 yr

Author

Well thanks guys for taking a look - but what makes the "other" a soon-to-fail?

"failed" has come up a few times on unRAID checks, like corrected errors and that, "other" has always been the green one. I know "other" has seen more runtime, but is 4.7 years a critical value? (TBH, if it was just me I'd ditch all drives for 2+1 10TB Helium drives, but there's the issue of money)

What value other than "uncorrectable" is a critical one to give an extra look, other than the ones being monitored by unRAID anyway ?

Edited July 26, 20187 yr by SheepContoller
typos

Quote

July 27, 20187 yr

10 hours ago, SheepContoller said:

but what makes the "other" a soon-to-fail?

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   198   195   051    -    27569

You want this to be 0 or a very low number.

It's also showing recent (looking at the power hours) UNC errors (read errors)

Quote

Error 5116 [3] occurred at disk power-on lifetime: 40112 hours (1671 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 03 48 00 01 35 48 6f b0 e0 00 Error: UNC 840 sectors at LBA = 0x135486fb0 = 5188906928

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 03 48 00 01 35 48 6c b0 e0 08     07:17:30.379 READ DMA EXT
25 00 00 05 40 00 01 35 48 67 70 e0 08     07:17:30.371 READ DMA EXT
25 00 00 03 50 00 01 35 48 64 20 e0 08     07:17:30.365 READ DMA EXT
25 00 00 05 40 00 01 35 48 5e e0 e0 08     07:17:30.355 READ DMA EXT
25 00 00 03 40 00 01 35 48 5b a0 e0 08     07:17:30.351 READ DMA EXT

Quote

July 27, 20187 yr

Author

I see, and added the value to the ones unRAID should monitor closely.

Well the replacement drive has been precleared 2 runs, guess I'll just keep watching how things go and either replace the one that fails first or maybe if possible get a second HDD and replace them both.

Anyway thanks to both of you for sparing a moment to give my problem a look, most appreciated.

Edited July 27, 20187 yr by SheepContoller

Quote

Drive died - or didn't it?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)