What do you make of this SMART report?


Recommended Posts

This is one of my oldest drives, a WD green from ~2011 if I recall correctly. It hasn't red-balled or anything but it does store my most important information.

 

TEST for WDC_WD30EZRX-00MMMB0_WD-WCAWZ1116457 on 201411212005
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD30EZRX-00MMMB0
Serial Number:    WD-WCAWZ1116457
LU WWN Device Id: 5 0014ee 2b0c7e823
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Nov 21 20:05:49 2014 HST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
				was completed without error.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 114)	The previous self-test completed having
				the read element of the test failed.
Total time to complete Offline 
data collection: 		(49800) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 478) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   146   144   021    Pre-fail  Always       -       9658
  4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7024
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22266
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       202
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       47
193 Load_Cycle_Count        0x0032   149   149   000    Old_age   Always       -       155215
194 Temperature_Celsius     0x0022   120   108   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       20%     22259         273017280
# 2  Extended offline    Interrupted (host reset)      90%     22253         -
# 3  Short offline       Completed: read failure       10%     22253         273017280

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

rsync your data off the drive or rebuild it with unRAID onto 'another' drive.

 

The pending sector or offline uncorrectable sector is deadly to a rebuild.

If you have another drive fail, this drive may prevent rebuilding the other drive.

Even the firmware can't get by the sector. I give that a higher level of urgency as the fact is that even with retries it cannot get past it.

Especially since you clarify that "it does store my most important information."

 

The only way to re-allocate the sector is to write to it.

However the math to do that eludes me and probably a great deal of people here.

 

So generally what people do is move the data and preclear the drive.

 

Me, I would move the data, or rebuild onto another drive.

Then run this drive through 5 passes of badblocks in write/read mode with different patterns. (before I've validated the moved data)

I've had success with that and it's also proven that drives were not worthy of my data.

 

It's times like this, I'm usually happy that I have boxes of spare drives.

Link to comment

I do not see a sector "pending re-allocation" in that smart report. 

I see one that was detected offline...  not exactly the same thing.

 

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      1

 

Advice is still valid.  If you do not have another copy of your most important files, make a copy elsewhere.

unRAID is NOT a backup, it is a way to recover from a single hard-disk failure.

 

Joe L.

Link to comment

My mistake about the colums in a quick response, However,This read failure below is deadly to a rebuild.

unRAID may or may not kick the drive out of the array depending on time out factor and value returned.

 

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: read failure      20%    22259        273017280

 

Chances are you'll be able to rsync most of the data and a few files will get a read error.

This just happened to me recently as I was trying to access a file that ended up having pending sectors.

 

 

Here's something you can attempt within the drive's firmware.

I've never executed this test. So this will be new territory in recovery.

http://daemon-notes.com/articles/system/smartmontools/offline-uncorrectable

Frankly, I would try to rsync my most important data to a spare drive first. (but that's me).

Once you have that offline spare backup tucked away somewhere you can try other recovery methods

i.e. the smart offline test, a rebuild of the drive onto a new drive or whatever other recovery procedure you want to attempt.

 

Link to comment

I used rsync to pull the files off and there were a ton that couldn't be copied due to errors. Instinct tells me I should image the drive before it gets worse but I don't have anything big enough to hold the image, so an rsync backup will have to do.

 

Should I start with reiserfsk to try and recover, or ddrescue?

Link to comment

A read failure from the drive (while operating as part of the array) would trigger unRaid to reconstruct the bad block using all the other disks in the array and perform a write back to the bad disk. In theory, that should force a sector remap. So even a parity check should force this correction to occur.

Link to comment

I do have a disk I could use if I really need to, would you recommend ddrescue over reiserfsck?

 

ddrescue is not going to fix the problem alone.

 

ddrescue copies the disk to another disk.

It makes many attempts. It LOGS what sectors are bad.

You can then use those sectors to attempt another copy in reverse or retry the copies.

 

Once you have copied the whole disk, you can attempt the reiserfsck on the copied disk.

This leaves the original disk in the most untouched state possible so you can try and repair on the temporary copy without destroying your only copy.

 

It takes a long time and a bunch of command line work, but I was able to retrieve all but 1 sector of a failed disk.

 

If you plan to go that route search on the board and google for how to use ddrescue

 

I can't remember the details of how i used it.

 

It all depends on how precious the data is to you.

You can attempt the reiserfsck on the current disk and hope for the best.

 

From what we've seen reiserfs has been quite resilient and recovers allot.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.