Jump to content

DriveReady SeekComplete Error


Recommended Posts

Hi,

 

I'm in need of a bit of guidance please.

 

I logged into the unmenu interface to discover a drive with a status of "DISK_DSBL".  The drive is a "WD20EADS".  Attached is a syslog.  I completed a readonly parity check successfully, a few hours ago.

 

This is my first disk failure with unraid, so I'm not sure of what I should attempt next.

- Start, stop the array?

- Replace drive?

 

I found the following under the FAQ. Is it still appropriate for my situation?  I do have a spare precleared drive, that is plugged into the system, but not part of the array.

 

The procedure to replace a drive is essentially the same, whether you are upgrading the drive to a newer or bigger disk, or replacing a failed drive. Here is the procedure, but first view this post for screen shots and helpful descriptions and comments.

1. On the Devices page of unRAID Web Management, record the current drive assignments by screen capture, screen print, or old-fashioned notes by hand

2. Remove the bad drive and install the new drive

3. Boot the server

4. Check the Devices page again, and assign the new drive where the bad drive was; make sure all other assignments are still correct

5. Return to the Main page, and click the little check box under the Start button that says "I'm sure I want to do this", then click the Start button to Start the array and start the rebuild of the replaced drive

6. Drive will now be rebuilt, takes a while; the array can be used at the same time, but we recommend waiting until the rebuild is complete

 

thank you

syslog-2011-04-22.zip

Link to comment

The ball for that drive is red. The other drives are green.

 

There is also an orange ball next to the word "Started" in the "Command Area"

 

A drive is disabled when a "write" to it fails.  It will not restore on its own since it would not have the correct contents (remember, at least one write to it failed)

 

A write could fail if you have a loose cable, or a intermittent cable, or a defective cable, or a defective disk, or even a defective port on the disk controller.  Most times is is an intermittent cable.  (and it could be either the data OR power cable)

 

First thing is to save a copy of the system log.  It might have clues to how the write to the drive failed.

Follow the steps in the "sticky" to capture the syslog before you reboot.

 

Then, stop the array, power down, and re-seat the connections to the drive. 

After you power up, get a "smartctl" report on the drive.  If it responds, great.

 

If it responds you need to stop the array

un-assign the disabled/failed drive

start the array with it un-assigned (this will cause the unRAID array to forget its serial number)

stop the array

re-assign the failed drive

start the array.  It will re-construct the contents onto itself. (thinking it is a replacement, since it forgot the original serial number in the prior step)

 

If the drive has really failed, then just replace it and start the array. (on 5.0beta6a you'll need to assign the replacement drive to the failed slot, then start the array)

 

Joe L.

Link to comment

thanks for the help.

 

I ran smartctl after the suggested steps, and found that it did respond, and it says that it has passed.

 

But it looks my "current pending sector" is now at 1. 

 

Should I:

1) Rebuild the array with the same drive?

2) RMA the drive?  Mfr date is Jan 2011

3) Something else?

 

 

Statistics for /dev/sdf 00W_WD-WCAVY6360823

 

smartctl -a -d ata /dev/sdf

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    Western Digital Caviar Green family

Device Model:    WDC WD20EADS-00W4B0

Serial Number:    WD-WCAVY6360823

Firmware Version: 01.00A01

User Capacity:    2,000,398,934,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Sat Apr 23 17:54:04 2011 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (43200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x3035) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  229  228  021    Pre-fail  Always      -      10533

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      66

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  099  099  000    Old_age  Always      -      1089

10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      10

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      3

193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      368

194 Temperature_Celsius    0x0022  126  118  000    Old_age  Always      -      26

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      1

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...