DriveReady SeekComplete Error

mklv · April 23, 2011

Hi,

I'm in need of a bit of guidance please.

I logged into the unmenu interface to discover a drive with a status of "DISK_DSBL". The drive is a "WD20EADS". Attached is a syslog. I completed a readonly parity check successfully, a few hours ago.

This is my first disk failure with unraid, so I'm not sure of what I should attempt next.

- Start, stop the array?

- Replace drive?

I found the following under the FAQ. Is it still appropriate for my situation? I do have a spare precleared drive, that is plugged into the system, but not part of the array.

The procedure to replace a drive is essentially the same, whether you are upgrading the drive to a newer or bigger disk, or replacing a failed drive. Here is the procedure, but first view this post for screen shots and helpful descriptions and comments.

1. On the Devices page of unRAID Web Management, record the current drive assignments by screen capture, screen print, or old-fashioned notes by hand

2. Remove the bad drive and install the new drive

3. Boot the server

4. Check the Devices page again, and assign the new drive where the bad drive was; make sure all other assignments are still correct

5. Return to the Main page, and click the little check box under the Start button that says "I'm sure I want to do this", then click the Start button to Start the array and start the rebuild of the replaced drive

6. Drive will now be rebuilt, takes a while; the array can be used at the same time, but we recommend waiting until the rebuild is complete

thank you

syslog-2011-04-22.zip

dgaschk · April 23, 2011

What color is the ball on the drive line on the "unRAID Main" page?

EDIT: disk7

mklv · April 23, 2011

The ball for that drive is red. The other drives are green.

There is also an orange ball next to the word "Started" in the "Command Area"

Joe L. · April 23, 2011

The ball for that drive is red. The other drives are green.

There is also an orange ball next to the word "Started" in the "Command Area"

A drive is disabled when a "write" to it fails. It will not restore on its own since it would not have the correct contents (remember, at least one write to it failed)

A write could fail if you have a loose cable, or a intermittent cable, or a defective cable, or a defective disk, or even a defective port on the disk controller. Most times is is an intermittent cable. (and it could be either the data OR power cable)

First thing is to save a copy of the system log. It might have clues to how the write to the drive failed.

Follow the steps in the "sticky" to capture the syslog before you reboot.

Then, stop the array, power down, and re-seat the connections to the drive.

After you power up, get a "smartctl" report on the drive. If it responds, great.

If it responds you need to stop the array

un-assign the disabled/failed drive

start the array with it un-assigned (this will cause the unRAID array to forget its serial number)

stop the array

re-assign the failed drive

start the array. It will re-construct the contents onto itself. (thinking it is a replacement, since it forgot the original serial number in the prior step)

If the drive has really failed, then just replace it and start the array. (on 5.0beta6a you'll need to assign the replacement drive to the failed slot, then start the array)

Joe L.

mklv · April 24, 2011

thanks for the help.

I ran smartctl after the suggested steps, and found that it did respond, and it says that it has passed.

But it looks my "current pending sector" is now at 1.

Should I:

1) Rebuild the array with the same drive?

2) RMA the drive? Mfr date is Jan 2011

3) Something else?

Statistics for /dev/sdf 00W_WD-WCAVY6360823

smartctl -a -d ata /dev/sdf

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

=== START OF INFORMATION SECTION ===

Model Family: Western Digital Caviar Green family

Device Model: WDC WD20EADS-00W4B0

Serial Number: WD-WCAVY6360823

Firmware Version: 01.00A01

User Capacity: 2,000,398,934,016 bytes

Device is: In smartctl database [for details use: -P show]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Sat Apr 23 17:54:04 2011 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (43200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x3035) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 229 228 021 Pre-fail Always - 10533

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 66

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1089

10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 3

193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 368

194 Temperature_Celsius 0x0022 126 118 000 Old_age Always - 26

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

dgaschk · April 24, 2011

I would run a preclear on the disk and see if it passes. The current pending sector just may go away or become a reallocated sector. If more reallocated sectors appear after the preclear then the disk needs to be replaced.

DriveReady SeekComplete Error

Recommended Posts

mklv

Link to comment

dgaschk

Link to comment

mklv

Link to comment

Joe L.

Link to comment

mklv

Link to comment

dgaschk

Link to comment

Archived