2 drive failure, [EDIT] now 3 drives showing as unformatted

October 29, 201114 yr

Hi,

I recently had a two drive failure in my 12 drive array, and I'm wondering if anyone has some advice on what I should do. The first drive, 750GB Seagate is pretty old and it seems to be toast; it clicks and doesn't do much else. I have since bought a new WD20EARS to replace it.

The second drive that supposedly failed, a fairly new WD20EARS, seems fine.

Here is the smartctl output:

root@Tower:~# smartctl -a -d ata /dev/sdg
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EARS-00MVWB0
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Oct 29 14:46:43 2011 ADT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                       was completed without error.
                                       Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                       without error or no self-test has ever
                                       been run.
Total time to complete Offline
data collection:                 (36480) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                       Auto Offline data collection on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       Conveyance Self-test supported.
                                       Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                       power-saving mode.
                                       Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                       General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                       SCT Feature Control supported.
                                       SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x002f   191   179   051    Pre-fail  Always       -       658
 3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1350
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       66
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       960
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       64
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   186   186   000    Old_age   Always       -       44120
194 Temperature_Celsius     0x0022   128   115   000    Old_age   Always       -       22
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       60
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   190   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I'm not totally sure what this all means, but I see the word PASSED, which seems like a good sign.

Assuming that I am correct in thinking that the second drive is disabled but fine, can someone please advise me as to how I should go telling unRAID that the second drive is fine and to go about rebuilding the first disk onto the replacement drive?

I am using version 4.7.

Many thanks to anyone who reads this.

Quote

October 29, 201114 yr

http://lime-technology.com/forum/index.php?topic=9880.0

Quote

October 29, 201114 yr

Author

I didn't think a syslog would matter for this issue, but I'll attach it if you think it will help.

syslog-2011-10-29.txt

Quote

October 30, 201114 yr

Author

Would the method used in this topic apply to my situation? http://lime-technology.com/forum/index.php?topic=10455.0

The only difference seems to be that I have a DISK_DSBL and a DISK_NP_MISSING instead of a DISK_DSBL and a DISK_WRONG.

Would Joe L.'s advice work for me?

Quote

October 30, 201114 yr

Author

I hate to be that guy, but can anyone help me out?

Quote

October 30, 201114 yr

Hello,

Sorry to hear of your problems. Hope you get it sorted.

I looked at the smart report you posted and even though it passed it did list 60 current pending sectors. I think that the drive is having some issues. Maybe do a search about sector reallocation on this board. Lots of great advice on here.

Take care,

Jim S.

Quote

October 30, 201114 yr

Yes, Joe's method will work for you. If you don't feel comfortable doing this without some hand holding, send him a PM and direct him to this thread. Joe's method is a variation on the trust my parity technique outlined here:

http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

Since your 2T drive was diabled due to a failed write. The data at that position on the parity drive will not be correct. If the 2T drive was fairly full, it should be past the 750G point and your failed 750G drive will rebuild fine. A parity sync immediately after the failed drive rebuild will correct the parity disk if that is the case.

Quote

November 1, 201114 yr

Author

I'm still about iffy about what I should do. I intend to trust disk5 and rebuild disk1, so I think I should do the following:

log in and enter "initconfig"

enter "mdcmd set invalidslot 1"

refresh the web-ui and rebuild

Does this sound right?

Quote

November 1, 201114 yr

Does this sound right?

Based on my understanding of "mdcmd set invalidslot" and the post you linked to, that sounds right. There is little risk as long as you make sure the target is correct (disk1 in your case). Are you replacing disk 1 so the rebuild goes to a known good drive?

Quote

November 1, 201114 yr

Author

Yes, disk1 failed, and I've replaced it with a drive that has been precleared without any problems.

Thanks for your help.

Quote

November 5, 201114 yr

Author

It seemed like I was able to rebuild disk 1, but now more problems have cropped up. disk1 (the rebuilt drive), disk9, and disk10 are showing as unformatted. I have tried rebooting and start/stopping the array, but the problem persists.

Can anyone help? This just seems to keep getting worse.

syslog-2011-11-04.txt

Quote

2 drive failure, [EDIT] now 3 drives showing as unformatted

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)