[SOLVED] Failure of 2 drives in a 15 drive array - can I recover the array?


Recommended Posts

It appears I may have simultaneous failure of two drives in a 15 drive array.  My unRAID server is a LimeTech MD-1510/LI; unRAID version is 4.7.  The system has a mix of WD, Hitachi & Seagate drives, including several ST31500341AS 1.5TB drives.  I recently had an issue with one of these Seagate drives were the disk was showing as unavailable.  I did an RMA advanced replacement from Seagate, got the replacement drive in two days, ran preclear on it with a clean bill of health and rebuilt the array.  Everything was then fine.  Now, four days later, it appears as if two of the other Seagate drives have failed.  I first noticed that a directory had far fewer files in it than it should, so I checked the unMenu page and there were I/O errors listed.  I rebooted the array, and now two drives are listed as missing (see unMenu screenshot) and the array won't start.  Also attached is the syslog; the two drives are listed as missing so I haven't been able to figure out how to run a SMART test on them (if there is a way, please let me know and I'll certainly do that).

 

Is there any hope of rebuilding the array or am I out of luck when it comes to the data on the two affected drives?

unMenu.gif.66850b827964ad7c8ba1ea6d0211458c.gif

syslog.txt

Link to comment

Thanks Joe,

 

I checked the cables--everything seemed fine.  I then re-slotted the drives (my server uses the IcyDock enclosures) just to make sure that wasn't an issue.  I then powered up the system--I heard the clicking noise indicative of bad drive.  When I went to the unRAID web console, it was showing an odd message.  Disk 3 had a red dot and was marked as unavailable, but the error message was that the parity disk was smaller than the largest disk in the array.  Disk 3 was showing a total size of over 2.1GB, even though it's a 1.5GB drive (my parity is 2GB Hitachi).  I then unassigned disk 3, and was able to successfully start the array.  I then tried to use preclear on disk 3, and got the message "Sorry: Device /dev/sdn is not responding to an fdisk -l /dev/sdn command", so I have to assume the drive is beyond recovery at this point.  I guess it's another RMA for Seagate.

 

I was still a little concerned about disk 1, so I ran a SMART check on it.  Attached is the output; based on your wiki post about interpreting the SMART parameters, this drive looks like it may be on the road to failure, correct?

smart_sdm.txt

Link to comment

this drive looks like it may be on the road to failure, correct?

Actually, EVERY drive, regardless of their SMART report, is on the road to failure.  It is not a question of "if" any given drive will fail, but "when".  It is just a matter of time:(

 

Remember, there are only two kinds of hard-drives.  No, not IDE and SATA, but those that have already failed, and those that have not yet failed.  Just give it more time, and it will.

 

As mentioned, 3 re-allocated sectors is not an issue, as most modern drives have several thousand in the pool of spare sectors.    But... keep an eye on the drive.  If you see the number creeping upward, it is time to replace it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.