Jump to content

Failed Drives


Kirkbuilder

Recommended Posts

I had 15 drives, 2 parity and 13 storage drives in my array.  I recently added another 5 drives to the array of old 2 TB drives and that is when everything started going wrong.  First one of my 8TB drives in my array (Disk 14) failed and so I added another 8TB drive (unprecleared) and it started rebuilding then disk 8 stated that it had failed.  So I replaced tried to replace it, but since Disk 14 hadn't finished rebuilding I had to wait.  Then my parity disk (Parity 2) failed and I started getting errors on disks 2 - 5 (I think) with a count of 300k.  Then my array started frying my network.  I have a dedicated Netgear wireless router which is dedicated to routing all video and file traffic segmented from my Internet Modem/Router (Fios).  The Netgear router suddenly stopped working - by this I mean no traffic on the network worked until I unplugged my unRAID server from the router and then the network returned to normal.  I would shut down my unRAID server (hard reset) and then it would come back up and it would work connected to the network with the array not started.  Once I started the array it would work, but then after about 10-15 minutes the network would again stop working on all devices connected to it.  I did another hard reset and then didn't start the array and no issues.  I then went down a rabbit hole of upgrading the bios on my Netgear router which seems to have failed since it didn't seem to work reliably.  Sooo I bought a new one: https://smile.amazon.com/gp/product/B01M12RE4A/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 .   Needless to say I got it configured and up and running reliably - during this time my unRAID server array had been started and was repairing my disk 14 (continuing i believe) since I had disconnected it from the network while I acquired the new router for about 3 days.  Once I had the new router configured, I attempted to connect the running unRAID server back to the network and it again (stormed, I believe) the new router network since every device again stopped communicating.  I then did a hard reboot of the unRAID server and the network came back after about 3 minutes and once the unRAID server was back online I didn't start the array.  I instead took the 3 drives from the Parity 2, Disk 8 and Disk 14 slots which appeared in the unassigned devices area and ran a pre-clear on all three for the past ~2 days with the pre & post testing skipped and had it erase and zero the drives.  Two of the drives had no problem with the preclear process and finished successfully but the third one went to 99% of the zero process and then "paused".   As I am writing this, the third just passed.   

 

So now I have 3 drives precleared (passed).  The 2 that passed are newer drives which ran fantastic until I added the new drives and the third is a bit old and not a WD like all the others it is a Seagate.  I definitely need some advice on how to recover from all this and need some direction.  Here is the Main page graphic:

image.thumb.png.58e4225ad3e02de36d5fc167ae6d6dde.png

image.thumb.png.46229a125d41d94dce0c586270170ece.png

 

I am unable to add the drives to the array and be able to start it to recover it.  

 

Whatever help you guys can be I would appreciate it.  Please HELP!!!

 

tower-diagnostics-20190414-2014.zip

tower-diagnostics-20190414-1417.zip

Link to comment
28 minutes ago, Kirkbuilder said:

I recently added another 5 drives to the array of old 2 TB drives and that is when everything started going wrong.

Since neither of the diagnostics or the screenshots have the array started, I can't tell. Do you actually have any data on any of these disks yet? Do you really need so many disks?

 

I recommend not installing any more disks than you actually currently need for the capacity, and add others later as needed. And I recommend not using such small disks or old disks either. The fewer disks the fewer opportunities for problems, and larger disks are typically more cost effective and perform better.

 

Even dual parity is not a guarantee if you put a lot of possibly unreliable disks in the array. Parity by itself, even dual parity, cannot recover anything. Parity PLUS ALL other disks must be reliably read in order to reliably rebuild a disk.

 

Rather than go through all the trouble to go through all the diagnostics with such a large number of disks, I will wait for your responses.

Link to comment
4 minutes ago, trurl said:

I recommend not installing any more disks than you actually currently need for the capacity, and add others later as needed. And I recommend not using such small disks or old disks either. The fewer disks the fewer opportunities for problems, and larger disks are typically more cost effective and perform better.

Thank you for your quick reply!  I am good with not adding more drives than I have now, but the point was to get to use my older drives till they died and replace them with 8tbs.  I see your point about reducing points of failure.  I do have data on the drives that I would NOT like to loose if possible.  I will consider reducing.  I do need to get past this point though.  Is there anyway for me to recover the current array considering I have 1 parity that died and 2 disks?  

 

I was able to start the array removing the parity 2 and disk 8 replacements.  Here is the pic. :

image.thumb.png.edb716bcab59eddc339510796003d63b.png

image.thumb.png.bbc36ab60befd4dbd7811e5f94f0fc5c.png

 

Started diagnostics attached.  It would appear that the Parity-Sync/Data-rebuild is working very quickly - I imagine that is due to the fact that it has been trying to rebuild it for the past week and its checking.  Any suggestions would be welcome.  Sorry I am new to this.

 

tower-diagnostics-20190414-2014 (1).zip

Link to comment

Doesn't look like the rebuild is going well. Lots of this in syslog:

Apr 14 21:37:21 Tower kernel: md: recovery thread: multiple disk errors, sector=624

Do any of those 2TB disks actually have anything on them? They look pretty empty to me. Since disk14 is unmountable I can't tell if it has anything on it or not. I am guessing 14 and 15 are new disks with little or nothing on them also. And rebuilding 14 is very unlikely to make it mountable anyway. Certainly the current rebuild is not going to be a good result.

 

It might be better to just remove all those 2TB disks. It doesn't seem like you need the capacity. If there is really anything on them you could work with them one at a time with Unassigned Devices and copy the data off.

 

Might be simpler to just New Config without those small disks, and even the new 8TB disks, rebuild parity and parity2, and then reconsider whether and how you want to expand your array.

 

I am going to ask for another opinion from @johnnie.black

Link to comment

Also, I have to ask. Do you have another copy of anything important and irreplaceable? You absolutely must have backups of important and irreplaceable data. Parity, even dual parity, is not a substitute for backups. You don't have to backup everything. You get to decide what qualifies as important and irreplaceable.

Link to comment
40 minutes ago, Kirkbuilder said:

Okay, so I was able to get Disk 8 and 14 formatted and up.  But now the sync process is EXTREMELY SLOW on Disk 8.  Disk 14 took no time to sync.  

Just in case you don't know, parity can't recover any data from a disk after formatting it in the parity array. Format is a write operation. It writes an empty filesystem to the disk. Unraid treats that write operation just as it does any other, by updating parity. So parity now agrees that those disks have an empty filesystem on them. I hope those disks didn't have anything on them you wanted to keep.

 

I don't know what you did or mean about disk14 taking no time to sync. Rebuilding an 8TB disk should take 16+ hours. That screenshot showing the time to rebuild disk8 is indeed too slow, but I don't see any errors. It was just getting started though so maybe it improved.

 

Might not be any point to rebuilding those formatted disks anyway. The only reason to rebuild a formatted disk is to avoid rebuiilding parity. And if you remove those 2TB disks you are going to have to rebuild parity anyway.

 

But now that I have taken the time and trouble to look at SMART for all disks.

 

Disk1, not recommended:

Serial Number:    2TG94U7D
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
197 Current_Pending_Sector  -O---K   100   100   000    -    32

 

Disk4, not recommended:

Serial Number:    7SHAMRBC
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
197 Current_Pending_Sector  -O---K   100   100   000    -    8

 

Disk9, not recommended:

Serial Number:    WD-WCAZA0362193
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
197 Current_Pending_Sector  -O--CK   200   200   000    -    5
198 Offline_Uncorrectable   ----CK   200   200   000    -    2

 

Disk10, definitely NOT:

Serial Number:    WD-WMAZA0105551
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
197 Current_Pending_Sector  -O--CK   200   001   000    -    224
198 Offline_Uncorrectable   ----CK   200   200   000    -    11

 

Disk13, trash:

Serial Number:    WD-WMAZA0398115
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   058   058   140    NOW  1132
196 Reallocated_Event_Count -O--CK   001   001   000    -    882
197 Current_Pending_Sector  -O--CK   200   001   000    -    106
198 Offline_Uncorrectable   ----CK   200   200   000    -    5

 

Possibly one or more of these disks is the reason for your slow rebuild of disk8

 

Might be simpler to New Config with only the good 8TB disks, resync parity and parity2, then use Unassigned Devices to try to copy the data from the other disks.

Link to comment

And maybe this should really be the first priority.

1 hour ago, trurl said:

.... Do you have another copy of anything important and irreplaceable? You absolutely must have backups of important and irreplaceable data. Parity, even dual parity, is not a substitute for backups. You don't have to backup everything. You get to decide what qualifies as important and irreplaceable.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...