Jump to content

Replacing multiple failed drives faster: Method


xtrips

Recommended Posts

Hello,

 

One of my 3T drives has many read errors and brought my NAS to a crawl.

And my cache disk isn't fast enough, so I bought an SSD.

Now the Parity check is running but needs 330 more days!! to finish. (LOL)

 

So I thought, could I load my NAS without the Parity disk somehow, then physically move the files remaining on the fail disk and the cache to other disks, then shutdown and replace both disks, and finally re-attach the Parity disk?

And then what do I do?

Link to comment

Nowhere near enough information here to reliably evaluate your options ... but I'll offer a few thoughts based on what you've said ...

 

=>  If you have a failed disk, then you should NOT be doing a parity check.    This is likely to "correct" parity incorrectly (due to errors on the failed disk) ... and you then won't be able to reliably rebuild the bad disk.

What you SHOULD have done was replaced the failed disk with a new 3TB drive and let the system rebuild the data from the failed disk onto the new one.    IF, however, you did not have good parity to begin with, that wouldn't have worked either.

 

=>  If by chance the parity check you're currently attempting to run is a non-correcting check, you can abort it and, if the parity was previously good, can still do the rebuild I just noted.

 

=>  If you do NOT have good parity, then what you CAN do is attempt to recover the data from the failed disk by attaching it to a PC and installing the free Linux Reader and seeing if you can read the data on the drive (copying the recovered data to backup disk).  Or you may be able to use one of the Linux recovery tools, depending on the file system on the disk (e.g. Reiserfsck can do wonders at recovering data from Reiser formatted disks).

 

=>  As for loading the NAS without parity ... you can certainly do a New Config and assign only the drives you want to, with or without a parity disk.  But remember that if you do this, you instantly lose any possibility of rebuilding the data for the failed drive from the other array disks.    If you don't assign a parity disk, then write operations will be quicker;  but your array won't be fault-tolerant until you stop it and assign a parity disk, and let it do a parity sync.

 

 

Link to comment

Nowhere near enough information here to reliably evaluate your options ... but I'll offer a few thoughts based on what you've said ...

 

=>  If you have a failed disk, then you should NOT be doing a parity check.    This is likely to "correct" parity incorrectly (due to errors on the failed disk) ... and you then won't be able to reliably rebuild the bad disk.

What you SHOULD have done was replaced the failed disk with a new 3TB drive and let the system rebuild the data from the failed disk onto the new one.    IF, however, you did not have good parity to begin with, that wouldn't have worked either.

 

=>  If by chance the parity check you're currently attempting to run is a non-correcting check, you can abort it and, if the parity was previously good, can still do the rebuild I just noted.

 

=>  If you do NOT have good parity, then what you CAN do is attempt to recover the data from the failed disk by attaching it to a PC and installing the free Linux Reader and seeing if you can read the data on the drive (copying the recovered data to backup disk).  Or you may be able to use one of the Linux recovery tools, depending on the file system on the disk (e.g. Reiserfsck can do wonders at recovering data from Reiser formatted disks).

 

=>  As for loading the NAS without parity ... you can certainly do a New Config and assign only the drives you want to, with or without a parity disk.  But remember that if you do this, you instantly lose any possibility of rebuilding the data for the failed drive from the other array disks.    If you don't assign a parity disk, then write operations will be quicker;  but your array won't be fault-tolerant until you stop it and assign a parity disk, and let it do a parity sync.

 

I understand the concern but that's my fault. I didn't explain myself clearly. The disk is not 'failed' yet, it is rather failing. It is the only one giving me many errors for quite a few months so I guess that it is also the one responsible for the crawling pace of my NAS, but I could be wrong. Anyway it doesn't matter because if gives read errors it has to be replaced, right?

 

So, in short, I can access the content of all the disks, separately as of now.

Therefore, is my method doable?

How do I rebuild the Parity afterwards?

Link to comment

The cache disk should really not be part of the discussion, since it has nothing to do with the parity disk. You should be thinking instead about rebuilding the problem disk if that's appropriate, but we need more to give a firm recommendation.

 

Replacing the cache should be dealt with later after the array is working well. We can talk about that later.

 

You have posted this in v6 support. What version do you have exactly?

 

See v6 help in my sig and post a syslog and smart for the problem disk.

Link to comment

... So, in short, I can access the content of all the disks, separately as of now.

Therefore, is my method doable?

How do I rebuild the Parity afterwards?

 

Yes, since the data is readable, you can do what you've asked.

 

First, do a New Config and select ONLY the data disks you want to include in the new array => do NOT assign a parity or cache drive.

 

Next, Start the array and confirm you've got it set the way you want.

 

Then I would Stop the array and assign a parity disk, so your array is fault tolerant.    Then wait for the parity sync to complete.

 

And finally, you can then copy the data from your failing disk to the array (assuming there's enough space on it).

 

Link to comment

The scary thing about a failing disk is that its risk of real failure is much higher. You've not posted a SMART report or any details on what errors you are seeing so we don't know how bad it might be. But it could easily fail while doing a ginormous copy of all of its files to other disks.

 

The question you have to ask is loosing all of the data on that disk an acceptable risk to you. The chances of loss are small, but probably several orders of magnitude higher than doing a similar operation with a disk showing no signs of failure.

 

If loosing its data would leave you crying in your corn flakes, I would not recommend the course of action you laid out.

 

Instead, I would suggest rebuilding the failing disk onto a new disk, and holding the failing disk to the side as a fragile backup.

Link to comment

The scary thing about a failing disk is that its risk of real failure is much higher. You've not posted a SMART report or any details on what errors you are seeing so we don't know how bad it might be. But it could easily fail while doing a ginormous copy of all of its files to other disks.

 

The question you have to ask is loosing all of the data on that disk an acceptable risk to you. The chances of loss are small, but probably several orders of magnitude higher than doing a similar operation with a disk showing no signs of failure.

 

If loosing its data would leave you crying in your corn flakes, I would not recommend the course of action you laid out.

 

Instead, I would suggest rebuilding the failing disk onto a new disk, and holding the failing disk to the side as a fragile backup.

 

I thought about that but the data is not critically important for one, and also I was not fond of the parity I had to rebuild the data.

So I am running the copy right now and hoping it will go through without any major problems.

BTW, if I am not mistaken the SATA ports on my motherboard should be faster than the ones I added through a PCI daughterboard.

I have a set of smaller HDDs but faster (7200) and another set of much larger HDDs but slower (5000).

Which goes on the faster SATA ports?

Link to comment

The scary thing about a failing disk is that its risk of real failure is much higher. You've not posted a SMART report or any details on what errors you are seeing so we don't know how bad it might be. But it could easily fail while doing a ginormous copy of all of its files to other disks.

 

The question you have to ask is loosing all of the data on that disk an acceptable risk to you. The chances of loss are small, but probably several orders of magnitude higher than doing a similar operation with a disk showing no signs of failure.

 

If loosing its data would leave you crying in your corn flakes, I would not recommend the course of action you laid out.

 

Instead, I would suggest rebuilding the failing disk onto a new disk, and holding the failing disk to the side as a fragile backup.

 

I thought about that but the data is not critically important for one, and also I was not fond of the parity I had to rebuild the data.

So I am running the copy right now and hoping it will go through without any major problems.

BTW, if I am not mistaken the SATA ports on my motherboard should be faster than the ones I added through a PCI daughterboard.

I have a set of smaller HDDs but faster (7200) and another set of much larger HDDs but slower (5000).

Which goes on the faster SATA ports?

 

There are three elements of port speed.

  • One is related to the SATA spec - either SATA 1 (1.5 Gb/sec), SATA 2 (3 Gb/sec), or SATA3 (6 Gb/sec). See HERE for more details. Most spinning drives will be able to, depending on what it is doing, saturate a SATA 1 port. (But for most operations, you'd not see a difference). The SATA 2 port is more than fast enough for even the fastest spinners, but can hold back SSDs mildly. And SATA 3 port is fast enough for SSDs. Both the port and the drive have to support the same spec. So a SATA2 drive plugged into a SATA3 port will still run a SATA2 speeds.
     
     
     
  • The second is bus speed.  PCI cards are old and slow and I really won't go into them. I will also skip PCIX cards. PCIe cards are the most common. PCIe has gong through several spec changes - PCIe 1.1 (2.5 GT/sec), PCIe 2 (5 GT/sec), and PCIe 3 (8 GT/sec). (5 GT/sec is about equal to 4 Gb/sec). The faster speeds have mostly been for video cards, but they also apply to disk controllers. The motherboard port and the addon card controller must negotiate a spec - the slowest one wins. So plug a PCIe 1.1 controller into a PCIe 3.0 motherboard port - you are running at PCIe 1.1 speed.
     
     
     
  • The third is bus width or "lanes". When you see PCIe cards rated, often it will come with an "x" number after, like PCIe x8. The "x" number tells you how many lanes. The bus speed mentioned above is PER LANE, not per the whole card. So a PCIe 2.0 x1 card is not going to allow as much traffic as a PCIe 1.1 x8 card. It is really the combination of speed and width that is important. And like with the bus speed above, the motherboard port and the addon card must negotiate a number of lanes - fewest wins. So a PCIe 1.1. x8 controller card plugged into a PCIe 2.0 x4 slot will run at PCIe 1.1 x4 speed/width.
     
     

Motherboard ports of modern motherboards are fast and wide. You can normally run all of them at full speed and not limit disk speed.  Addon cards depend on the factors listed above and also their firmware and architectures can also influence speed. It is an oversimplification, but motherboard ports are typically the fastest ports.

 

Probably much more than you asked for - but there you go.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...