[REQUEST] Replace drive without stopping protection during process


Recommended Posts

Currently, when you replace a drive (eg to upgrade the drive capacity) your array is unprotected whilst the new drive is rebuilt.

 

Would it be possible to add the ability to replace a drive without the loss of protection during the process?

 

In my mind it could work something like:

[*]Insert new (precleared) drive

[*]Assign drive to already occupied slot

[*]Without unassigning the existing drive, UnRaid proceeds to 'rebuild' the drive data onto the new drive (effectively cloning the existing drive)

[*]Once the 'rebuild' is complete, the old drive can be unassigned from that slot (this could even happen automatically)

 

The key point here is that during the 'rebuild' step, both drives are present and if any disk errors occur then the array stays protected.

I know that you can 'simply' put the old drive back in if there's a problem, but this should make the process less risky I think.

 

Link to comment

I think that's a great idea.  No idea if it's something they can implement or not.  But this gets my vote too as a requested feature.

 

I'm thinking protection could also be achieved during this process by adding dual parity.  But there are a lot of other threads discussing that, and it's a feature that I don't even think is on Limetech's radar right now.  It's a shame really.  This would accomplish what you are asking for, plus it would have other positive advantages as well. 

Link to comment

I'm thinking protection could also be achieved during this process by adding dual parity.  But there are a lot of other threads discussing that, and it's a feature that I don't even think is on Limetech's radar right now.  It's a shame really.  This would accomplish what you are asking for, plus it would have other positive advantages as well.

Good point - dual parity achieves this as well.  Since LimeTech have said dual parity is on their roadmap (albeit with no ETA) maybe that is the better way to achieve the desired result.
Link to comment

There are a variety of ways to do this, but all have some risk.  Clearly the best solution is dual fault-tolerance, which will hopefully be implemented "one of these days"  :)

 

Probably the safest way is to do this:

 

(1)  Add the new, larger, pre-cleared drive to the array;

 

(2)  Copy the entire contents of the old drive to the new drive (drive -> drive copy; NOT using user shares);

 

(3)  Write zeroes to the entire old drive (once this is done, it can be removed with no impact on parity);  This needs to be done with dd to the md device, so parity is maintained during the writes.    The process for this is outlined in several threads here.

 

and then

 

(4)  Do a New Config with the "Trust Parity" option, leaving out the old drive  [The new drive can be assigned to the old slot if desired].

 

Parity remains valid through this entire process, so the array is always protected.

 

 

Link to comment
(3)  Write zeroes to the entire old drive (once this is done, it can be removed with no impact on parity);  This needs to be done with dd to the md device, so parity is maintained during the writes.    The process for this is outlined in several threads here.
FYI: I got a kernel panic when I did this when I was converting from ReiserFS to XFS on b14b.  Looked like a problem in the ReiserFS drivers according to the panic dump.  So yet another reason for me to change to XFS when I upgrade my servers to b14b or later.

 

see below for reason for edit.

Link to comment

I would have thought that this step must be done with the file system on the disk not mounted (i.e. in Maintenence mode) otherwise you will get exactly this sort of problem with a file system driver attempting to access the mounted file system on the disk because you have just "corrupted" the mounted file system by writing zeros over it.

Link to comment

I would have thought that this step must be done with the file system on the disk not mounted (i.e. in Maintenence mode) otherwise you will get exactly this sort of problem with a file system driver attempting to access the mounted file system on the disk because you have just "corrupted" the mounted file system by writing zeros over it.

And I'm sure you are correct.  Don't remember if I put it into maintenance mode or not now so ignore my post.
Link to comment

Why not go one step further and have the ability to assign a drive as a hot spare for automatic failover provided the spare is at least the size of the failed drive?

 

The question discussed in this thread isn't about replacing a failed drive -- it's about replacing a good drive with a new, larger drive.

 

Link to comment

There is probably a pretty easy way to accomplish this by dd'ing your old drive to a precleared new drive (outside of the array), doing a new config, redefining the array to include the new and exclude the old, and trusting parity.

 

A seldom remembered alternate way to remove a drive from the array without zeroing it is to add a new disk to the array, and clone the old disk to the new disk with dd. You can then do a new config and omit both disks and trust parity. Two bit for bit equivalent disks perfectly cancel each other out from a parity perspective.

Link to comment

Well timed post.

My big problem of late has been while trying to upgrading the parity drive I have one of the other drives fail. This means I now how a dual failure and data loss. It's happened to me twice recently. I'm pining for some form of dual parity or redundancy to protect parity while upgrading.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.