EMPTY and Remove a Drive Without Losing Parity


NAS

Recommended Posts

Is there any chance we can discuss something without someone trying to end it by saying something else is more important. :)

 

We all know there are other things to do, there always is. For instance I personally think a framework for notifications is more important but that doesn't stop me wanting or discussing this.

 

Anyways back on topic, I think  it is perfectly acceptable that at least as a starting point a requirement of removing a drive without losing parity would be for it to be either blank or in the case where someone simply want to upgrade the disk size duplicated and removed. Both of which conceptually at least can be done with zero protection loss.

 

 

Link to comment

Is there any chance we can discuss something without someone trying to end it by saying something else is more important. :)

It's all good, I appreciate the discussion.

 

We all know there are other things to do, there always is. For instance I personally think a framework for notifications is more important but that doesn't stop me wanting or discussing this.

Please start another thread for this (or point me where it already is).

 

Anyways back on topic, I think  it is perfectly acceptable that at least as a starting point a requirement of removing a drive without losing parity would be for it to be either blank or in the case where someone simply want to upgrade the disk size duplicated and removed. Both of which conceptually at least can be done with zero protection loss.

This is also a different function (upgrading disk while preserving parity), maybe start another thread for it too?

Link to comment

@Tom i will make a note to resurect the other threads when this one start petering out.

 

@lionelhutz

 

As long as the RO drive doesn't fail before it is removed your protection is kept in place. This is the least attractive option. Where it gets inelegant is if a drive fails mid this removal operation. I would suggest given the current direction of this thread this would be considered a separate usage case now.

 

i.e. The thread should be called "Remove a BLANK Drive Without Losing Parity"

Link to comment

I didnt start it i just moved the old one and rekindled it

Oh. I figured the first post in the thread could always change the title. Learn something new everyday.  :)

 

Yeah your right it seems somehow i messed up moving the thread as well. not my week it seems :)

 

Link to comment

i.e. The thread should be called "Remove a BLANK Drive Without Losing Parity"

 

Blank and zero'd are two very different things. You could delete everything off the drive but it still would need to be zero'd. Doesn't matter much, checking for no files is just out one of the safety checks that could be put in place.

 

Simply making the drive read only and then virtually zero'ing it out is no different then just using initconfig and building parity. I'm not sure how this was ever considered an option.

 

You could argue that it's possible to keep track block by block as the drive is virtually zero'd out. But you can't sync keeping track of the position with the actual writes to the hard perfectly, so a power loss or drive failure part way through could cause a loss of data right at the point it was zero'ing when the failure occurred. Then, the rebuild routine has to use 0 until the point where the virtual zero'ing stopped where it would switch to using the drive data.

 

I would suspect part of the reason for not implementing something like this is the long wait for the drive to be zero'd. You also have to unmount the drive and reset the shares before starting because the drive and data will not be accessible.

 

Link to comment

i.e. The thread should be called "Remove a BLANK Drive Without Losing Parity"

 

Blank and zero'd are two very different things. You could delete everything off the drive but it still would need to be zero'd. Doesn't matter much, checking for no files is just out one of the safety checks that could be put in place.

Right, but I know what is meant  ;D

 

Simply making the drive read only and then virtually zero'ing it out is no different then just using initconfig and building parity. I'm not sure how this was ever considered an option.

A "parity sync" is a whole lot faster than writing a single drive that's part of the array with all-zeros.  But the way I'm going to implement this is by making "remove drive from array" a variation of parity-sync.  What it will do is run like a normal parity sync/check with this variation: as it marches through the array, stripe-by-stripe (a "stripe" is the same 4K block on all disks), it will read all the data disks except the one(s) being removed, and then write parity with computed new parity, and write marked-for-removal disks with zero.  At end of operation, it will finalize the array config (removing the target disk(s) from the array).  If the operation fails at any point, or is canceled, parity is still valid and a failing disk can still be reconstructed.

 

You could argue that it's possible to keep track block by block as the drive is virtually zero'd out. But you can't sync keeping track of the position with the actual writes to the hard perfectly, so a power loss or drive failure part way through could cause a loss of data right at the point it was zero'ing when the failure occurred. Then, the rebuild routine has to use 0 until the point where the virtual zero'ing stopped where it would switch to using the drive data.

I would suspect part of the reason for not implementing something like this is the long wait for the drive to be zero'd. You also have to unmount the drive and reset the shares before starting because the drive and data will not be accessible.

Often thought about doing this, but would require lots of writes to non-volatile storage, like the flash.  But not a good idea to write the flash that many times, so should use cache drive if present, but man, that's a feature well down the to-do list  :P

 

Link to comment

But the way I'm going to implement this is by making "remove drive from array" a variation of parity-sync.  What it will do is run like a normal parity sync/check with this variation: as it marches through the array, stripe-by-stripe (a "stripe" is the same 4K block on all disks), it will read all the data disks except the one(s) being removed, and then write parity with computed new parity, and write marked-for-removal disks with zero.  At end of operation, it will finalize the array config (removing the target disk(s) from the array).  If the operation fails at any point, or is canceled, parity is still valid and a failing disk can still be reconstructed.

 

Perfect.  Quicker than zeroing the drive first, with what amounts to the same outcome.    On any UPS-protected server (which all should be), that will make removing a drive a safe, simple operation that never results in any "at risk" time  :)

 

In fact -- although I'm not sure you want to complicate things to allow this -- this would allow removal of as many disks as you want in the same pass !!

 

Link to comment

I'm going to implement this is by making "remove drive from array" a variation of parity-sync.  What it will do is run like a normal parity sync/check with this variation: as it marches through the array, stripe-by-stripe (a "stripe" is the same 4K block on all disks), it will read all the data disks except the one(s) being removed..

I think you should not do any special work to implement this speciffically for the "remove disk" feature --  it should come naturally from your "write mode" tuneable.  If the tuneable is flipped to reconstruct mode, then yes, the zeroing will be done by reading from all disks.  But if the tuneable is set to the current "normal" mode, then all the other data disks can be spun down nicely during the whole process.  Personally, I'd very much prefer the latter.

 

Also, please think about a "Trust my disk is zeroed" option (if you must, with "yes I know what I'm doing" checkbox :)).  That way one can easily remove several disks at the same time.

 

 

Link to comment
  • 4 weeks later...

Hi, I'm not sure was worth it's own thread and I thought this was the most relevant.

 

I'm currently decommissioning my server to move to a new setup (new drives and all). I'm at the last two drives in my current setup (1 Data, 1 Parity). Once I've removed all data from disk1, is all of the data from the server removed? So at this point, I can just remove the parity drive without worrying about losing anything?

Link to comment
  • 3 months later...

A "parity sync" is a whole lot faster than writing a single drive that's part of the array with all-zeros.  But the way I'm going to implement this is by making "remove drive from array" a variation of parity-sync.  What it will do is run like a normal parity sync/check with this variation: as it marches through the array, stripe-by-stripe (a "stripe" is the same 4K block on all disks), it will read all the data disks except the one(s) being removed, and then write parity with computed new parity, and write marked-for-removal disks with zero.  At end of operation, it will finalize the array config (removing the target disk(s) from the array).  If the operation fails at any point, or is canceled, parity is still valid and a failing disk can still be reconstructed.

 

Bump. Since this thread slowed down I have upgraded several disks (at least 5) and every time it irks me the way to do this currently is so risky.

 

Is this planned for the v6 cycle or later?

Link to comment

A "parity sync" is a whole lot faster than writing a single drive that's part of the array with all-zeros.  But the way I'm going to implement this is by making "remove drive from array" a variation of parity-sync.  What it will do is run like a normal parity sync/check with this variation: as it marches through the array, stripe-by-stripe (a "stripe" is the same 4K block on all disks), it will read all the data disks except the one(s) being removed, and then write parity with computed new parity, and write marked-for-removal disks with zero.  At end of operation, it will finalize the array config (removing the target disk(s) from the array).  If the operation fails at any point, or is canceled, parity is still valid and a failing disk can still be reconstructed.

 

Bump. Since this thread slowed down I have upgraded several disks (at least 5) and every time it irks me the way to do this currently is so risky.

 

Is this planned for the v6 cycle or later?

 

I am confused by Tom' s post. Although a Parity Check may be faster than zeroing a disk, a parity build it not. Also, using the "fill with zeroes" method of drive removal, followed by "set invalidslot 99" (which evidently is working again), is a safe operation. If you were to kill the command writing the zeroes, the parity would be in sync and you'd be able to do a drive reconstruction. But since you are writing to one disk, only that disk and parity are being updated, The rest are completely idle and could even be spun down.

 

While NAS, I understand what you are saying about doing a drive update, which leaves your array unprotected if you use the array at all while the reconstruction occurs, I don't understand how the referenced post makes that problem any easier?

 

Actually the safe way to replace a disk would be to zero the new disk, do a sector by sector clone of the disk to the new disk while the new disk is unmounted and outside of the array (leaving bytes above the capacity of the drive zero), Then, stop the array, take a screenshot of the configuation, do a "New Config," and then reassign all of the drives to the new array except exchange the disk you want to remove with the disk you just created, run "set invalidslot 99", and then start the array. Array should start and parity should be perfect. My only question would be how to get the OS to expand the disk to enable it to provide its full capacity. There much be a command to do this, as it is what Tom is doing behind the scenes once a rebuild is done.

 

Disclaimer - if someone actually wants to follow this procedure, post back and get more detailed instructions. Although I believe the general approach to be sound, you do so at your own risk.

Link to comment

If you read what Tom is saying he means that a block is read and the parity created from the disks that will be remaining. Concurrent to each block being read to build parity the disk(s) being removed have the same block zeroed. So, since it's doing both operations at the same time the disk being removed will be zero in the correct spots to keep parity still valid with it in place if something bad happens during the process. This method of re-creating parity would be faster than the read/write of zeroing a single drive using it and parity as the only 2 disks involved in the process.

 

However, now that there is the write method (which I can't remember the name of) to use all disks when writing to a single drive which greatly speeds up writes, I don't see the point of adding this extra stuff to zero a drive. Just set the array to use that write method and the zero writes will use all drives and be much faster.  I don't recall reading of anyone testing using this method to clear a drive but it should run in the 80Mbps to 100Mbps or maybe even faster on new hardware. When that new parity update method appeared I actually suspected Tom added it to the md driver to support the future addition of quickly clearing and removing a drive into emhttp.

 

As for disk replacement, the use of any zeroing is unneccessary in a fully supported safer method. Clone the new drive from the one being replaced, expand it and write zeroes to the expanded part and then you have a new drive that could replace the current one while keeping parity correct.

 

Link to comment

Actually the safe way to replace a disk would be to zero the new disk, do a sector by sector clone of the disk to the new disk while the new disk is unmounted and outside of the array ...

 

r.e. replacing a disk ==>  Replacing a disk isn't what this thread is about.  That is, in fact, very simple -- UnRAID is already designed to do just that, by automatically restoring the data from the old disk to the new and expanding the file system on the new disk.

 

The discussion here is about REMOVING a disk -- i.e. reducing the drive count in your array.  One way to do this, as already outlined, is to write all zeroes to the disk you want to remove; then do a New Config without that disk but with the "Trust Parity" option ... since you know that parity is correct if the removed disk was in fact all zeroes.

 

The process Tom outlined, and suggested he planned to implement, would work just fine.  I'm not sure it's worth the bother, however => reducing the drive count doesn't seem like a very common task; and it's simple enough to do by just doing a New Config [After, of course, doing a parity check to confirm everything is okay in the array before running at risk for the duration of the new parity sync.]    It's true, however, that Tom's method -- or any of several others outlined in this thread -- would eliminate that "at risk" time.

 

Link to comment

I'm not sure it's worth the bother, however => reducing the drive count doesn't seem like a very common task;

 

I would disagree. I would suggest that in time it is, or will be, the exact opposite of this.:

 

#unRAID array license drive limit reached

#physical case drive limit reached

#drives churn i.e. that 500GB drive can be replaced by a 4TB drive

 

People do this now by breaking the array parity. This is fundamentally wrong.

 

and it's simple enough to do by just doing a New Config

 

I would suggest whilst it is conceptually simple it is not elegant and fundamentally risky requiring users to use dd on the command line for what should be a simple task. This is why in the real world it is not common for people to do follow the optimal parity loss approach and fall back to the GUI method (again this is fundamentally wrong).

 

You can add a new drive to unRAID without dropping parity, it seems like an commission to not be be able to do the reverse.

Link to comment

#unRAID array license drive limit reached

#physical case drive limit reached

#drives churn i.e. that 500GB drive can be replaced by a 4TB drive

 

None of these strike me as reasons you'd want to reduce the # of drives.  To wit:

 

(a) UnRAID license limit reached ==> So?  Why would that cause you to reduce the # of drives in the array?

 

(b)  physical case drive limit reached ==>  So you can't add any more drives ... but, again, why would that induce you to reduce the # of drives?

 

©  drive churn -- replacing a 500GB drive with a 4TB drive => This can already be done natively within UnRAID, and it does NOT result in fewer drives.

 

 

 

You can add a new drive to unRAID without dropping parity, it seems like an commission to not be be able to do the reverse.

 

I agree it'd be nice to be able to do it ... and Tom has indicated he plans to add that capability at some point -- I just don't think it's a particularly high priority feature.

 

 

 

Link to comment

Why would that cause you to reduce the # of drives in the array?

See ©. End result you can expand the capacity of your array without parity downtime.

 

(b)  physical case drive limit reached ==>  So you can't add any more drives ... but, again, why would that induce you to reduce the # of drives?

See ©. End result you can expand the capacity of your array without parity downtime.

 

©  drive churn -- replacing a 500GB drive with a 4TB drive => This can already be done natively within UnRAID, and it does NOT result in fewer drives.

Not without failing the array which is a completely unacceptable solution if you really think about it.

 

I just don't think it's a particularly high priority feature.

I think it is because it is a missing fundamental.

Link to comment

So you don't really want to reduce the # of drives -- you want a way to replace a drive with a larger drive without the need for the rebuilding process.    I gather your concept is to (a) copy all of the data off the drive to be replaced;  (b) remove it from the array;  ©  add a pre-cleared larger drive in its place;  and then (d) copy all the data back to the drive.

 

I agree that to do that, you need the ability to remove a drive from the array in cases where you don't have space or SATA ports to add the new, larger drive first.

 

But it sure seems like the current process is a lot easier way to do that  :)

 

 

Link to comment

Easier yes but fundamentally wrong :)

 

You should NEVER  NEVER NEVER have to run without parity. Not only does the current way require you to fail the parity it also really requires you check parity first (many hours of spinning all disks) then to add insult to injury you need to run for many hours in a failed parity state with all disks spinning again.

 

Thats just wrong especially when there is a elegant way to do it without any of that nonsense and all that stops us is some code (and likely not much of it).

 

So I have taken up the torch to address this basic short coming in amongst a sea of very cool but essentially shiny features.

Link to comment

... it also really requires you check parity first ...

 

Is that a bad thing?    Don't you always want to KNOW you have good parity?  There's no "requirement" to check parity first -- but it's certainly a good idea to do so, so you KNOW all is well before you do something that depends on it !!

 

 

... You should NEVER  NEVER NEVER have to run without parity. ...

... you need to run for many hours in a failed parity state ...

 

When you do a drive upgrade you are NOT running without parity, nor is parity in a "failed state".  It IS true that if you do ANY writes to the disk you're upgrading during this time, you can't recover if another disk fails;  but as long as you DO NOT WRITE to the disk you're upgrading, you could, if necessary, restore the system to its former state (by replacing the original disk and doing a New Config with the "Trust Parity" option) ... and then recover from some other disk failure.    Note that this is a far better option than you have during the process of reconstructing a disk during an actual disk failure.

 

It's true, of course, that there are no warnings about not writing to the disk you're upgrading, nor any simple-to-follow instructions for recovering if a different disk fails during the process -- but help for this is readily available on the forum, and it's a fairly unlikely event anyway.

 

The REAL "solution" to the issue you're concerned about here -- which would also resolve the "running at risk" issue you have when reconstructing a failed disk -- is dual parity  :)    I'd MUCH rather see Tom's energies directed at implementing that than at a process of removing disks from the array -- which can already be trivially done by just doing a New Config (granted, with some "at risk" time during the subsequent parity sync).

 

 

It's interesting, by the way, that you admonish that you should "...  NEVER  NEVER NEVER have to run without parity ..."  while at the same time complaining about a process where you should "... check parity first."    I fully agree that you should NEVER run without parity -- but you should also CHECK IT with some regularity, and certainly before you do anything that depends on it if you can (clearly it's too late to check it if a disk fails and you have to simply do a reconstruction on a replacement).

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.