EMPTY and Remove a Drive Without Losing Parity


NAS

Recommended Posts

Yes! Replacing a drive either with a bigger one or because it is getting a lot of errors/is failing is a huge pain right now and involves the command line.  

 

Synology makes this easy, you just select a drive, tell it you want to remove it and it copies all data off to other drives, updates parity and then removes the drive from the array automatically.  You just have to wait (a very long time of course) for it to mark the drive ok to remove.  

 

I would love to have an easy drive removal feature like this that minimizes array downtime as well.  Having to zero a drive with the array stopped in order to remove the drive is kind of and insane amount of downtime.

Link to comment
3 hours ago, yippy3000 said:

Replacing a drive either with a bigger one or because it is getting a lot of errors/is failing is a huge pain right now and involves the command line.  

 

Actually that's not true.   You can simply replace the drive and UnRAID will rebuild its contents on the new drive.   It's true that if you have a single parity system you are "running at risk" during that rebuild -- but if you're running dual parity you're still protected from another drive failure during the process.

 

Link to comment

I think Yippy's point is still valid though; the array shouldn't have to be put in a degraded (or partially so, with dual parity) state, simply to replace a drive. 

 

I think a "mirror commands going to parity to new drive as well, until new drive matches parity, then swap, and mark now extra parity as unneeded" sounds like the logical solution here. I'd imagine something similar already happens when adding a second parity drive, so surely some of the work is already done.

Link to comment
16 minutes ago, -Daedalus said:

I think Yippy's point is still valid though; the array shouldn't have to be put in a degraded (or partially so, with dual parity) state, simply to replace a drive.

What Raid system doesn't run in a degraded system while its rebuilding a drive?

 

The main concern of the OP was in actually removing a drive without running in a degraded state, which with unRaid does involve zeroing out the drive while its still installed followed by removal.

4 hours ago, yippy3000 said:

Yes! Replacing a drive either with a bigger one or because it is getting a lot of errors/is failing is a huge pain right now and involves the command line.

I've been running unRaid since 2012 (started with 5.0 beta 8), and for the life of me, I wouldn't know how to do this via the command line.  Replacing / upgrading a drive is simply, and only involves the webGUI

Link to comment
11 minutes ago, Squid said:

 

I've been running unRaid since 2012 (started with 5.0 beta 8), and for the life of me, I wouldn't know how to do this via the command line.  Replacing / upgrading a drive is simply, and only involves the webGUI

 

I think he was more referring to emptying a drive for removal and assumed the drive was still working such that you could move the file off then zero out the drive.  And then never losing parity protection..

Edited by jbuszkie
Link to comment

Yes, syrys's issue was removing a drive -- not replacing it.    And clearly he didn't understand that the only way to do this without losing parity protection was to completely zero the drive (so removing it won't impact parity); and then just do a New Config without the drive and marking "parity is already valid" => or at least he didn't realize that zeroing the drive would destroy all of its content.

 

But the note r.e. "... Replacing a drive either with a bigger one or because it is getting a lot of errors/is failing is a huge pain right now and involves the command line." is simply not true => it's a VERY simple process and does NOT involve any use of the command line.

 

Link to comment
5 hours ago, yippy3000 said:

Replacing a drive either with a bigger one or because it is getting a lot of errors/is failing is a huge pain right now and involves the command line

 

I agree with Gary that the above statement is ABSOLUTELY NOT true. Replacing a failed disk or upgrading to a bigger disk is a simple matter of assigning the new disk in the GUI and unRAID will automatically rebuild and expand if needed.

 

Edited by bonienl
Link to comment

If you just straight up replace the disk and have Unraid rebuild, aren't you losing parity protection during that operation as it is the same as a drive failure?  I don't consider that an acceptable option.  I want to be able to remove a drive completely and not have to do the zeroing and new config dance.  Then, only after it is removed safely with parity kept in tact, will I add the new drive.

Link to comment
3 minutes ago, yippy3000 said:

If you just straight up replace the disk and have Unraid rebuild, aren't you losing parity protection during that operation as it is the same as a drive failure?

Not at all.  If the rebuild fails, (bad replacement, etc)  Nothing's changed.  Fix the problem and rebuild again.

 

EDIT:  If another drive does happen to drop dead during the rebuild, and you're only running single parity, then yes you will have exceeded the resiliency of the array.  At that point you put the original drive back in and start everything all over again.

 

 

Edited by Squid
Link to comment
5 minutes ago, Squid said:

Not at all.  If the rebuild fails, (bad replacement, etc)  Nothing's changed.  Fix the problem and rebuild again.

 

But if you lose another drive while you are rebuilding...  Then you are screwed...

 

Unless you have double parity....  which I don't  (yet)

 

Edited by jbuszkie
  • Upvote 1
Link to comment
1 minute ago, jbuszkie said:

But if you lose another drive while you are rebuilding...  Then you are screwed...

Exactly, this is my complaint.  The way around this is to safely remove a drive completely and then add a new one, not just swap as if it failed.

 

1 minute ago, bonienl said:

 

Removal of a disk (thus changing the array) is something completely different then replacing a failed disk for a new disk.

Then replacing a failed disk, yes, but I am concerned about replacing a working disk.  It is easy if you are ok with going into a reduced parity state during rebuild but considering rebuilds stress the drives and is the MOST likely time for another failure I think that is too risky.  There should be a way to safely remove a drive (change the array) then add a drive in to replace it.  The adding a drive is easy so really, we just need an easy way to completely remove a drive safely.  Whether the goal is to just downgrade the array size or replace it later shouldn't matter.

Link to comment
Just now, jbuszkie said:

But if you lose another drive while you are rebuilding...  Then you are screwed...

Run dual parity.  That's why it exists.  For the possibility of another drive dying while the array is in a degraded state.

 

The process of removing a drive (no matter the process), and then re-adding another one is going to be even more stressful on the drives involved than a simple rebuild.

 

Link to comment
3 minutes ago, Squid said:

 

 

The process of removing a drive (no matter the process), and then re-adding another one is going to be even more stressful on the drives involved than a simple rebuild.

 

 

Not true..  Zero-ing  out the drive will be stressful..  but you're protected..  Adding a drive not stressful if it's been pre-cleared.  

Link to comment
4 minutes ago, jbuszkie said:

Adding a drive not stressful if it's been pre-cleared.  

Not at all.  You're just stressing it via the preclear (which is actually more stressful on the drive than adding a drive that's not been precleared)  But on the array itself sure.  But  yippy is going to wind up copying the data back onto the newly added drive anyways.

 

34 minutes ago, yippy3000 said:

 Then, only after it is removed safely with parity kept in tact, will I add the new drive.

My whole argument is that a rebuild / expansion is painless, not excessively stressful at all, and any process on any system to remove a drive and then re-add it is more stressful than the rebuild itself.  

 

Link to comment
3 minutes ago, Squid said:

Not at all.  You're just stressing it via the preclear (which is actually more stressful on the drive than adding a drive that's not been precleared)  But on the array itself sure.  But  yippy is going to wind up copying the data back onto the newly added drive anyways.

 

 

But it can all be done with the array single parity protected.

 

 

Quote

My whole argument is that a rebuild / expansion is painless, not excessively stressful at all, and any process on any system to remove a drive and then re-add it is more stressful than the rebuild itself.  

A agree a rebuild/expansion is painless...  but without dual parity, you are left un protected.  The thought is there IS a way to do it without losing parity protection and should be supported without using command line stuff.  We shouldn't have to be forced to go to dual parity if there is a way to do it without.  I'd love to run dual parity..  but spending the $$ on another 8T disk just for dual parity isn't in the budget. Luckily I've be able to do the process described in the thread.. but I know my way around the command line (a little bit, anyway) :D

 

 

Link to comment
1 minute ago, jbuszkie said:

but without dual parity, you are left un protected

But you're not.  You still have the original drive.

 

IMO, you're being excessively paranoid.  Actual drive failures are rare.  Actual "supposed" drive failures caused by inadequate cabling, etc are common, and no system can protect you from yourself.

 

I run a pair of 12 drive servers, each with a single parity disk.  I have no plans to ever upgrade to dual parity.  With unRaid's implementation, and how often drives actually fail, I'm not worried at all about the possibility of losing data.  This isn't RAID after all.  Perfectly working drives don't get dropped from the array simply because they took a hair longer to respond to a request for data than they should have.  I do have however the data that is irreplaceable and important backed up offsite.

 

That's all I'm going to say on this...

Link to comment
36 minutes ago, yippy3000 said:

Exactly, this is my complaint.  The way around this is to safely remove a drive completely and then add a new one, not just swap as if it failed.

 

Then replacing a failed disk, yes, but I am concerned about replacing a working disk.  It is easy if you are ok with going into a reduced parity state during rebuild but considering rebuilds stress the drives and is the MOST likely time for another failure I think that is too risky.  There should be a way to safely remove a drive (change the array) then add a drive in to replace it.  The adding a drive is easy so really, we just need an easy way to completely remove a drive safely.  Whether the goal is to just downgrade the array size or replace it later shouldn't matter.

So, this would be the "less stressful" process:

 

1) Copy all the data on the drive to be removed to other disk(s) in the array. This stresses the drive(s) that are being copied to, and also parity since any write also updates parity.

 

2) Zero the drive to be removed. This stresses the drive to be removed, which as you said, is still a good drive. And it stresses parity since writing those zeros also updates parity. That is the whole point of zeroing the drive while it is in the array, so parity can be updated and will be valid when the drive is removed.

 

3) Remove the drive

 

4) Add a new drive, which was either precleared (pre-stressed) or will be cleared (stressed) by unRAID when added.

 

5) Move the data from the other drive(s) to the new drive. This stresses the drives that the data is being moved from, and the new drive, since both are written. And it stresses parity since it is also updated when the new drive is written, and when the files are removed from the other drive(s). Any change to the contents of a disk is a write, whether copying, writing a file, deleting a file, or formatting. Any of these operations update parity.

 

So, in the end, instead of just reading all the drives in order to rebuild to a new drive, which only writes to the rebuilt drive, you have actually spent a lot of extra time and trouble writing other drives. And of course, at any point during this you could have a failure anyway and lose parity protection.

 

  • Upvote 1
Link to comment
20 hours ago, Squid said:

What Raid system doesn't run in a degraded system while its rebuilding a drive?

 

The main concern of the OP was in actually removing a drive without running in a degraded state, which with unRaid does involve zeroing out the drive while its still installed followed by removal.

 

My apologies; I should have been clearer. I was talking about replacing a working drive with another (bigger) working drive. Naturally replacing a failed disk would mean the array is running degraded at some point, however that shouldn't ever need to be the case when you haven't had a failed disk.

 

And yes, you could say "Run dual parity", and that would protect you in this case because you'd still be partially degraded, and on principle that's stupid, because it shouldn't have to happen. It's also not always viable. Some people don't want to (out of physical slots, for example) or it wouldn't make sense (4-disk array).

 

19 hours ago, Squid said:

Not at all.  If the rebuild fails, (bad replacement, etc)  Nothing's changed.  Fix the problem and rebuild again.

 

This is only true if you haven't written anything to the array in the mean time, because the new data wouldn't be reflected in the new drive, so you'd end up having to run a check afterwards (which you should anyway, but again, if this was handled more robustly, you shouldn't have to).

Edited by -Daedalus
Done goofed words
Link to comment

As trurl noted, the hoops you have to jump through to "reduce the perceived stress" actually cause a lot more stress on the drives than a simple rebuild.

AND if you're rebuilding a good drive to a larger drive there's really very little risk -- even if you only have single parity.

 

The reason is simple:

 

The basic process (as I'm sure you know) is to simply replace the drive and let UnRAID do a rebuild.   The vast majority of the time this will work fine and you're done.

 

If you have a problem, you can easily recover ...

 

(a)  If the problem is the new disk has errors; simply replace it with another drive and start over.

 

(b)  If the problem is an issue with another drive, you can restore the system to a good state by switching back to the original drive;  doing a New Config with the "parity is already good" option box checked; and can then replace the other failed drive.   Note that this will work even if you've written new data to the array, as long as that new data wasn't to the drive being replaced.

 

And just to re-emphasize the obvious, if you have a dual parity system the likelihood of a rebuild failing is VERY small ... it should virtually always succeed unless there's an issue with the new disk itself.

 

 

Link to comment
On 7/25/2017 at 11:40 AM, yippy3000 said:

Synology makes this easy, you just select a drive, tell it you want to remove it and it copies all data off to other drives, updates parity and then removes the drive from the array automatically.  You just have to wait (a very long time of course) for it to mark the drive ok to remove.  

 

I would love to have an easy drive removal feature like this that minimizes array downtime as well.  Having to zero a drive with the array stopped in order to remove the drive is kind of and insane amount of downtime.

I like the feature of being able to remove a disk and having unRAID reallocate its files to other disks and then remove it from the array. I think that would be a nice addition. How it happens under the hood would be up to LimeTech, but likely one of the methods below.

 

But until we get it, here are the options as I see them for removing a disk from the array.

 

Removing a disk Options (Assumes you've moved/copied any files from the disk to other array disks)

1. New config, exclude the disk, rebuild parity. GUI/L.

2. Same as #1, but perform a parity check beforehand and confirm all SMART reports are clean. GUI/VL

3. Similar to #1 or #2, but using a new parity disk. CMD/VVL. If you backup your flash and then do the rebuild with a different parity disk, and also refrain from array writes during the parity build, you can maintain recoverability from a disk failure. If all goes well, this as all GUI work (plugin to back up the flash), but if you actually need to recover, it gets involved with multiple manual steps and maybe a trip or two to the command line.

4. Zero it, new Config, exclude the disk, trust parity: CMD/VVL. Filling disk with zeroes while parity protected affects only the disk and parity (not using reconstructed write). If the disk fails, it can continue to completion with no problem as disk will be simulated and parity will continue to get updated. If parity fails, sort of doesn't matter any more but no harm done. The other disks are idle and unaffected. After it is zeroed, you can do a new config, exclude the disk, and trust parity.

---

GUI - uses only unRAID GUI, few manual steps

CMD - requires command line or plugin/docker features

VVL - very very low risk - protected from any single disk failure

VL - very low risk - single disk failure risk lowered

L - low risk - normal risk of single disk failure

 

I personally almost never remove a disk. But I sometimes do a mass disk replacement, where I combine several smaller disks onto a fewer number of larger disks, and then rebuild with a larger parity, excluding the smaller and including the larger disks. I use an approach similar to #3 so I can recover if a disk happens to fail at an inopportune moment.

 

The thing I don't like about zeroing (#4) is that the disk contents are lost. I like to keep the old disks as backups. But if one or more of my array disks were questionable, and I HAD to remove this one disk that I knew was good, this is what I would do. Here is a somewhat common use case. A disk is failing and someone wants to pull a newly added disk from the array and use it for the repair. I would consider this option. The failing disk sits idle, the new disk gets zeroed and parity gets updated, and when done that new disk can be used to replace the failing disk.

 

But if I really had to remove a disk from a healthy array, I'd likely go with #2. Drive failures are beyond rare. The chances of a failure happening after I've doing a recent parity check and all my disks are healthy - I can tolerate that risk once in a great while. But if my array were finicky, I'd go with option #3 or #4.

Link to comment

While it would be a "nice feature", I think there's little actual need for a "remove a disk" function.    Few folks have DECREASING storage needs ... I suspect virtually everyone on this forum has significantly INCREASED their array capacity over the years.   And replacing a disk with a larger disk is already a trivial function that's easily done via the GUI.    If I did want to remove a disk, I'd simply do a parity check to confirm all was well; then do a New Config and rebuild parity.  As bjp999 noted, you could use a new parity disk in the process, which would give you the ability to completely restore the original configuration should anything go awry during the new parity build (providing you don't do any writes to the array during that process).

 

For those that DO want to remove a disk without losing parity along the way, the process is already fairly well documented -- it's just (a) not a GUI function; and (b) DOES require that you are CAREFUL when using the command line tools needed to do this -- e.g. don't zero a disk with data you need to keep.

 

Link to comment

I think "Remove a Drive" would be a nice feature, potentially with three flavors - drop the drive without preserving parity, copy the data off the drive and drop it from the array without preserving parity, and move the data off the drive, zero it, and drop it from the array while preserving parity.  All these things can be done, but I'd argue they are basic array functions that shouldn't require a new config.  And while reducing the capacity of an array doesn't happen very often, removing a drive isn't just associated with reducing capacity.

 

Further bulletproofing (and frankly handholding) the user experience of the more complicated NAS functions could help reduce risk for the non-enthusiast users who just use the system and drop by the forums once in a while.

Link to comment
3 hours ago, tdallen said:

Further bulletproofing (and frankly handholding) the user experience of the more complicated NAS functions could help reduce risk for the non-enthusiast users who just use the system and drop by the forums once in a while.

 

This. This is what I was getting at. unRAID already does most of the stuff people here need, but if Lime Tech wants to target more of the mass market, there's a bunch of hand-holding and graceful exit stuff that has to be in place first. This topic is a prime example of one of those things.

Link to comment

A "Remove a Drive" feature is conceptually very simple for the NAS, but is complicated by active Dockers and/or VM's that might be using the drive.

 

Basically, removing an empty drive without impacting parity is very easy -- just zero the drive, then remove that drive from the config ... this could be easily implemented; and would indeed be nice if it was a built-in function as it would eliminate the need to use DD and the need for a new config (since UnRAID could simply remove the drive from the current config).

 

The next step -- removing a non-empty drive -- is more complex.   It would require an initial step of moving all of the current data on the drive to another location in the array, while also flagging the drive so no further writes are allowed to it.    This could also impact the functionality of any Dockers or VM's that require that drive.   Personally I think the simple "Remove a Drive" that's empty is all that's needed -- it just needs to start with a BIG WARNING that the drive to be removed must be completely empty or you will LOSE ALL OF THE DATA on the drive -- and then the user can simply abort that and copy any data from the drive before using the function.    The system could even offer a "Remove a drive and rebuild parity" feature, which would NOT zero the drive before removing it and would simply update the config and rebuild parity [Thus not requiring the user to do a "New Config"]

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.