EMPTY and Remove a Drive Without Losing Parity


NAS

Recommended Posts

You guys type too fast.  ;)

I've been trying to get this into here for 10 minutes:


 

I'm not sure it's worth the bother, however => reducing the drive count doesn't seem like a very common task;

As a lurker, I'd like to point out:

1. I can foresee several reasons I might need to do it. I've never tried removing a drive, but I can see myself in situations where I could do that--either deliberately or out of urgent need. (for example, if my unRAID failed because of the sole drive that contains my TM backups, I could see myself trying to remove just the TM drive. (no lectures, please...its just that I may not be able to afford the downtime on the OTHER disks.)

2. What I know about unRAID I get from reading the forums. The question of 'how do I do this' is asked at least once a month in the last six. On all those threads, I keep quiet.

3. The process scares the h*ll out of me.  I'm good with directions, but I like to know what the 'contingency' is at each step in an operation.  In the event of error, at each step, I want know whether to at least (1) sit and wait, or (2) pull the power cord from the wall, or (3) something in between.

 

I haven't really thought through the logic of removing a drive, but from what I have read, Its Scary As Hell.  And for a lot of the folks who are even less technical than I am, its gotta be heart stopping. From their point of view, its not just the 'removal of the drive', its the 'I could screw up the entire array and lose everything--after all, I'm "removing data and hardware from my array! Physical AND logical stuff could go wrong." '

 


More thoughts:

My point isn't about the 'priority' of Tom's dev efforts. I'm not in a position to comment on those.

But what I do notice is that this need to reduce drives CAN happen, DOES come up, and SHOULD have a 'best practices' procedure defined.  Users need to understand BEFOREHAND at what points they are at risk, and what are the possible consequences of a misstep, or a failure, so they can plan accordingly.

Link to comment

... You should NEVER  NEVER NEVER have to run without parity. ...

... you need to run for many hours in a failed parity state ...

...

I agree with NAS and garycase, we should not run without parity protection for routine procedures. The only time we lose parity protection is if a drive really fails. In that scenario we cannot be protected until the recovery is complete.

 

But in every other circumstance, we CAN BE protected, including removing a disk and replacing a disk, if we know how. And it only takes a handful of steps. The key is the know how, which is our job as moderators and experienced unRAIDers to share.

 

TO REMOVE A DISK WITHOUT LOOSING PARITY

Simply fill the disk to be removed with zeroes, and then use the trust parity trick to redefine the array without the zeroed disk. This is a safe procedure (if the machine were to crash you could resume it later). I created a step by step post to do this that you can find in the "Best of the Forums" link in my sig.

 

TO UPGRADE A DISK WITHOUT LOOSING PARITY

You really ARE protected using the regular replace a disk process, so long as you don't write to the array during the rebuild process or mess with the removed disk. If a drive were to fail in the middle, you could use the trust parity procedure to reassemble the array using the removed disk instead of the partially rebuilt new disk, and rebuild the other disk that failed. One can argue that there is no more stressful activity that the system does than rebuild a failed drive. It depends on every drive to be spinning and one drive being written completely. Running a parity check before starting definitely helps take the risk of a failure in the middle.

 

But there is another safer less stressful way. And you wouldn't necessarily need to run a parity check first. Install the new disk into the machine. Preclear it (standard with any new disk but a necessity here). Copy the old disk (the one you want to replace) to the new disk sector by sector using dd. (Can a Linux expert provide the command to do this?) The array can be freely used the whole time so long as you do not write to the old disk.  Once complete, use the trust parity procedure to reassemble the array with the new disk replacing the old disk. This will be the last step if the replacement disk is exactly the same size as the disk it replaced, but if you are upsizing there is an additional step - to expand the file system on the new disk to take up the whole drive. This must be done after the disk is added to the array. (Linux expert - please provide the syntax for this command).

 

TO REPLACE TWO DISKS IN AN ARRAY WITH A SINGLE LARGER DISK

- Follow the steps above TO UPGRADE A DISK WITHOUT LOOSING PARITY on the first disk to remove

- Copy the files from the second disk you want to remove to the new disk (which is now in the array)

- Follow the steps above TO REMOVE A DISK WITHOUT LOOSING PARITY to remove the second disk

 

Are these procedures too hard to do? Am I missing something?

Link to comment

TO REMOVE A DISK WITHOUT LOOSING PARITY

Requires uses to SSH and dd. Neither are a requirement for any other official uNRAID procedure. Users fundamentally should not be messing with dd ever. Cant stress this enough. Its a recipe for disaster.

 

TO UPGRADE A DISK WITHOUT LOOSING PARITY

Consistently and well presented. Valid points but equally its is still convoluted and again requiring SSH maybe dd and/or very specific steps. It is the best current compromise. But its fugly.

 

TO REPLACE TWO DISKS IN AN ARRAY WITH A SINGLE LARGER DISK

Again its fugly :0

 

 

I really dont get the push back on this, even Tom has said its a good idea and promoted a better logical and elegant way to do it.

 

Are there more important features needed (sure depending on you point of view) but that is always the case with any thing and especially with unRAID.

 

I was just asking for a progress report on the ALREADY agreed upon approach :)

 

 

 

Link to comment

I find this thread very educational. Seriously. I hope I never need it.

And I don't mean to hijack it, but this 'variation on the theme' just popped up.

http://lime-technology.com/forum/index.php?topic=32641

 

This type of thing happens here all of the time. Sometimes user error, sometimes loose cable, sometimes flakey drives, sometimes .. shit happens and we don't know what happened. Being educated is your best defense. I hope the parity rebuild goes well for this guy, because if the drive he wants to upsize is bad, the parity build will fail. And recovery may be negatively impacted. Using "set invalidslot 99" after the new config would have reset the current parity as valid, and allowed the disk to rebuild without having to recreate parity that was probably good. But if the parity build works, rebuilding the drive will be perfect. So the user had options - he didn't understand them and was given advise he took. The advise was very reasonable. But I would prefer that people know unRAID well enough to come up with some of the options on their own, and use the forum for other ideas, finalize the decision, and ask for syntax help if needed.

 

I have survived some of the worst disasters in unRAID history with my old array. I learned a lot in the process. Because I understand logically what unRAID does, I can think logically through solutions. I don't always know how to make my solutions work (i.e., I don't know the commands), but I do know what I want to do and Linux experts here are usually able to fill in the blanks.

 

TO REMOVE A DISK WITHOUT LOOSING PARITY

Requires uses to SSH and dd. Neither are a requirement for any other official uNRAID procedure. Users fundamentally should not be messing with dd ever. Cant stress this enough. Its a recipe for disaster.

 

TO UPGRADE A DISK WITHOUT LOOSING PARITY

Consistently and well presented. Valid points but equally its is still convoluted and again requiring SSH maybe dd and/or very specific steps. It is the best current compromise. But its fugly.

 

TO REPLACE TWO DISKS IN AN ARRAY WITH A SINGLE LARGER DISK

Again its fugly :0

 

I really dont get the push back on this, even Tom has said its a good idea and promoted a better logical and elegant way to do it.

 

Are there more important features needed (sure depending on you point of view) but that is always the case with any thing and especially with unRAID.

 

I was just asking for a progress report on the ALREADY agreed upon approach :)

 

I love unRAID, but have long ago given up on Tom implementing these types of idiot-proof recovery features into unRAID. If he does it, great. If he doesn't, I have my workarounds.

 

Fugly can be beautiful if it saves your ass. :o

Link to comment

Another simple way to replace a disk without impacting parity is to add the new disk to the array (pre-cleared of course);  MOVE all of the contents of the old disk to the new disk;  then write zeroes to the old disk (per bjp999's notes) ... and then simply do a New Config with the Trust Parity option (excluding the old disk).

 

I certainly agree the safest way to remove a disk is to fill it with zeroes; then do a New Config/Trust Parity.  But I really don't think the risk of simply doing the New Config and rebuilding parity is significant, as long as you've just checked parity before proceeding (which you should do in either case).    The latter is quicker; and the total "at risk" time is relatively small (just a parity sync).

 

 

 

Link to comment

Another simple way to replace a disk without impacting parity is to add the new disk to the array (pre-cleared of course);  MOVE all of the contents of the old disk to the new disk;  then write zeroes to the old disk (per bjp999's notes) ... and then simply do a New Config with the Trust Parity option (excluding the old disk).

 

I certainly agree the safest way to remove a disk is to fill it with zeroes; then do a New Config/Trust Parity.  But I really don't think the risk of simply doing the New Config and rebuilding parity is significant, as long as you've just checked parity before proceeding (which you should do in either case).    The latter is quicker; and the total "at risk" time is relatively small (just a parity sync).

 

Upsizing a disk (replacing a disk and doing a rebuild) provides a way to recover in the event another drive fails during the rebuild, so long as no writes are done to the array while the rebuild occurs. So that is what I do.

 

But removing a drive from the array (a very infrequent activity for me) and rebuilding parity afterwards is not a safe procedure. I would tend to use the zeroing method. Of course I do not have the full array backup that you have, so my assessment of risk is slightly different than yours :). Rebuilding parity is something I do very sparingly.

Link to comment

Hi guys,

I have read this thread with interest & I have basically done this already.

 

I am at a point I only wanting to have Parity & 13 disks, I have rethought my desire to build a 24 drive server.

both servers are now on 14 total (1stower with a 4tb several 2 & 1.5, while stower02 is all 2tb).

I bought numerous 2 & 4 way sata cards pci & pci e.

but I found long ago on number 2 that at I stage I added a 15 drive I had issues (6 sata on mb & 8 on supermicro card) without putting anything on that drive I wanted to scale back asap.

as I usually back my usb before & after doing something.

I removed the number 15 &  reset my usb to the before state then did a parity check.

it was a little tricky but reduced the array by 1 drive in the end.

 

I have had up to 19 (4-5 warm spares) in stower02, I have now disconnected all spares & will only add them when I need to replace drives.

(see toms gallery redlaws modified from a duplicating case that I could not mount a m/b in until I mounted a m/b tray in it, now the tray extends 50mm from the rear, I can mount 5 4:3 coolermaster drive bays in there but on my 1stower I figured out how to mount 10 drives into 2.5 bays plus add 2 fans in front of them).

 

 

I done a New configuration after 2 drives had reiserfsck --rebuild-tree errors I paniced & wanted then out incase they failed (I would now rebuild them in the array should be better for recovery & less file errors).

 

just my 2 cents worth Steve from Australia

Link to comment

Another simple way to replace a disk without impacting parity is to add the new disk to the array (pre-cleared of course);  MOVE all of the contents of the old disk to the new disk;  then write zeroes to the old disk (per bjp999's notes) ... and then simply do a New Config with the Trust Parity option (excluding the old disk).

 

I certainly agree the safest way to remove a disk is to fill it with zeroes; then do a New Config/Trust Parity.  But I really don't think the risk of simply doing the New Config and rebuilding parity is significant, as long as you've just checked parity before proceeding (which you should do in either case).    The latter is quicker; and the total "at risk" time is relatively small (just a parity sync).

 

If a tool like preclear existed which essentially hides dd from the end user but more importantly has checks and ballances ot make sure you average user is less likely to dd somehtin they need I would tend to agree with you. I absolutely am sure normal users should never type dd ever. It is simply too dangerous.

 

But even if we assume this tool magically popped into existence along side the whale and petunias it is still far from ideal as it requires the entire array to be in maintenance mode (antyhing other than maintenance mode is too risky).

 

 

Upsizing a disk (replacing a disk and doing a rebuild) provides a way to recover in the event another drive fails during the rebuild, so long as no writes are done to the array while the rebuild occurs. So that is what I do.

 

Do you use maintenance mode or risk it?

 

But removing a drive from the array (a very infrequent activity for me) and rebuilding parity afterwards is not a safe procedure. I would tend to use the zeroing method. Of course I do not have the full array backup that you have, so my assessment of risk is slightly different than yours :). Rebuilding parity is something I do very sparingly.

Agreed. It is an EASY procedure but its just wrong to fail parity.

 

...

I am at a point I only wanting to have Parity & 13 disks, I have rethought my desire to build a 24 drive server....

The sentiment I take from your post is that you could/would have used a tool that lets you manipulate the array members a bit better whilst maintaining parity.

 

And that is the crux of this whole thing. There are several very cool things that could be done with unRAID precisly becuase it has such a unique RAID model to allow array manipulation beyind even what we are discussnig here. But they need to be native (emHTTP) or semi native (addon/script) for them to viable.

Link to comment

But they need to be native (emHTTP) or semi native (addon/script) for them to viable.

 

I certainly agree this is very desirable.  Clearly tools that can be safely run without requiring any knowledge of the underlying Linux commands are far preferable to requiring things to be run from the console (or via telnet or screen).

 

Link to comment

But they need to be native (emHTTP) or semi native (addon/script) for them to viable.

 

I certainly agree this is very desirable.  Clearly tools that can be safely run without requiring any knowledge of the underlying Linux commands are far preferable to requiring things to be run from the console (or via telnet or screen).

I agree in theory.

 

BUT - they don't exist. So in their absence what do we do? Is it better to go to the command line requiring a more educated and technically proficient user, or use a documented feature that is less secure? Each user can decide.

Link to comment
  • 2 months later...

The dd command is not so scary. If you understand that a drive full of binary zeros is invisible from a parity perspective (you can slip a drive into or out of an array if it is full of zeros), you can understand how this works. You have to copy everything off of the disk you want to remove (do not copy the data to a user share as there is a nasty bug that can bite you in the butt - always copy from a disk share to a disk share).

 

They you use the oh so scary  ::) dd command to write binary zeros to the whole drive. While you are writing the zeroes, the parity is being updated to what it should be if that disk is removed from the array. When the disk is full of zeroes you can stop the array, create a new configuration, put all of the disks back in their proper locations (omitting the one you want to remove), and, most importantly, tell unRAID to trust parity. Then when you start the array all is well. A quick non-correcting parity check will confirm that it worked - get a few hundred megs with no parity errors and you are golden.

Link to comment
They you use the oh so scary  ::) dd command to write binary zeros to the whole drive.
dd is scary only because people are able to apply it to the wrong drive with no warnings or second chances. If you would write a wrapper script that checked the contents of the drive and only went through with the erase if there were no files and no directories in the target drive filesystem, then it would cease to be scary. You could even make it a button in mymain.
Link to comment

They you use the oh so scary  ::) dd command to write binary zeros to the whole drive.
dd is scary only because people are able to apply it to the wrong drive with no warnings or second chances. If you would write a wrapper script that checked the contents of the drive and only went through with the erase if there were no files and no directories in the target drive filesystem, then it would cease to be scary. You could even make it a button in mymain.

 

myMain looks but doesn't touch. Even if you have no idea what you are doing, you not could damage your array using myMain. I have no plans to add features that can, when used improperly, result in data loss.

 

This sounds like a possibility for a user script.

Link to comment
myMain looks but doesn't touch. Even if you have no idea what you are doing, you not could damage your array using myMain. I have no plans to add features that can, when used improperly, result in data loss.

 

This sounds like a possibility for a user script.

Totally agree on the philosophy of do no harm, just wondering how zeroing a drive with a checked clean mounted filesystem and no files or folders could lose data. The only corner case I can think of right now would be an accidentally formatted drive that you would rather run a rebuild to salvage data instead of zeroing and removing it. Actually, now that I think that through, it's not that hard to force the accidental formatting of a drive because of unraid's brain dead handling of an unmountable volume. Never mind. Protecting ignorant users is hard.  ;D
Link to comment

I still maintain it is a short hope for unRAID to support this natively rather than trying to teach users how to use dd without causing havoc.

 

Most of the steps that need done to have this feature unRAID does already in a different context.

 

It is a shame user likely wont have this any time soon as it also gives you a secure delete option useful for selling on drives and also a poor mans preclear.

 

 

Link to comment

While it DOES entail a bit of risk;  I really think this is a rare enough occurrence that if you simply do a current parity check;  ensure all is well (no sync or drive errors);  and then just do a New Config, this is the easiest way for folks to remove drives from an array.

 

Yes, it's then "at risk" with no parity for the duration of the parity sync -- but all the drives were just checked with the parity check before doing this, so it's a very small risk.    And if something DOES go awry, that's what backups are for  :)

 

Those that don't want to take that small risk simply need to study enough about dd to use it safely.

 

And if you're not removing a drive ... but simply "upsizing it", then just replacing it and letting UnRAID do a rebuild doesn't cause any loss of parity protection IF you don't touch the old drive until the rebuild is completed.    [it does require knowledge of how to reconstitute the array with the "Trust Parity" option in the event of failure]

 

 

Link to comment

While it DOES entail a bit of risk;  I really think this is a rare enough occurrence that if you simply do a current parity check;  ensure all is well (no sync or drive errors);  and then just do a New Config, this is the easiest way for folks to remove drives from an array.

 

Yes, it's then "at risk" with no parity for the duration of the parity sync -- but all the drives were just checked with the parity check before doing this, so it's a very small risk.    And if something DOES go awry, that's what backups are for  :)

 

Those that don't want to take that small risk simply need to study enough about dd to use it safely.

 

And if you're not removing a drive ... but simply "upsizing it", then just replacing it and letting UnRAID do a rebuild doesn't cause any loss of parity protection IF you don't touch the old drive until the rebuild is completed.    [it does require knowledge of how to reconstitute the array with the "Trust Parity" option in the event of failure]

 

Completely agree.

 

However, this approach is exactly what this thread is about trying to avoid.

 

Edit: Re-read that and I unintentionally was a bit curt. My point is simply that this thread is specifically dealing with disk changes that can be done with no parity downtime :)

Link to comment
  • 1 year later...

Apologies, if they are needed, for resurrecting an old thread. My reason is that I read it through from beginning to end, along with the thread that preceded it, with great interest and anticipation. But it went nowhere and petered out just like so many promising threads on this board, which is incredibly frustrating. Is there any chance of a quick update from someone who knows, please? What is the situation with version 6.0.1 in 2015? It seems Tom made a commitment to make disk removal a safe procedure (I'm in agreement with NAS about it being fundamentally wrong to break parity, btw). Did it even get implemented?

 

Link to comment

Apologies, if they are needed, for resurrecting an old thread. My reason is that I read it through from beginning to end, along with the thread that preceded it, with great interest and anticipation. But it went nowhere and petered out just like so many promising threads on this board, which is incredibly frustrating. Is there any chance of a quick update from someone who knows, please? What is the situation with version 6.0.1 in 2015? It seems Tom made a commitment to make disk removal a safe procedure (I'm in agreement with NAS about it being fundamentally wrong to break parity, btw). Did it even get implemented?

No it has not been implemented.  I believe it is still a roadmap item but without an ETA.

Link to comment

I see this as less of an advantage than one might expect.   

 

To be able to remove a drive successfully all other drives and the parity drive need to be OK.  The current alternative of doing a 'New Config' and then rebuilding parity from the remaining drives does leave a little window where another failure could cause a problem, but that was just as likely to have happened when trying to remove the drive (which would probably take the same length of time as a parity rebuild). 

 

Also rather like a current clear it is likely that the array would be offline while removing the drive as unRAID wrote zeroes to every sector.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.