Shrinking my array

rbroberts · June 21, 2018

I'm staring at this page and not really sure I'm following:

https://lime-technology.com/wiki/Shrink_array

I have a failed drive. I have enough space on other drives to not need to replace the drive. I'd like to not replace it because I really want to follow up later with slowly replacing my drives with bigger drives. But first I need to shrink the array while preserving the data I have.

The failed drive has about 2TB of data on it (1/2 full).

I'm running unraid 6.4.0.

The steps confuse me because...right now my failed drive is backed by the parity drive so it can be emulated. When I hit step 8 and uncheck the "Parity is already valid" with the new configuration, it seems to me that parity will be rebuilt from the remaining 4 drives in my new configuration and my emulated drive data is just gone. should I be copying from /mnt/disk1 (the failed drive) to some open space somewhere before starting this?

SSD · June 21, 2018

10 minutes ago, rbroberts said:

I'm staring at this page and not really sure I'm following:

https://lime-technology.com/wiki/Shrink_array

I have a failed drive. I have enough space on other drives to not need to replace the drive. I'd like to not replace it because I really want to follow up later with slowly replacing my drives with bigger drives. But first I need to shrink the array while preserving the data I have.

The failed drive has about 2TB of data on it (1/2 full).

I'm running unraid 6.4.0.

The instructions do not include shrinking an array to remove a failed disk. They are focused on removing a functional drive from an array.

10 minutes ago, rbroberts said:

The steps confuse me because...right now my failed drive is backed by the parity drive so it can be emulated. When I hit step 8 and uncheck the "Parity is already valid" with the new configuration, it seems to me that parity will be rebuilt from the remaining 4 drives in my new configuration and my emulated drive data is just gone. should I be copying from /mnt/disk1 (the failed drive) to some open space somewhere before starting this?

The failed drive is simulated by parity + all other disks in the array working together. Parity itself is not a backup.

You may have already gone too far with the steps to be able to recover the data on the failed disk. But I'm not sure exactly how far you went, so there may yet be a way to recover. And even if you went farther, there still may be a way to put the array back as it was and have is simulate the failed disk.

What you should have done (and anyone who is finding this thread for instructions) is to copy the data from the failed disk (which unRAID would be simulating and it would have appeared as if it were present) to other non-failed disks in your array that had available space. Once the data is copied, you would have been able to do the new config, redefined the array omitting the failed disk, and rebuilt parity. The net affect is you would have kept all of your data but using fewer physical drives. The amount of available space would have dropped by the size of the failed disk.

Give some more details on the current state and someone may be able to assist.

JonathanM · June 21, 2018

21 minutes ago, rbroberts said:

The steps confuse me because...right now my failed drive is backed by the parity drive so it can be emulated. When I hit step 8 and uncheck the "Parity is already valid" with the new configuration, it seems to me that parity will be rebuilt from the remaining 4 drives in my new configuration and my emulated drive data is just gone. should I be copying from /mnt/disk1 (the failed drive) to some open space somewhere before starting this?

Your instincts are correct, but you are understating the risk. The parity drive does NOT hold any data on its own. ALL the remaining drives plus the parity drive are being used to emulate the failed drive. So, if you have another drive failure in this state you will lose the data on both drives.

Copying from the emulated drive to another array drive is going to be very slow, it would be faster to copy the data across the network to a local drive, then do the procedure to remove the drive and rebuild parity from the remaining drives. Once that is complete and parity is checked with zero errors, you can copy the data back to the array.

Directly quoted from the wiki you linked.

This method does not keep the drive's data within the array. If the drive to be removed has data you want to stay in the array, you must move it yourself to the other data drives. Parity will be built based entirely and only on the remaining drives and their contents.

FlorinB · June 21, 2018

10 hours ago, jonathanm said:

This method does not keep the drive's data within the array. If the drive to be removed has data you want to stay in the array, you must move it yourself to the other data drives. Parity will be built based entirely and only on the remaining drives and their contents.

There are 3 possibilities:

1. copy the emulated data from the failed disk(s) somewhere over the network

2. if you have a cache drive big enough - disable the mover and copy the emulated data temporary to the cache disk

3. if you have hot-spare disk(s) install the Unassigned Devices plugin, set Destructive Mode enabled and format one of your hot-spare disks, then copy the data from emulated disk there.

Of course after fulfilling one of the above steps you have to shrink your array, rebuild parity and copy your data back to the array.

Edited June 21, 2018 by FlorinB

rbroberts · June 21, 2018

I actually haven't done anything except change the share configurations to exclude the failed drive. I can't change the global share configuration without stopping the array which I didn't want to do. I do understand that the emulated data is not solely on the parity drive which is why the instructions didn't make sense to me. That and the instructions on that page start with the answer to "Why would you want to shrink your array?"

Maybe you have recently found a red ball on one of your drives, and you want to take it out of the array. You've got enough extra space, and don't need to replace the drive.

which is what drew me there in the first place. Given that I haven't actually done anything beyond step 1 there, I think I'm "safe" or at least as safe as I can be with a currently failed drive.

So, apart from the issue of moving it to a different network location vs copying from /mnt/disk1 to /mnt/diskN, it sounds like the next steps should be the same if I'm looking to shrink the array and preserve the emulated data.

SSD · June 21, 2018

On 6/21/2018 at 9:11 AM, rbroberts said:

I actually haven't done anything except change the share configurations to exclude the failed drive. I can't change the global share configuration without stopping the array which I didn't want to do. I do understand that the emulated data is not solely on the parity drive which is why the instructions didn't make sense to me. That and the instructions on that page start with the answer to "Why would you want to shrink your array?"

Maybe you have recently found a red ball on one of your drives, and you want to take it out of the array. You've got enough extra space, and don't need to replace the drive.

which is what drew me there in the first place. Given that I haven't actually done anything beyond step 1 there, I think I'm "safe" or at least as safe as I can be with a currently failed drive.

So, apart from the issue of moving it to a different network location vs copying from /mnt/disk1 to /mnt/diskN, it sounds like the next steps should be the same if I'm looking to shrink the array and preserve the emulated data.

Yes - seems the wiki should up updated to make this more clear. Early on I did a lot of work on the Wiki, but most of it has been rewritten or new content added, and I'm not sure who has the responsibility now. @jonp might be able to funnel this request to the right person to do the update.

Good you didn't do the new config yet.

One thing you need to know and understand is the user share copy bug, which I discovered a long while back, and due to technical reasons, cannot be fixed by LimeTech as I had hoped. Steps have been taken to avoid this issue, but your situation is particularly susceptible to encountering this bug.

Here is what appears to be a perfectly valid thing for you to do, but this WILL result in losing a lot of data in a hurry.

(Again, do not do this!) Remove the failed disk from the user share configuration. Then copy all the data from the disk share folder (e.g., /mnt/disk4/Movies to the user share (/mnt/Movies)

The reason this will not work is that disks excluded (meaning explicitly excluded or not included [you should only use included or excluded, not both BTW]) does NOT truly exclude the disks from the user share except in one very specific use case. If a file is copied to a user share which overwrites a file, that file will be overwritten on whatever disk it is present, even if that disk is excluded in the user share. And even if the file really is a new file, if the split level will force that new file onto a specific disk, even if that disk is excluded. Only if a new file is being copied that does not exist and split level is not impacting its placement - only in that situation do the excluded/included disk configurations to come into play. Also, when you browse to a user share, it is going to show content from all disks in the array that contain the root level folder for the user share, regardless of the include/exclude share configuration. The only way to stop these user share behaviors is to globally exclude a disk from the user share feature. Once you do that the disk will be ignored for anything user share related. You would not be able to have any user shares on that disk.

So if you copy (or move) a file from the user share directory on a disk share to the user share, the user share will think you are overwriting an existing file. So you are basically trying to copy a file overtop of itself. Normally the operating system prevents you from doing something like that - you'd get a "can't copy a file to itself" error and the OS would prevent it before it tried. But in this situation, the OS does not realize what the user share is doing under the covers. And it will not prevent the operation. So it will try to copy the file, and immediately clobber the source. With the source gone, the copy fails, and the contents of that file are lost. Say you are copying (or moving) 500 files that take up 1T of source files, you might think you'd somehow realize what was happening and stop it. But in truth unRAID would wipe out those 500 files very very quickly. Only the first block of each would be attempted, the copy would fail, and then on to the next file.

A rule of thumb is to always copy disk share to disk share, or user share to user share. Do not mix.

But I'll give a tip that would allow you to safely copy from a disk share to a user share. Go to the disk share, and RENAME the root level folder. Say it was called "Movies". Change it to "X" or "MovieTemp" or anything that is different from "Movies" and not the name of some other user share. This will instantly separate the files on that disk from the "Movies" user share, and temporarily create a new user share with the name you gave. You also need to make sure that the user share configuration excludes (or does not include) that disk. You can then copy from that disk share to the user share. Or, copy from the user share "X" or "MovieTemp" or whatever you call it, to the "Movies" user share. This would result in all of the Movies being copied to one of the currently configured disks in that share.

It is not necessary to Move, Copy is fine. The disks is being simulated, and when you reconfigure you array, any files on that disk will be poofed out of existance. Moving requires deleting the file on the simulated disk - which would unnecessarily waste time - potentially a lot of it.

Post back with any questions.

(#ssdindex - User share behavior, user share copy bug)

FlorinB · June 21, 2018

4 hours ago, SSD said:

So if you copy (or move) a file from a disk share to a user share, the user share will think you are overwriting an existing file. So you are basically trying to copy a file overtop of itself[...]

A rule of thumb is to always copy disk share to disk share, or user share to user share. Do not mix.

Very useful rule. I was already doing this intuitively, but I was not yet in a situation to have one or more disks failed.

4 hours ago, SSD said:

But I'll give a tip that would allow you to safely copy from a disk share to a user share. Go to the disk share, and RENAME the root level folder. Say it was called "Movies". Change it to "X" or "MovieTemp" or anything that is different from "Movies" and not the name of some other user share. [...] You also need to make sure that the user share configuration excludes (or does not include) that disk. You can then copy from that disk share to the user share. Or, copy from the user share "X" or "MovieTemp" or whatever you call it, to the "Movies" user share.

We are realizing that copying new data with some disks broken will be slower, but if there is no network share or unused disk available that is the only way.

Do we have to test or is working 100%?

4 hours ago, SSD said:

It is not necessary to Move, Copy is fine.

Move from the broken disk is the last thing you want...It must be Copy!

If this is working as you said SSD, definitely should be included into the unRaid manual under a section like Shrink Array with Broken Disk(s).

Edited June 21, 2018 by FlorinB

SSD · June 21, 2018

5 minutes ago, FlorinB said:

Very useful rule. I was already doing this intuitively, but I was not yet in a situation to have one or more disks failed.

We are realizing that copying new data with some disks broken will be slower, but if there is no network share or unused disk available that is the only way.

Actually the speed should not be so much slower than normal copy operation. If you copied to a separate network share, and then had to copy it from there back to the array, I expect it would be slower. This is a one-time thing - he can just start it and let it run overnight.

Quote

Do we have to test or is working 100%?

He can copy a few files and then attempt to access/play them from their destination. Could also check MD5s (that's what I typically do).

Quote

Move from the broken disk is the last thing you want...It must be Copy!

Not sure why you say that. You're not moving from a broken disk - you are moving from a simulated disk - which basically means some updates to parity. And very minimal since deleting a file just marks it as deleted. But some file systems (RFS) are very slow to delete for some reason. But the copy is preferred because it is faster and he won't be loosing access to the files in case he wants to do the md5 verification.

Quote

If this is working as you said SSD, definitely should be included into the unRaid manual under a section like Shrink Array with Broken Disk(s).

Hope jonp got the notification and asks someone working on the wiki to make the update.

FlorinB · June 21, 2018

26 minutes ago, SSD said:

Not sure why you say that. You're not moving from a broken disk - you are moving from a simulated disk - which basically means some updates to parity. And very minimal since deleting a file just marks it as deleted.

I mean that Copy instead of Move would be preffered. In case something goes wrong, with Copy you still have the data from the broken/simulated disk unaltered.

Edited June 21, 2018 by FlorinB

rbroberts · June 24, 2018

Thanks for all the feedback. I've successfully copied my files from the emulated disk and reconfigured the array. So far it looks good and all my data appear to be there. The parity resync is in progress.

rbroberts · June 24, 2018

Interesting. I can see my data, but somehow my docker apps and vms all disappeared from the Docker and VMs tabs.

For the docker apps, I can see the data still in appdata, so I'm not quite sure what I lost. I'm going to have to poke, but at this point there's no going back since the array has been reconfigured.

rbroberts · June 24, 2018

And...it looks like if I reinstall, it remembers my old configurations. For example, for RDP-Calibre, I can select template my-RDP-Calibre and all the paths come back. So I think this should be pretty painless, at least for the docker apps.

Shrinking my array

Recommended Posts

rbroberts

Link to comment

SSD

Link to comment

JonathanM

Link to comment

FlorinB

Link to comment

rbroberts

Link to comment

SSD

Link to comment

FlorinB

Link to comment

SSD

Link to comment

FlorinB

Link to comment

rbroberts

Link to comment

rbroberts

Link to comment

rbroberts

Link to comment

Join the conversation