Rebalance data drives.


Recommended Posts

Whenever I add a new drive, I end up with a situation where I have something like:

 

disk1 90% full

disk2 90% full

disk3 90% full

disk4 0% full

 

I'd like to see a function (in the form of a script or a button or something) that moves files around so that in the end I end up with:

 

disk1 67.5% full

disk2 67.5% full

disk3 67.5% full

disk4 67.5% full

 

Hope this is something that can be scheduled.  Maybe it can be done entirely within userland shell scripting.

Link to comment

Why is this a problem?

Well why does UNRAID fill drives fairly equally?

 

It makes sure everything isn't just on one drive, which would then fail earlier through greater use. The end part of the drive is also the slowest to access, etc.

 

Any mover would need to take account of the allocation mechanism that keeps certain levels of the directory structure together, but levelling the usage would be good. Maybe could be combined with 'scrub' functionality to check for bit rot etc.

Link to comment

Why is this a problem?

It's a feature request.  It also acts towards load balancing - which is good.

 

It also would help when I have drives that are 90%+ full.  Personally, I currently have 4 drives at 99%, and one drive at 8%. 

 

Many filesystems (and I don't know the specifics for XFS, BTRFS, and ReiserFS) also hit performance issues when you get close to capacity.

 

Mostly it would just make me happy.  I should be able to work out something via bash I'm sure.  I'll post it in here if when I manage to build something useful.

Link to comment

Why is this a problem?

It's a feature request.  It also acts towards load balancing - which is good.

 

It also would help when I have drives that are 90%+ full.  Personally, I currently have 4 drives at 99%, and one drive at 8%. 

 

Many filesystems (and I don't know the specifics for XFS, BTRFS, and ReiserFS) also hit performance issues when you get close to capacity.

 

Mostly it would just make me happy.  I should be able to work out something via bash I'm sure.  I'll post it in here if when I manage to build something useful.

 

There is a benefit to having disks "fill up" and then become (mostly) read-only going forward. If you wanted to create PAR blocks, for example, a full drive would be a good thing to create them for. For backup purposes, it is also useful to only have to focus on a single disk as the source of new files.

 

It is true that drives can become too full and impact performance. It can also impact the ability to run recovery tools.

 

I (personally) am not excited about an auto-rebalance feature, but neither do I object to your requesting it.

 

One comment - if you intend to consider manual rebalancing. DO NOT copy data from a DISK share to a user share, or vice versa. There is a bug in the user share system that can easily result in data loss. (If you think you have a clever way to get around this issue, confirm on the forum. Because the internals of user shares are confusing and whatever you are thinking may NOT avoid the data loss. You have been warned!) Re-balancing manually is easy to achieve by moving data from a DISK share to a DISK share. I suggest using Teracopy to do this move operation verifying the CRCs as it goes. Not the fastest, but leaves you with a warm feeling that all of the files were accurately moved.

Link to comment

It is true that drives can become too full and impact performance. It can also impact the ability to run recovery tools.

 

I (personally) am not excited about an auto-rebalance feature, but neither do I object to your requesting it.

 

The big benefit to me is that, given the user share allocations / split level rules it's quite easy to run out of space on a disk for specific paths of data as the split levels try to colocate everything as per the rules.

 

At which point you need to manually shuffle data around the disks to try and alleviate space.

 

You can either do that or change the split levels but there is a middle ground there. It does mean, I think, any sane rebalancing script needs to be aware of and take into account split level rules in effect and treat data affected specially. Which complicates the process ten fold.

Link to comment

I didn't want to go too far with this since I'm unsure how to address split level data, and I really just want a quick fix, so I wrote one that could be used manually to do cleanup, but should stop when a disk is down to 90% full (my disks are 99% full).

 

#!/bin/bash
DISK=disk2
TARGET=disk5
FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'`
while [ $FREE -gt '90' ] ; do
        cd /mnt/$DISK/Movies/
        for i in "`ls -1 | head -n 1`" ; do
                df -h /mnt/$DISK
                echo "$i"
                mv -v /mnt/$DISK/Movies/"$i" /mnt/$TARGET/Movies/
                FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'`
        done
done

 

There is code cleanup to be done (I think I can get rid of the FREE variable and just inline it), and it currently only works in my movies directory (which seems safe!).  Maybe this will be of use to others.

 

Maybe rebalance isn't the best term for what I need to happen (vs what I would like to happen) which is pressure relief.

Link to comment

When I needed to clean up my data ( due to split levels not being set after a new config), I just copied things in 1TB chunks to the cache and it the mover sort it out.

For the benefit of others just watching this thread, if you move data to the cache drive, you should always use /mnt/disk? or \\tower\disk? and /mnt/cache or \\tower\cache paths to accomplish this. Using /mnt/user/share and /mnt/user0/share or \\tower\share is VERY DANGEROUS because of the way user shares are handled right now. You could easily lose data if you use user shares to move things around.
Link to comment

The easiest way to avoid this is to simply not wait until your drives are so full before adding additional storage.    If you add another drive when your average hits, for example, 80%, then (assuming you're using high water) you'll never have to worry about getting the drives to full you feel the need to move data around.  In other words,  UnRAID already has a method for balancing the drives -- high water allocation.  You simply have to provide enough drives for this to work as designed, instead of only adding drives after your current ones are nearly full.

 

Personally, I have no issue with filling up drives -- there's no performance issue for reads ... only for writes; and none of the full drives is ever written to, so it's not an issue.  But if you prefer to see "balanced" drives, there's nothing wrong with that.

 

Incidentally, I don't agree with the rationale that balancing "... makes sure everything isn't just on one drive, which would then fail earlier through greater use ..."  ==> drives are designed to be used; and in fact once a drive is full it's most likely going to get far less use. 

 

Link to comment

The easiest way to avoid this is to simply not wait until your drives are so full before adding additional storage.    If you add another drive when your average hits, for example, 80%, then (assuming you're using high water) you'll never have to worry about getting the drives to full you feel the need to move data around.  In other words,  UnRAID already has a method for balancing the drives -- high water allocation.  You simply have to provide enough drives for this to work as designed, instead of only adding drives after your current ones are nearly full.

 

Personally, I have no issue with filling up drives -- there's no performance issue for reads ... only for writes; and none of the full drives is ever written to, so it's not an issue.  But if you prefer to see "balanced" drives, there's nothing wrong with that.

 

Incidentally, I don't agree with the rationale that balancing "... makes sure everything isn't just on one drive, which would then fail earlier through greater use ..."  ==> drives are designed to be used; and in fact once a drive is full it's most likely going to get far less use.

 

It was simply a question of me not paying attention to my free space.  As I said, I've worked around the problem this way:

 

#!/bin/bash
DISK=disk2
TARGET=disk5
SHARE=Anime-Series
FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'`
while [ $FREE -gt '90' ] ; do
        cd /mnt/$DISK/$SHARE/
        for i in "`ls -1 | head -n 1`" ; do
                df -h /mnt/$DISK
                echo "$i"
                mv -v /mnt/$DISK/$SHARE/"$i" /mnt/$TARGET/$SHARE/
                FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'`
        done
done

 

This is updated, and it just takes a little babysitting, but otherwise seems to work well. 

 

I'd love to see something a bit more integrated in the future though.  Maybe something that honors and fixes split levels for directories that didn't originally have one set properly.

 

I'm installing a disk6 next week that should handle my free space problems for a while.

Link to comment

My suggestion would be to use rsync and not mv.

 

 

With rsync you can use --remove-source-files it will remove the source file after successful move.

it will not delete empty directories, but that can be done at the end. (see the mover).

 

 

There's a weird bug that crops up when you do a move to a disk share that can cause truncation of the destination file.

With rsync, the file is first copied to a temp file before being moved into place.

 

 

I don't know if this bug will rear it's ugly head with that script.

 

 

User Share Copy Bug

http://lime-technology.com/forum/index.php?topic=34480.msg320517#msg320517

Link to comment

I will keep that in mind.  I may be more likely to steal some of the code from mover:

 

    find "./$Share" -depth \( \( -type f ! -exec fuser -s {} \; \) -o \( -type d -empty \) \) -print \
      -exec rsync -i -dIWRpEAXogt --numeric-ids --inplace {} /mnt/user0/ \; -delete

 

I may be kept up at night trying to grok that find statement though.

Link to comment

Figured out the find command, I think.  At least well enough that it's looking files not in use or empty directories.

 

Listing out that bear of an rsync though:

 

-i, --itemize-changes      output a change-summary for all updates

-d, --dirs                  transfer directories without recursing

-I, --ignore-times          don't skip files that match in size and mod-time

-W, --whole-file            copy files whole (without delta-xfer algorithm)

-R, --relative              use relative path names

-p, --perms                preserve permissions

-E, --executability        preserve the file's executability

-A, --acls                  preserve ACLs (implies --perms)

-X, --xattrs                preserve extended attributes

-o, --owner                preserve owner (super-user only)

-g, --group                preserve group

-t, --times                preserve modification times

    --numeric-ids          don't map uid/gid values by user/group name

    --inplace              update destination files in-place (SEE MAN PAGE)

 

Link to comment

I've often wanted a 'defrag' of folders kind of function- where I had a folder of smaller size at one time, and as I added files it got much bigger. Then when I access that directory all my drives are spinning up (due to configuration). The rebalance is sort of the opposite- but still in the same vein of figuring out how to manage files in a user share.

Link to comment

I've often wanted a 'defrag' of folders kind of function- where I had a folder of smaller size at one time, and as I added files it got much bigger. Then when I access that directory all my drives are spinning up (due to configuration). The rebalance is sort of the opposite- but still in the same vein of figuring out how to manage files in a user share.

 

I think this use case is much more common. I certainly ballance manually because of this. It is surprisingly tricky to do though because of the time it takes to run

Link to comment

I suspect that there's a central methodology that could be used to do this, and then it could be used for things like rebalance and reprotect.

 

One of my problems is that I don't know where the open source parts of this project end and the closed source parts begin.

 

Is there a resource about that?

 

I may have to spend some time digging about the wiki.

 

EDIT: My assumption is that EMHTTP and SHFS are LimeTech, and everything else is considered open source or user contributed plugin.  I don't know if there's something else I'm missing though.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.