bmfrosty Posted November 3, 2014 Share Posted November 3, 2014 Whenever I add a new drive, I end up with a situation where I have something like: disk1 90% full disk2 90% full disk3 90% full disk4 0% full I'd like to see a function (in the form of a script or a button or something) that moves files around so that in the end I end up with: disk1 67.5% full disk2 67.5% full disk3 67.5% full disk4 67.5% full Hope this is something that can be scheduled. Maybe it can be done entirely within userland shell scripting. Quote Link to comment
SSD Posted November 3, 2014 Share Posted November 3, 2014 Why is this a problem? Quote Link to comment
sane Posted November 3, 2014 Share Posted November 3, 2014 Why is this a problem? Well why does UNRAID fill drives fairly equally? It makes sure everything isn't just on one drive, which would then fail earlier through greater use. The end part of the drive is also the slowest to access, etc. Any mover would need to take account of the allocation mechanism that keeps certain levels of the directory structure together, but levelling the usage would be good. Maybe could be combined with 'scrub' functionality to check for bit rot etc. Quote Link to comment
lionelhutz Posted November 3, 2014 Share Posted November 3, 2014 Well why does UNRAID fill drives fairly equally? Because you set the user share that way. You can also set it to fill each drive. Quote Link to comment
bmfrosty Posted November 4, 2014 Author Share Posted November 4, 2014 Why is this a problem? It's a feature request. It also acts towards load balancing - which is good. It also would help when I have drives that are 90%+ full. Personally, I currently have 4 drives at 99%, and one drive at 8%. Many filesystems (and I don't know the specifics for XFS, BTRFS, and ReiserFS) also hit performance issues when you get close to capacity. Mostly it would just make me happy. I should be able to work out something via bash I'm sure. I'll post it in here if when I manage to build something useful. Quote Link to comment
SSD Posted November 4, 2014 Share Posted November 4, 2014 Why is this a problem? It's a feature request. It also acts towards load balancing - which is good. It also would help when I have drives that are 90%+ full. Personally, I currently have 4 drives at 99%, and one drive at 8%. Many filesystems (and I don't know the specifics for XFS, BTRFS, and ReiserFS) also hit performance issues when you get close to capacity. Mostly it would just make me happy. I should be able to work out something via bash I'm sure. I'll post it in here if when I manage to build something useful. There is a benefit to having disks "fill up" and then become (mostly) read-only going forward. If you wanted to create PAR blocks, for example, a full drive would be a good thing to create them for. For backup purposes, it is also useful to only have to focus on a single disk as the source of new files. It is true that drives can become too full and impact performance. It can also impact the ability to run recovery tools. I (personally) am not excited about an auto-rebalance feature, but neither do I object to your requesting it. One comment - if you intend to consider manual rebalancing. DO NOT copy data from a DISK share to a user share, or vice versa. There is a bug in the user share system that can easily result in data loss. (If you think you have a clever way to get around this issue, confirm on the forum. Because the internals of user shares are confusing and whatever you are thinking may NOT avoid the data loss. You have been warned!) Re-balancing manually is easy to achieve by moving data from a DISK share to a DISK share. I suggest using Teracopy to do this move operation verifying the CRCs as it goes. Not the fastest, but leaves you with a warm feeling that all of the files were accurately moved. Quote Link to comment
boof Posted November 4, 2014 Share Posted November 4, 2014 It is true that drives can become too full and impact performance. It can also impact the ability to run recovery tools. I (personally) am not excited about an auto-rebalance feature, but neither do I object to your requesting it. The big benefit to me is that, given the user share allocations / split level rules it's quite easy to run out of space on a disk for specific paths of data as the split levels try to colocate everything as per the rules. At which point you need to manually shuffle data around the disks to try and alleviate space. You can either do that or change the split levels but there is a middle ground there. It does mean, I think, any sane rebalancing script needs to be aware of and take into account split level rules in effect and treat data affected specially. Which complicates the process ten fold. Quote Link to comment
bmfrosty Posted November 4, 2014 Author Share Posted November 4, 2014 I didn't want to go too far with this since I'm unsure how to address split level data, and I really just want a quick fix, so I wrote one that could be used manually to do cleanup, but should stop when a disk is down to 90% full (my disks are 99% full). #!/bin/bash DISK=disk2 TARGET=disk5 FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'` while [ $FREE -gt '90' ] ; do cd /mnt/$DISK/Movies/ for i in "`ls -1 | head -n 1`" ; do df -h /mnt/$DISK echo "$i" mv -v /mnt/$DISK/Movies/"$i" /mnt/$TARGET/Movies/ FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'` done done There is code cleanup to be done (I think I can get rid of the FREE variable and just inline it), and it currently only works in my movies directory (which seems safe!). Maybe this will be of use to others. Maybe rebalance isn't the best term for what I need to happen (vs what I would like to happen) which is pressure relief. Quote Link to comment
bmfrosty Posted November 4, 2014 Author Share Posted November 4, 2014 For anyone thinking about using this, it doesn't deal with exceptions at all. Use at your own risk. Quote Link to comment
StevenD Posted November 4, 2014 Share Posted November 4, 2014 When I needed to clean up my data ( due to split levels not being set after a new config), I just copied things in 1TB chunks to the cache and it the mover sort it out. Quote Link to comment
bmfrosty Posted November 4, 2014 Author Share Posted November 4, 2014 Ah. I may or may not be having mover issues right now due to the disks being full. I like the idea, but I'd prefer to not copy twice. Also, I'm probably moving to a 256 gig SSD for cache and VMs soon, so that may be out of the question too. Quote Link to comment
JonathanM Posted November 4, 2014 Share Posted November 4, 2014 When I needed to clean up my data ( due to split levels not being set after a new config), I just copied things in 1TB chunks to the cache and it the mover sort it out. For the benefit of others just watching this thread, if you move data to the cache drive, you should always use /mnt/disk? or \\tower\disk? and /mnt/cache or \\tower\cache paths to accomplish this. Using /mnt/user/share and /mnt/user0/share or \\tower\share is VERY DANGEROUS because of the way user shares are handled right now. You could easily lose data if you use user shares to move things around. Quote Link to comment
garycase Posted November 5, 2014 Share Posted November 5, 2014 The easiest way to avoid this is to simply not wait until your drives are so full before adding additional storage. If you add another drive when your average hits, for example, 80%, then (assuming you're using high water) you'll never have to worry about getting the drives to full you feel the need to move data around. In other words, UnRAID already has a method for balancing the drives -- high water allocation. You simply have to provide enough drives for this to work as designed, instead of only adding drives after your current ones are nearly full. Personally, I have no issue with filling up drives -- there's no performance issue for reads ... only for writes; and none of the full drives is ever written to, so it's not an issue. But if you prefer to see "balanced" drives, there's nothing wrong with that. Incidentally, I don't agree with the rationale that balancing "... makes sure everything isn't just on one drive, which would then fail earlier through greater use ..." ==> drives are designed to be used; and in fact once a drive is full it's most likely going to get far less use. Quote Link to comment
bmfrosty Posted November 5, 2014 Author Share Posted November 5, 2014 The easiest way to avoid this is to simply not wait until your drives are so full before adding additional storage. If you add another drive when your average hits, for example, 80%, then (assuming you're using high water) you'll never have to worry about getting the drives to full you feel the need to move data around. In other words, UnRAID already has a method for balancing the drives -- high water allocation. You simply have to provide enough drives for this to work as designed, instead of only adding drives after your current ones are nearly full. Personally, I have no issue with filling up drives -- there's no performance issue for reads ... only for writes; and none of the full drives is ever written to, so it's not an issue. But if you prefer to see "balanced" drives, there's nothing wrong with that. Incidentally, I don't agree with the rationale that balancing "... makes sure everything isn't just on one drive, which would then fail earlier through greater use ..." ==> drives are designed to be used; and in fact once a drive is full it's most likely going to get far less use. It was simply a question of me not paying attention to my free space. As I said, I've worked around the problem this way: #!/bin/bash DISK=disk2 TARGET=disk5 SHARE=Anime-Series FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'` while [ $FREE -gt '90' ] ; do cd /mnt/$DISK/$SHARE/ for i in "`ls -1 | head -n 1`" ; do df -h /mnt/$DISK echo "$i" mv -v /mnt/$DISK/$SHARE/"$i" /mnt/$TARGET/$SHARE/ FREE=`df -h /mnt/$DISK | awk '{print $5}' | tail -n 1 | sed -e 's/\%//g'` done done This is updated, and it just takes a little babysitting, but otherwise seems to work well. I'd love to see something a bit more integrated in the future though. Maybe something that honors and fixes split levels for directories that didn't originally have one set properly. I'm installing a disk6 next week that should handle my free space problems for a while. Quote Link to comment
WeeboTech Posted November 5, 2014 Share Posted November 5, 2014 My suggestion would be to use rsync and not mv. With rsync you can use --remove-source-files it will remove the source file after successful move. it will not delete empty directories, but that can be done at the end. (see the mover). There's a weird bug that crops up when you do a move to a disk share that can cause truncation of the destination file. With rsync, the file is first copied to a temp file before being moved into place. I don't know if this bug will rear it's ugly head with that script. User Share Copy Bug http://lime-technology.com/forum/index.php?topic=34480.msg320517#msg320517 Quote Link to comment
bmfrosty Posted November 5, 2014 Author Share Posted November 5, 2014 I will keep that in mind. I may be more likely to steal some of the code from mover: find "./$Share" -depth \( \( -type f ! -exec fuser -s {} \; \) -o \( -type d -empty \) \) -print \ -exec rsync -i -dIWRpEAXogt --numeric-ids --inplace {} /mnt/user0/ \; -delete I may be kept up at night trying to grok that find statement though. Quote Link to comment
bmfrosty Posted November 5, 2014 Author Share Posted November 5, 2014 Figured out the find command, I think. At least well enough that it's looking files not in use or empty directories. Listing out that bear of an rsync though: -i, --itemize-changes output a change-summary for all updates -d, --dirs transfer directories without recursing -I, --ignore-times don't skip files that match in size and mod-time -W, --whole-file copy files whole (without delta-xfer algorithm) -R, --relative use relative path names -p, --perms preserve permissions -E, --executability preserve the file's executability -A, --acls preserve ACLs (implies --perms) -X, --xattrs preserve extended attributes -o, --owner preserve owner (super-user only) -g, --group preserve group -t, --times preserve modification times --numeric-ids don't map uid/gid values by user/group name --inplace update destination files in-place (SEE MAN PAGE) Quote Link to comment
garycase Posted November 5, 2014 Share Posted November 5, 2014 As Weebo noted, you definitely want to avoid the share copy bug => your script references the disk shares, so it should be fine ... but be sure you don't forget about avoiding share references if you modify it in the future. Quote Link to comment
redbeard Posted November 9, 2014 Share Posted November 9, 2014 I've often wanted a 'defrag' of folders kind of function- where I had a folder of smaller size at one time, and as I added files it got much bigger. Then when I access that directory all my drives are spinning up (due to configuration). The rebalance is sort of the opposite- but still in the same vein of figuring out how to manage files in a user share. Quote Link to comment
NAS Posted November 10, 2014 Share Posted November 10, 2014 I've often wanted a 'defrag' of folders kind of function- where I had a folder of smaller size at one time, and as I added files it got much bigger. Then when I access that directory all my drives are spinning up (due to configuration). The rebalance is sort of the opposite- but still in the same vein of figuring out how to manage files in a user share. I think this use case is much more common. I certainly ballance manually because of this. It is surprisingly tricky to do though because of the time it takes to run Quote Link to comment
switchman Posted November 10, 2014 Share Posted November 10, 2014 I to balance manually, just went through all my drives a consolidated the directories to a single drive. I built an excel sheet to help me. It reads a top level share, shows the drives it is located on an and lets you build an rsync command to consolidate the directories. http://lime-technology.com/forum/index.php?topic=33689.msg328265#msg328265 Quote Link to comment
NAS Posted November 10, 2014 Share Posted November 10, 2014 I still use my hacky but adequate script I call undist from 5? years ago http://lime-technology.com/forum/index.php?topic=2689.msg27208#msg27208 Quote Link to comment
bmfrosty Posted November 11, 2014 Author Share Posted November 11, 2014 I suspect that there's a central methodology that could be used to do this, and then it could be used for things like rebalance and reprotect. One of my problems is that I don't know where the open source parts of this project end and the closed source parts begin. Is there a resource about that? I may have to spend some time digging about the wiki. EDIT: My assumption is that EMHTTP and SHFS are LimeTech, and everything else is considered open source or user contributed plugin. I don't know if there's something else I'm missing though. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.