Balancing Disk Space Among Shares/Disks


Recommended Posts

I've just added more disks to my system.  One of the original drives (4TB Seagate) is over 90% full.  The disk is part of a share with 6 other disks (Movies).  There are several new drives (5TB Seagate and 2 3TB Hitachi) which are filling with data in a nearly equal way, however, I'd like to free up some space on Disk 1 so that the free space remains proportionally the same among all the disks.  I could just move files...annoying and slow.  What I'd like is for the system to automatically balance files across a number of disks or within a share.  Is there a utility or best practice for automatically maintaining that equilibrium?  I looked at Unbalance, but it looks like it's only a utility to move folders, not balance disk free space within a folder (which would do what I want...).  I know that multiple shares to a disk complicates this issue, but I'd be happy to limit shares per disk if that would help...

Link to comment

I'm in a similar boat as you. I have 15 4TB drives that are at about 97% full. I just added two new drives and wish to move some data around. Unfortunately as far as I'm aware there is no way to re balance data to those drives. So I have been logging in to my server via ssh and moving it manually in midnight commander. It's slow but at least it works. A word of caution though, if you go this route and do it manually do not copy from a disk to a share. You must copy from disk to disk otherwise you will corrupt your data. In my case all of my top level shares can use any disk so I just copy from /disk1/media/movies/hd/files-to-be-moved to /diskx/media/movies/hd/destination. Though d pending on your folder structure you may have to create some folders manually if they are several layers deep.

 

I don't know of any automatic way though, it's all manual. In my case it's really a pain when your working with 17 disks. Though I would ask why you want to move them. If your disks use xfs than I don't think there is a performance issue with very full drives like RieserFS has.  Just my two cents.

 

 

Sent from my iPhone using Tapatalk

Link to comment

IMHO, it's much better to totally fill disks with folders of content that will be mostly static, and allow your most free drives to receive all new content. That way as disks fill up, you aren't juggling files around from disk to disk. Logically separate your archival stuff from your changing and new content.

Link to comment

IMHO, it's much better to totally fill disks with folders of content that will be mostly static, and allow your most free drives to receive all new content. That way as disks fill up, you aren't juggling files around from disk to disk. Logically separate your archival stuff from your changing and new content.

 

This makes sense.  Would you separate archival stuff from dynamic stuff onto physical disks as well?

Link to comment

IMHO, it's much better to totally fill disks with folders of content that will be mostly static, and allow your most free drives to receive all new content. That way as disks fill up, you aren't juggling files around from disk to disk. Logically separate your archival stuff from your changing and new content.

 

This makes sense.  Would you separate archival stuff from dynamic stuff onto physical disks as well?

Yes. It makes backups easier to manage. With each unraid disk having a standalone filesystem you only have to worry about drive sized chunks of backups instead of trying to manage backing up a user share that could span many drives. Some people prefer to scatter stuff evenly across all their drives, so if they lose a couple drives only a portion of any particular share is gone. That view makes my brain itch. My user shares are confined to only as many drives as absolutely necessary to hold them. Example, disk1 has share a, b, c because they are all small shares. disk 2,3,4 only have movies, disk 5,6,7,8 only have tvshows, etc. The movies drives get filled fillup style, the tv shows are split into ongoing vs canceled so the canceled shows can be packed to the brim while ongoing shows have breathing room.
Link to comment

I'm from the "spread it out" school with moderation.

 

I do try to control what material gets written to which drive, but not to the extent that chicken eggs are always in basket 1 and duck eggs are in basket 2 and 3 etc.

[*]The main consideration for me is that if I have a failed drive that is somehow not recoverable by Parity (or 2+ failed drives), trying to recover 4TB manually, in my opinion, is a lot less painful (and easier to succeed) than trying to recover 8TB manually (despite the 50% probability of not having to recover anything as the failed drive is empty).

[*]If the drive is completely not recoverable, losing 50% of movies and 50% of TV shows, in my opinion, is a lot less painful than losing ALL the TV shows or ALL the movies.

[*]I don't have the luxury of buying drives to leave them empty.

 

The "moderation" part is the fact that I don't strive to make sure each drive has the same amount of free space. Close enough is fine, but not exactly the same. I basically just let unRAID decides based on a "Most Free" distribution.

 

Link to comment

I don't have the luxury of buying drives to leave them empty.

 

I could equally well counter that by saying that I don't have the luxury of buying drives just to leave them half full. I have six data drives plus two parity in my main server, with space to add more as and when I need them - a major advantage of unRAID vs other NAS solutions for me. Spinning up multiple half-empty drives is wasteful, IMO.

 

Link to comment

I could equally well counter that by saying that I don't have the luxury of buying drives just to leave them half full. I have six data drives plus two parity in my main server, with space to add more as and when I need them - a major advantage of unRAID vs other NAS solutions for me. Spinning up multiple half-empty drives is wasteful, IMO.

 

Fair enough but that's just perspective. My perspective is an empty drive is one that is not being used. A half-full drive is one that in the process of getting filled up. It's my asset -> I like to see my asset works  ;D

 

Regarding the last sentence:

It doesn't matter if you have all data in 1 drive or data spread out in multiple drives with unRAID.

To access a single file, a drive must be spun up regardless of which scheme you are on.

unRAID doesn't do stripe so the entire file is still on the same drive so only 1 drive get spun up => hence no unnecessary spinning up anyway.

And if you are considering the scenario when multiple people access multiple files then it becomes a matter of performance vs electricity.

Link to comment

I'm from the "spread it out" school with moderation.

  • If the drive is completely not recoverable, losing 50% of movies and 50% of TV shows, in my opinion, is a lot less painful than losing ALL the TV shows or ALL the movies.

Ahh, there is the major point of contention for me. I would rather know I lost all of something, than not know which items were gone until I tried to access them. Reripping disks or reloading from backups is much simpler when you know for sure which items you need.

 

I realize this is mostly personal preference, and what causes one person distress is totally different from what causes it for another. Borderline OCD or a compulsion to be in control of as much as possible rather than leaving things up to chance is what drives my philosophy.

Link to comment

My perspective is an empty drive is one that is not being used. A half-full drive is one that in the process of getting filled up.

 

But I don't have any empty drives in that server. I have full ones and one that's "in the process of getting filled up." I also have one standing by for when I need it either as the next in the array or to replace a failed drive. Naturally, it's tested and pre-cleared. Although it's true to say that it isn't actively being used, and therefore something of a "wasted asset", it isn't using any electricity either.

 

It doesn't matter if you have all data in 1 drive or data spread out in multiple drives with unRAID.

To access a single file, a drive must be spun up regardless of which scheme you are on.

unRAID doesn't do stripe so the entire file is still on the same drive so only 1 drive get spun up => hence no unnecessary spinning up anyway.

 

Actually, it isn't that simple. When you open a folder that is split over multiple disks those disks all have to spin up, just to provide the information necessary for the folder window. Also, parity checks require all (i.e. more) disks to spin.

 

And if you are considering the scenario when multiple people access multiple files then it becomes a matter of performance vs electricity.

 

If I needed more performance I would use some form of RAID, not unRAID. But I don't need more performance. unRAID works very well for me.

 

Link to comment

Actually, it isn't that simple. When you open a folder that is split over multiple disks those disks all have to spin up, just to provide the information necessary for the folder window. Also, parity checks require all (i.e. more) disks to spin.

 

If I needed more performance I would use some form of RAID, not unRAID. But I don't need more performance. unRAID works very well for me.

 

Cache Dir plugin works most of the time so I can browse folders without spinning up drives. Tested it and it works.

Parity check will have to spin up all drives anyway so I don't think that's relevant.

 

And I did not mention anything about RAID. Within the context of unRAID, if 2 people access 2 different files on the same HDD, the disk will have to spin back and forth to get the data. If 2 people access 2 files on 2 HDD then each HDD serves its own file, no seeking. That's the performance point.

 

I think we can agree to disagree and keep it there as it really depends on each's perspective.

  • Like 1
Link to comment

I don't have the Cache Dir plugin installed. I might give it a try.

 

My point about the parity check was that since all drives need to spin, it's more efficient to have fewer of them that are full than more that are half empty.

Irrelevant.  Parity Checks / Rebuilds always operate on the block level.  IE: they read or write all sectors on the drive
Link to comment

I don't have the Cache Dir plugin installed. I might give it a try.

 

My point about the parity check was that since all drives need to spin, it's more efficient to have fewer of them that are full than more that are half empty.

Irrelevant.  Parity Checks / Rebuilds always operate on the block level.  IE: they read or write all sectors on the drive

 

What I mean is that by filling up disks one at a time and adding to the array only when a new one is needed, as opposed to having the same amount of data spread over more part-full disks, uses less electricity. That's my point. I'm sorry I didn't explain it well enough.

 

Link to comment
  • 2 weeks later...

I have six data drives plus two parity in my main server, with space to add more as and when I need them - a major advantage of unRAID vs other NAS solutions for me.

 

I didn't know you would have multiple parity drives?  Or is it a Raid 0 array (which is a cool idea now that I think about it...)

 

unRAID 6.2 supports dual parity.

 

Link to comment

I don't have the Cache Dir plugin installed. I might give it a try.

 

My point about the parity check was that since all drives need to spin, it's more efficient to have fewer of them that are full than more that are half empty.

 

As already noted, the amount of data on your drives has NO impact on parity check speeds => the parity check will check parity on the entire disk regardless of whether it's full of data or completely empty ... it's completely irrelevant whether or not there is any data on the drives.

 

As for balancing data on the drives => clearly that's a matter of personal preference.    Note, however, that writes to a nearly empty drive are appreciably faster than those to a full drive, since it's the writes are going to be on the outer cylinders of the drive, which typically has data transfer speeds more than double those of the inner cylinders.  If you uniformly balance all the writes (i.e. a "most free" allocation) then the overall performance will degrade uniformly as the array fills up.

 

Link to comment

... What I mean is that by filling up disks one at a time and adding to the array only when a new one is needed, as opposed to having the same amount of data spread over more part-full disks, uses less electricity. That's my point. I'm sorry I didn't explain it well enough.

 

I don't think any particular allocation scheme has much impact on power utilization.  Regardless of the allocation scheme, only the drive being written to [plus the parity drive(s)] will be spun up for a write.    True, if you're writing a lot of data that results in multiple drive selections, a few more drives may be spun up during the overall process ... but assuming a nominal spin-down time, this isn't going to have any appreciable impact on total power consumption.

 

FWIW I use the "fill up" method for my media server, so drives are filled up before new ones are used; but it's really irrelevant which method you use.    The split level is more significant, as you don't want multiple disks to spin up to play back a single media item -- so you don't want the constituent files scattered across multiple disks.

 

Link to comment

I'm obviously still not explaining myself clearly enough.  ;)

 

What I'm saying is that when I first started using this server I installed a parity disk plus one data disk, which I then proceeded to fill. As it became close to full, I added another. When that got close to full I added a third. And so on. I have since added a second parity and will soon be adding my seventh data disk. I am therefore not spinning up empty disks during parity checks, which is more efficient because it saves electricity. That is my point - nothing more.  :)

Link to comment

I'm obviously still not explaining myself clearly enough.  ;)

 

What I'm saying is that when I first started using this server I installed a parity disk plus one data disk, which I then proceeded to fill. As it became close to full, I added another. When that got close to full I added a third. And so on. I have since added a second parity and will soon be adding my seventh data disk. I am therefore not spinning up empty disks during parity checks, which is more efficient because it saves electricity. That is my point - nothing more.  :)

I know what you mean but it might still be misinterpreted. Empty disks will spin during parity checks. What you mean is you don't have any empty disks to spin.
Link to comment

I'm obviously still not explaining myself clearly enough.  ;)

 

What I'm saying is that when I first started using this server I installed a parity disk plus one data disk, which I then proceeded to fill. As it became close to full, I added another. When that got close to full I added a third. And so on. I have since added a second parity and will soon be adding my seventh data disk. I am therefore not spinning up empty disks during parity checks, which is more efficient because it saves electricity. That is my point - nothing more.  :)

I know what you mean but it might still be misinterpreted. Empty disks will spin during parity checks. What you mean is you don't have any empty disks to spin.

 

Exactly that, and only one partially empty disk in that particular server. I'll drop the subject now. :)

 

Link to comment

True.  If you don't have any empty disks in your array then there won't be any "extra" disks spinning during a parity check.

 

This indeed saves a bit during parity checks.    A typical NAS rated disk (e.g. a 4TB WD Red) uses 4.5w when spun up and reading, so for a 10 hour parity check it would waste about 45w if you had an empty drive spinning ... or about 540 watts/year if you do a monthly parity check.  An unused drive would also be consuming about 0.4w when spun down, which adds up to about 3.5kw/year of additional power consumption => so including 12 parity checks you'd waste about 4kw/year for an unused drive ... at current average US power costs ($0.12/kwn) that's about $0.48/year for an used drive.

 

Yes, that's a savings ... but hardly enough to really matter  :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.