"Keep together" split mode - please


NLS

Recommended Posts

This is one thing that didn't change between my last unRAID install 12 years ago and the current one.

I find that the system doesn't try hard enough to keep folders together in disks.

High-water really can't help.

I think some of us really need a mode to keep desired depth of folders together as much as possible on the disk the share ALREADY is (i.e. not decide based on disk number, or empty space as long as it fits) - and then be able to overflow to another (so also the "manually split" method does not cover it).

 

Is there any chance for this?

 

Link to comment

What criteria would this be based on?

There has to be the possibility for simple If This Then That logic to decide when/where to split the data. This is a drawback to the merged filesystem approach to bulk data storage that Unraid takes. If the file is larger than a single volume - it can't go on the array at all. Similarly, if a directory or set of directories is larger than a single volume,  it must be split across multiple volumes. 

I can't think of a way any software engineer could approach this in a logical, and universal fashion. Sure, we could write an implementation for one specific use case - but that's about the limit.

The only thing I can think of would be to use a "keep together" list and allow selecting specific folders to try to force onto a singular volume. If those directories ever didn't fit the free space of a disk, it would have to store data on another disk. It's just not feasible imho.

Link to comment

Although I am not active in that field any more, I do come from a programmer's background.

I don't agree at all with your analysis.

(I do agree about the limitations of merged fs but this is not a problem right now)

 

The functionality I ask is pretty simple:

First attempt to put files in EXISTING folders up to the depth the new files have (actually with using cache this can be smarter even as it directly knows the tree structure and sizes).

If space is not enough going to the full depth of the tree, jump one down and see if breaking at that level is ok. And so on.

Actually even high-sierra should work like that (maybe it does), just the destination space check is different.

Use unRAID disk number (and create structure in other disk) only if the above check cannot find enough space.

 

I can put down full analysis and examples if needed and if anybody in the dev team bothers to consider it.

 

Edited by NLS
Link to comment
On 11/27/2019 at 2:22 AM, NLS said:

Although I am not active in that field any more, I do come from a programmer's background.

I don't agree at all with your analysis.

(I do agree about the limitations of merged fs but this is not a problem right now)

 

The functionality I ask is pretty simple:

First attempt to put files in EXISTING folders up to the depth the new files have (actually with using cache this can be smarter even as it directly knows the tree structure and sizes).

If space is not enough going to the full depth of the tree, jump one down and see if breaking at that level is ok. And so on.

Actually even high-sierra should work like that (maybe it does), just the destination space check is different.

Use unRAID disk number (and create structure in other disk) only if the above check cannot find enough space.

 

I can put down full analysis and examples if needed and if anybody in the dev team bothers to consider it.

 

Ah, I misunderstood the initial request.

Yes, that makes sense and is approachable from a software engineering perspective.

Start at the source tree level 0. If tree level 0 and all subdirectories fits, just merge with the existing tree.
If tree level 0 doesn't fit, jump to tree level 1 and iterate over subdirectories in level 1 until something doesn't fit. Once it doesn't fit make a new tree on another drive. Simple enough.

I was thinking this request was for some arbitrary content type. If it's just based on folders its not that hard.

Link to comment

NOTE: This would also make a very nice plug-in, to run on a schedule or replace mover script or something...

 

If implementing this mode, I would make the system strongly suggest cache=yes.

This can make the process way smarter.

 

Actually to be more complete as pseudo code:

(note that I believe many of those checks are already done by unRAID)

 

[file write to unRAID]

? Cache enabled for destination disk?
> No. Use new [smart file write to disk] routine.
> Yes. Actual write to cache as already done by unRAID, with proper checks for cache minimum disk space or if cache becomes full. If any of these occur, proceed as set in "use cache disk" setting, invoking [smart file write to disk] if needed. Wait to use new [smarter mover] when scheduled.

...(I am not sure how unRAID handles files that already exist, for cache-endabled shares. If it was me, I would either just "present" the cache (newer) version and leave the old version on disk (but not display it in any share operations) until mover overwrites it, or immediatelly delete the old version. It is possible that unRAID even ignores cache setting for files that already exist and overwrites destination directly - anybody knows?)...

 

[smart file write to disk]:
(used only when cache is not used)

? Which disks already have the top folder that is needed (the share)?
> If none is found, all disks are marked as possible destinations.
> If at least one is found, mark it as possible destination.

(here we have a list of possible destination disks)
? Is any of the possible destination disks over "minimum free space"?
> If not, keep them.
> If yes, remove those from list. If all of them are over quota, then remove them all and populate the list with all the other disks (that originally don't have the share) and are under quota.

- Repopulate the list by adding in all the entries the full requested path (including filename).
(here we have a final list of possible destination full paths including the filename)

? Does any of the full strings in the list match an existing path+filename?
> If it does, replace it. Skip to [actual copy process].

- Remove the filename from the list entries. (remain with full possible paths)

? On the possible destination paths, does the complete requested path exist on any already?
> It does, keep them (can be more than one) as possible destination paths.
> It doesn't. Go up one parent and re-check.
If checks reach share level (top folder), then all paths (that were in the list) are marked as possible destination paths.

[actual copy process]
(here we have a list of possible destination paths - or single entry if file exists - make sure in above routine to keep path depth correct)
- Re-arrange list based on alocation method.
- Attempt to copy using top entry ...do anything that needs to be done, to take care of overwrites, switch to next entry in list if file is larger than it fits the disk and so on (all the usual tasks already done by unRAID when copying to the array).

 

[smarter mover]
(used only in new split level mode)
(what is different from smart file write, is that for files in cache, system can readily know the file sizes and folder content sizes to make more educated destination selection)

- Build move list (one entry per file irrelevant of path).

? Check for files that already exist in array.
> If a file exists, find actual destination disk path.
? Check if replacement file size difference, will hit any limits (minimum free space or actual disk remaining space)
> All ok, overwrite (on same disk). Remove from move list.
> Will not fit. Delete destination old version. Keep in list.

(here, move list has only "new" files)

["Map" structure process]
- Map is created by making a list of paths, made by the full path and all parent paths, only once.

...example let say we have this in cache:
emulation/emulators/mame/mame.exe
emulation/emulators/mame/mame.ini
emulation/emulators/winuae/configurations/newconfig.ini
emulation/emulators/winuae/winuae.exe
emulation/downloads/mame/mame.zip
emulation/downloads/winuae.zip
emulation/misc/all-emuls.xlsx

...then the map is created like this:
emulation/emulators/mame
emulation/emulators
emulation
emulation/emulators/winuae/configurations
emulation/emulators/winuae
(emulation/emulators and emulation are not re-added in the list because they exist)
emulation/downloads/mame
emulation/downloads
(emulation/downloads is not re-added in the list for winuae.zip, nor is emulation because they exist)
emulation/misc
(emulation is not re-added in the list because it exists)
(now this process can be created to either not re-add existing entries, or add them and then eliminate duplicates... depends which method is faster)

- Sort path list starting from deepest paths.
(...from example

emulation/emulators/winuae/configurations
emulation/downloads/mame
emulation/emulators/mame
emulation/emulators/winuae
emulation/downloads
emulation/emulators
emulation/misc
emulation

...it is vital we sort from deepest path, for the whole process to work properly)
(we now have a map of different paths and all their parents)

[build initial move list]
(This will build an initial move list that doesn't "decide" on actual disk destination yet)

- Sort file list, starting from deepest paths.

(using above example:
emulation/emulators/winuae/configurations/newconfig.ini
emulation/downloads/mame/mame.zip
emulation/emulators/mame/mame.exe
emulation/emulators/mame/mame.ini
emulation/emulators/winuae/winuae.exe
emulation/downloads/winuae.zip
emulation/misc/all-emuls.xlsx

- Add dummy destinations
<dummy>/emulation/emulators/winuae/configurations/newconfig.ini
<dummy>/emulation/downloads/mame/mame.zip
<dummy>/emulation/emulators/mame/mame.exe
<dummy>/emulation/emulators/mame/mame.ini
<dummy>/emulation/emulators/winuae/winuae.exe
<dummy>/emulation/downloads/winuae.zip
<dummy>/emulation/misc/all-emuls.xlsx

...this is to be re-placed with proper paths below)

[choose destinations]
(This process will take the path map above and check if it exists in any of the disks already. If it does, it will consider using it - or not)
(if a deeper path is already split in other disk than its parents, or in any case sub-directories are using different disks, it means the user or a previous run of smarter mover, already decided it had to be this way - so FIRST we attempt to re-use that choice)
(Starting from top entry - the path map, not the move list)
? Check path exists on a disk or more than one disks.
> Path does not exist. Remove entry from list, if parent entry exists below. Means we can use all the destinations the parent entry exists, based on allocation method and minimum free space not passed.
> Path exists on one disk. Replace map entry with full path to destination disk.
> Path exists on more than one disk. Replace map entry with full path to destination disk and choose disk (from the candidates) based on allocation method and minimum free space not passed.

(after this we have a map that dictates which cache folder will try to use which disk to move a path's contents)
(so using example above, map is now formed something like that:
/mnt/sdf/emulation/emulators/winuae/configurations
/mnt/sde/emulation/downloads/mame
/mnt/sdd/emulation/emulators/mame
/mnt/sdf/emulation/emulators/winuae
/mnt/sdg/emulation/downloads
/mnt/sdf/emulation/emulators
/mnt/sdg/emulation/misc
/mnt/sdg/emulation

...the destination paths are just examples. Using those examples, the system decided based on paths that exist, while their parents are free to "roam" to other disks that already have them, based on free space and alocation method... in other words for the example above, mame emulator is in sdd, because it couldn't fit in sdf where winuae and probably rest of emulators are... but emulation folder itself prefer to use sdg unless as we saw above we have "specific instructions" for some of its contents)

[build final move list]
(This is how the magic works...)
- Now the script will use the map above, to replace <dummy> entries in move list.
(As I said it is vital to sort from deepest paths, as these will be replaced first, so that smaller paths will not change the decision as they will not match)
? Map entry matches any <dummy> entries in move list? (starting from top, remember it checks only for entries still with <dummy>)
> Yes. Replace full path to full proposed destination by map.
> No. Move to next entry.

(so using the example the move list will change to something like that...

/mnt/sdf/emulation/emulators/winuae/configurations/newconfig.ini
/mnt/sde/emulation/downloads/mame/mame.zip
/mnt/sdd/emulation/emulators/mame/mame.exe
/mnt/sdd/emulation/emulators/mame/mame.ini
/mnt/sdf/emulation/emulators/winuae/winuae.exe
/mnt/sdg/emulation/downloads/winuae.zip
/mnt/sdg/emulation/misc/all-emuls.xlsx

...and hopefully this works. :D )

 

Whoever took the time to read this... Thanks.

 

Edited by NLS
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.