FIFO caching unraid


Recommended Posts

The server is used with projects from around 60gb a piece (Photo's).

What I want:

The photographer copies the XQD cards from the pc to the unraid server.

Once there, unraid needs to do 3 things:

1. In the background copy the photo's to the HDD

2. If the cache drive is full: remove oldest project (It will still be on the HDD off course, only the cached project will be deleted)

3. Keep the newly created photo's on the cache drive until it gets pushed out by newer projects (FIFO)

 

I thinks this is called FIFO caching: how can I achieve this? 

 

Thanks in advance!

Link to comment

unraid stock can do a similiar thing, just have a cache drive. Set up a share e.g. "photos".

 

Set caching on it to "use cache: Yes"

 

Quote

Specify whether new files and directories written on the share can be written onto the Cache disk/pool if present.


This setting also affects mover behavior.

No prohibits new files and subdirectories from being written onto the Cache disk/pool.
Mover will take no action so any existing files for this share that are on the cache are left there.

 

Yes indicates that all new files and subdirectories should be written to the Cache disk/pool, provided enough free space exists on the Cache disk/pool.
If there is insufficient space on the Cache disk/pool, then new files and directories are created on the array.
When the mover is invoked, files and subdirectories are transferred off the Cache disk/pool and onto the array.

 

Only indicates that all new files and subdirectories must be writen to the Cache disk/pool.
If there is insufficient free space on the Cache disk/pool, create operations will fail with out of space status.
Mover will take no action so any existing files for this share that are on the array are left there.

 

Prefer indicates that all new files and subdirectories should be written to the Cache disk/pool, provided enough free space exists on the Cache disk/pool.
If there is insufficient space on the Cache disk/pool, then new files and directories are created on the array.
When the mover is invoked, files and subdirectories are transferred off the array and onto the Cache disk/pool.

 

NOTE: Mover will never move any files that are currently in use.
This means if you want to move files associated with system services such as Docker or VMs then you need to disable these services while mover is running.

 

If you want to do exactly what youre talking about, then you need a custom script (you could use the user scripts plugin) for that (but i think thats pretty much unneccessary)

Edited by nuhll
Link to comment

First of all: thank you both for the reply's.

@Nuhll I now about de the caching options, but the problem as stated is that the entire share will be located on the cache drive.. I ONLY wan't the newest projects to be on there.

@jonathanm Yes I know it doesn't work like that; but it sounds strange to me that it doesnt, this isn't such a strange use case is it? I think freenas and a couple of other solutions do offer this? 

 

There are a lot of use cases where this can be benificial, especially the project based work loads like photography or videography.

 

I love unraid for everything else, and do not like to switch; so how can I solve this problem? I tried:

- Syncing with nextcloud (but also other syncing software) (so the cache would be on the PC itself) but this is not the ideal solution. Especially for video projects. (not every pc has enough space for a project like that)

- Making a 'work' share on the cache drive. But this brings with it a lot of hassle; people have to remember coping the contents to the archive directory, and if they do it takes a lot of time; so they have to ask me for a command line copy.

 

So the most obvious way would be to adapt the mover script, so instead of moving the entirety of the cache disk it only moves on a per directory bases, and it tries to leave an x amount of space free:

Max size of a project is 1tb. So 1tb has to be reserved for when a new project is copied onto the cache drive (which for the user is just the folder they want to put the project in)

1. 4tb cache pool; 3tb is taken with 'older' projects. User copies new files.

2. Mover is invoked; it checks to see if the 1tb threshold has been hit (it has in this case).

3. Mover moves only the amount of data needed to get under the 1tb threshold to the HDD's. FIFO style (Oldest directories first)

 

Is this something that is doable? Some pointers on how I can ahieve this?

 

Love to have some input!

 

 

 

Link to comment

OK. Found the mover script.

 

First thing to change should be a check if a threshold value has been hit (disk usage percentage):

# Only run script when cache disk usage threshold is exceeded
threshold=(75)
diskusage=$(df -hl | grep '/mnt/cache' |  awk '{ print $5 " " $1 }'| cut -f1 -d '%')  
if [[ "$diskusage" -lt "$threshold" ]]; then
  exit 0
fi

 

In the mover loop, this command:

find "./$Share" -depth \( \( -type f ! -exec fuser -s {} \; \) -o \( -type d -empty \) \) -print \ \( -exec rsync -i -dIWRpEAXogt --numeric-ids --inplace {} /mnt/user0/ \; -delete \) -o \( -type f -exec rm -f /mnt/user0/{} \; \)  
 

Has to be changed somehow to reflect the output of this command:

find /mnt/cache/$Share -type f -print0 | xargs -0 ls -tr 

Which lists all files, beginning with the oldest one, and subsequently moves them with the rsync command.

 

After each file there has to be a check if disk usage percentage has been reached, 75% in this case: (per file bases because some files are like 80gb) or if this is to slow maybe it is possible to have a counter after which we check.

diskusage=$(df -hl | grep '/mnt/cache' | awk '{ print $5 " " $1 }'| cut -f1 -d '%')
      if [ $diskusage -lt $threshold ]; then

        break

      fi

 

Any input in how to to change the find / rsync command? Quite new to bash.. so..

 

Last thing to change is an rsync command to sync the cache to a HDD share for extra duplication of files. (Just now I had 2 ssd drives fail in a 4 drive cache pool.. I sync the pool every day, so nothing lost there but feels a bit safer this way.)

 

Am I on the right track? 

Link to comment
5 hours ago, robertoal said:

- Making a 'work' share on the cache drive. But this brings with it a lot of hassle; people have to remember coping the contents to the archive directory, and if they do it takes a lot of time; so they have to ask me for a command line copy.

Only if the work share is on cache and the archive is on the array. If the archive share is cache=yes, then a copy (via SMB) from work share to archive should be fairly fast. Then the slow part, the cache->array copy happens via the mover script at whatever interval makes sense. 

 

Also have you spent much time searching the forums for mover script modifications? I ask because i vaguely recall people discussing the kind of modifications you are trying to make. But i dont remember where is saw it. 

Link to comment
6 hours ago, robertoal said:

First of all: thank you both for the reply's.

@Nuhll I now about de the caching options, but the problem as stated is that the entire share will be located on the cache drive.. I ONLY wan't the newest projects to be on there.

@jonathanm Yes I know it doesn't work like that; but it sounds strange to me that it doesnt, this isn't such a strange use case is it? I think freenas and a couple of other solutions do offer this? 

 

There are a lot of use cases where this can be benificial, especially the project based work loads like photography or videography.

 

I love unraid for everything else, and do not like to switch; so how can I solve this problem? I tried:

- Syncing with nextcloud (but also other syncing software) (so the cache would be on the PC itself) but this is not the ideal solution. Especially for video projects. (not every pc has enough space for a project like that)

- Making a 'work' share on the cache drive. But this brings with it a lot of hassle; people have to remember coping the contents to the archive directory, and if they do it takes a lot of time; so they have to ask me for a command line copy.

 

So the most obvious way would be to adapt the mover script, so instead of moving the entirety of the cache disk it only moves on a per directory bases, and it tries to leave an x amount of space free:

Max size of a project is 1tb. So 1tb has to be reserved for when a new project is copied onto the cache drive (which for the user is just the folder they want to put the project in)

1. 4tb cache pool; 3tb is taken with 'older' projects. User copies new files.

2. Mover is invoked; it checks to see if the 1tb threshold has been hit (it has in this case).

3. Mover moves only the amount of data needed to get under the 1tb threshold to the HDD's. FIFO style (Oldest directories first)

 

Is this something that is doable? Some pointers on how I can ahieve this?

 

Love to have some input!

 

 

 

There is even a mover plugin, where you can change pretty much everything.

 

But i still dont get your problem

 

"@Nuhll I now about de the caching options, but the problem as stated is that the entire share will be located on the cache drive.. I ONLY wan't the newest projects to be on there."

Thats wrong. If you choose e.g. YES then all new things go to cache (fast), and will be moved to array (when you want) e.g. ive set mine mover to only move if less then 50% is free on cache drive.

 

"Last thing to change is an rsync command to sync the cache to a HDD share for extra duplication of files. (Just now I had 2 ssd drives fail in a 4 drive cache pool.. I sync the pool every day, so nothing lost there but feels a bit safer this way.)"

You can add mutliple cache drives and have it protected also. (thats how i do it)

 

"2. Mover is invoked; it checks to see if the 1tb threshold has been hit (it has in this case).

3. Mover moves only the amount of data needed to get under the 1tb threshold to the HDD's. FIFO style (Oldest directories first)"

 

That seems so overcomplicated and useless, if you ask me. If you need that u should hire someone on fiver to create a bash script for some bucks. (but i dont see any reason why you want to do it that way, the built in unraid way (official way) seems perfectly fine for that sort of work you describe.

 

 

Edited by nuhll
Link to comment

The important thing to remember is that files on cache and files on the array are both files in user shares.

 

If you want a copy of something in both cache and array then they need to be separate shares, or at least paths sufficiently different so you don't get a collision. For example /mnt/cache/share1/file1 and /mnt/disk1/share1/file1 are both seen as file1 in the share1 user share.

Link to comment

Again all of you thanks for the replies!

 

@primeval_god You are right: but again: the mover moves ALL files, not just the oldest ones, correct? I searched the forum with FIFO caching, caching, data based caching and so forth. I will try search for it again but with the script tag added 🙂

 

@nuhll I don't think i explained my question well enough. As you say the mover can be invoked at a set percentage or a set time. But then it moves ->ALL<-  of the files from the cache drive to the array, correct? So this means that the moment the 50% (as stated in your example) disk usages hits anybody working on a project has there projects moved to the slow array: this is the opposite of what I want!

Right now I have the standard raid10 mode with my cache ssd's (4 of them). Just the other day the wrong 2 went bad, which means I last all my vm's, docker etc. Especially on critical projects (most of the projects are weddings so thats pretty much always) i want to be on the safe side..

That seems so overcomplicated and useless -> So no, I don't think it is. Or maybe i'm still missing something here? The only extra thing I want the mover to do essentially is: move everything based on date until threshold is reached? 

 

@trurl Indeed: I discovered this: it is why there is a /mnt/user0 drive. the /mnt/user/$Share folders aggregrates everything from cache en array drives I think?

 

 

Link to comment
9 minutes ago, robertoal said:

there is a /mnt/user0 drive

This shows user share contents OMITTING any files belonging to those shares that are on the cache.   Note, however, that Limetech have stated that the /mnt/user0 mount point is now deprecated and likely to disappear in a future Unraid release.

Link to comment

Yes, which is why rsync copies the files from the cache to /mnt/user0, en subsequently removes files from the cache drive?

Everything located in /mnt/user0 will be shown in /mnt/user, correct?

 

See this post:

  

On 4/17/2015 at 4:09 PM, gundamguy said:

 

Squid gave you a good answer but to expand a bit.

 

User shares in general aggragate folders from multiple disks.

 

For example, you have a share called Downloads (I am going to assume this isn't cache only even though it likely is...)

 

 


#What the user share downloads is located at 
/mnt/user/Downloads

#What this user share is actually doing is aggragating the following places into one view
/mnt/disk1/Downloads 
/mnt/disk2/Downloads 
... 
/mnt/diskX/Downloads  
#(Where X is the number of disks in your array that this user share is allowed to be on)
/mnt/cache/Downloads
 

 

 

This allows you to browse one folder and get a combined view, even though your data is actually in a lot of different folders on different disks.

 

Now for User0

 

 


#What the user share downloads is located at 
/mnt/user0/Downloads

#What this user share is actually doing is aggragating the following places into one view
/mnt/disk1/Downloads 
/mnt/disk2/Downloads 
... 
/mnt/diskX/Downloads  
#(Where X is the number of disks in your array that this user share is allowed to be on)
# Key difference is that user0 does not include /mnt/cache/Downloads
 

 

 

So when you look at the directories under /mnt/user/ and /mnt/user0 they should look exactly the same (maybe not if there are cache only shares... not sure.)

 

Why was explained by Squid already, it's used by the Mover script to help move files on the Cache to a disk in the array.

 

But you can also use user0 if you have a share that uses the cache drive but you want to bypass writing to the cache drive and write directly to the array for some reason. (Or at least I think you can do that, pretty sure it works that way)

 

 

 

 

 

 

 

Link to comment

You have it basically correct regarding /mnt/user0 as of the date of that post you quoted, but as itimpi noted

2 hours ago, itimpi said:

Limetech have stated that the /mnt/user0 mount point is now deprecated and likely to disappear in a future Unraid release.

I don't think the current version of Mover uses /mnt/user0 at all.

 

Link to comment
48 minutes ago, robertoal said:

find /boot/ -type f -name mover doesnt seem to find a mover script.. any idea where the new one is located?

You wouldn't find it in /boot. The flash drive only contains the archive of the OS, and settings from the webUI. That archive is unpacked into RAM at each boot.

 

Try this instead:

which mover

 

Link to comment
On 1/4/2020 at 11:54 AM, robertoal said:

 

@nuhll I don't think i explained my question well enough. As you say the mover can be invoked at a set percentage or a set time. But then it moves ->ALL<-  of the files from the cache drive to the array, correct? So this means that the moment the 50% (as stated in your example) disk usages hits anybody working on a project has there projects moved to the slow array: this is the opposite of what I want!

Right now I have the standard raid10 mode with my cache ssd's (4 of them). Just the other day the wrong 2 went bad, which means I last all my vm's, docker etc. Especially on critical projects (most of the projects are weddings so thats pretty much always) i want to be on the safe side..

That seems so overcomplicated and useless -> So no, I don't think it is. Or maybe i'm still missing something here? The only extra thing I want the mover to do essentially is: move everything based on date until threshold is reached? 

Yes, it will move all files.

 

What projects we are talking about? Only photos?

 

I think the "safest" way would be

 

1.) create a share CACHE ONLY /mnt/user/*insert your share name here*

2.) create a share Archiv (maybe set cache to YES) /mnt/user/*insert your share name2 here*

3.) create/find a rsync command for moving files older then X to Archiv (use user scripts plugin to run the script every X) Move from /mnt/user/*insert your share name here* to /mnt/user/*insert your share name2 here*

 

Like others already said, you must be very careful where and how you move files on unraid.

 

Is it really that critical if your photos get read from non cache? I mean, they wont be 100s of GB?

Edited by nuhll
Link to comment
  • 2 weeks later...

I'm glad I found this thread. I'm in a similar situation with files on Unraid (having recently moved from Qnap).

 

@nuhll In my case I'm working with often 500gb-1500GB of data that I want to be on the cache allocation. I think the solution you mentioned above will work, but I'm hoping to find an easier way for my pipeline still.

 

UX-wise something like a "cache this folder" menu item is the kind of thing I'd like (but won't happen). Ultimately I want to have a single share for the top-level folder (e.g. "Photos") and then be able to select sub-folders of that to be on the cache or not depending on my current task.

Folder-tree might look like:

  • Photos (Unraid Share)
    • Import
      • Client 4
      • Client 5
    • Processing
      • Client 1
      • Client 2 (Cached)
      • Client 3
    • Archive
      • Client 0

 

Any advice on this kind of structure or OP's question is doubly appreciated.

 

Link to comment
  • 1 month later...
On 1/6/2020 at 1:55 AM, nuhll said:

Yes, it will move all files.

 

What projects we are talking about? Only photos?

 

I think the "safest" way would be

 

1.) create a share CACHE ONLY /mnt/user/*insert your share name here*

2.) create a share Archiv (maybe set cache to YES) /mnt/user/*insert your share name2 here*

3.) create/find a rsync command for moving files older then X to Archiv (use user scripts plugin to run the script every X) Move from /mnt/user/*insert your share name here* to /mnt/user/*insert your share name2 here*

 

Like others already said, you must be very careful where and how you move files on unraid.

 

Is it really that critical if your photos get read from non cache? I mean, they wont be 100s of GB?

A little update:

I haven't been able to script the FiFo concept yet unfortunately; i'm simply not experienced enough with bash 🙂 

So I will leave this feature to be added (hopefully) in the feature.

 

Thanks nuhll for baring with me, and for the solution you mention. Thing is with all these soltions: I want full transparancy for the users. THey don't need to think about where to move which files, so now I use the ' move above 80% usage' scheme; although nog optimal it is workable.

 

Is it really that critical if your photos get read from non cache? I mean, they wont be 100s of GB?

Yes: the .nef arent' that large by itself: 40/50 mb but especially with multiple users selecting photo's for the final selection a HDD is very noticeable. And: I don't wan't to spin up the drives all the time, with the right caching scheme (FiFo) they should only spin once a week or something like that which is good for longevity.

 

Link to comment
9 minutes ago, robertoal said:

Is it really that critical if your photos get read from non cache? I mean, they wont be 100s of GB?

Yes: the .nef arent' that large by itself: 40/50 mb but especially with multiple users selecting photo's for the final selection a HDD is very noticeable. And: I don't wan't to spin up the drives all the time, with the right caching scheme (FiFo) they should only spin once a week or something like that which is good for longevity.

A few potential suggestions to consider:

  • You can use python instead of bash. I have found python scripts to be a lot more powerful, especially when sorting data.
  • For better access time, you can add SSD to the array and limit your share to just the SSDs. Since you are more interested in read than write, even QLC can be a good budget choice.
    • Note that some SSD may cause parity sync errors so make sure to watch out for that.
    • Also SSD in the array cannot be trimmed.
  • Alternatively you can even create a pseudo array out of SSD + Unassigned Devices + Mergerfs.
    • You lose some Unraid functionalities e.g. parity, share etc. but then you can trim the SSD so I would consider that a wash.
    • I am running one right now with 3 external USB SSDs for my offline backup job mainly to leverage on lus distribution and not having to deal with the ramification of USB drives dropping offline.
Link to comment

@testdasi Thank you for thinking with me here.

 

1. I'm not really good in programming in general, although I have some experience with Python. I hope I have the time to make a project of this !

2. This is a really good idea, but.. it is almost the same as using the cache as a temp drive isnt it? So the user has to manually copy the files to another directory when he/she is done with the project. It is not the end of the world but again: FiFo looks like a very logical solution for this use case. (and for the limetech team probably quite easy to implement) 

3. I have a 4*800gb ssd cache array at the moment; trimming in a cache array IS permitted isnt it?

Link to comment
3 hours ago, robertoal said:

@testdasi Thank you for thinking with me here.

 

1. I'm not really good in programming in general, although I have some experience with Python. I hope I have the time to make a project of this !

2. This is a really good idea, but.. it is almost the same as using the cache as a temp drive isnt it? So the user has to manually copy the files to another directory when he/she is done with the project. It is not the end of the world but again: FiFo looks like a very logical solution for this use case. (and for the limetech team probably quite easy to implement) 

3. I have a 4*800gb ssd cache array at the moment; trimming in a cache array IS permitted isnt it?

1. You can actually combine find + rsync with bash for a 1-liner that would rsync stuff older than a certain number of days. If you run the script daily then it sort of would achieve a FIFO-like thing.

2. It's not entirely the same. In the array, the SSD is parity protected. In the cache pool, the SSD is RAID-protected (or not, depending on RAID type). It all depends on your use case.

3. Cache pool is trim enabled.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.