Jump to content
blumpy

Intelligent File Cache

18 posts in this topic Last Reply

Recommended Posts

For media workflows it would be nice to be able to keep all of your projects, clips, samples, etc in their own user share but have all the recently used projects, clips, etc intelligently mirrored to the ssd cache for acceleration.  When writing a file to user share, it is written to the cache.  In the background the cache is written to the array so that the data has parity protection.  Recently read files would be mirrored onto the cache for accelerated access but still stored on the device array. 

 

Reasons for this is not having to deal with local storage copies and conflicts.  Projects remain in their respective user share and an automated backup system would not have redundancies because users need to manual copy data to and from SSDs to work. 

 

All that said, I'm very new to UNRAID, so perhaps this is a feature that can be implemented manually but thus far I've not found it in the forums. 

 

Edited by blumpy
clarification; brevity

Share this post


Link to post

With the current unRAID implementation if you use a cache pool running in BTRFS RAID 1 mode (the default) then the contents of the cache are protected until mover runs to transfer the files to the array.

 

Not decrying the idea of something more intelligent but I suspect what you are asking for may require a significant development and thus not likely to appear in the near future.

Share this post


Link to post

I would imagine that I’m not the only person that would benefit from read cache.  Or perhaps I’m getting something wrong.

 

To be able to directly work from a NAS on media.... seems like a no brainer to me. To having what would ultimately be a giant Apple Fusion JBOD with parity protection seems like it would be very useful to media creators.  

Share this post


Link to post
5 hours ago, blumpy said:

I would imagine that I’m not the only person that would benefit from read cache.  Or perhaps I’m getting something wrong.

 

To be able to directly work from a NAS on media.... seems like a no brainer to me. To having what would ultimately be a giant Apple Fusion JBOD with parity protection seems like it would be very useful to media creators.  

 

Something like this can be done currently, if you can make each "project" a separate share.  For example maybe  you want to create a project called "myproject".

 

You could create a share called "myproject" and in the Share Settings set "Use cache disk" to Prefer.  As project files get created they will be stored on the cache device/pool.

With setting Prefer, the mover will not move anything from that share to the parity-array.

 

Later when the project is completed and you want to free up cache and move the project to the parity-array, you go into the share settings and set "Use cache disk" to Yes.  Next time mover runs, either scheduled or by clicking Move Now, the files comprising myproject will get moved to the array.

 

If later you want to move myproject back to the cache, you can go to share settings and set "Use cache disk" to Prefer again and next time mover runs, files will get copied back to the cache.

 

I can see some disadvantages with this.  First, it's a manual process, meaning, s/w does not "detect" project files being accessed and somehow automatically promote them from array to cache (or likewise make a determination that project is no longer in use and move them from cache to array).  Second, it would be nice if this could be done at an arbitrary level within the share directory tree (so that you can have a single shared called Projects where individual subdirs are handled as described above).  Also there is the problem of how long it might take to move projects between cache/array.

 

Second problem is easier to solve than first problem.  Maybe other problems?

Share this post


Link to post

Or if you use the find command and have it detect files that have changed over a specified time frame they could be archived into another share or folder and moved into the array for protection. 

 

I literally use Find for a lot on my system to make sure a file hasn’t been changed for at least 5minutes so in my video file work flow another script doesn’t mess with a currently accessed file.

Share this post


Link to post

Thank you very much for the recommendations but moving projects in & out of various share locations defeats the purpose & I should remain on nvme storage for the speed if manual oversight is required for projects on each workstation. Working from multiple shares also creates an unplanned redundant backup of projects if someone moves old projects in for editing. It’s good to have several backups but everything becomes a lot easier if it’s all from one share/pool/space/cloud/etc which was the purpose of this build. 

 

 

 

Share this post


Link to post

Why not just have two separate folders. One that only exists in the cache and one that uses the array then use a script to copy any older file from the cache folder to the array folder

Share this post


Link to post

There are several reasons to not use separate shares.  When dealing with massive Sample Libraries and auditioning libraries and having to repoint location of the data is completely impractical. It would be far superior to just run off HDD locally.  Perhaps I should just invest in a bunch of 2tb SSDs and leave it at that but a cache read system on NVME would be better.

 

Other reasons: Creating multiple version of files instead of pointing to single files not in the projects pool.  Having to reallocate the file locations in projects that have been moved.  If I'm working on a project and want to recall a dozen files from a previous project I need to copy those files to the new project instead of simply referencing the files that already exist.

 

Requiring multiple users to be aware which is the current version to work from instead of working from one version. it will also created dozens or more unplanned Redundant backs ups when working with projects. Being unclear which copy of a project is the latest.  Preventing user error when replacing previous projects. 

 

More importantly, why only have cache that works one way?  I'd imagine having a read cache would benefit most people.

Share this post


Link to post

A user share is just the combined top level folders on cache and array named for the share. User share settings only apply to new writes, except for the cache settings which have the additional effect of possibly moving cache to array or array to cache. Reading from user shares always includes all disks regardless of the user share settings (unless a disk is specifically excluded from participating in all user shares in Global Share Settings).

 

A cache-only share will write new files to cache, but that share's files won't be touched by mover even if they are on the array. You could write a script (and schedule it with User Scripts plugin) that would move anything not recently accessed from a cache-only user share to the same user share on the array (possibly using the mover script as a model). So, the user share would have recently accessed files on cache, with files not recently accessed on the array.

 

You could have a similar script that went the other way, moving recently accessed array files to the same share on cache. Not clear how you would trigger this though, cron or inotify would be after-the-fact of the files renewed access, so you might as well run it manually after accessing them again..

Share this post


Link to post

@trurl I wouldn't trust myself to write such a script.  That said, thank you for trying to find a work around.  I'm just suggesting that an ssd cache should work in the more traditional way, where file writes are cached then stored and recently read and heavily used files are stored in the cache for fast access. 

 

As others have suggested, trying to create a work around with separate folders and shares etc simply defeats  the purpose of the server.  I do not have the time nor the scripting prowess to confidently write such a script and trust it. 

 

I love the idea of UNRAID, but without such a feature it's going to remain a storage tank.

Share this post


Link to post

Worst case store all the data on your Cache you must have access to for speed and then copy that data to the array for protection. 

If need be create a simple script that copies your data from your cache to the array as you add it for protection. 

 

I know you don't want work arounds, but honestly your asking for something that isn't currently in unRAID and often we all have to use work arounds to achieve one off tasks.

Share this post


Link to post

Thank you for the suggestion but that is not possible. 

 

I’d have to reassign directory paths for literally thousands of files every time a project is moved back & forth or I have to copy the data to the project instead of referencing it thus creating TBs of redundant files.

Edited by blumpy

Share this post


Link to post

Historically, cache in unRAID was designed to make writes to user shares quicker, since there is a performance penalty for parity updates. SSDs weren't common at the time.

 

With the new features of VMs and dockers, and larger affordable SSDs, cache has also become the primary storage for these features for the increased performance SSDs provide.

 

Everyone has different needs and different ways they use unRAID. I recently installed a 1TB SSD in the PC my wife uses for photoshop. All photoshop work happens locally on that PC, and unRAID is only used as backup storage for that particular use. I don't even use unRAID cache when writing those backups.

 

I also have other things unRAID does well by itself using dockers, and my SSD cache pool comes into play there.

Share this post


Link to post

Without knowing your exact work flow its really hard to give you any direction or advice.

 

If I was using Media I guess I would Utilize 7200RPM drives and more than likely a 10GB connection. I'd also set my default drive spin downs to an extended amount in the hours vs my 1hour I have set now. 

Share this post


Link to post

@kizer thanks for you input.  I’ll give you the tl;dr. 

 

I have two main storage needs:

1) Sample libraries

2) project library 

 

A sample library is massive & needs maxed out ram and ultra fast storage. If I open a patch, let’s say a piano, the front of each note is loaded into ram. Some of these can be 200+mb in size each. When I play a note the rest of the sample is streamed from storage as you play. When doing 60+ layers of instrumentation/libraries you’re filling up 64+ ram. Add to that multiple workstations. So you if you pooled storage you need lots of space that’s ultra fast & cannot be moved because each library has a directory path. 

 

The projects folder will be streaming video & dozens if more a hundred uncompressed audio files. These files can sometimes point to/reference files from prior projects. So It’s best to keep all projects in the same pool wise you end you redefining hundreds of paths or maintains multiple copies of the same thing. 

 

What I need is a parity protected JBOD with NVME read cache.

 

40+ TB of SSD is a solution but it’s overkill since 90% of it will lay dormant. It’s also not my best solution. The best solution is an expandable 40+TB of HDD & 4TB of NVME with intelligent read cache/tier networked over 10GBE.

 

So that’s what I started building but then quickly discovered the cache was not what I expected.  It’s entirely my fault for not reading enough of the literature but I think an intelligent read cache-tier would be a very nice feature for people trying to work with media so that’s why I was suggesting it as a feature for future versions of unraid. 

 

I’m exploring solutions still. I think I’m just going to stick with local nvme storage & SATA ssd’s and just back up to the unraid system, but it generates tons of file redundancies I was hoping to avoid.  I am considering a 10gbe RAID 10 using pcie cards but it sure would be nice to have nvme cache for the low latency. To do an nvme tier I’d need to run Windows server for storage spaces direct or some Linux variant. I just can’t seem to figure out a way to do this with unraid as I’d planned. 

Share this post


Link to post
On 4/9/2018 at 1:54 PM, blumpy said:

@kizer thanks for you input.  I’ll give you the tl;dr. 

 

I have two main storage needs:

1) Sample libraries

2) project library 

 

A sample library is massive & needs maxed out ram and ultra fast storage. If I open a patch, let’s say a piano, the front of each note is loaded into ram. Some of these can be 200+mb in size each. When I play a note the rest of the sample is streamed from storage as you play. When doing 60+ layers of instrumentation/libraries you’re filling up 64+ ram. Add to that multiple workstations. So you if you pooled storage you need lots of space that’s ultra fast & cannot be moved because each library has a directory path. 

 

The projects folder will be streaming video & dozens if more a hundred uncompressed audio files. These files can sometimes point to/reference files from prior projects. So It’s best to keep all projects in the same pool wise you end you redefining hundreds of paths or maintains multiple copies of the same thing. 

 

What I need is a parity protected JBOD with NVME read cache.

 

40+ TB of SSD is a solution but it’s overkill since 90% of it will lay dormant. It’s also not my best solution. The best solution is an expandable 40+TB of HDD & 4TB of NVME with intelligent read cache/tier networked over 10GBE.

 

So that’s what I started building but then quickly discovered the cache was not what I expected.  It’s entirely my fault for not reading enough of the literature but I think an intelligent read cache-tier would be a very nice feature for people trying to work with media so that’s why I was suggesting it as a feature for future versions of unraid. 

 

I’m exploring solutions still. I think I’m just going to stick with local nvme storage & SATA ssd’s and just back up to the unraid system, but it generates tons of file redundancies I was hoping to avoid.  I am considering a 10gbe RAID 10 using pcie cards but it sure would be nice to have nvme cache for the low latency. To do an nvme tier I’d need to run Windows server for storage spaces direct or some Linux variant. I just can’t seem to figure out a way to do this with unraid as I’d planned. 

I also would LOVE this to be a thing!

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.