what i like to see in Unraid 7: read cache SSD


Recommended Posts

Writing to the array using an NVMe SSD as a cache drive is nice and serves well on LANs with 10Gbit/s and more.

But once Mover has done his job, read performance drops down to ridiculous drive speeds.

 

What I like to see is another Pool, designated as a read cache for all shares (or configurable, but it does not really matter).

 

* if a file is requested, it is checked first, if it is on cache already

 * if yes check if it still recent (size / time of last write and so on)

    * if recent (last cache write is younger than file creation time) reading continues from the cache drive (exit here)

    * if not, delete the file from the cache SSD (no exit, continue next step as if the file would not have been on cache at all)

* if no, the free space of the cache is checked to see if the requested file would fit

  * if no, but the cache COULD hold the file, the oldest file from the cache is deleted, check is redone (loop until enough space is freed up)

  * read the file from the array, write it to the LAN, but also write it to the cache drive and write the current time to the cache too

 

* if a file is closed:

  * if it came from the cache:

     * update "time of last write" on the cache

(this is to let it "bubble" up to prevent it from early deletion if space is needed. Often used files will this way stay on cache for a longer period whereas files that were only ask for once will be prefered to be cleaned up)

 

Fairly straightforward and simple approach. The last part could be optimized by reading ahead and writing asynchoronnally, but with the current speeds for LAN and SSDs, it does not matter, the SSD is in any cases faster than the LAN.

 

This would not speed up the first access to the file, but the second and more would greatly be improved. And if the designated Read Cache SSD is large (like 2Tb or more), a lot of files will fit until the first delete will be necessary.

 

This feature could be added to the high level of the vfs file system overlay from unraid,

 

(the cache disk itself is disposable, even if the content gets lost due to errors, it does not matter, it is just a copy and also needs no backup. So UNRAID should not look for shares or allow to create folders on that designated cache ssd)

 

Update: yeah, I know, it will make file reading a bit slower (because of the additional write to the read cache), but this is almost not measurable. Reading from real disks is about 290MB/s with best conditions, writing to SATA SSDs should be almost twice as fast and writing to NVMe SSDs will be five or even more times faster. So this really does not count in.

 

Update2: I would like to add 2 config settings for fine tuning:

a) minimum file size: files smaller than this are never put on cache (default 0)

b) maximum file size: files larger than this are never put on cache (default: not set or 100M or so)

 

Update3: additionally there could be an cron driven "garbage collection" to free files from cache that have not been accessed for a certain period of time (should be a piece of cake since the read/close updates the file time, it is always recent and a simple find /mnt/readcache -atime -XXX -exec ... is enough for cleaning up)

 

Edited by MAM59
  • Upvote 15
Link to comment
  • 2 weeks later...

I've had a similar though. Automatic file management similar to File Juggler, Belvidere, or Droppit.  Rather than moving all cached files at once, allow users to move cached files based off user determined rules. For example, move DVR recordings older than 7 days to the array or move a video file that's been accessed twice in the past few days back onto cache.  It's possible to do this with user scripts, but a UI to configure rules would me amazing. Unfortunately I've yet to find a program that runs on Linux that offers this.

Link to comment
8 hours ago, Tjlejeune said:

Rather than moving all cached files at onc

No, sorry, this is NOT what I like to see implemented.

I would not touch the current write cache / mover thing, I like to add another SSD purely used as a read cache. So there is no "moving" on or off this drive, no user intervention (maybe the whole drive is not even visible within the filesystem, but if then "read only" and not shareble).

Just a drive with the only purpose to be read fast.

 

Link to comment
  • 3 weeks later...

I didn't read the post but I fully agree.

 

As a matter of fact a personal data storage system with VM container and media capabilities, must anticipate the end to end life of data and workloads, and Storage QoS and power management dimensions.

 

How is the unRAID or Linux storage systems documented in terms of high level design logic ?

 

Where are the diagrams?

Edited by GRRRRRRR
Link to comment
  • 3 weeks later...

+1 - L2 ARC

I am new to Unraid and am surprised that 'cache' (now pools) aren't really 'cache' they are more like user defined write pools with user defined copy/move rules [hence the move from cache terminology to pool terminology].

 

ZFS will change much of this and enable us to have L2 ARC amongst other things.

Link to comment
  • 1 month later...

A read cache would be lovely, personally I'd like to at least see the following 2 scenarios:

  • Frequently accessed files.
    • Speeds up repeated reads.
  • The rest of a folder if more than X threshold of Megs was read from a file in said folder. (Allow us to select how many levels "up" it can cache)
    • While not too interesting for moving files across the network, it's pretty insane what this could do for example in Plex or other media server.

I'm somewhat surprised that this isn't a thing already, we have the write caching mechanism to keep the array idle as much as possible and speed up transfers - but as far as I'm aware there is no way to automatically move long(er) term storage (back) to the cache only when a file or folder is actually accessed. I'd much prefer to have reads more condensed at (season/[movie] series) folder level that I can actually spin the rust down knowing that it is unlikely that it'll need to be spun up again shortly. 

 

The bottom line is that I'd imagine a lot of people would like to prevent their array from being accessed as much as possible. Active disks draw a lot more power than an SSD does and most people want to minimize how often they spin disks up and down.

Edited by iD4NG3R
  • Upvote 1
Link to comment
1 hour ago, iD4NG3R said:

Frequently accessed files.

  • Speeds up repeated reads.

 

My request would automatically produce this situation. The more often you request a file, the longer it stays on cache.

(its a side effect of "update time of last access" and check against this instead of creation time. Or update both times according to last access, makes no difference)

 

Also your 2nd demand would be already covered by this. (it makes no difference if YOU request the file or your Plex does)

 

Edited by MAM59
Link to comment
14 minutes ago, MAM59 said:

Also your 2nd demand would be already covered by this.

That really depends whether it would (preemtively) cache the rest of the folder (possibly even X levels above that folder) or just that one accessed file. If it's the latter it would be useless for my own usecase. (But still a nice to have for a lot of people!)

 

The vast majority of my own files are only read once in a blue moon. However if a file is accessed, the rest of the folder is usually also read in a relatively short period after that. 

Edited by iD4NG3R
Link to comment
1 minute ago, iD4NG3R said:

That really depends whether it would (preemtively) cache

I did not ask for a preemtive cache, this would make things much complicated because it is merely just a guess (like the branch prediction within the cpu, it does not know what will happen, so it reads both parts.This is much too slow with Disks instead of RAM)

 

But notice, I also ask not for cache deletion after a timeout (if the user does not ask for it). So your "beloved" files can stay on cache as long as there is space and the file did not change.

 

Anyway, my request is old already, very old. And since then nobody from Limetech even commented on it... SNIFF.

 

Link to comment

Eh, wouldn't say that it makes things much more complicated. Simply ask the user (through settings) if they want to cache the entire folder or just single files. In regards to how old this request is, 3 months isn't that long right? Feature requests might make it through if enough people show interest! 😃

Link to comment
1 minute ago, iD4NG3R said:

Simply ask the user (through settings) if they want to cache the entire folder or just single files.

Technically it ain't that "Simply". For now they only needed to hook up the "open read" (to check the cache) and the "open write" functions of the OS.

What you want would require a recursive call of this function, this could easily lead to blockades and stalls (what happens if the 23the file of the folder has a read error? if we return it, it would flag the wrong file. If we surpress the error, its unseen).

Also I see no real approvement by prefetching files that might never be requested at all later on (if they are, they would be put on cache too).

 

7 minutes ago, iD4NG3R said:

In regards to how old this request is, 3 months isn't that long right? Feature requests might make it through if enough people show interest! 😃

Yeah I know, and I am not really try to push things. But for now I do not even know if somebody of them has read the request and is willing to take it into consideration. Thats a bit frustrating 😁

Link to comment
32 minutes ago, MAM59 said:

For now they only needed to hook up the "open read" (to check the cache) and the "open write" functions of the OS.

What you want would require a recursive call of this function, this could easily lead to blockades and stalls (what happens if the 23the file of the folder has a read error? if we return it, it would flag the wrong file. If we surpress the error, its unseen).

In the end that would only be relevant the first time a (sub)folder's content is copied over to the read cache, the system already knows what files it moved to cache and what files have been added/removed to/from the array that are (also) available on the read cache. You're going to need to deal with file/folder changes, so you're going to have to somehow keep track of what files are active on the read cache anyway. At that point it makes little difference if it's one, dozens, or even thousands of files in a particular (cached) folder. Leave it up to the user if they want it and at which thresholds/limits it should do that.

 

File was added to folder X > should files from folder X be on the read cache? if so > Copy it to cache too.

 

That would also be beneficial assuming we can use the existing write cache for read caching, since you wouldn't need to write new files to it twice, you'd just copy it over to the array while leaving the cached file where it is. 

 

32 minutes ago, MAM59 said:

Also I see no real approvement by prefetching files that might never be requested at all later on (if they are, they would be put on cache too).

Hence user setting. For media consumption it would be extremely beneficial to have the entire show loaded in cache instead of having to fetch individual episodes or movies from the array. Leaves the disks spun down or at the very least the heads in resting position as much as possible.

 

Anyway, probably a little too in-depth for a feature request. I'd personally be mostly interested in the ability to (preemtive) cache folders, but the caching of single files would already be a nice upgrade. 👍

Edited by iD4NG3R
Link to comment
  • 1 month later...

I think the most basic feature for data storage regarding read-cache is to have a warm cache. It's quite simple:

  • Whenever we seek for a file, try to check on cache, if it finds, update file last access. If cache is a miss and file is being read sequentially/completely, try start caching the file, if there is enough space in cache thats fine.
  • Also cache recently written files.
  • If you're caching a new file and cache is full, remove the least last accessed files until there is enough space.

It's a dumb cache, but the idea is simple. Recently used files will be used more often. And they will be cached.
As data gets colder, it will be pulled off cache.

 

And I think it is much more important than preemptive cache just because if you know the exact set of folders you want to cache, you can just make a share for it and configure it to store in a faster drive. 

  • Like 1
Link to comment
  • 4 months later...
On 4/3/2023 at 11:42 AM, iD4NG3R said:

it would be extremely beneficial to have the entire show loaded in cache instead of having to fetch individual episodes or movies from the array.

Cough! 🙂

Lemme check here... for instance, "Eisenbahn-Romantik"... 1035 Episodes, all with NFO and jpgs...  20Gigs in total...

You want all them to be loaded just if somebody wants to watch ONE episode?

Does not sound very healthy to me.

As I said: KEEP IT SIMPLE!

"my read cache" is just a copy, it does not matter if it is lost, it does not need to be monitored as long as space is free. Then you run a simple "garbage collection" and delete the oldest entry that yields enough space for the newly to store file.

No prefetching, no recurse.

BUT FAST!

 

Link to comment
  • 2 weeks later...

I have used unraid for a year now, running basic containers and file shares.
Initially I misunderstood unraid's cache mechanism (thinking it was both ways) and have been disappointed there is not a read cache option.

I have intermittently encountered download issues and slow play stutter which I attribute to the slower HDD spinning up. 

 

Seems an oversight that such a basic feature as a read cache is missing.
I have an SSD installed in my system that I would immediately change to a read cache if available today.  

Link to comment
  • 1 month later...
  • 3 weeks later...

I also thought Unraid cache was for read purposes. But I see now it’s only write after all these years using Unraid. Any chance to have read cache in the future? I use unraid to have retro gaming roms and it would help a lot to have most played games always on the ssd/nvme cache. 

Edited by Bruno
Link to comment
  • 2 months later...

Would love to see this as well. Use case is to reduce power consumption privately running unraid servers.

A read cache on SSD base would allow to spin down the array during phases where no client is using vm's, docker containers or files shares. This will reduce power consumption, especially with when many drives are configured. Power consumption in my configuration:  Array of 6 drives, 8TB total and write cache (SSD).  Expected power reduction approx: 10-20 Watt, depending on the distribution of the files within the array.

 

Already evaluated howto copy frequently read files from within the array to a pool device, but gave up since the unraid OS mounts the drive array with noatime flag ... ;-( so no chance for "find -atime" ...)

 

 

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.