Native Cache_Dirs Support

jonp · August 1, 2014

Self-explanatory.

jonp · August 14, 2014

I think this should be considered for prioritization for 6.0's release. In all my testing with 6.0 without this feature, it's very annoying to not have it especially in larger arrays where data is spread across a variety of drives. Not sure how much effort it would take to make this happen but I would like us to consider it...

NAS · August 14, 2014

Ideally I would agree. There is no scenario where I would not run cache_dirs as the perceived real world speed improvements for my use case are nothing short of vast.

This is doubly true with 64bit.

garycase · August 14, 2014

Agree this would be a nice feature to have "baked in" to the core.

One suggestion ...

If this is a native feature, it'd be nice if it was automatically toggled off during parity checks. [Or, alternatively, if it had a fixed memory buffer which, once filled, wasn't impacted by other system operations].

The filling of the cache has a notable impact on other system operations. Simple way to show that:

=> With cache_dirs enabled; boot your system and immediately start a parity check. Refresh the screen once/minute and watch the speed of the check for a while. Then stop the check.

=> Disable cache_dirs from your go script; then reboot the system and, again, immediately start a parity check. Refresh the screen once/minute and watch the speed of the check now.

The difference is significant. Once the cache is filled (5-30 minutes, depending on the number of files you have), the parity check runs at normal speeds. But if you're still using the system, this can change and the slowdown will repeat itself numerous times during the check.

Alternatively, a simple checkbox on the Web GUI that enabled/disabled cache_dirs would be nice ... and would let anyone who wanted to disable it during other disk-intensive operations (parity checks; disk rebuilds; etc.).

limetech · August 15, 2014

If this is a native feature, it'd be nice if it was automatically toggled off during parity checks. [Or, alternatively, if it had a fixed memory buffer which, once filled, wasn't impacted by other system operations].

During parity sync all drives are spun up (at least in beginning) so don't need cache-dirs to run. Probably I would tie it to spindown: just prior to spinning a drive down, if drive is marked for "cache-dir" it add it to an 'active cache-dir'ing' list and fire off a scan, and then spin drive down after scan completes.

garycase · August 15, 2014

That would work nicely ... and would also automatically resolve this for any other functions that may be keeping the disk's very busy that cache_dirs may interfere with.

NAS · August 15, 2014

I dont get it. All the testing in the past showed that for cache_dirs to work it needs to continually perform the recursive ls regardless of drive spin state. Tieing it to spin down, due to the kernels natural tendency to drop caches, will just result in cache_dirs causing the drives to spin up again.

This is doubly true during very busy operations which typically are causing more caches to be dropped. this is not to say you should not not stop cache_dirs during certain operations but you cant link this action with spin state.

cache_dirs is a very very clever kludge fix for the inherent problem that the kernel consider inode entrys as being cheap and disposable. It should be possible to tune the kernel to not drop these entry's but no amount of fettling seems to do it and can result in unpredictable behavior (cache pressure etc).

The perfect solution would be a semi permanent inode cache where the directory listing, which is actually very small amounts of data, could be stored in both ram and disk making it non volatile rather than cache_dirs tricking the kernel into keeping it on ram.

Update:

Whilst this is getting some attention, assuming we cant find some magic kernel fix, the current cache_dirs implementation lacks in a couple of areas:

1. Visibility of RAM usage and inode count vs dir . 99% of your cache is from this one folder 3 deep which you might not care about so much

2. Setting folder depth limits. Yes i want to cache one folder deep in them all so that an errant click doesnt cause disk spin up but i dont want to go beyond 2 deep on folder Y. I believe this ties in with the use of "find" in cache_dirs it just need better control.

3. Webui config (obviously) and control.

garycase · August 15, 2014

Actually, the idea of a dedicated, permanent memory area for the directory cache would be even better ... although clearly this could only be done on systems with sufficient RAM (which most new systems will have).

I'd be quite happy with a simple Cache_Dirs on/off toggle on the Web GUI. The only operations I've personally found that it notably impacts are parity checks and drive rebuilds. I suppose some plugins could also be negatively impacted, but I don't think it'd be near as bad as those two functions.

limetech · August 15, 2014

I dont get it. All the testing in the past showed that for cache_dirs to work it needs to continually perform the recursive ls regardless of drive spin state. Tieing it to spin down, due to the kernels natural tendency to drop caches, will just result in cache_dirs causing the drives to spin up again.

If drive spinning => no scanning that drive

If drive about to be spun down by emhttp => first scan (might take a while), then spin down

If drive not spinning => scan periodically

If drive spun up => quit scanning this drive

The perfect solution would be a semi permanent inode cache where the directory listing, which is actually very small amounts of data, could be stored in both ram and disk making it non volatile rather than cache_dirs tricking the kernel into keeping it on ram.

Unfortunately this is not possible without LOTS of kernel-level programming, and wouldn't stand a chance of getting merged into upstream.

Also it's no good to just store directory listings.. need all the stuff in the inode too: permissions, type, etc.

NAS · August 15, 2014

I am too lazy to quote all this properly...

If drive spinning => no scanning that drive ... i know why this seems logical and perhaps with SSD it will be fine but with traditional spinners even when a drive is spun up cache_dirs improves the virtual responsiveness for browsing and searching drives (significantly).

If drive about to be spun down by emhttp => first scan (might take a while), then spin down ... sensible

If drive not spinning => scan periodically ... aka cache_dirs

If drive spun up => quit scanning this drive ... aka drop the cache and rebuild it on demand. This is the crux of this change in approach and the bit that doesnt make sense to me.

So the drive is spun down and and you are maintaining a cache for it. Then it spins up and you allow the cache to expire. Just before the drive spins down you recreate the cache. That seems sensible but in reality all you gain is a reduction in ram usage for the spin up time. Before the spin up and just before the spin down you use that ram up again and it costs to a bunch of disk IO to get it back. In fact net disk IO is increased using this approach all to save a bit of RAM for a short period of time.

Is this worth it?

Unfortunately this is not possible without LOTS of kernel-level programming....

I assume as much. But perhaps we can do a poor mans version of this with no kernel hacking. If we had a sacrificial SSD drive for page file and tuned the kernel tunables realting to inodes we may be able to achieve a semi permanent cache.

limetech · August 15, 2014

Most pressure on RAM is going to be when drive is spun up because we're doing transfers with it (or else it wouldn't be spun up). Why not let linux go ahead and claim those inode pages if they age off the LRU list?

Is it worth it? If memory pressure gets high enough those inode pages will get ejected anyway and now you have even more pressure because cache-dirs will be trying to bring those inodes back into memory.

The argument against it would be:

- having the inodes cached makes for a snapier experience - huh well if you say so, but seems like there's "snappy" and "snappy enough" - anyway this would be configurable so if you really wanted to maintain caching while spun up then sure

- might have a further delay before drive gets spun down - well spin down inactivity is measured in hours, a few more seconds at most to scan the dirs seems meaningless

Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.

NAS · August 15, 2014

- having the inodes cached makes for a snapier experience ...

...a few more seconds at most to scan the dirs seems meaningless

Here is a rather unscientific example of a folder taken out of cache dirs but spun up to make a point...

Edit: I got bored waiting. I took a folder i know has a ridculous amount of files out of cache dirs and then done a ls -R on it. Normally this is a few seconds but out of cache dirs its been going now for 5+ minutes. This is neither just a bit snappoer or a few meanigles sseconds its big slow kludge numbers.

Update: it completed

Out of cache dirs

time ls -R /mnt/user/comics/

real 11m23.293s

user 0m1.200s

sys 0m2.020s

In cache dirs

real 0m1.920s

user 0m0.340s

sys 0m0.270s

360 times faster just a bit of a difference

limetech · August 15, 2014

Huh, well I stand corrected. Thanks for the experiment.

BTW that's a helluvalot of comics

garycase · August 16, 2014

I think the simplest "solution" is a simple On/Off switch for Cache-Dirs in the Web GUI

I suspect most of us would leave it on all the time except when we wanted to do a parity check or drive rebuild.

NAS · August 16, 2014

The question really should be why does a parity check or drive rebuild clear out the caches that causes cache_dirs to kick in to real disk reads again?

Do they have to?

WeeboTech · August 16, 2014

parity check AND the mover. Joe's version also monitors the mover.

At one time I tried increasing the dentry queue size, it helped, but since I had so many files, it would cause the system to crash with an OOM.

Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.

When I looked I saw they did, just not very obviously.

In my early tests I set it so they were last to be ejected.

It helped with directory sweeps, but again, I had so many files, I had all kinds of OOM crashes.

I think with 64bit this will be less of a problem.

FWIW, here is my test on one of my mp3 disks.

I should re-iterate only one of them. I have many of them.

Plus I have tons of source code files I've collected over the years.

You can quickly see how I had millions and millions of files.

root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt

real    32m46.917s
user    0m4.030s
sys     0m33.110s

2nd test immediately after.
root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt

real    3m15.613s
user    0m0.890s
sys     0m5.520s

root@unRAID:/mnt/disk3# wc -l /mnt/disk3/filelist.txt
307013 /mnt/disk3/filelist.txt

I'm hoping the move to 64bit will be better for me.

NAS · August 16, 2014

If nothing else these two sets of stats show that when people say cache dirs makes a vast real world usability difference they really mean it.

Has anyone tried this on the 64bit kernel with a page file and the kernel tunables yet?

WeeboTech · August 16, 2014

A page file will not help, the structures are memory resident kernel structures and do not get swapped out.

Instead they get dropped when there is memory pressure.

NAS · August 16, 2014

I did not know that however surely a swap file would reduce the risk of the entrys needing to be dropped resulting in the net same result albeit from a less direct/elegant angle?

garycase · August 16, 2014

parity check AND the mover. Joe's version also monitors the mover.

That's surely the best approach -- if Cache_Dirs monitors parity checks, the Mover, and drive rebuilds, then that's most of the activities that it could interfere with.

Note that Tom's thoughts r.e. automatically throttling parity checks based on other system activity could also impact this. With Cache_Dirs "built in" as a core feature, any throttling can be coordinated between the various activities. The reality is it's in the "no big deal" category for most things -- i.e. if a parity check takes an extra 20-30 minutes due to Cache_Dir activity, I don't really care. But if I'm watching a video, and it starts to stutter due to other system activity (e.g. an automated parity check kicks off), THAT is much more frustrating. [Although I no longer use scheduled parity checks -- I just start one myself at the beginning of each month, so there's never any interference with other activity.]

Note that Cache_Dir activity can also cause a bit of stuttering, if you happen to be streaming a movie from a particular disk when it starts buffering the directory info from that disk (easy to force this to show it; but not a very likely scenario).

Not sure just how automated this all needs to be -- I think a simple On/Off for Cache_Dirs would be fine; but I DO like the idea of automatic throttling of the parity checks and rebuilds based on other system activity. [but that's off-topic for this thread]

WeeboTech · August 16, 2014

I did not know that however surely a swap file would reduce the risk of the entrys needing to be dropped resulting in the net same result albeit from a less direct/elegant angle?

If this was an efficient way to do what you are trying to do, the kernel developers would have done it.

Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?

What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.

This cached/mmap file could contain all the stat information for all visited files.

The downside is that you would be duplicating what the kernel does.

The upside is that you can keep the information longer outside of real ram requirements and connect some kind of inotify so that when files are opened/closed/added/removed the mmap cache for that directory is updated.

If the device being reviewed is spun down, use the data in the stat cache rather then what is actually on the disk.

Allot more work for the fuse layer.

The data can be cached in a mmap file or a .gdbm file. I'm testing how long it takes to store all the stat blocks in a gdbm file now.

WeeboTech · August 16, 2014

My data must already be cached. It should have taken much much longer.

time /mnt/disk1/home/rcotrone/src.slacky/ftwcache/ftwstatcache /tmp/disk3.gdbm /mnt/disk3

files processed: 324600, stores: 298901, duplicates: 0, errors: 0

fetched 0 records, deleted 0 records, stored 298965 records

real 2m22.192s

user 0m2.830s

sys 0m8.660s

root@unRAID:~# ls -l /tmp/disk3.gdbm

-rw-rw-r-- 1 root root 50897023 2014-08-16 18:24 /tmp/disk3.gdbm

root@unRAID:~# ls -l --si /tmp/disk3.gdbm

-rw-rw-r-- 1 root root 51M 2014-08-16 18:24 /tmp/disk3.gdbm

The gdbm contains the key of the full path and the data of struct stat[];

WeeboTech · August 17, 2014

I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.

fetched 0 records, deleted 0 records, stored 1151935 records

real 71m26.353s

user 0m17.040s

sys 2m4.010s

root@unRAID:~# ls -l --si /tmp/statcache.gdbm

-rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm

The issue with this approach is scanning through all the keys to find a match can take time as the number of files increase.

Here's an example.

using a bash loadable library, I'm able to access the gdbm file at the bash level directly.

A single key look up is pretty fast.

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# enable -f ./bash/bash-4.1/examples/loadables/gdbm gdbm

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# gdbm

gdbm: usage: gdbm [-euikvr] [-KVW array] file [key | key value ...]

time gdbm /tmp/statcache.gdbm /mnt/disk3/Music/music.mp3/Jazz/Various\ Artists/The\ Art\ Of\ Electro\ Swing/01\ Tape\ Five\ -\ Madame\ Coquette\ \(Feat.\ Yuliet\ Topaz\).mp3

real 0m0.003s

user 0m0.010s

sys 0m0.000s

Yet traversing all the keys of 1 million files takes time.

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | grep 'Jazzy Lounge - The Electro Swing Session' | wc -l

56

real 0m11.225s

user 0m14.280s

sys 0m2.290s

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | wc -l

1,151,935

real 0m7.236s

user 0m3.930s

sys 0m8.350s

This might be faster with an sqlite table or mmap'ed file.

Point is, there's quite a bit that would have to go into this to cache the stat data outside of the kernel.

So kernel patches may be better then doing this at an application level. I don't know for sure.

When the next uNRAID release is available with SQLite compiled into PHP we can build a table with filenames and some stat data for a browser based locate function. We'll also be able to store md5's in there too.

NAS · August 17, 2014

If this was an efficient way to do what you are trying to do, the kernel developers would have done it.

I think the issue is that what we want to do is inherently very inefficient and so specific a use case that the kernel developers would now do it. Note this is not the same as couldn't do it.

The fact that cache_dirs works at all shows it can be done to some extent. Just how do we scale it whilst remaining reliable.

Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?

Million dollar quesion although with SSD/ram drives we could possibly skew the results to our favour.

What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.

I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.

fetched 0 records, deleted 0 records, stored 1151935 records

real 71m26.353s

user 0m17.040s

sys 2m4.010s

root@unRAID:~# ls -l --si /tmp/statcache.gdbm

-rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm

...

This is a clever approach obviously and what is especially interesting is the statcache file you create is relatively small considering there could be more efficient storage mechanism.

Even as is 7M records would be about 1GB of ram and I am pretty sure we could better that.

However what I dont want to end up happening is we pre-design this so much we end up not getting anything. Even cache_dirs added as is with a few tweaks to showh better feed back about memory usage, folder inode usage and potential cache_dirs causing spin up would be a big step.

Maybe split this into phase1 and phase 2 anbd we can play about experimeting for phase 2 stuff whilst the path to phase 1 would be quite clear and straight forward.

WeeboTech · August 17, 2014

This is a clever approach obviously and what is especially interesting is the statcache file you create is relatively small considering there could be more efficient storage mechanism

This was pretty much an academic exercise in feasibility.

According to my numbers, when extracted the keys (full path filenames) are 121MB.

Calculating the stat struct size and number of records I get 165MB.

The statcache.gdbm file is 179MB. so somewhere the keys are being compressed into hashes.

Therefore it's pretty efficient.

While the 179MB seems relatively small, we have to consider that if something like this is on disk, when it is read, it is read into the buffer cache. Therefore this could feasibly take up twice the amount of ram.

Would I give up 512MB to cache the stat information on my array, I sure would.

In comparison a mmap file access the filesystem as if it were real ream. I.E. The array is stored as a file.

This means traversing million entries in an array requires reading the file from somewhere. SSD, tmpfs/rootfs, etc, etc.

Which brings some kind of time delay.

Again, since it's a file, it will need to utilize disk space and ram in buffer cache.

So in comparison, it may be better to review the kernel code, expand the dentry or other hash tables and see how to preserve these entries in ram better.

Native Cache_Dirs Support

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation