Directory Cache

May 9, 200917 yr

Yes, 1.4 is broken...as reported in the earlier post I made.

try 1.4.1, or edit 1.4 yourself to fix the comment at the top.

That is the one I'm using. In the comments at the top the latest version listed is 1.4.

May 9, 200917 yr

Yes, 1.4 is broken...as reported in the earlier post I made.

try 1.4.1, or edit 1.4 yourself to fix the comment at the top.

That is the one I'm using. In the comments at the top the latest version listed is 1.4.

I see... ok.

May 9, 200917 yr

Joe, I don't seem to be able to get this working properly. I edited the 'go' script using the PATH method and after a reboot I get the message "cache_dirs is already running" when trying to execute from the prompt. Sounds good, but the drives still spin up while browsing the shares.

Using -F -v shows the same list every 5 seconds.

Executing find /mnt/disk1/Audio

Executing find /mnt/disk2/Audio

Executing find /mnt/disk3/Audio

Executing find /mnt/disk6/Backup

Executing find /mnt/disk6/Game

Executing find /mnt/disk6/Prog

Executing find /mnt/disk1/Video

Executing find /mnt/disk2/Video

Executing find /mnt/disk3/Video

Executing find /mnt/disk4/Video

Executing find /mnt/disk5/Video

Executing find /mnt/disk6/Video

Executed find in 0.224566 seconds, avg=0.228831 seconds, now sleeping 5 seconds

Drives will spin up depending on what you do while browsing..

If all the drives are spun-down, and you type on the command line

find /mnt

and everything is in cache, then nothing will spin up. If it all does not fit in cache, some might spin up.

Are you accessing through user-shares? Did you try using the new "-u" option? It might help to also cache /mnt/user

Joe L.

May 9, 200917 yr

If all the drives are spun-down, and you type on the command line

find /mnt

and everything is in cache, then nothing will spin up. If it all does not fit in cache, some might spin up.

Are you accessing through user-shares? Did you try using the new "-u" option? It might help to also cache /mnt/user

I haven't tried the -u option yet, but that might be part of the problem. I did a find /mnt and drives immediately started spinning up.

How do I know how much RAM is being used for the cache?

What does the 'maxdepth=9999' value refer to?

Thanks Joe.

May 9, 200917 yr

I have a mixed report. For the most part, it is working well, as in some of my tests, I got exactly the right behavior!!! So I think the main functionality is correct, and we have moved into the 'tweaking' phase. The part that is not correct yet is the handling of the timing, the period (num_seconds), the min and max, and the running average.

There are several problems in the way it works, as well as an unexpected issue to handle. I wasn't watching the first time it happened, so I don't know how long the first extraordinarily long elapsed time was, but I saw the second time it occurred in a later test, when a scan took over 20 seconds! Mine normally now take from .020 seconds to .030 seconds if system under heavy traffic. That is about a thousand times too long, and completely skewed the average! Averages are usually of items that are in the same ballpark, and this one was a hundred ballparks away. Perhaps a test is needed to limit skewed values:

[ $elapsed_time -gt  $(($avg_elapsed_time*2)) ] && elapsed_time=$(($avg_elapsed_time*2))

At the time, I as trying to get the period to stay around 4 or 5, so since min was 1, I set max to 8. When the avg went sky high, then the period rapidly increased to the max of 8, which is too long, and most of my drives spun up. At this time, I have no idea what hung my system for 20 seconds! Drive spin ups must have been a part of it, but that does not fully explain it.

When the above was not happening, I found that the period would often decrease to about 2 seconds, and stick there, which I consider too fast. After playing with it some, I decided that a fixed period works better for me, and the way to do that normally would be to set the min equal to the max, but there is a small issue with that. Current the min and max validation uses .le, when it should probably use .lt. I now use "-m 4 -M 5", which starts it at 4 and lets it drift up to 5. It won't (currently) let me set the value to 5, with no possibility of going higher. Using fixed values, I can begin testing it to see how high I can take the value, before drives spin up, and currently I'm favoring a value of 5.

The avg_counter seems to either be a vestige of an earlier idea, now retired, or the limit test on it should compare with 10, not 100. Perhaps this would be better:

[$avg_counter -lt 10 ] && avg_counter=$(($avg_counter+1))

and drop the later limit test.

I have to be honest and say, I really don't think that caching /mnt/user is a good idea. It is already in memory, so caching it too is redundant, will make the scan take twice as long, and take up twice the caching space, which may actually work against you. If drives are still spinning up, then something is not being cached yet, and you should work to determine what and why.

May 9, 200917 yr

If all the drives are spun-down, and you type on the command line

find /mnt

and everything is in cache, then nothing will spin up. If it all does not fit in cache, some might spin up.

Are you accessing through user-shares? Did you try using the new "-u" option? It might help to also cache /mnt/user

I haven't tried the -u option yet, but that might be part of the problem. I did a find /mnt and drives immediately started spinning up.

We honestly don't know either. The -u option is an experiment to see if it has any effect. It might make things worse in some cases it there is not enough memory available to cache everything inde /mnt/disk* then adding /mnt/user/* will only make things worse, as it will displace other entries.

How do I know how much RAM is being used for the cache?

The "free" command gives a summary.

What does the 'maxdepth=9999' value refer to?

It is a placeholder value... It simply is used to know that the -maxdepth argument to find is not needed.

The whole reason for maxdepth is to limit what is cached so you don't run need to spin up a disk if you are only browsing a top level directory on a share. Most people only have directory hierarchies a few levels deep.

Joe L.

May 10, 200917 yr

Thank you for the fixes and improvements Joe L.

As a feedback:

In the morning, after a full night inactivity (except mover), in the morning all the disks were spun down.

I started to try browse on them as a test and I entered into random dirs. At some point the disk was spun up.

Browsing further on disks still in spun down was OK, but at some point those disks was also spun up.

So it seems, that some dirs are cached, some are not, although I haven't tried -u yet. I will during the day.

May 10, 200917 yr

Thank you for the fixes and improvements Joe L.

As a feedback:

In the morning, after a full night inactivity (except mover), in the morning all the disks were spun down.

I started to try browse on them as a test and I entered into random dirs. At some point the disk was spun up.

Browsing further on disks still in spun down was OK, but at some point those disks was also spun up.

So it seems, that some dirs are cached, some are not, although I haven't tried -u yet. I will during the day.

When you were browsing, were you just getting directory listings, or opening the files. (If using windows/explorer, was it showing you thumbnails? or icons from the files? If so, it is accessing data in the files, not just the directory contents. That might help explain what is happening.)

The cache_dirs is just accessing the directory and file names ... not the file contents. (and by accessing them frequently, they end up most recently used in the buffer cache)

How much RAM do you have in your server? How many directories/files?

to get the number of directories, and files, type:

find /mnt/disk* -type d -print| wc -l

find /mnt/disk* -type f -print| wc -l

I think a lot depends on what your file-explorer is doing.

Joe L.

May 10, 200917 yr

Posting here just to say thanks, Joe!

I give you the lime-of-the-month award....

May 10, 200917 yr

Wow! Congratulations Joe! That is obviously a huge award! (it almost filled my screen)

May 10, 200917 yr

Thank you for the fixes and improvements Joe L.

As a feedback:

In the morning, after a full night inactivity (except mover), in the morning all the disks were spun down.

I started to try browse on them as a test and I entered into random dirs. At some point the disk was spun up.

Browsing further on disks still in spun down was OK, but at some point those disks was also spun up.

So it seems, that some dirs are cached, some are not, although I haven't tried -u yet. I will during the day.

When you were browsing, were you just getting directory listings, or opening the files. (If using windows/explorer, was it showing you thumbnails? or icons from the files? If so, it is accessing data in the files, not just the directory contents. That might help explain what is happening.)
The cache_dirs is just accessing the directory and file names ... not the file contents. (and by accessing them frequently, they end up most recently used in the buffer cache)

How much RAM do you have in your server? How many directories/files?

to get the number of directories, and files, type:

find /mnt/disk* -type d -print| wc -l

find /mnt/disk* -type f -print| wc -l

I think a lot depends on what your file-explorer is doing.

Joe L.

I am browsing in Total Commander. There are for sure no access on file contents.

RAM: 1GB

Number of directories: 5823 (never thought it is that much)

Number of files: 59039

Is that too much, so I should try to exclude more path?

May 10, 200917 yr

try excluding as a test to see if RAM is limiting factor.

also try without a 3rd party file browser. you never know what oddness these things do sometimes regardless how cool and established they are

all i can think off this now Joe will likely have more ideas

May 10, 200917 yr

I've done some tests with and without the "-u" option that also scans /mnt/user in addition to /mnt/data*

Here is how it affects the timing on my server:

[pre]^Croot@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3

Executed find in 0.240709 seconds, avg=0.240709 seconds, now sleeping 5 seconds

Executed find in 0.235200 seconds, avg=0.237954 seconds, now sleeping 5 seconds

Executed find in 0.233932 seconds, avg=0.236614 seconds, now sleeping 5 seconds

Executed find in 0.234060 seconds, avg=0.235975 seconds, now sleeping 5 seconds

Executed find in 0.233304 seconds, avg=0.235441 seconds, now sleeping 5 seconds

Executed find in 0.233832 seconds, avg=0.235173 seconds, now sleeping 5 seconds

Executed find in 0.232930 seconds, avg=0.234853 seconds, now sleeping 5 seconds

Executed find in 0.233634 seconds, avg=0.234700 seconds, now sleeping 5 seconds

^Croot@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3 -u

Executed find in 3.973781 seconds, avg=3.973781 seconds, now sleeping 5 seconds

Executed find in 3.984020 seconds, avg=3.978901 seconds, now sleeping 5 seconds

Executed find in 3.980527 seconds, avg=3.979443 seconds, now sleeping 5 seconds

Executed find in 3.980490 seconds, avg=3.979705 seconds, now sleeping 5 seconds

Executed find in 3.988374 seconds, avg=3.981438 seconds, now sleeping 5 seconds

Executed find in 3.977725 seconds, avg=3.980820 seconds, now sleeping 5 seconds

Executed find in 3.971874 seconds, avg=3.979542 seconds, now sleeping 5 seconds

^Croot@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3

Executed find in 0.240431 seconds, avg=0.240431 seconds, now sleeping 5 seconds

Executed find in 0.234446 seconds, avg=0.237438 seconds, now sleeping 5 seconds

Executed find in 0.232792 seconds, avg=0.235890 seconds, now sleeping 5 seconds

Executed find in 0.231936 seconds, avg=0.234901 seconds, now sleeping 5 seconds

Executed find in 0.232741 seconds, avg=0.234469 seconds, now sleeping 5 seconds

Executed find in 0.231724 seconds, avg=0.234012 seconds, now sleeping 5 seconds

Executed find in 0.231217 seconds, avg=0.233612 seconds, now sleeping 5 seconds

^Croot@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3 -u

Executed find in 3.978377 seconds, avg=3.978377 seconds, now sleeping 5 seconds

Executed find in 3.981181 seconds, avg=3.979779 seconds, now sleeping 5 seconds

Executed find in 3.976999 seconds, avg=3.978852 seconds, now sleeping 5 seconds

Executed find in 3.982898 seconds, avg=3.979864 seconds, now sleeping 5 seconds

Executed find in 3.987037 seconds, avg=3.981298 seconds, now sleeping 5 seconds

^Croot@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3

Executed find in 0.236724 seconds, avg=0.236724 seconds, now sleeping 5 seconds

Executed find in 0.232525 seconds, avg=0.234624 seconds, now sleeping 5 seconds

Executed find in 0.231816 seconds, avg=0.233688 seconds, now sleeping 5 seconds

Executed find in 0.232902 seconds, avg=0.233492 seconds, now sleeping 5 seconds

Executed find in 0.238305 seconds, avg=0.234454 seconds, now sleeping 5 seconds

root@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3 -u

Executed find in 3.988509 seconds, avg=3.988509 seconds, now sleeping 5 seconds

Executed find in 3.979166 seconds, avg=3.983837 seconds, now sleeping 5 seconds

Executed find in 3.993271 seconds, avg=3.986982 seconds, now sleeping 5 seconds

Executed find in 3.975888 seconds, avg=3.984208 seconds, now sleeping 5 seconds

^CExecuted find in 2.066407 seconds, avg=3.600648 seconds, now sleeping 6 seconds

root@Tower:/boot/custom/bin# cache_dirs -e data -F -d 3

Executed find in 0.233074 seconds, avg=0.233074 seconds, now sleeping 5 seconds

Executed find in 0.233520 seconds, avg=0.233297 seconds, now sleeping 5 seconds

Executed find in 0.232736 seconds, avg=0.233110 seconds, now sleeping 5 seconds

Executed find in 0.231806 seconds, avg=0.232784 seconds, now sleeping 5 seconds

Executed find in 0.231712 seconds, avg=0.232570 seconds, now sleeping 5 seconds

^Croot@Tower:/boot/custom/bin#

[/pre]

I see a slight difference in efficiency... .23 seconds, vs. 3.9 seconds when adding in the user share file system.

Also, I'm pretty sure adding in the user file-system results in my system going out to the disk instead of getting everything from cache. (But then, I only have 512Meg of RAM, so your test results might be different. I'm willing to bet .23 seconds is a result of everything being in the cache... and 3.9 a result of having to go to the physical disk.

Clearly more testing is necessary.

Joe L.

May 14, 200917 yr

Hey Joe, excellent work. I too see a large difference when caching the shares. The increase is about 12 fold when caching only movie directories, and about 55 times longer when I include my directory and file intensive 85GB music collection. I do see occasional hiccups when browsing from Windows, even when never touching a file, but in general the responsiveness is greatly improved.

I'm wondering if it would be possible to output the find results to a file (but just once), instead of /dev/null. I think it would be both interesting to see exactly what is being cached, and informative. The file size might give some insight into how much memory we are consuming with the cached dirs, and if we are overflowing our available cache. Or perhaps I'm mistaken in thinking what is output by the find command is the same as what is getting stored in the server's cache.

May 14, 200917 yr

...what is output by the find command is the same as what is getting stored in the server's cache.

The find command prints the name of the directory entry.

What is cached is the data about the directory entry & stat information.

This has information such as name, size, modification date, permissions and other information.

As far as how much data is consumed by the cache, it is small considering how much data is buffered from the read of a DVD.

If you really want to know how much is consumed by your system take a look at this thread.

http://lime-technology.com/forum/index.php?topic=3616.0

or do

echo 3 > /proc/sys/vm/drop_caches

top -bn1 | head

take note of the memory used.

Then run the script, do a

top -bn1 | head

Compare results.

May 14, 200917 yr

Hey Joe, excellent work. I too see a large difference when caching the shares. The increase is about 12 fold when caching only movie directories, and about 55 times longer when I include my directory and file intensive 85GB music collection. I do see occasional hiccups when browsing from Windows, even when never touching a file, but in general the responsiveness is greatly improved.

I'm wondering if it would be possible to output the find results to a file (but just once), instead of /dev/null. I think it would be both interesting to see exactly what is being cached, and informative. The file size might give some insight into how much memory we are consuming with the cached dirs, and if we are overflowing our available cache. Or perhaps I'm mistaken in thinking what is output by the find command is the same as what is getting stored in the server's cache.

You are mistaken. The two are related, but not in a way where you could use them to see the size.

A block on the disk is probably 512 bytes.. however, the block on the disk might only hold a single entry for a single file, or the 512 bytes might describe a whole set of files. I'm not sure of the block size in the disk buffer cache. It might be 1k, or 2k, or 4k.

Best way to see the effect is to reboot the server, then type

free

then run the script and type

free

once more.

Edit; or use the "top" command as described in the prior post. (You can clear the caches as he described instead of rebooting, it would have the same effect)

Joe L.

May 14, 200917 yr

Thanks for the howto guys. I calculated my memory usage from the cache_dirs script, and I got interesting results:

9,440k mem usage without -u

6,884k buffer usage without -u

112,948k mem usage with -u

83,112k buffer usage with -u

I kinda expected a doubling of memory usage when turning on the share caching, as to me it would be caching the same data twice. So not only does the -u share caching option greatly increase processing time, it also greatly increases memory usage. I understand that the memory usage here is nothing compared to reading a DVD, but when you factor in the longer processing times and the increased data set size, using the -u option becomes exponentially harder for the server to keep the directories cached than without the shares.

May 18, 200917 yr

I've read through all 10 pages, and may have missed something here, In general if I do:

"cache_dirs -F -d 5", I can see it caching my data, with typical sleep times of 2sec

If I do:

"cache_dirs -F -d 5 -u", I can see it caching my data, with typical sleep times of 10sec

Since I have defined user shares, and access my data through Windows boxes via these shares only (disk* drives are hidden), do I need the -u in order for the caching to be successful/useful? Initially I thought (before you added the -u) that this script would cache everything including my user shares, but if what I'm reading is correct then I was mistaken, and I would have seen no benefit, correct?

May 18, 200917 yr

I've read through all 10 pages, and may have missed something here, In general if I do:

"cache_dirs -F -d 5", I can see it caching my data, with typical sleep times of 2sec

If I do:

"cache_dirs -F -d 5 -u", I can see it caching my data, with typical sleep times of 10sec

Since I have defined user shares, and access my data through Windows boxes via these shares only (disk* drives are hidden), do I need the -u in order for the caching to be successful/useful? Initially I thought (before you added the -u) that this script would cache everything including my user shares, but if what I'm reading is correct then I was mistaken, and I would have seen no benefit, correct?

The reason I put the options is so you could experiment.

It is reasonably considered that the "user-share" file system is ALREADY in memory, and probably cannot be swapped out (I'm just guessing here). All it consists of is pointers to the actual disk file-systems. In effect, symbolic links that look like the files themselves but point to the physical disk file-systems.

The whole purpose of keeping the directory entries in cache is to eliminate the need to spin up a disk just to see what is in the directory. Since the user-shares just point to the disk blocks anyway, and since user0shares are already in memory, most consider it a waste of time to attempt to cache those disk blocks as they would not need to spin up a disk to be read.

You can experiment as you please. The "sleep" time is not so important as the average directory scan time. As long as disks blocks are not displaced between directory scans, you should be fine. In other words, if between scans you managed to use every disk buffer available and that the least frequently used block is again the directory block, then it would be re-used. This is very unlikely to occur with normal disk throughput as long as the directory listing are scanned frequently.

This whole thing is an experiment...in making the system more friendly when you go to list the files on your media player. (or file-browser)

A lot depends on the volume of data on your server and your pattern of use.

The cache_dirs.sh program is doing nothing different than you would do if you used your file explorer to list all the files every 10 seconds or so, ignoring the listings as they scrolled past on the screen. It just makes it a tiny bit easier.

Joe L.

May 19, 200917 yr

So Basically this application is just running an infinite loop were every X sec, it just does a list of all the directories (keep them in memory). That way at any time from my windows box I can open a folder and I won't get that delay while the drive spins up to pull the directories. With the -u switch, If I'm not seeing any problems listing drive info from my windows boxes without it, then there is no reason to use it =)

If I have any of this wrong please let me know, either way great work so far.

May 19, 200917 yr

So Basically this application is just running a loop were every X sec, it just does a list of all the directories. That way at any time from my windows box I can open a folder and I won't get that delay while the drive spins up to pull the directories. With the -u switch, basically If I'm not seeing any problems listing drive info from my windows boxes, then there is no reason to use -u =)

Exactly correct.

Once you access a file, it is very likely you will still have a delay as the disk spins up. This whole utility came about because when browsing my server looking for a movie to watch my media-player was timing out waiting for the disks to spin up in turn. This made my wife very unhappy. Now, there is no delay in listing the movies... and a shorter delay when a specific disk is spun up to play a selected movie.

If I have any of this wrong please let me know, either way great work so far.

You've got it right...

It is a simple loop... listing all the directories in turn down to the depth you specified. The sleep time between scans of directories is decreased when the time for a directory scan increases from the average scan time.. and increased when scan time is quicker (everything is in memory, so no need to scan the directories as often)

The script itself is a tiny bit complicated by it checking that only one is run at a time, and that it detaches itself from the terminal so you can log off, and that it has so many options you can experiment with.

Joe L.

May 19, 200917 yr

Seriously Joe....

Brilliant man, this was my ONLY major complaint with unRAID. This should be incorporated as an option in the main release, the functionality is sorely needed by everyone, In my opinion.

Now once I find a good DVI switch-box, and about 10% more write speed, I will have nothing more to desire =P

Thanks again for all your work on this.

May 21, 200917 yr

After the improvements and comments above, including the comments about Total Commander, I've been fiddling and tweaking, and testing settings. After still seeing a few drives spin up unexplainably, I've come up with another idea, which I've tested, and it seems to work, needs additional confirmation from others.

I first thought that Olympia was seeing Total Commander (TC) accessing files for the little file icons that precede the file names in the file listing, just the way Windows Explorer and most file management tools do. So I checked the Icons page of the TC options, and turned them off for files on the Net. But i discovered that it was still possible to browse to a cached folder, and have it spin the drive up, which was a bit frustrating. I then realized there were a few other files in the root folder of some of the data drives, which made me wonder if the Find ever saw them and caused them to be cached, because if it hadn't, then that would explain why browsing to the folder would cause the drive to have to spin up. TC needed to display the entire root folder contents, while Find could stop as soon as it verified a particular name existed there.

So that brings us to the obvious idea: (new lines in blue)

start_time=`date +%s%N`

# always cache root dirs

for i in /mnt/disk*

do

ls $i >/dev/null 2>&1

done

echo "$dir_list" | while read share_dir

It only adds a tiny increment in scan time, and it really seems to work for me. I can now browse freely through the drives with no spinning up unexpectedly. I'm not confident my logic was correct here, so I'll be happy to hear Joe's thoughts on it, as well as his better implementation.

May 21, 200917 yr

Very very interesting. Joe cant see any harm in adding this can you?

May 22, 200917 yr

Very frustratingly, I have discovered that I can still find folders, fully cached and their parent folders fully cached, that when I browse to them, the drive stupidly decides to spin up. I believe olympia and someone else found the same.

I am coming to the conclusion that the find command does not cache everything needed for file browsing. It works great if I never browse the folders. All drives not directly involved in a transfer stay down, and SageTV continues to scan them every 5 minutes. Browsing the root folders is now safe, never seems to cause a spin up any more. But if I decide to browse the other cached folders, constantly scanned by SageTV, and constantly cached by cache_dirs, I will invariably find several folders that will force a spin up.

I am now testing the use of "ls -R" instead, with the cache_dirs -c option, and preliminary testing seems to be showing improvement. It is not conclusive yet, will know more later tomorrow. It also is slower, takes longer to scan, takes about .030 seconds with "find", about .048 seconds with "ls -R". That seems very acceptable to me.

Directory Cache

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)