cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up


Recommended Posts

cache_dirs is a script to attempt to keep directory entries in memory to prevent disks from spinning up just to get a directory listing.

 

The Linux kernel keeps the most recently accessed disk buffers and directories in memory, and flushes out the least recently accessed entries. If we can 'trick' the kernel into keeping our directory entries in that memory cache, then directory scans will find what it is looking for is already in memory, and not need to access the physical disk to get it. As a result, since the physical disk is not accessed, linux will (after the defined time-out delay) let the physical drives spin down, and save on power costs, plus remove the drives as heat sources. This is especially useful when media files are spread across multiple drives, and a media player begins to scan for a particular media file to play. You want the scan to look at all of the relevant directories, but only spin up the one drive containing the desired media file.

 

Since the cache management decision process tries to keep the most recently accessed disk buffers and directory entries, we need to 'trick' it by constantly accessing the directories of the folders we want to keep in the cache, so that they will always appear to be the most recently accessed.  I've developed, with the help of lots of suggestions and feedback, an easily customizable script called cache_dirs to do this.

 

it is described in the wiki here

 

A long series of posts describing its evolution over time is here in this thread

 

It has many possible tunable options, but most people can simply invoke it as

cache_dirs -w

 

The "-w" option will cause it to wait if unRAID is not yet started. 

 

If you have a folder or folders you wish to exclude, there is a -e option.  This option can be used multiple times.  To exclude a "data" and "old-stuff" directory, you would use

cache_dirs -w -e "data" -e "old-stuff"

Always use Quote marks around the folders you wish to exclude or include, this is especially important if the folder name contains a space or other special character that might be interpreted by the linux shell.

 

By default, all top level folders on the disks and everything under them in sub-folders are scanned.  If you only want a subset of the top -level folders scanned, you can supply an "include" list using the "-i include_dir" option.  Again, it may be repeated on the command line multiple times.  If using the "include" feature, only those directories included are scanned.  There is no need to use an exclude as well, unless you use a wild-card for the include directory and the "include" wild-card matches more than you want cached.  (The include and exclude options work on top-level folders only.  They may not be used to include or exclude specific sub-folders.  You can use a different option (-a) if you wish to exclude a sub-folder, as shown in  this post )

 

For example, let's say you have folders like this:

Movies-Comedy-Bad

Movies-Comedy-Good

Movies-Chick-Flicks-Good

Movies-Chick-Flicks-Bad

Movies-Adventure-Good

Movies-Adventure-Bad

Movies-Drama-Good

Movies-Drama-Bad

Movies-Kids-Good

Movies-Kids-Bad

Movies-Junk-Good

Movies-Junk-Bad

Data1

Data2

Data3

...

You could use an include rule like this

-i "Movies*"

 

and an exclude rule like this in combination with it

-e "*Bad"

 

You would cache only those directories that start with "Movies" and do not have "Bad" at the end of their name.

 

If you added one more exclude like this:

-e "*Junk*"

You would not scan either of the folders with Junk in their name.  Using a combination of include and exclude directories make it pretty flexible if you have the need.  For most people, one or two exclude folders might be all that is needed, if at all.  If you have enough RAM, just let it scan and cache everything.  My "data" folder holds a directory with a backup of an old windows system and has at least several hundred thousand files and folders under it.  I always exclude it, as it is never needed by my media players in their listing of movies.

 

If you add /boot/cache_dirs -w to your "go" script, it will run each time you re-start your server.

 

To stop cache_dirs from running, type

cache_dirs -q

 

To see all the options, type

cache_dirs -h

 

To run it in the foreground, so you can see what it is doing, use the -F option.  As it loops and scans it will print statistics on how long each scan is taking.  It will adjust the scan rate based on the activity on the server.  You can set the min and max delay times of the scan rate using the -m min-time and -M max-time options.

 

Usage: cache_dirs [-m min_seconds] [-M max_seconds] [-F] [-d maxdepth] [-c command] [-a args] [-e exclude_dir] [-i include_dir] [-w]
       cache_dirs -V      = print program version
       cache_dirs -q
       cache_dirs -l on   = turn on logging to /var/log/cache_dirs.log
       cache_dirs -l off  = turn off logging to /var/log/cache_dirs.log
-w       =   wait for array to come online before start of cache scan of directories
-m NN    =   minimum seconds to wait between directory scans (default=1)
-M NN    =   maximum seconds to wait between directory scans (default=10)
-F       =   do NOT run in background, run in Foreground and print statistics as it loops and scans
-v       =   when used with -F, verbose statistics are printed as directories are scanned
-s       =   shorter-log - print count of directories scanned to syslog instead of their names
-d NN    =   use "find -maxdepth NN" instead of "find -maxdepth 999"
-c command   = use command instead of "find"
              (command should be quoted if it has embedded spaces)
-a args    = append args to command
-u       =   also scan /mnt/user (scan user shares)
-e exclude_dir  (may be repeated as many times as desired)
-i include_dir  (may be repeated as many times as desired)
-p NN    =   set cache_pressure to NN (by default it is set to 10)
-B       =   do not force disks busy (to prevent unmounted disks showing as unformatted)
-S       =   do not suspend scan during 'mover' process
-z       = concise log (log run criteria on one line)
-q       = terminate any background instance of cache_dirs

 

cache_dirs will force all the data disks to be "busy" to prevent them from being un-mounted.  This will prevent un-mounted disks appearing as un-formatted in the unRAID management console.  If you are using any release prior to 4.5beta7, this will prevent you from "Stopping" the array the first time you press the "Stop" button.  Simply wait a few seconds and then press "Stop" a second time within 2 minutes of the first attempt to stop the array.  If you have no other processes keeping disks busy, it will then stop.

 

On release 4.5b7, it is no longer necessary to press stop a second time, and in fact you cannot, as the management console will show "Unmounting" until all processes holding disks busy are terminated and only the "Refresh" button is active.

 

If you are on 4.5b7 or greater, if you wish, you can use the -B option to not force the disks to be busy.

 

The 1.6.4 version of cache_dirs is attached.  It is now coded to sleep while the "mover" process moves files from your cache drive. 

# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent

#                DuplicateFile messages from occurring as file is being copied.

#              - Added -S option to NOT suspend scan during mover process.

#              - Added logic to re-invoke cache_dirs if array is stopped and then re-started

#                by submitting command string to "at" to re-invoke in a minute.

#              - Added entry to "usage()" function for -B

 

# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"

#                to eliminate warnings in syslog about duplicate files detected while files were

#                being copied.

# Version 1.6.6 - Fixed typo in looking for mover-pid.

# Version 1.6.7 - Added cache_pressure to "usage" statement, fixed bug where it reverted to 10 after being invoked through "at"

#                when used with the -w option.

# Version 1.6.8 - Added -U NNNNN option to set ulimit, and detection of 64 bit OS so odds are this new option will not be needed.

#                by default, ulimit is set to 5000 on 32 bit OS, and 30000 on 64 bit OS.  Either can be over-ridden with -U NNNNN on command line

# Version 1.6.9 - Removed exec of /bin/bash.  Newer bash was not setting SHELL variable causing infinite loop if invoked from "go" script.

#                Changed default ulimit on 64 bit systems to 50000.

#                by default, ulimit is now set to 5000 on 32 bit OS, and 50000 on 64 bit OS.  Either can be over-ridden with -U NNNNN on command line

#                Setting ulimit to zero ( with "-U 0" option) is now special, cache_dirs will not set any ulimit at all.  You'll inherit the system value, whatever it might be.

 

The full revision history is as follows:

####################################################################################
# cache_dirs
# A utility to attempt to keep directory entries in the linux
# buffer cache to allow disks to spin down and no need to spin-up
# simply to get a directory listing on an unRAID server.
#
# Version 1.0   Initial proof of concept using "ls -R"
# Version 1.1   Working version, using "ls -R" or "find -maxdepth"
# Version 1.2   Able to be used with or without presence of user-shares.
#               Removed "ls -R" as it was too easy to run out of ram. (ask me how I know)
#               Added -i include_dir to explicitly state cached directories
#               Added -v option, verbose statistics when run in foreground
#               Added -q option, to easily terminate a process run in the background
#               Added logging of command line parameters to syslog
# Version 1.3   Added -w option, to wait till array comes online before starting scan
#               of /mnt/disk* share folders.
#               Changed min-seconds delay between scans to 1 instead of 0.
#               Moved test of include/exclude directories to after array is on-line
#               Added logging of mis-spelled/missing include/exclude dirs to syslog
#               Added ability to have shell wildcard expansion in include/exclude names
# Version 1.4   Fix bug with argument order passed to find when using -d option
#               Fixed command submitted to "at" to use full path. Should not need to
#              set PATH variable in "go" script.
#               Added ability to also cache scan /mnt/user with -u option
# Version 1.4.1 Fixed version comment so it is actually a comment.
# Version 1.5   Added -V to print version number.
#               Added explicit cache of root directories on disks and cache drive
#               Modified "average" scan time statistic to be weighted average with a window
#               of recent samples.
#               Added -a args option to allow entry of args to commands after dir/file name
#                 example: cache_dirs -a "-ls" -d 3
#                 This will execute "find disk/share -ls -maxdepth 3"
# Version 1.6   - Fixed bug... if -q was used, and cache_dirs not currently running,
#               it started running in error. OOps... Added the missing "exit"
#               - Changed vfs_cache_pressure setting to be 1 instead of 0 by default.
#               - Added "-p cache_pressure" to allow experimentation with vfs_cache_pressure values
#                (If not specified, default value of 1 will be used)
#               - Made -noleaf the default behavior for the "find" command (use -a "" to disable).
#               - Added logic to force all disks "busy" by starting a process with each as their
#               current working directory.   This will prevent a user from seeing a frightening
#               Unformatted description if they attempt to stop the array.  A second "Stop" will
#               succeed (the scan is paused for 2 minutes, so it may be stopped cleanly)
#               - Added new -B option to revert to the old behaviour and not force disks busy if by
#               chance this new feature causes problems for some users.
#               - Allow min seconds to be equal to max seconds in loop delay range.
#               - Added run-time-logging, log name = /var/log/cache_dirs.log
# Version 1.6.1 - Fixed bug. Added missing /mnt/cache disk to scanned directories
# Version 1.6.2 - Added trap to clean up processes after kill signal when run in background
# Version 1.6.3 - Modified to deal with new un-mounting message in syslog in 4.5b7 to
#                 allow array shutdown to occur cleanly.
# Version 1.6.4 - Modified to suspend scan during time "mover" script is running to prevent
#                 DuplicateFile messages from occurring as file is being copied.
#               - Added -S option to NOT suspend scan during mover process.
#               - Added logic to re-invoke cache_dirs if array is stopped and then re-started
#                 by submitting command string to "at" to re-invoke in a minute.
#               - Added entry to "usage()" function for -B
# Version 1.6.5 - Fixed what I broke in looking for "mover" pid to suspend during the "mover"
#                 to eliminate warnings in syslog about duplicate files detected while files were
#                 being copied.
# Version 1.6.6 - Fixed mover-detection to use the exact same logic as "mover" (and fixed stupid typo I had made)
# Version 1.6.7 - Added cache_pressure to "usage" statement, fixed bug where it reverted to 10 after being invoked through "at"
#                 when used with the -w option.

 

Joe L.

cache_dirs.zip

  • Upvote 1
Link to comment

Thanks Joe. Why do i see 8 seperate processes when this is started?

 

root      2618    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2622    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2628    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2633    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2637    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2643    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2648    1  0 11:37 ?        00:00:00 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

root      2652    1  0 11:37 ?        00:00:02 /bin/bash /boot/scripts/cache_dirs.sh -d 2 -m 3 -M 5 -w

 

My go script looks like this...

 

/boot/scripts/cache_dirs.sh  -d  2  -m  3  -M  5  -w
Link to comment

Easy, there is one "child" process keeping each of your data disks busy to prevent you from seeing a "Un-formatted" description when you attempt to stop the array and a directory scan is in progress.

 

The child process basically does this

  changes directory to /mnt/diskX

  then invokes a loop waiting for the lock-file used by cache_dirs to not exist

  while lock-file-exists

  do

    sleep 2 seconds

  end while 

 

Every 2 seconds it wakes up from its sleep and check if the lock-file is still there, if it is, it sleeps another 2 seconds any then looks once more, etc... Since the child processes each have as its "current directory" one of your data disks, it will not be possible for unRAID to un-mount them, as they will be "busy"

 

The main cache_dirs process removes the lock-file when it notices an attempt to stop the array allowing the child processes to stop and the disks to be un-mounted.

 

So... it is normal to see what you are seeing.  Once you upgrade to unRAID 4.5beta7, or beyond, you will not need those extra processes, as unRAID is smart enough now to not show "un-formatted" on disks it already un-mounted. On earlier versions it shows "Unformatted" on all the disks that could be  un-mounted and also showed a "Format" button you might accidentally use on a disk with your data.

 

Once you upgrade to 4.5b7, you can use the "-B" option to cache_dirs if you want to clean up your process listing, as those child processes will no longer be needed on your server to force disk to be "busy".

 

Joe L.

 

Link to comment

Hi Joe, quick question about the benefits of cache_dirs in my particular situation.  Is cache_dirs worthwhile if I'm  using my UnRAID server to feed an NMT Popcorn Hour where I use a movie jukebox and skin that is stored locally on a drive in the NMT?  I could see the benefit if I'm pointing my NMT to the user share of the UnRAID server and I'm using the standard UI of the NMT but I'm struggling to rationalize if I'm getting any benefit with my current setup?  Any insight would be greatly appreciated!

Link to comment

Hi Joe, quick question about the benefits of cache_dirs in my particular situation.  Is cache_dirs worthwhile if I'm  using my UnRAID server to feed an NMT Popcorn Hour where I use a movie jukebox and skin that is stored locally on a drive in the NMT?  I could see the benefit if I'm pointing my NMT to the user share of the UnRAID server and I'm using the standard UI of the NMT but I'm struggling to rationalize if I'm getting any benefit with my current setup?  Any insight would be greatly appreciated!

If the popcorn hour does not need to perform directory listings for you to choose a movie to view, then it is of less use to you.

 

It won't hurt anything, but you'll know if you need it.... (because your family will ask, why does it take so long to get a listing of our movies/music/pictures when I press a button?)

 

Joe L.

Link to comment

Hi Joe L.,

 

I am wondering if it would be easy for you to include an option for completely wiping the disk instead of preclearing it.

Or the preclearing method in it's current form can be used for this purposes? (but at least the readback seems unneccessary in this case)

 

That would be extremly useful on a disk replacement, when the old disk is going to be sold out.

 

Thank you in advance for your feedback.

Link to comment

Hi Joe L.,

 

I am wondering if it would be easy for you to include an option for completely wiping the disk instead of preclearing it.

Or the preclearing method in it's current form can be used for this purposes? (but at least the readback seems unneccessary in this case)

 

That would be extremly useful on a disk replacement, when the old disk is going to be sold out.

 

Thank you in advance for your feedback.

I think you intended this question to be posted to the thread on the preclear_disk.sh script,

so I'll post my answer there.

 

Joe L.

Link to comment
  • 2 weeks later...

Joe,

 

I really want to use this script as I am a user of MediaBrowser in Windows Media Center.

The application scans my movie folder every time to refresh the metadata and with 8 disks this causes an annoying delay as well as spins up my disks unnecessarily.

 

I installed the script and it seemed to work fine, but the disks showing unformatted really made me nervous.

It seems like I could accidentally do a lot of damage to my server.

 

Would upgrading to 4.5b7 change this in any way ?

Is 4.5b7 good enough to trust with my data ?

 

Any help you could be provide would be greatly appreciated.

 

Thanks,

Kent

Link to comment

Joe,

 

I really want to use this script as I am a user of MediaBrowser in Windows Media Center.

The application scans my movie folder every time to refresh the metadata and with 8 disks this causes an annoying delay as well as spins up my disks unnecessarily.

 

I installed the script and it seemed to work fine, but the disks showing unformatted really made me nervous.

It seems like I could accidentally do a lot of damage to my server.

 

Would upgrading to 4.5b7 change this in any way ?

Is 4.5b7 good enough to trust with my data ?

 

Any help you could be provide would be greatly appreciated.

 

Thanks,

Kent

The current version of cache_dirs explicitly keeps all your disks "busy" so no disk should show as "unformatted"  Are you using the current version of cache_dirs?

 

The 4.5b7 version of unraid will no longer show "Unformatted" for any disk that it is able to un-mount successfully but unable to un-mount a busy disk.  For that reason, it is safer.  You will need to use the current version of cache_dirs with it, otherwise you will not be permitted to stop the array until you stop the cache_dirs process (with cache_dirs -q)

 

The most current version of cache_dirs will work with both versions of unRAID.  Both versions of unRAID are safe for your data.  The "bugs" are in active directory support and in adding more than 18 data drives... (You have neither in your version, so you are not affected by those issues when you upgrade)

 

To see the version of cache_dirs you are running, type

cache_dirs -V

 

Current version is 1.6.4 as of today.

Link to comment

Joe,

 

I upgraded to 4.5B7 and I have got the script running now.  Thanks for your help.

I'm not so sure it's working though.  When I browse to the folders my disks still spin up.

 

This is the command I used to start it:

/boot/cache_dirs -w -i "Movies" -i "KidsMovies" -i "TV"

 

Movies, KidsMovies and TV are the user shares I want cached.

 

Am I missing anything obvious here ?

 

Thanks again for your help,

 

Kent

Link to comment

Wont mediabrowser be accessing the mymovies.xml files in each folder to verify it is current?

 

That is corret. My media frontend (SageTV) also reads the cover images of all my videos.

So the caching script that I am using copies all .jpg files it finds to /dev/null

 

Purko

 

Link to comment

Wont mediabrowser be accessing the mymovies.xml files in each folder to verify it is current?

 

That is corret. My media frontend (SageTV) also reads the cover images of all my videos.

So the caching script that I am using copies all .jpg files it finds to /dev/null

 

Purko

 

 

If you are using a modified version of JoeL. script it would probably benefit others to have access to it.  If you could attach it to a post that would be great.  I (or someone else) could then update the wiki to point to this "alternate" version.

 

You may also ask JoeL if he could add another switch (or 2) to include certain files or certain extensions.

Link to comment
  • 3 weeks later...
  • 4 weeks later...

I am running Cache_Dirs v1.6 from an unmenu "go script" as follows:

 

/boot/custom/bin/cache_dirs -w -B -i "Media"

 

I then have a cron job that checks for duplicate files between disks of a share once per hour.  If a dupe is found a notification is sent to me (via SMS) and then I execute the following to stop the cache_dirs process:

 

/boot/custom/bin/cache_dirs -q >>/var/log/syslog 2>&1

 

The problem is that the process is not stopping.  If I telnet to my server and attempt to restart cache_dirs I receive an error that it is already running.  Then after stopping the process from the terminal I am able to restart.

 

Is there some special way that I should be stopping the process when in a shell script???

Link to comment
  • 2 weeks later...

I tried cache_dirs -w and all my drives are now not spinning down at all? I see  in ps one cache_dirs -w per drive. When I manually try to spin down a drive, it immediately spins up again. I have started cache_dirs -w from the terminal (as I have not rebooted the server, otherwise it would be in the go script). Any hints?

Link to comment

I tried cache_dirs -w and all my drives are now not spinning down at all? I see  in ps one cache_dirs -w per drive. When I manually try to spin down a drive, it immediately spins up again. I have started cache_dirs -w from the terminal (as I have not rebooted the server, otherwise it would be in the go script). Any hints?

That would seem to tell me you have more files and directories than will fit into memory at one time.  Since they all do not fit, each time it scans, it must read from a physical disk.

 

How many files and directories do you have?

(this command will tell you)

ls -R  /mnt/user | wc -l

 

If you have too many files to cache in memory you might need to exclude some shares.  (I exclude the "data" and "Pictures" shares on my server, I only cache the "Movies" shares.

 

One cache_dirs process per disk is correct.  They are just "sleeping" and not accessing the disk at all. They are there to keep the disk from being un-mounted.  If you are running 4.5 version of unRAId you can use the -B option to cache_dirs because it no longer needs to keep the disks from being un-mounted.  With that option, there will only be one cache_dirs process.

 

Joe L.

 

Link to comment

Just counted 119638 files. I have 4GB memory, can add 4 more if unRAID can see all 8? If cache_dirs support some Depth for scanning/caching, might add that too, I would care for depth=3 only.

cache_dirs has the "-d maxdepth" option, so you could try

cache_dirs -d 3 -w

 

There may be something else involved though, if something is looking at other than the directory listings.  If that was the case, cache_dirs would not help, as the other content would not be in the cache, and it if was, it would be displaced quickly by more recently accessed content.

Link to comment

I'll try again specifying explicitly the depth. Is maxdepth counting from /mnt/disk*/ downwards only, or starting with / ? I am running 4.5. I've read somewhere unRAID supports only 4GB main memory as it runs 32bit, no point adding more RAM, right?

 

Edit: This is what I get when I execute the following. cache_dirs loops showing the same and don't terminate at all. The Test user share is a very small one.

 

cache_dirs -B -w -F -v -e Archive -i Test

 

Executing find /mnt/disk3/Test -noleaf

Executing find /mnt/disk9/Test -noleaf

Executed find in 0.011855 seconds, weighted avg=0.011855 seconds, now sleeping 5 seconds

Executing find /mnt/disk3/Test -noleaf

Executing find /mnt/disk9/Test -noleaf

Executed find in 0.012200 seconds, weighted avg=0.012085 seconds, now sleeping 4 seconds

 

 

Link to comment

I'll try again specifying explicitly the depth. Is maxdepth counting from /mnt/disk*/ downwards only, or starting with / ?

Starting with "/"

I am running 4.5. I've read somewhere unRAID supports only 4GB main memory as it runs 32bit, no point adding more RAM, right?

Wrong... it spports extended memory addressing, so you can add the additional memory.

Edit: This is what I get when I execute the following. cache_dirs loops showing the same and don't terminate at all. The Test user share is a very small one.

 

cache_dirs -B -w -F -v -e Archive -i Test 

 

Executing find /mnt/disk3/Test -noleaf

Executing find /mnt/disk9/Test -noleaf

Executed find in 0.011855 seconds, weighted avg=0.011855 seconds, now sleeping 5 seconds

Executing find /mnt/disk3/Test -noleaf

Executing find /mnt/disk9/Test -noleaf

Executed find in 0.012200 seconds, weighted avg=0.012085 seconds, now sleeping 4 seconds

It is never supposed to terminate...it is supposed to loop forever.  That is the whole idea, to keep accessing directory listing data to keep it in memory so it never becomes the "least-recently-accessed" data and becomes available for re-use.

 

If you use a "-i" option it will NEVER cache anything but directories under "Test" so do not use "-i" in combination with -e the way you illustrated.

 

For you, just invoke with

-e Archive -w

 

If you wish to see what is being cached, type

find /mnt/disk* -maxdepth 3 -noleaf -print

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.