Directory Cache

May 6, 200917 yr

Suppose I put a "fuser -k" in front of each un-mount, e.g.,

fuser -km /mnt/disk1

umount /mnt/disk1

I think that will suffice for now.
root@unraid~ #cd /mnt/disk6
root@unraid /mnt/disk6 #fuser -km /mnt/disk6 
/mnt/disk6:            862c
Connection to unraid closed.

Works for me too for almost everything... (see below) If I'm stopping the array, I don't expect anything to continue working that was keeping a disk busy.

Kids... don't try this at home... it is being done by a trained profffesinall

Where it does not work is if you have mounted a file system image using one of the loop devices.

I have a slackware disk image. I mount it as follows:

mkdir /tmp/iso

mount -o loop -t iso9660 /mnt/disk11/data/slackware-12.1-install-d1.iso /tmp/iso

I can then issue the fuser -k command on /mnt/disk11 as you described and it has no effect. The file stays mounted, and the "Stop" button cannot stop the array. (This time, all but disk11 showed as Unformatted)

Only way to stop the array is to first type:

umount /tmp/iso

Interestingly, I found that if I mounted the file through the user-share the kill was applied to part of the user-share program, and I was able to stop the array, but I was still able to get to the files on /mnt/iso!!!

As a side effect, the user-shares did not work properly after I went through all this... so i had to reboot. Probably a side effect of my typing

fuser -k on the disk when the file was mounted through /mnt/user instead of /mnt/disk11.

When I stopped and re-started the array, all looked OK, but when I attempted to get to a user-share, I got this:

root@Tower:~# ls /mnt/user
/bin/ls: cannot access /mnt/user: Transport endpoint is not connected

perhaps the losetup program needs to kill any loop connections before you stop the array.

I'm rebooting as I type this... after cleanly stopping my array... and yes, I pulled out a few hairs tracking this down once... as I found I could not shut down the array, and could not find any processes having a file open. ( I had forgotten about the mount of the ISO image I had performed, in fact, I don't even think I gave it the -o loop option, the mount command did that on its own when I specified the file-system type of iso9660)

Joe L.

May 6, 200917 yr

The reason I brought up the loop mount is that we are seeing more of the media-server processes that re-encode media on the fly as they serve it to an http or upnp client on the LAN. It is not unreasonable to see the loop devices involved in such a scheme.

All that said, the change Tom suggested would handle 99.99% of the users of unRAID.

Joe L.

May 7, 200917 yr

Hi guys,

The idea is brilliant and while I can't understand most of the technical discussions on this thread, I can certainly appreciate your effort and wanted to say thanks for sharing.

Do you think would be possible to post a short "complete-linux-noob-guide" on how to install this script using telnet / putty?

Cheers and keep up the good work

EDIT:

Sorted. I was under the impression this is going to be much more complicated than it was. Please forgive my ignorance

May 7, 200917 yr

I've updated the Improving unRAID Performance page, specifically the Keep directory entries cached section, to add cache_dirs and a few instructions and explanations. I added a simple set of instructions with very conservative values, to help new users get started with it.

Please feel free to edit or expand, or to suggest additional changes to the section.

May 7, 200917 yr

Just in time for the new wiki entry is the latest version 1.2 of "cache_dirs"

The changes are as follows:

Able to be used with or without presence of user-shares.
/mnt/cache drive is scanned in addition to /mnt/disk*
Removed "ls -R" as it was too easy to run out of ram. (ask me how I know)
Added -i include_dir to explicitly state cached directories
Added -v option, verbose statistics when run in foreground
Added -q option, to easily terminate a process run in the background
Added logging of command line parameters to syslog

If you use the -i option, you must specify all the top level directories you wish included. (Use multiple -i dir_name pairs)

You can use -? or -h to see the correct usage.[pre]

cache_dirs -?

Usage: cache_dirs [-m min_seconds] [-M max_seconds] [-F] [-d maxdepth] [-c command] [-e exclude_dir] [-i include_dir]

cache_dirs -q

-m NN = minimum seconds to wait between directory scans (default=0)

-M NN = maximum seconds to wait between directory scans (default=10)

-F = do NOT run in background, run in Foreground and print statistics as it loops and scans

-v = when used with -F, verbose statistics are printed as directories are scanned

-d NN = use "find -maxdepth NN" instead of "find -maxdepth 999"

-c command = use command instead of "find"

(command should be quoted if it has embedded spaces)

-e exclude_dir (may be repeated as many times as desired)

-i include_dir (may be repeated as many times as desired)

-q = terminate any background instance of cache_dirs

[/pre]

Oh yes, the -f option was removed... it always uses "find"

To use this with multiple include dirs invoke as follows (put quotes around directory name if it has embedded spaces):

cache_dirs -d 3 -i Music -i "TV Shows" -i "Movies A-E"

To use this on all the top level shares except for a few, invoke like this:

cache_dirs -e "data" -e "backup files"

To use it on everything, just use:

cache_dirs

To run it in the foreground, and see everything it is doing, use the -F and the -v options (-v = verbose)

cache_dirs -F -v

Normally, the process detaches itself from the terminal and runs in the background. To easily terminate the background process, type:

cache_dirs -q

The new version is attached to this post... Have fun.

Newest version is here: http://lime-technology.com/forum/index.php?topic=3666.msg32934#msg32934

Edit, now replaced by version 1.5 in this post: http://lime-technology.com/forum/index.php?topic=3666.msg33498#msg33498

Edit, newest version now in this post: http://lime-technology.com/forum/index.php?topic=4500.0

Joe L

May 7, 200917 yr

FYI, I let it try to index all of my files and let it run overnight. When I checked on it in the morning, the server was locked up. I couldn't see shares and couldn't connect to it via telnet or http... I rebooted and decided to not try that again

May 7, 200917 yr

FYI, I let it try to index all of my files and let it run overnight. When I checked on it in the morning, the server was locked up. I couldn't see shares and couldn't connect to it via telnet or http... I rebooted and decided to not try that again

Let me guess... lots and lots and lots of files, and you used the "ls -R"

Am I right?

Joe L.

PS.

You are in very good company, I rebooted three times before I decided to remove the ability to use the "ls -R" as the default.

May 7, 200917 yr

I thought I used the script that defaulted to "find", but I'll double check when I get home...

May 7, 200917 yr

I thought I used the script that defaulted to "find", but I'll double check when I get home...

The old script defaulted to "ls -R" and only used "find" if you specified "-d depth" or the "-f" option.

In any case, the new script is quite a bit different in how it creates its list of folders to cache.

May 7, 200917 yr

Nice improvements! Lots of innovation!

I do see a problem though. I could be wrong, as I haven't tested it personally yet (unforgivable, I know!), but the validation of the includes and excludes seems to occur before any testing of the array being up and fully mounted, which means it is going to abort pretty quickly if any includes or excludes are specified, at startup. This is a tough problem, makes me wonder if I should have suggested it, as it is rather hard to know when every last drive is fully mounted. I've seen cases where there were drive problems, then a crash, which resulted on the next boot with lots of transactions being replayed AND a parity check in progress, which of course slows the startup way down. It can take a drive several minutes to finally finish mounting.

I wonder if it would be better to add parallel arrays of flags or counters for the includes and excludes, and either set a flag on for any usage, or increment a counter for each use. Then when the script is quit, output a report of any folders that were never used.

Very minor, doesn't look like user_shares_flag is used. (Have to go, I'll update the Wiki much later tonight)

May 7, 200917 yr

I thought I used the script that defaulted to "find", but I'll double check when I get home...

The old script defaulted to "ls -R" and only used "find" if you specified "-d depth" or the "-f" option.

In any case, the new script is quite a bit different in how it creates its list of folders to cache.

I just looked through it and sure enough... I'll get up to speed sooner or later

May 7, 200917 yr

Nice improvements! Lots of innovation!

I do see a problem though. I could be wrong, as I haven't tested it personally yet (unforgivable, I know!), but the validation of the includes and excludes seems to occur before any testing of the array being up and fully mounted, which means it is going to abort pretty quickly if any includes or excludes are specified, at startup.

Yes, this script cannot be invoked before the array is started. You can either put a

sleep 30

before it in your go script, or you can invoke this using "at" like this:

echo "cache_dirs -d 3 -e data" | at now + 5 minutes

Or, once Tom we get to version 5 of unRAID, we can invoke it in the "after-array-starts" hook.

This is a tough problem, makes me wonder if I should have suggested it, as it is rather hard to know when every last drive is fully mounted. I've seen cases where there were drive problems, then a crash, which resulted on the next boot with lots of transactions being replayed AND a parity check in progress, which of course slows the startup way down. It can take a drive several minutes to finally finish mounting.

I wonder if it would be better to add parallel arrays of flags or counters for the includes and excludes, and either set a flag on for any usage, or increment a counter for each use. Then when the script is quit, output a report of any folders that were never used.

I think that adds a lot of overhead to the overall loop, with very little added value. The delayed start, in my opinion, is much better.

In fact, that might be a good command line option... a delay time before looking for the array disks.

Very minor, doesn't look like user_shares_flag is used. (Have to go, I'll update the Wiki much later tonight)

Yes, it was not needed. (-u does not exist)

Joe L.

May 7, 200917 yr

Or, once Tom we get to version 5 of unRAID, we can invoke it in the "after-array-starts" hook.

A suggestion... At the beginning of the while loop there is this:

if [ ! -d /mnt/disk1 ]
  then
    # array is not started, sleep and look again in 10 seconds.
    sleep 10
    continue
fi

It's quite legitimate that there is no 'disk1'.

To solve this problem and lack of 'array start' hook, in the 'while read share_dir' loop maintain a 'count' of the number of directories you have executed 'find' on. When the loop exits, if this count is '0' then you know the array hasn't started yet. The presence of any subdirectory within the /mnt directory indicates that the array is started, and their absence indicates array is not started... ie, the disk1, disk2, ..., user, user0 dirs are created during array start process and deleted during array stop process.

May 7, 200917 yr

You can use this code to look for loopbacks....

awk '!/^#/ && $1 ~ /^\/dev\/loop/ && $2 != \"/\" {print $1}' /proc/mounts

May 7, 200917 yr

Or, once Tom we get to version 5 of unRAID, we can invoke it in the "after-array-starts" hook.

A suggestion... At the beginning of the while loop there is this:
if [ ! -d /mnt/disk1 ]
  then
    # array is not started, sleep and look again in 10 seconds.
    sleep 10
    continue
fi
It's quite legitimate that there is no 'disk1'.

To solve this problem and lack of 'array start' hook, in the 'while read share_dir' loop maintain a 'count' of the number of directories you have executed 'find' on. When the loop exits, if this count is '0' then you know the array hasn't started yet. The presence of any subdirectory within the /mnt directory indicates that the array is started, and their absence indicates array is not started... ie, the disk1, disk2, ..., user, user0 dirs are created during array start process and deleted during array stop process.

Tom,

I like that idea a lot... and, you are absolutely correct, it is very possible to not have a disk1. (and I did not consider that possibility)

You are a tiny bit mistaken about "any directory in /mnt" though, as many people mount disks that are not part of the protected array there, and the unMENU "Disk-Management" plug-in specifically uses /mnt/disk/sdb1, hdj1, etc... as mount-points when it mounts disks that are not part of the protected array. I'll be safe if I just look for disk[1-9]* Those are from within emhttp's control.

Looks like there will soon be a version 1.3.

Joe L.

May 8, 200917 yr

You are a tiny bit mistaken about "any directory in /mnt" though, as many people mount disks that are not part of the protected array there, and the unMENU "Disk-Management" plug-in specifically uses /mnt/disk/sdb1, hdj1, etc... as mount-points when it mounts disks that are not part of the protected array. I'll be safe if I just look for disk[1-9]* Those are from within emhttp's control.

This raises an interesting issue which must get solved for version 5.0. That is, defining how 3rd party add-ons can peacefully co-exist with the stock system. I know folks in the community have been begging for some guidance here & I take responsibility for not providing that in the past, but I do have my (good) reasons

In this particular case, it's probably not a good idea to use the /mnt directory to create 3rd party mount points, especially one named /mnt/disk. This is because I have some plans for the /mnt directory which could mess up 3rd party features, or be messed up by 3rd party features. Better to use /var directory for this... But I'm not suggesting anyone who has written any add-ons to change anything because at present it's not an issue... just a potential issue.

May 8, 200917 yr

Yes, this script cannot be invoked before the array is started. You can either put a
sleep 30

before it in your go script, or you can invoke this using "at" like this:

echo "cache_dirs -d 3 -e data" | at now + 5 minutes

Yes, a 'pre-sleep' is the easiest way to deal with it, although not completely reliable. The "at" method would be better, as it does not hang the go script, but trying to explain to new users how to customize the cache_dirs part looks problematic, too many ways to botch it. "sleep 30' method is the easiest.

Probably need to explain to users that the way folders are selected for caching is different now. I think this is better, more flexible, but does need to be explained. To summarize, all top level folders on all data drives and the Cache drive are selected for caching by default. You can either leave it that way, or specify some folders to be excluded, or specify the folders to include. If you specify any folders to be included, then you have to specify all of the folders you want included, because all non-specified folders will be excluded.

I tried to use -i Vid*, but it was always rejected, whether I specified it with or without quotes or apostrophes or backslashes or a period for the asterisk. I don't see why it does not work, looks like it should, but I instead had to specify 10 different includes, for folder names that all begin with Vid. Appears to be working well, will know better in the morning.

May 8, 200917 yr

You are a tiny bit mistaken about "any directory in /mnt" though, as many people mount disks that are not part of the protected array there, and the unMENU "Disk-Management" plug-in specifically uses /mnt/disk/sdb1, hdj1, etc... as mount-points when it mounts disks that are not part of the protected array. I'll be safe if I just look for disk[1-9]* Those are from within emhttp's control.

This raises an interesting issue which must get solved for version 5.0. That is, defining how 3rd party add-ons can peacefully co-exist with the stock system. I know folks in the community have been begging for some guidance here & I take responsibility for not providing that in the past, but I do have my (good) reasons

In this particular case, it's probably not a good idea to use the /mnt directory to create 3rd party mount points, especially one named /mnt/disk. This is because I have some plans for the /mnt directory which could mess up 3rd party features, or be messed up by 3rd party features. Better to use /var directory for this... But I'm not suggesting anyone who has written any add-ons to change anything because at present it's not an issue... just a potential issue.

Oh, I absolutely agree we need to not do anything that would cause any conflict... clearly communications is in order, but I cannot envision you being involved in the approval/dis-approval of every directory/file/resource used by any add-on process. There's just too many possibilities (I'm sure of this), and way too little of your free time (Not as sure, but a good guess).

The /mnt directory has been used for mount points in UNIX for as long as I can remember... It was the logical choice when I wrote the disk-management plug-in. Thanks for understanding...

The only issue I see, is that even if we have guidance, not every package that might be added by a random user will have been coded to the same standards... We'll just need to be aware of those possibilities and do our best to not conflict with anything.

I'm know some guidelines for add-on packages have already been discussed. I think a dedicated thread might be in order for more discussion.

As far as the unMENU add-on, I can change it to use /var/mnt, or wherever is agreed upon, but I'll wait for more guidance. It is easy to change...

Joe L.

May 8, 200917 yr

Yes, this script cannot be invoked before the array is started. You can either put a
sleep 30

before it in your go script, or you can invoke this using "at" like this:

echo "cache_dirs -d 3 -e data" | at now + 5 minutes

Yes, a 'pre-sleep' is the easiest way to deal with it, although not completely reliable. The "at" method would be better, as it does not hang the go script, but trying to explain to new users how to customize the cache_dirs part looks problematic, too many ways to botch it. "sleep 30' method is the easiest.

I've added a new feature to the next version of cache_dirs... it has a "-w" (wait-till-online) option. If the array is not on-line when cache_dirs is started, it will schedule itself using the "at" program to be executed in 1 minute. When run a minute later, it will re-submit itself to "at" if the array is still off-line.

This will continue until the array comes on-line. There is no need for a sleep30 before it any longer. This way, the "go" script can continue to execute.

(It considers the array to be off-line if a count of the directories in /mnt/disk[1-9]* come up zero.)

The include/exclude folders are not checked until the array is on-line. So that issue is taken care of too. Miss-spelled/missing directories being included/excluded are logged in the syslog.

Probably need to explain to users that the way folders are selected for caching is different now. I think this is better, more flexible, but does need to be explained. To summarize, all top level folders on all data drives and the Cache drive are selected for caching by default. You can either leave it that way, or specify some folders to be excluded, or specify the folders to include. If you specify any folders to be included, then you have to specify all of the folders you want included, because all non-specified folders will be excluded.

Exactly correct

I tried to use -i Vid*, but it was always rejected, whether I specified it with or without quotes or apostrophes or backslashes or a period for the asterisk. I don't see why it does not work, looks like it should, but I instead had to specify 10 different includes, for folder names that all begin with Vid. Appears to be working well, will know better in the morning.

Yes, wild-cards don't work in includes/excludes currently... but let me give it a tiny bit of thought to see if they might be possible.

Joe L.

May 8, 200917 yr

Update on this non-professional's adventure:

I did a full cache with Joe's new script that defaults to find. It took about 6 minutes to complete the first run through, then just 5-10 seconds each time after that. I don't know how many files I have, but it's several million at least. Browsing is much less painful now and the doubts I was having about unRAID being the right solution for me are gone now.

This feature has such a huge impact on the user experience that it should be a part of the standard distribution in my opinion. Limetech should even seriously consider hacking the kernel to protect the directory cache from being flushed.

May 8, 200917 yr

Couldn't agree more with your sentiment.

With one small exception.. hacking the kernel aint no smal task in this regard... the documentaion sucks. I would say raising a call to the real kernel devs would be worth it. I spend a while on the kernel IRC channel but could never get past the keen supporters to the real people i needed to talk to about it.

May 8, 200917 yr

OK... based on feedback, attached is version 1.3 of cache_dirs.

Edit, now replaced by version 1.4 in this post: http://lime-technology.com/forum/index.php?topic=3666.msg32991#msg32991

version 1.3 did not invoke find -maxdepth correctly.

The changes are as follows:

Added -w option, to wait till array comes online before starting scan of /mnt/disk* share folders.
Changed default min-seconds delay between scans to 1 instead of 0.
Moved test of include/exclude directories to after array is on-line
Added logging of mis-spelled/missing include/exclude dirs to syslog
Added ability to have shell wildcard expansion in include/exclude names
Modified slightly the logic used to adjust loop delay to not adjust unless difference is > .1 seconds.

When adding this to your "go" script, use the -w option. There is no need to "sleep" before it to wait for the array to come on-line.

Example:

PATH=$PATH:/boot

cache_dirs -w -d 3 -e "data"

With the "-w" option, it will then not start to cache disks until after the array is online. It does this by scheduling itself to be re-run in a minute by using the "at" command. When invoked and re-scheduled it will print[pre]

warning: commands will be executed using /bin/sh

job 12 at Fri May 8 09:24:00 2009

The unRAID array is not online. Directory scan will occur when array comes online.

[/pre] You can ignore the warning.. The "job number" can be used to see the status.

To see what is in the "at" queue, (to see what is pending at a future time) type

atq

To see the specific job details type:

at -c jobnnumber

note: jobnumber is the number reported in the atq command. Also note, the job number will change each minute the array is off-line, as the cache_dirs command re-schedules itself to try again a minute later every minute if the array is still off-line. When it does, it is a new job number.

The -w will have no effect if the cache_dirs command is run with the "-F" option. Instead, you will just get the following error if the array is off-line:

PATH=$PATH:/boot

cache_dirs -F -e data -d 3 -w

The unRAID array is not online. Directory scan by cache_dirs not started.

Lastly, you can use shell wildcards in the include and exclude options as shown here (put the include/exclude names in quotes):

PATH=$PATH:/boot

cache_dirs -F -e data -d 3 -w -i "M[op][v3]*" -i "P*" -e "*Z" -e "*R"

The example will only include directories that start with the letter "M" followed by either "o" or "p" followed by "v" or "3" (matching Movies and Mp3)

It will also include the directories starting with the letter "P" (matching my "Pictures" directory)

The example will exclude the directories ending with "Z" or "R" as well as the "data" directory.

I know this is a bogus example, but it is intended to show what can be done.

The list of directories matched by the includes/excludes given are printed in the syslog.

Have fun.

Edited to add the PATH=$PATH:dirname so "at" will be able to find the command

Edited once more, to point to version 1.4

Joe L.

May 8, 200917 yr

By default.. how many levels down does this go? I see it seems to suggest 999 in the script, however on my first attempt at setting this up, it looks as though the disk spins up after just the top level.

edit:

I've run it from a telnet window and it takes only 0.5 secs to run the first time. Is that too quick to be caching all folders over 10 disks?

May 8, 200917 yr

By default.. how many levels down does this go? I see it seems to suggest 999 in the script, however on my first attempt at setting this up, it looks as though the disk spins up after just the top level.

If you do not give any "-d" option, it will just perform a "find" command that will search as deep as you have. (Nobody has 999 directory deep hierarchies... not if you use windows-explorer to get to them, that is

)

The disks spin up based on them being accessed to read something off the physical disk... They are accessed only if the data needed for the "find" command is not already in memory. So, it is possible for all your disks to spin up, or none of them, or some part of them when you first run this. In the same way, if by chance a directory node in the buffer gets displaced by more recently accessed data, any specific disk could spin up to get it in the cache once more.

Joe L.

May 8, 200917 yr

By default.. how many levels down does this go? I see it seems to suggest 999 in the script, however on my first attempt at setting this up, it looks as though the disk spins up after just the top level.

edit:

I've run it from a telnet window and it takes only 0.5 secs to run the first time. Is that too quick to be caching all folders over 10 disks?

Nope...sounds quite normal, either your disks are already spinning, or it is all in the cache.

Mine runs in about .16 seconds if i limit it to a depth of 5, and .23 seconds if I let it search the full hierarchy. Reading from RAM is VERY fast.

I have a mostly IDE based array, with a single PCI bus, and 10 data drives.

Joe L.

Directory Cache

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)