cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up


Recommended Posts

Hi all,

 

OK, so this is my first post in the forums, so please be gentle!

 

1st up, many thanks to Joe for the cache_dirs script; most definitely a Good Thing to have on an unraid system :-)

 

So, I've been running unraid v5 since about 5b5 (I think), and have been on the stable 5.0 release since not long after it came out. *was* running a combination of 2TB and 3TB SATA array drives for a total of 17TB, with a non-array 300GB IDE disk providing me a swap partition (not strictly necessary with 4TB, I know, but I've been using Linux since 1996, and old habbits die hard!) and somewhere to keep data that didn't need to be protected cos I have it burnt to dvd). When I set the machine up for unraid, I went down the "oh, that could be useful" route, so I had unmenu, powerdown, various additional unmenu packages (like htop, powerdown, etc) and a rather customised way of launching them so that I could store the majority of them on the non-array disk and kind of inject them into the live unraid filesystem, inspired by the way Tiny/Micro Core linux can (or atleast could when I last used it) be configured to do package persistence.

 

Anyway, a couple of months back, I had a combination failure of an array drive and a molex/sata adapter (with impressive/scary amounts of smoke billowing from the machine!)... the result being a completely b0rked 3TB drive and some very melted/charred wires.

 

The long and the short of it is an upgraded array with 25TB on 4TB Parity with the 300GB drive now set up as cache (although not actually *caching* any of the shares.... its doing the same job as before (even down to having a sneaky 2nd partition set up for swap), its just that its now formatted with BTRFS so that I have the option of playing with Docker if I fancy) and a bigger PSU with lots of SATA power connections (eliminating what would have been 6 molex/sata adapters for the now 10 SATA drives) and running unraid 6b6 with powerdown 2.04 the only 'extra' added to the stock unraid setup (except for the minute tweak of moving emhttp onto port 8000).

 

While I'd been poking in the forums to find out as much as I could about the changes in powerdown 2.x *before* I tried it, I came across the cache_dirs script and, since I've got a reasonable amount of data on my array in a various shares, it seemed like it would be a very nice addition.

 

I don't run my server 24/7 (mainly cos its audible from where I sleep, but also to keep electric use down), so there are only really two possibilities when its powered up; I'm either actively doing something that uses the array or I'm not (because I'm either going to or have already done so)... so I have my spindown time set to 15 minutes, which suits this usage pattern right up to the point when what I want to do is quickly check if I have already got something on the array or am hunting for a specific thing that I know I've got stored on there *somewhere*... at that point, it becomes a right royal PITA,... quickly rapidly becomes slowly..... as multiple spinup delays have to happen

 

My unraid system is my main storage for *everything*.... so it has a Software share with all *sorts* of software installers, updaters, boot images, etc from/for all *sorts* of OSs on it, a TV share with dvr'd movies and tv shows, a Media share carrying music and e-books and a final share that acts as my archive of personal *stuff*.... its got A Level and Uni notes, software designs, photos, etc, etc thats been accumulated over some twenty years of having atleast one PC.... this share is by far the least organised and most messy of the 4, and its also the one that suffers the worst from spinup delays when I'm trying to find things, as its distributed across pretty much every disk in the array (and yes, I know I should amalgamate it on one disk, that would help; I just never seem to find the time to shuffle it all around!)

 

So yeah, I've only been using it a couple of days but cache_dirs is *brilliant* for me as it has mostly eliminated those spinup delays and made me a very happy bunny! :-)

 

However, I did come across a little niggle that I'd like to report...

 

OK FYI: I'm running cache_dirs from my /boot/config/go script as

 

        /boot/tools/cache_dirs -w -u -B 

 

I am (purposefully) not using a maxdepth value with cache_dirs (because my archive and software shares are the trees that will benefit me the most from being cached as they are my "I know I've got that somewhere" go-to locations and they are both quite deep and range from being poorly organised to completely not organised at all...!). This unrestricted depth means that the first cache_dirs scape takes a pretty long time (ie, ballpark 20mins). Not (in and of itself) an issue, but because my first cache_dirs scrape takes a long time , I come across something that I suspect most people won't.... which is that if I attempt to use

 

        cache_dirs -q 

 

in preparation for shutting down the unraid system while the first scrape is still in progress (ie, there is a 'find -noleaf' process running that was spawned by cache_dirs), only the cache_dirs process who's PID is stored in the LCK file ends and I still have a cache_dirs process and a find process listed when I do:

 

        ps -ef | grep "\(find\|cache_d\)" | grep -v grep 

 

Attempting

 

        cache_dirs -q 

 

for a second time, outputs a statement:

 

        cache_dirs is not currently running 

 

(presumably because the LCK file no longer exists), even though ps shows it is; indeed, the cache_dirs process that is still running shows in ps as having now been re-parented to PID 1, since it was orphaned when the process referenced in the LCK file ended. Keeping an eye on the process list, the find that was running when the 'quit' was signalled runs to completion, and another find starts, so it looks to me as though the cache_dirs process that is still running will continue to run until it completes, regardless of the 'quit' having being signalled.

 

But... I got bored waiting for that to happen, so I decided to kill cache_dirs and find. Just killing that orphaned cache_dirs process doesn't stop the find, so I ended up using

 

        killall -g cache_dirs 

 

which immediately stops the orphaned cache_dirs process group (ie cache_dirs ands its child find(s)).

 

Now, admittedly this is a less-than-usual use pattern, but from my reading about powerdown, I know there is atleast one user who experiences frequent (ie, multiple powerouts in short succession) power outages; when their power goes out, the powerdown script is invoked (ie, via UPS signalling) to gracefully shutdown the array and machine, so I don't think its too far-fetched that someone could run into this issue by being unlucky enough to suffer a power outage during cache_dirs initial scrape...

 

I am now using the following bit of scripting to stop cache_dirs on my own system (so I don't end up with a modified cache_dirs scripts to maintain):

 

        #!/bin/bash
        cache_dirs -q
        sleep 2
        killall -s SIGTERM -g cache_dirs 

 

But if Joe felt it worthwhile, I think the 'quit' handler could simply be tweaked to use

 

        killall -g "$program_name" 

 

instead of

 

        kill "$lock_pid" 

 

I thought that I should probably report this, as it could be an issue that someone might run in to (albeit infrequently) that could end up with an array disk being ungracefully unmounted, thus triggering a parity check on the next boot for no obvious or easily-identified reason.

 

Editted to include the possible tweak, and again to make it use $program_name not $lock_pid

Link to comment

For some reason cache_dirs is keeping one of my disks busy.  I've got 6 data disks, and cache dirs appears to be keeping Disk 6 busy.  inotifywait -mr /mnt/disk6 shows that a few folders keep being rescanned on the disk (every 3-5 seconds), preventing the disk from spinning down.  As soon as I stop cache_dirs with -q inotifywait stops updating (the disk is idle)

 

I'm currently running reiserfsck on the drive to ensure it's not a file system problem.  Outside of a file system issue, does anyone have any idea as to why this is happening, and what I can do to prevent it?  I'm currently invoking cache_dirs via the go script with the recommended command:

 

/boot/cache_dirs -w

 

I have 2gb of RAM, and am running the SSH, and APCUPSD plugins on v6.06beta.  Currently using the latest version of cache_dirs

Link to comment

For some reason cache_dirs is keeping one of my disks busy.  I've got 6 data disks, and cache dirs appears to be keeping Disk 6 busy.  inotifywait -mr /mnt/disk6 shows that a few folders keep being rescanned on the disk (every 3-5 seconds), preventing the disk from spinning down.  As soon as I stop cache_dirs with inotifywait stops updating (the disk is idle)

 

I'm currently running reiserfsck on the drive to ensure it's not a file system problem.  Outside of a file system issue, does anyone have any idea as to why this is happening, and what I can do to prevent it?  I'm currently invoking cache_dirs via the go script with the recommended command:

 

/boot/cache_dirs -w

 

I have 2gb of RAM, and am running the SSH, and APCUPSD plugins on v6.06beta.

 

Since it's happening on the last data drive to be scanned, I suspect it is trying to start a new scan before the current one is finished, that is, timing out on the scan polling interval.  As an experiment, try increasing the minimum scan time (eg. /boot/cache_dirs -w -m 7).  If that doesn't help then increase it to 8 seconds, then 9, then 10 seconds.  If it does appear to be working correctly, then decrease it a second at a time until it doesn't, then change it back to the lowest minimum scan time that does work.

Link to comment

how many files on all your disks?

The amount of files determines how much data can be cached.

 

2GB might be a little low for what you want to do if there are many files.

 

There is only so much low ram available and without high ram to be used for applications and buffer caching some data might be getting flushed out

 

For example. I have a 3 disk array with a million files and cache_dirs causes issues for me.

Link to comment

For some reason cache_dirs is keeping one of my disks busy.  I've got 6 data disks, and cache dirs appears to be keeping Disk 6 busy.  inotifywait -mr /mnt/disk6 shows that a few folders keep being rescanned on the disk (every 3-5 seconds), preventing the disk from spinning down.  As soon as I stop cache_dirs with inotifywait stops updating (the disk is idle)

 

I'm currently running reiserfsck on the drive to ensure it's not a file system problem.  Outside of a file system issue, does anyone have any idea as to why this is happening, and what I can do to prevent it?  I'm currently invoking cache_dirs via the go script with the recommended command:

 

/boot/cache_dirs -w

 

I have 2gb of RAM, and am running the SSH, and APCUPSD plugins on v6.06beta.

 

Since it's happening on the last data drive to be scanned, I suspect it is trying to start a new scan before the current one is finished, that is, timing out on the scan polling interval.  As an experiment, try increasing the minimum scan time (eg. /boot/cache_dirs -w -m 7).  If that doesn't help then increase it to 8 seconds, then 9, then 10 seconds.  If it does appear to be working correctly, then decrease it a second at a time until it doesn't, then change it back to the lowest minimum scan time that does work.

 

Thanks for the help, increasing it to 9 seems to have fixed the issue (I'm going to keep monitoring it).

 

As a side note using -m with a value higher than 10 causes cache_dirs to fail.

 

Low ram shouldn't have any affect on me as I'm running v6.06beta which is a 64-bit OS, and has no low ram.  There are ~245000ish files on the array.

 

 

Link to comment

Low ram shouldn't have any affect on me as I'm running v6.06beta which is a 64-bit OS, and has no low ram.  There are ~245000ish files on the array.

 

 

I was going to mention the 64bit beta but got interrupted and sidetracked.

Anyway with that amount of files you should be fine.

I store mostly mp3's which ends up being millions of files and it causes all sorts of grief trying to traverse that.

Link to comment

hi, ok, sooo..... just in case anyone has read my last post..... erm :-[ ....so... obviously I don't use the webgui very much... turns out that just issuing

killall -g cache_dirs

can end up b0rking emhttp...  :-/ Apart from that, it seems to work perfectly!  :'(

 

I only came across this today because I was trying to add scripts to make cache_dirs go up and down with the array rather than from the go and stop scripts... and somewhere in my experimenting I pressed the 'stop array' button and the moment I did, I ran slap bang into a completely unresponsive webgui....  when the array is stopped as part of system shutdown, you're expecting the webgui to stop working, and because I tend to shutdown from the shell, not the webgui, its even less apparent....

 

I think I've worked out whats going on though; what I think is that when its being started by scripts that live in

/usr/local/emhttp/plugins/*some_plugin*/event/

, cache_dirs runs as part of the same process group as emhttp, and when you issue the

killall -g cache_dirs

to kill the process group, it takes out emhttp at the same time, what with them being in the same group....

 

If the processes are stopped manually (using ps and kill to stop cache_dirs and find in sequence, lowest PID to highest), emhttp doesn't b0rk...

 

Been trying to figure out what to do about it for a good few hours, even wrote the large part of a script that would carefully the individual kill the main cache_dirs process, its children cache_dirs and the find processes (and even check the at-queue for scheduled cache_dirs jobs)

 

And then I fell across

setsid

, which I'd forgotten alllll about...!

 

Turns out that if you start cache_dirs by doing

setsid /path/to/cache_dirs -w ... 

then cache_dirs gets its own process group, and killall works as expected, even when stopping the array from array event scripts :-)

 

So sorry if anyone has run into this.

 

-Jo

Link to comment

So I've been trying to get this script to work, and it does, for a while.. The thing is every 2 - 2,5 hrs the script seem to start with a new process ID (maybe it's supposed to do this..?) causing first most of, then all my drives to spin up. I started using unRAID just a few days ago so I'm totally new to this but I thought the script was just supposed to start once, then just loop and sleep not start a new process after a while. The script is working fine for a while tho (I would say at least half a day), before this starts and I don't know why it happens.

 

First I tried the cache_dirs plugin but that didn't work at all, the script would run but once I started browsing the directories the disks would spin up, and I think the same thing happend after a while, were all the disk would spin up after 2 hrs.. Then I disabled the plugin, downloaded the script from here and just started it with cache_dirs -w and it worked!! The disks would not spin up when I was browsing my directories. and I was happy.

 

Then I learned today that after a while all my disk would spin up again. When I quit cache_dirs all returns to normal. How do I fix it? Maybe it's because I have the plugin installed, would that screw it up? It's disabled but maybe I have to delete it form the plugin folder also? I'm running unRAID 5.0.5 btw. Syslog attached. As you can see from the syslog everything is fine from around 07:00 to 19:19 when disks start spinning up again. And I manually spin down all the disk at 22:25,23:53, killling cache_dir at Aug 25 00:50 and spinning down all the disk for the last time.

syslog-2014-08-25.txt

Link to comment

So I've been trying to get this script to work, and it does, for a while.. The thing is every 2 - 2,5 hrs the script seem to start with a new process ID (maybe it's supposed to do this..?) causing first most of, then all my drives to spin up. I started using unRAID just a few days ago so I'm totally new to this but I thought the script was just supposed to start once, then just loop and sleep not start a new process after a while. The script is working fine for a while tho (I would say at least half a day), before this starts and I don't know why it happens.

 

First I tried the cache_dirs plugin but that didn't work at all, the script would run but once I started browsing the directories the disks would spin up, and I think the same thing happend after a while, were all the disk would spin up after 2 hrs.. Then I disabled the plugin, downloaded the script from here and just started it with cache_dirs -w and it worked!! The disks would not spin up when I was browsing my directories. and I was happy.

 

Then I learned today that after a while all my disk would spin up again. When I quit cache_dirs all returns to normal. How do I fix it? Maybe it's because I have the plugin installed, would that screw it up? It's disabled but maybe I have to delete it form the plugin folder also? I'm running unRAID 5.0.5 btw. Syslog attached. As you can see from the syslog everything is fine from around 07:00 to 19:19 when disks start spinning up again. And I manually spin down all the disk at 22:25,23:53, killling cache_dir at Aug 25 00:50 and spinning down all the disk for the last time.

I'm sorry, but I have no idea what the plugin does, nor do I support it in any way.

 

There is never a need to kill cache_dirs (at least with the version I wrote)  It will suspend itself when the mover is run.

Joe L.

Link to comment

Yeah, I know but like I said I disabled the plugin and downloaded the script from the first post in this tread, ran it but it still keep spinning my disks up after a while. Any idea what is wrong?

 

Edit: I should mention when this happens I'm not doing anything, not browsing any directories or accsessing any data.

Link to comment

Symptoms of disks spinning up for no reason could be

 

1.  not enough ram

2.  too many files

 

There is no measure or benchmark of one vs the other.

 

You can calculate your file count with

 

find /mnt/disk* | wc -l

 

I have 4GB, 3 data disks, but over a million files, cache_dirs is ineffective in my situation.

 

Your's may be pushing a limit, or you may need to adjust the timing of cache_dirs for your server.

Link to comment

I have 16GB RAM and 11 data disks, disk 7-11 are empty at the moment tho.

find /mnt/disk* | wc -l gave me and output of 211745 files (looks like that's not the problem, or is it..?). Maybe I should try adjusting the timing and see if that works, and by timing I assume you mean the -m -M arguments? Is there a limit or do any number work? I really have no idea what to try, so I guess I'll just start playing with it.

Link to comment

Try setting a maximum depth and/or adjusting timing.

 

Frankly, the low memory thing goes away with the 64 bit kernel.

So if you are brave enough to run the beta, try that.

 

Keep in mind that EACH disk that is part of the array, takes up kernel memory in low ram.

Each disk has buffers which are part of the equation.

 

 

 

Link to comment

Hi all,

 

OK, so this is my first post in the forums, so please be gentle!

 

1st up, many thanks to Joe for the cache_dirs script; most definitely a Good Thing to have on an unraid system :-)

 

****SNIP****

 

I am now using the following bit of scripting to stop cache_dirs on my own system (so I don't end up with a modified cache_dirs scripts to maintain):

 

        #!/bin/bash
        cache_dirs -q
        sleep 2
        killall -s SIGTERM -g cache_dirs 

 

But if Joe felt it worthwhile, I think the 'quit' handler could simply be tweaked to use

 

        killall -g "$program_name" 

 

instead of

 

        kill "$lock_pid" 

 

I thought that I should probably report this, as it could be an issue that someone might run in to (albeit infrequently) that could end up with an array disk being ungracefully unmounted, thus triggering a parity check on the next boot for no obvious or easily-identified reason.

 

Editted to include the possible tweak, and again to make it use $program_name not $lock_pid

 

 

hi, ok, sooo..... just in case anyone has read my last post..... erm :-[ ....so... obviously I don't use the webgui very much... turns out that just issuing

killall -g cache_dirs

can end up b0rking emhttp...  :-/ Apart from that, it seems to work perfectly!  :'(

 

 

****SNIP****

 

Turns out that if you start cache_dirs by doing

setsid /path/to/cache_dirs -w ... 

then cache_dirs gets its own process group, and killall works as expected, even when stopping the array from array event scripts :-)

 

So sorry if anyone has run into this.

 

-Jo

 

 

Awesomeness!! thank you for your efforts into this - experienced exactly the same. Will implement the two changes , and report back if anything is still going wrong.

 

Link to comment
  • 2 weeks later...

JoeL because Im using a Mac to access my unRAID server I always end up with a load of .TemporaryItems folders over my disks and cache dirs always scans them. I don't know if it makes any difference but is there a way to exclude these folders from being scanned or will it make no difference at all.

See first post in thread.
Link to comment

Thanks trurl I did read that and could see the -e to exclude but its written like this -e "*Junk*" so I didn't know if you need the Asterisk and or the speech marks. I mean should I just add -e .TemporaryItems to my go file.

Yes, you use the double-quote "speech marks". The asterisk is a wild-card. It is used to match anything (or nothing), so for example "*Junk*" would match Junk or MyJunk or YourJunkBonds or anything else that had Junk in the middle with anything (or nothing) before it and anything (or nothing) after it.

 

You can use -e ".TemporaryItems" for your case.

Link to comment

I have attached a log that shows that my cache dirs scheduled a read after a spin down of the disks for no apparent reason. Can anyone explain what is happening please.

 

It looks like something is configured to start CacheDirs (using the at command) at 8:40pm???  The spin downs don't seem related.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.