cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up


Recommended Posts

Oh right, two threads...

 

I'm not sure anybody is maintaining the cache-dirs script, but I am the last one who got their hands dirty in the script. Honestly I'm really sick and tired of the bash script, bash is a terrible language for a script this long, and I made the script a lot longer by adding a lot of adaptive logic to it. But anyway it would be nice if the script works, but it does seem to work quite well for me.

 

The dynamix plugin contains the standalone cache-dirs script in an updated version, and the plugin itself is just a wrapper to run it and configure via GUI. The script in this thread has not been updated for years.

 

I think it would be best to have the cache-dirs conversation in a thread separate from all the other dynamix plugins, or the conversation will be muddled. I think this place is a good place, since Joe seems to not maintain the original script. Now I'm not promising that I will maintain it, but I might listen along.

 

If the problem with the plugin is that it does not start, then the dynamix thread is probably the best place to address the problem.

 

The cache-dirs script (also in the plugin version) can be run manually. The plugin installs the script to /usr/local/bin/cache_dirs and service files to /etc/rc.d/rc.cachedirs

 

 

Fireball3: It sounds like the issue that it doesn't start could be a plugin issue, I'll add a comment in that thread

 

 

Best Alex

Link to comment
10 minutes ago, Alex R. Berg said:

I think it would be best to have the cache-dirs conversation in a thread separate from all the other dynamix plugins, or the conversation will be muddled.

I agree. I would even suggest to have an extra thread for each dynamix plugin...

12 minutes ago, Alex R. Berg said:

If the problem with the plugin is that it does not start, then the dynamix thread is probably the best place to address the problem.

Starting is the one issue.

The second is the locking up and using 100% CPU on one core after some time (when manually started).

 

I reverted to v2.1.1 which doesn't have that issues for now.

Link to comment

Ah, good to know the 2.1.1 doesn't give you trouble. 

 

I'll need the cache-dirs log in /var/log/cache_dirs.log. You need to enable logging first in plugin (or touch the file) to get it written.

 

Also it will be help-ful to see what is spinning, if you want to dig into the issue: 

 

ps -e x -o ppid -o pid -o pgid -o tty -o vsz -o rss -o etime -o cputime -o rgroup -o ni -o fname -o args | grep "cache_dirs\|find\|wc"

 

pstree -p | grep -2 cache_dirs

Link to comment
4 hours ago, Alex R. Berg said:

Ah, good to know the 2.1.1 doesn't give you trouble. 

 

I'll need the cache-dirs log in /var/log/cache_dirs.log. You need to enable logging first in plugin (or touch the file) to get it written.

OK, I'll install the new version, enable logging and give it a reboot.

The plugin won't start presumably, then I will start it manually and after a couple of minutes I will dump the logs.

 

4 hours ago, Alex R. Berg said:

Also it will be help-ful to see what is spinning, if you want to dig into the issue: 

 

ps -e x -o ppid -o pid -o pgid -o tty -o vsz -o rss -o etime -o cputime -o rgroup -o ni -o fname -o args | grep "cache_dirs\|find\|wc"

 

pstree -p | grep -2 cache_dirs

Not sure about this one, it's intended to find the process that's keeping the CPU load pinned?

If so, I have to wait for the "event" to happen, then I'll dump that info.

Link to comment

@Fireball3

 

yeah, ps gives you processes, and the grep filters the ones i'm interested in.

 

pstree gives processes and subprocesses but does not give much info into what is going on.


You'll have to wait till the 100% cpu problem happens. Its not unnormal for high CPU, when all files are cached and during scan. But it shouldn't scan that often, and I doubt it should hit 100% cpu.

 

Best Alex

Link to comment

@Fireball3

 

Things look great.

 

the cache_dirs log in 

20180927-1021\logs\cache_dirs.log

indicate its running in each scan in 0 seconds, and waiting 10 secs in between running.

 

Did you try the ps + pstree ?

 

Regarding your syslog crontab problem I saw in your syslog: your crontab looks fine. Maybe you have a bad script in one of the dirs /etc/cron.*

 

Regarding the startup problem, it does seem like the cache dirs starts and system boot at is should. It starts within seconods after mounting the disks. Why do you think its not running on system boot?

 

Sep 27 10:01:53 Tuerke kernel: REISERFS (device sdi1): Using r5 hash to sort names
Sep 27 10:01:53 Tuerke emhttp: shcmd (80): set -o pipefail ; mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/sdi1 /mnt/cache |& logger
Sep 27 10:01:53 Tuerke emhttp: shcmd (81): sync
Sep 27 10:01:56 Tuerke emhttp: shcmd (82): mkdir /mnt/user0
Sep 27 10:01:56 Tuerke emhttp: shcmd (83): /usr/local/sbin/shfs /mnt/user0 -disks 4094 -o noatime,big_writes,allow_other   |& logger
Sep 27 10:01:56 Tuerke emhttp: shcmd (84): mkdir /mnt/user
Sep 27 10:01:56 Tuerke emhttp: shcmd (85): /usr/local/sbin/shfs /mnt/user -disks 4095 1024000000 -o noatime,big_writes,allow_other  -o remember=0  |& logger
Sep 27 10:01:56 Tuerke emhttp: shcmd (86): cat - > /boot/config/plugins/dynamix/mover.cron <<< "# Generated mover schedule:#012 /usr/local/sbin/mover |& logger#012"
Sep 27 10:01:56 Tuerke emhttp: shcmd (87): /usr/local/sbin/update_cron &> /dev/null
Sep 27 10:01:56 Tuerke s3_sleep: ----------------------------------------------
Sep 27 10:01:56 Tuerke s3_sleep: command-args=-C 2 -h 00 -h 17 -h 18 -h 19 -h 20 -h 21 -h 22 -h 23 -a -m 15 -n -e eth0 -N 12500 -i 192.168.178.150 -i 192.168.178.153 -l -L -D 0
Sep 27 10:01:56 Tuerke s3_sleep: action mode=shutdown
Sep 27 10:01:56 Tuerke s3_sleep: check disks status=yes
Sep 27 10:01:56 Tuerke s3_sleep: check network activity=yes
Sep 27 10:01:56 Tuerke s3_sleep: check active devices=192.168.178.150 192.168.178.153
Sep 27 10:01:56 Tuerke s3_sleep: check local login=yes
Sep 27 10:01:56 Tuerke s3_sleep: check remote login=yes
Sep 27 10:01:56 Tuerke s3_sleep: version=3.0.6
Sep 27 10:01:56 Tuerke s3_sleep: ----------------------------------------------
Sep 27 10:01:56 Tuerke s3_sleep: included disks=sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn
Sep 27 10:01:56 Tuerke s3_sleep: excluded disks=sda
Sep 27 10:01:56 Tuerke s3_sleep: ----------------------------------------------
Sep 27 10:01:56 Tuerke s3_sleep: s3_sleep process ID 4470 started, To terminate it, type: s3_sleep -q
Sep 27 10:01:56 Tuerke emhttp: Starting services...
Sep 27 10:01:56 Tuerke emhttp: nothing to sync
Sep 27 10:01:56 Tuerke sudo:     root : TTY=unknown ; PWD=/ ; USER=nobody ; COMMAND=/bin/bash -c /usr/local/emhttp/plugins/unbalance/unbalance -port 6237

 

Best Alex

Link to comment
1 hour ago, Alex R. Berg said:

the cache_dirs log in 

20180927-1021\logs\cache_dirs.log

indicate its running in each scan in 0 seconds, and waiting 10 secs in between running.

That was after I started it manually.

1 hour ago, Alex R. Berg said:

Did you try the ps + pstree ?

See attached.

1 hour ago, Alex R. Berg said:

Regarding your syslog crontab problem I saw in your syslog: your crontab looks fine. Maybe you have a bad script in one of the dirs /etc/cron.*

in /etc/cron.d is the attached file "root"

1 hour ago, Alex R. Berg said:

Regarding the startup problem, it does seem like the cache dirs starts and system boot at is should. It starts within seconods after mounting the disks. Why do you think its not running on system boot?

Because it says "Status: stopped".

There is nothing cached. Once the drives spin down I have to wait for the drives to spin up when browsing the cached smb shares.

Is there a way to check how many entries are actually cached?

I'm checking the memory chart within the stats. When using 2.1.1 almost all of my 8GB RAM is "cached".

When starting with 2.2.0j there is nothing cached. The only difference is the plugin version. This might not be a good approach though.

 

Not sure about the log snippet you posted. I don't see any cache_dirs entry.

 

Also attached a diagnostics after 10 hours uptime.

No abnormal CPU load registered yet.

 

root

ps.log

pstree.log

tuerke-diagnostics-20180927-2014.zip

Link to comment

Here's some relevant info for you to learn to debug the problem

 

pstree 

      |-cache_dirs(10139)---sleep(30575) 

This says which subprocesses cache_dirs currently has. Right now its sleeping. Cache-dirs does all its work in the 'find' and the 'wc' file-counting sub-process, though the wc is only executid when the script is first started to count files at each depth.

 

The ps-command gives info into how subcommands are executed, but there were no sub-commands.

 

Why do you think its running 100% cpu?

 

crontab

I didn't know about the existence of the file : /etc/cron.d, good for me to know. The last line is incorrect, hence the complaints in your syslogs. Something on your system must be messing it up. You could debug what it is if you want, by removing other plugins and reenabling one by one or doing binary search enabling half at a time. This is how my file looks:

# Generated cron settings for plugin autoupdates
0 0 * * * /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1
# Generated ssd trim schedule:
0 0 * * * /sbin/fstrim -a -v | logger &> /dev/null

# Generated docker monitoring schedule:
10 0 1 * * /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/dockerupdate.php check &> /dev/null

# Generated mover schedule:
40 3 * * * /usr/local/sbin/mover &> /dev/null

# Generated plugins version check schedule:
10 0 1 * * /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugincheck &> /dev/null

# Generated array status check schedule:
20 0 * * 1 /usr/local/emhttp/plugins/dynamix/scripts/statuscheck &> /dev/null
Quote

Because it says "Status: stopped".

There is nothing cached. Once the drives spin down I have to wait for the drives to spin up when browsing the cached smb shares.

Is there a way to check how many entries are actually cached?

I'm checking the memory chart within the stats. When using 2.1.1 almost all of my 8GB RAM is "cached".


The status could be lying. Running the ps or pstree command tells you the truth. Its a good indication though that nothing is cached, and drives are spun down, and spinning up when you access drives. Linux caches lots of stuff in memory, it might be different stuff is cached. Best indication is disks spinning up when you access dirs that should be cached. Note only dirs are cached, not file content, but I'm sure you know that.

The cache-dirs log gives you info about what its doing, and how many files you have cached at each level.

2018.09.27 10:26:06 Executed find in (0s) 00.10s, wavg=00.10s Idle____________  depth 9999 slept 10s Disks idle before/after 1538036766s/1538036766s suc/fail cnt=69/70/0 mode=4 scan_tmo=150s maxCur=9999 maxWeek=9999 isMaxDepthComputed=1 CPU= 4%, filecount[9999]=16460 

This line from a previous diagnostics cache-dirs log indicated

filecount[1]=0
filecount[2]=1213
filecount[3]=7832
filecount[4]=11794
filecount[5]=14207
filecount[6]=15873
filecount[7]=16460
filecount[9999]=16460

This means 1213 files at depth 2 (/mnt/user/share/X) and 15873 files at depth 6 (/mnt/user/share/X/Y/Z/A/B). I'm probably off by one or two on the counts, but this gives you an idea. If the next scan says 'Disks idle before/after 9999s/9999s' then disks are idle or spun down, and scanning didn't bring up the disks. 

 

You could reduce Cache Pressure to 1 while debugging, but you might run out of memory crashing the system, especially if you reduce to 0.

 

In the last log at 20:14 cache-dirs was also started at startup. Disks where spun down later. But there's no cache_log. Maybe you forgot it? I just played with the diagnostics myself, I can see its not automatically included. It would be cool if the plugin could tell unRaid to include the log-files in the diagnostics, but I don't know if that is possible. 

Best Alex

 

 

 

  • Upvote 1
Link to comment

Interesting with the 'defunct' processes. I've seen that myself once a long time ago. 'defunct' processes are apparently dead processes, kept around because of some crash or something because their parent exits, ie. cache_dirs. However those processes should not be running, so cannot generate CPU-load. I need to see what is causing the cpu-load. Can you include the following, and keep doing it until you get 100%: 

 

 top -b -n1 |head -20 > top-output.txt; cat top-output.txt
 

I have attached two files containing all the diagnostics I need, and also zips the zip file. Run cache_dirs_diagnostics_generate_zip.sh and keep running it until you see the 100% cpu process in the list of top-processes.

 

2.2.0j added process management where each disk is scanned in its own process. Apparently there's a problem with that. Did you stop the array at any time before the defunct processes appear? (probably not)

 

How many disks do you have? (not that it should matter)

 

Best Alex

 

 

 

cache_dirs_diagnostics.sh

cache_dirs_diagnostics_generate_zip.sh

Link to comment

I've made some changes to cache_dirs, and attached the file. Download it to somewhere, and then do

 

sudo chmod 755 ./cache_dirs; ./cache_dirs -q; ./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on -T

 

The last -T paramater disabled multithreaded disk-scan, and makes it more like the last 2.1.0 cache-dirs. See if the problem appears with and without -T

 

to get diagnostics you can now run (on the new attached cache_dirs)

./cache_dirs -L

 

When you reboot the machine, or start cache_dirs from the plugin, the installed cache_dirs plugin will be running, not the attached script.

cache_dirs

Link to comment

OK, thanks for your effort!

Here is the plan:

1. check if the GUI is lying. Execute the ps and pstree right after reboot to have evidence.

2. kill all running instances of cache_dirs

./cache_dirs -q

3. run the one you posted above

-without -T (wait until it locks up again, dump diagnostics)

./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on

-with -T (see if it locks up again, dump diagnostics)

./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on -T

-without -T but with memory limit -U100000 if it still locks up

./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on -U 100000
Quote

 

I noticed the defunct process happened at 3:42 which means right after moving started. But I still cannot reproduce the issue.

 

I have to check the log, but my mover has no timed job. I fire up the move by hand if necessary.

Edit:

I picked the wrong diagnostic.zip in the hurry. The file is from yesterday evening 20:14...

 

Edit2:

I was up late yesterday.

By that time, the S3sleep plugin should have shutdown the server but it wasn't.

That was suspicious, but the CPU wasn't locked up by that time. I will post the correct diagnostics dumped this morning when I'm home later. Maybe there is some clue within. The S3sleep is logging if a condition is preventing shutdown.

Edited by Fireball3
Link to comment

yep, exactly, try that and we'll see the result. I'm very interested in seeing a snapshot af the 100% cpu spike, and the cachelog at the same time. But the diagnostics should produce that, but your job is to fire it when its at 100% cpu. 

 

./cache_dirs -L

 

Also I'm very interested in seeing it at a time where there are defunct processes again, and also whether the 100% cpu problem always coincide with the defunct processes.

 

I found a better way to do the ps-command, I've updated cache_dirs attached, so its log-diagnostics is better.

 

If you run the attached

find_defunct.sh 

then it'll wait and do the scan when it finds defunct processes (if it works)

cache_dirs

find_defunct.sh

Link to comment

Here I am.

I updated the diagnostics in the post of this morning.

Can't see anything of use.

 

Attached to this post the first cache_dirs_diagnostics together with a screen shot of the cache_dirs GUI.

I'm not sure if the log generated with -L is showing the running plugin or the instance that is pulling the log data!?

Therefore I attached the ps and pstree output.

cache_dirs -q says it it not running.

This means, the GUI is not lying.

 

I fired up the modified version as planned.

./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on

The second file show the logs right after the script start.

The GUI is also showing the plugin status "running".

The defunct.sh is also running.

 

I think I managed to remove the mover & logger spamming the log.

The mover settings had logging enabled.

# Generated mover schedule:
0 0 1 * * /usr/local/sbin/mover |& logger

Once it is set to disabled the file \flash\config\plugins\dynamix\mover.cron

contains

# Generated mover schedule:
0 0 1 * * /usr/local/sbin/mover &> /dev/null

It seems there is no switch to fully disable the mover.

cache_dirs_diagnostics.zip

cache_dirs_diagnostics_arberg_mod_28.09.2018_2252.zip

Mover.jpg

Link to comment
8 hours ago, Alex R. Berg said:

You ran the cache_dirs -L diagnostics immediately after starting cache_dirs it seems.

No, the dynamix-plugin didn't start at start-up.

The cache_dir -L was the only instance running.

 

Your modified script invoked with

./cache_dirs -i Video -e Audio -e Backup -e Fotos -e Spiele -l on

didn't block the shutdown.

I will confirm with other runs.

How to instiall the modded version instead the onf of the dynamix suite?

And finally, what's wrong with the autostart of the plugin?

Link to comment

It is very surprising to me if the 2.2.0a version I sent does not give the same issues when executed without the -T switch. The only change is how I report idle time of disks, I cap the reporting at 9999s when to avoid the crazy numbers you got when all disks where spun down. I wonder it its somehow related to the spun-down situation.

 

I've create a test-plugin for you and added logging during startup.

 

Copy txz to \\tower\flash\config\plugins\dynamix.cache.dirs 

 

and replace the plugins\dynamix.cache.dirs.plg

 

(and backup the originals so you can roll back).

 

The updated plugin has the multithreaded param in the plugin-page http://tower/Settings/FolderCachingSettings

 

I've also attached a history of cache-dirs scripts, so you can test them individually for the defunct tests in case you don't have easy access to the cache_dirs via plugins.

 

Best Alex

dynamix.cache.dirs.txz

dynamix.cache.dirs.plg

cache_dirs-scripts.zip

Edited by Alex R. Berg
Link to comment

Great to hear its running now. I guess that means it somehow died with the old version.

 

I just noticed your are only caching your share 'Videos'. The excludes have no effect, because you use include.

This is the list of directories maching your include/exclude criteria, as seen in syslog. This is probably what you want, I just wanted to let you know, now that I saw it.

Sep 30 23:26:00 Tuerke cache_dirs: ---------- Caching Directories ---------------
Sep 30 23:26:00 Tuerke cache_dirs: Video
Sep 30 23:26:00 Tuerke cache_dirs: ---------------------------------------------- 

Best Alex

Link to comment
1 hour ago, Alex R. Berg said:

The excludes have no effect, because you use include.

OK, I can adjust that.

 

Is the new version available via the plugin update?

A friend of mine also had the plugin updated and then it wouldn't start.

I could tell him to update as the bug seems fixed.

 

Edit:

I suspect some content is not cached.

Is there a way to check what drives are and what not?
When I'm accessing the share - just browsing, it will pause to spin up some discs.

I checked the GUI and found only 3 disks spinning.

The share definitely spans mor than those 3 disks.

Edited by Fireball3
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.