Viewing files on a share causes all disks to spin up


Recommended Posts

Initially I created a "Movies" share with a split level of 2. In the "Movies" share I had my movies sorted by type (Blu-ray, DVD, DVD-Compressed, Videos, ect...).

 

But I'm starting to think that wasn't such a great idea. I'm now up to 9 data disks on my unRAID server. I have the Movies share span across every disk on my server. So... if I open the Movies share folder on my Windows PC while all my disks are spun down, the disks don't spin up (this is good). But if I open the "DVD" folder inside the Movies share all 9 of my data disks spin up (not something I want to happen).

 

While it would be nice if unRAID could create a cache image of all the files on the disks so no disks would need to be spun up while browsing file structures, I don't think that's currently possible or planned for future development.

 

So... I'm thinking the best solution is to remove my "movies" share. I then move DVD's to certain disks I choose, Blu-ray's to others, and so on. I assign a share such as "DVD" and give that disk1,disk2,disk3 in this example. That way, when I open the DVD share, only those 3 disks will spin up instead of all 9. While I like the organization of having all the movies in the "Movies" share, I'd rather not force all my HD's to spin up when that share is accessed.

 

Is that the best way to get around all the disks spinning up in my situation? Has there been any mention of not requiring the disks to spin up to view the file structures on the disks?

Link to comment
While it would be nice if unRAID could create a cache image of all the files on the disks so no disks would need to be spun up while browsing file structures,

actually, it does... IF you have enough RAM... Weebotech posted about this somewhere, you might want to search his posts...

 

;)

 

Link to comment

I'll add...

 

I Have 8GB of ram with PAE enabled in the kernel.

 

In addition I have tuned the kernel with

 

sysctl vm.vfs_cache_pressure=0

 

I also installed the slocate package.

Upon boot up and once every night I run

/usr/bin/updatedb -c /etc/updatedb.conf

 

This scans all the disks and creates the locate database.

 

What this does is seed the kernel caches with all of the directory and file entries in the system.

With the extra ram I hardly have to spin up unless I run some massive rsync or copy that flushes the cache.

 

In the future I'll release a tool ftwd which will scan the /mnt tree to try and keep the directory entries in memory as much as possible.

 

The key point here is you need enough memory.

 

Not everyone needs 8GB so I would not recommend going that high.

 

if you can afford the 4GB then drop it in and you will see less spin ups as long as you do the initial scans at boot up and once a day.

 

http://packages.slackware.it/package.php?q=12.0/slocate-3.1-i486-1

 

Here is my script to install and initialize slocate

 

#!/bin/sh

PKGDIR=/boot/custom/usr/share/packages

PACKAGE=slocate-3.1-i486-1
if [ ! -f /var/log/packages/$PACKAGE ]
   then installpkg ${PKGDIR}/$PACKAGE.tgz
fi

rm -rf /usr/doc/slocate-3.1
rm -rf /usr/man/man1/updatedb.1.gz /usr/man/man1/slocate.1.gz

batch <<-EOF
sleep 90
exec /usr/bin/updatedb -c /etc/updatedb.conf
EOF

 

you can add the following line to this script if you want.

sysctl vm.vfs_cache_pressure=0

 

 

- at vfs_cache_pressure=0 we don't shrink dcache and icache at all.

- at vfs_cache_pressure=100 there is no change in behaviour.

- at vfs_cache_pressure > 100 we reclaim dentries and inodes harder.

Link to comment

Thanks for the replies.

 

I may try doing what you did Weebo... but doing what you did still causes all my disks to be spun up once a day. Optimally (doing what you did) all the disks would be scanned at startup and if anything is altered or added things would be updated on the fly. This way, the script wouldn't need to run every night. I'd really like something where if I don't add or read from a certain drive, it stays spun down. This way, some of my drives may stay spun down for 2-3 weeks at a time.

 

Could your slocate package also have problems with 48GB blu-ray ISO's? While transferring a 4-6GB movie won't use up the 8GB of ram, maybe the larger files I have on my server would.

 

I currently have 2GB of ram in my server. I have another 2GB of sticks that I can add for 4GB total. But I'm not really sure if running slocate once a day is something I want to do. I'd really like all spun down disks to stay spun down until I actually need to read or write data to them.

Link to comment

I'm a bit confused how you are using slocate.

 

It is a package that scans the file hierarchy and creates a  database you can then query on the "location" of a file.

Is this somehow tied into samba? in such a way that when it does a directory listing, it uses the slocate database instead of the file system  directory structure?  As far as I've been able to tell, it does not work that way.

 

Yes, the slocate process will traverse the file system reading the directory information into memory.  If nothing else uses the memory, it is there for the next time you do a directory listing.

 

On the other hand, a simple

ls -R /mnt/user/Movies 1>/dev/null 2>&1

 

Will do exactly the same thing.  If the directory inode data is in memory, the disk does not spin up.  If it is not, it does spin up.

 

On my system, I use a script I wrote that pings my media players.  If one of them is online, it basically does a loop like this, keeping the directory listing in memory.

pseudo-code follows:

while media-players-detected-online and unraid-system-status-online

do

    ls -R /mnt/user/Movies

    sleep 30

done

 

How are you using the slocate command, other than its building of the list of files on your system.

do you, or your media player type "slocate xyzzy.avi" to locate a movie?  (as far as I know, that is how the database of file locations you create is used)

 

Joe L.

If no movie is being played, the disks go to sleep.  If a movie uses up the cache, then one or more disks will spin up to satify the ls command.

Link to comment

I use locate/slocate in three ways.

 

1. To seed the cache with all files in all directories upon boot up. (assuming cache_pressure=0).

2. when I need to search for a file I type locate <filename>.  It will access the condensed database.

    If we had web hooks, I would write a cgi for it and supply a link to the file.

3. I rescan with slocate once a day. If I have not copied any huge files, the drives do not spin up because the directory entries are in memory.

 

Therefore I would suggest trying slocate out at startup and see how long it works until your drives start spinning up.

 

One of these days I'll finish the file tree walk daemon which will accomplish the same thing.

It will even let you run an external job if it files directories and/or files have changed.

Good way to keep disks in sync automagically.

 

On the other hand, a simple

ls -R /mnt/user/Movies 1>/dev/null 2>&1

 

A simple as this is, it does not satisfy my needs and it's a waste of resources because you throw the output away.

I have an ftp server directory with hudreds of downlodas, Image directory with years of pictures, over 60.000 mp3s and all my movies....

 

locate allows me to scan(grep) for files and find their locations very rapidly. (mere seconds)

If the directory entries are in ram, re-running the updatedb command takes less then 5 seconds without spinning up the drives.

 

On my system, I use a script I wrote that pings my media players.  If one of them is online, it basically does a loop like this, keeping the directory listing in memory.

 

if you had more memory forced drive activation could be defeated.

Best way to keep disks online is to use the hdparm command, like a forced spin up...

 

hdparm -S242 /dev/sd?

or even do smartctl queries...

 

I think the real answer is the file tree walk tool. I have it done, I just have to package it up as an installable slackware package.

 

(side tracked with rtorrent, which has been working so very well. downloaded 30G in the last week without a hitch.

Link to comment
  • 1 month later...
  • 3 weeks later...

I think the real answer is the file tree walk tool. I have it done, I just have to package it up as an installable slackware package.

 

WeeboTech, any chance of getting this package from you soon?

 

I made some progress today. There have been things baffling me for a long time.

Today I had to look at the raw ftw routines in the libraries.

I realized that the lstat call is failing because the size of the files is > 2GB.

This is also causing a problem with slocate.

 

I rewrote part of the daemon to use ftw64 and it looks good so far.

I ran the ftwd with an interval of 10 seconds on the /mnt/user tree.

I copied 2 dvd's from one disk to the other and the ftwd kept the directory entries in memory.

I.E. My other drive did not spin up.

I think I'm almost there.

 

 

Link to comment

superb  ;D

 

Update: A question for you WeeboTech. Could you daemon cache ls to disk and replay it on first run. I am thinking that caching the fs to ram is great but it takes ages on first boot up and thats alot of disk thrashing to produce only a small data set which hardly changes. I don't know how these things work so this might be a complete nonsense request.

Link to comment

Could you daemon cache ls to disk and replay it on first run.

 

I don't see the benefit of this. Let me explain.

By setting the kernel parameter

 

sysctl vm.vfs_cache_pressure=0

 

You tell the kernel to keep as much directory cache information in ram as possible. (in "kernel ram')

 

What my daemon does is scan the file system repeatedly doing a stat64() call on each file.

This in turn puts the information about the files into the dentry "kernel ram"

 

The program itself does not store or cache any file information.

There is nothing I could load to do so.

It's the active act of statting each file that keeps that directory entry in ram accessed and current.

By keeping it current and accessed, the kernel chooses not to free it and use it for something else.

 

So no matter what, you are going to have to scan the filesystem upon boot up anyway.

Link to comment

OK i see your point.

 

I am extremely keent to try your daemon as evern with 4GB of ram and ls -R running every minute and vm.vfs_cache_pressure=0 i still lose the inode cache.

 

I set the daemon to cycle every 10 seconds.

 

root@Atlas:/mnt/disk2/Videos/Compressed# ps -ef | grep ftwd

root    13132    1  5 Aug15 ?        01:16:48 ./ftwd -i 10 -d /mnt/user

 

 

Link to comment

I had an idea on how to log and subsequently the efficiency of this tool.

 

If we parse /proc/slabinfo for certain information and then echo the numerical diffs to syslog (not echoing if there is no diff to save syslog space) we can keep a quantitative eye on how well inodes are being cached.

 

To prove the theory I first cached alot of my data using the ls -R method and  dumped slabinfo > 1slabinfo_before_full_cache.txt

Now i cache all my data. and dumped slabinfo >2slabinfo_after_full_cache.txt

And lastly yet another full cache, dumping slabinfo >3slabinfo_after_another_full_cache.txt

 

You can see big changes in the:

 

fuse_request

fuse_inode

reiser_inode_cache

radix_tree_node

dentry

 

between the first dump and the second and then hardly any changes between the second and the third dump.

 

I believe analysis of this data over time will highlight the areas of the kernel we need to refine to best suit our needs and it should also show any catastrophic failures in the cached data which would cause the daemon to spin up all disks.

 

What do you think?

 

Link to comment

I had an idea on how to log and subsequently the efficiency of this tool.

 

If we parse /proc/slabinfo for certain information and then echo the numerical diffs to syslog (not echoing if there is no diff to save syslog space) we can keep a quantitative eye on how well inodes are being cached.

 

To prove the theory I first cached alot of my data using the ls -R method and  dumped slabinfo > 1slabinfo_before_full_cache.txt

Now i cache all my data. and dumped slabinfo >2slabinfo_after_full_cache.txt

And lastly yet another full cache, dumping slabinfo >3slabinfo_after_another_full_cache.txt

 

You can see big changes in the:

 

fuse_request

fuse_inode

reiser_inode_cache

radix_tree_node

dentry

 

between the first dump and the second and then hardly any changes between the second and the third dump.

 

I believe analysis of this data over time will highlight the areas of the kernel we need to refine to best suit our needs and it should also show any catastrophic failures in the cached data which would cause the daemon to spin up all disks.

 

What do you think?

 

Actually, you have access to the number of blocks read, and the number of blocks written to the disks... and you can do it without spinning them up.

 

The statistics are available in /proc/diskstats 

 

Therefore, a process that looks at the stats every minute, and looks at the blocks written/read, detects when they change from the prior stat, records the most recent time a read/write value changed on a disk, you can then detect the most recent time a given disk was accessed.  When a disk is not accessed in an interval desired, you can spin it down.

 

Joe L.

Link to comment

So essentially we are saying we can write our own spin down and disk usage monitoring daemonI see alot of advantages with this.

 

We would also need to monitor the memory usage as well to both measure the efficiency of the ftw daemon and analyse the memory pruning algorithm. The reason i say this is that at this point I am pretty certain it is not going to work 100%. For instance there is no way i fill up my ram with samba type information in one minute however if i write alot of data to a single disk share I can cause the ls -R to spin up every disk. With our current understanding of the memory model and kernel tunables this should not happen... but it does and I can reproducible these results 100% of the time.

Link to comment

I can cause the ls -R to spin up every disk.

 

Shorten the time lil by lil.

This is what I am testing.

So far, If I access the disks via /mnt/user and have the ftwd scanning the same tree, disks stay spun down.

I've switched to pushing files to /mnt/disk3, and since then disk1 is spinning up.

So I'm still in the process of testing things out. Right now I'm at 10 seconds. I think I may drop it down to 5 seconds to see if that changes things.

 

In the meantime I had to download some torrents so I'll have to wait for the next test.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.