guitarlp Posted June 25, 2008 Share Posted June 25, 2008 Initially I created a "Movies" share with a split level of 2. In the "Movies" share I had my movies sorted by type (Blu-ray, DVD, DVD-Compressed, Videos, ect...). But I'm starting to think that wasn't such a great idea. I'm now up to 9 data disks on my unRAID server. I have the Movies share span across every disk on my server. So... if I open the Movies share folder on my Windows PC while all my disks are spun down, the disks don't spin up (this is good). But if I open the "DVD" folder inside the Movies share all 9 of my data disks spin up (not something I want to happen). While it would be nice if unRAID could create a cache image of all the files on the disks so no disks would need to be spun up while browsing file structures, I don't think that's currently possible or planned for future development. So... I'm thinking the best solution is to remove my "movies" share. I then move DVD's to certain disks I choose, Blu-ray's to others, and so on. I assign a share such as "DVD" and give that disk1,disk2,disk3 in this example. That way, when I open the DVD share, only those 3 disks will spin up instead of all 9. While I like the organization of having all the movies in the "Movies" share, I'd rather not force all my HD's to spin up when that share is accessed. Is that the best way to get around all the disks spinning up in my situation? Has there been any mention of not requiring the disks to spin up to view the file structures on the disks? Quote Link to comment
jimwhite Posted June 25, 2008 Share Posted June 25, 2008 While it would be nice if unRAID could create a cache image of all the files on the disks so no disks would need to be spun up while browsing file structures, actually, it does... IF you have enough RAM... Weebotech posted about this somewhere, you might want to search his posts... Quote Link to comment
WeeboTech Posted June 25, 2008 Share Posted June 25, 2008 I'll add... I Have 8GB of ram with PAE enabled in the kernel. In addition I have tuned the kernel with sysctl vm.vfs_cache_pressure=0 I also installed the slocate package. Upon boot up and once every night I run /usr/bin/updatedb -c /etc/updatedb.conf This scans all the disks and creates the locate database. What this does is seed the kernel caches with all of the directory and file entries in the system. With the extra ram I hardly have to spin up unless I run some massive rsync or copy that flushes the cache. In the future I'll release a tool ftwd which will scan the /mnt tree to try and keep the directory entries in memory as much as possible. The key point here is you need enough memory. Not everyone needs 8GB so I would not recommend going that high. if you can afford the 4GB then drop it in and you will see less spin ups as long as you do the initial scans at boot up and once a day. http://packages.slackware.it/package.php?q=12.0/slocate-3.1-i486-1 Here is my script to install and initialize slocate #!/bin/sh PKGDIR=/boot/custom/usr/share/packages PACKAGE=slocate-3.1-i486-1 if [ ! -f /var/log/packages/$PACKAGE ] then installpkg ${PKGDIR}/$PACKAGE.tgz fi rm -rf /usr/doc/slocate-3.1 rm -rf /usr/man/man1/updatedb.1.gz /usr/man/man1/slocate.1.gz batch <<-EOF sleep 90 exec /usr/bin/updatedb -c /etc/updatedb.conf EOF you can add the following line to this script if you want. sysctl vm.vfs_cache_pressure=0 - at vfs_cache_pressure=0 we don't shrink dcache and icache at all. - at vfs_cache_pressure=100 there is no change in behaviour. - at vfs_cache_pressure > 100 we reclaim dentries and inodes harder. Quote Link to comment
guitarlp Posted June 25, 2008 Author Share Posted June 25, 2008 Thanks for the replies. I may try doing what you did Weebo... but doing what you did still causes all my disks to be spun up once a day. Optimally (doing what you did) all the disks would be scanned at startup and if anything is altered or added things would be updated on the fly. This way, the script wouldn't need to run every night. I'd really like something where if I don't add or read from a certain drive, it stays spun down. This way, some of my drives may stay spun down for 2-3 weeks at a time. Could your slocate package also have problems with 48GB blu-ray ISO's? While transferring a 4-6GB movie won't use up the 8GB of ram, maybe the larger files I have on my server would. I currently have 2GB of ram in my server. I have another 2GB of sticks that I can add for 4GB total. But I'm not really sure if running slocate once a day is something I want to do. I'd really like all spun down disks to stay spun down until I actually need to read or write data to them. Quote Link to comment
Joe L. Posted June 25, 2008 Share Posted June 25, 2008 I'm a bit confused how you are using slocate. It is a package that scans the file hierarchy and creates a database you can then query on the "location" of a file. Is this somehow tied into samba? in such a way that when it does a directory listing, it uses the slocate database instead of the file system directory structure? As far as I've been able to tell, it does not work that way. Yes, the slocate process will traverse the file system reading the directory information into memory. If nothing else uses the memory, it is there for the next time you do a directory listing. On the other hand, a simple ls -R /mnt/user/Movies 1>/dev/null 2>&1 Will do exactly the same thing. If the directory inode data is in memory, the disk does not spin up. If it is not, it does spin up. On my system, I use a script I wrote that pings my media players. If one of them is online, it basically does a loop like this, keeping the directory listing in memory. pseudo-code follows: while media-players-detected-online and unraid-system-status-online do ls -R /mnt/user/Movies sleep 30 done How are you using the slocate command, other than its building of the list of files on your system. do you, or your media player type "slocate xyzzy.avi" to locate a movie? (as far as I know, that is how the database of file locations you create is used) Joe L. If no movie is being played, the disks go to sleep. If a movie uses up the cache, then one or more disks will spin up to satify the ls command. Quote Link to comment
WeeboTech Posted June 25, 2008 Share Posted June 25, 2008 I use locate/slocate in three ways. 1. To seed the cache with all files in all directories upon boot up. (assuming cache_pressure=0). 2. when I need to search for a file I type locate <filename>. It will access the condensed database. If we had web hooks, I would write a cgi for it and supply a link to the file. 3. I rescan with slocate once a day. If I have not copied any huge files, the drives do not spin up because the directory entries are in memory. Therefore I would suggest trying slocate out at startup and see how long it works until your drives start spinning up. One of these days I'll finish the file tree walk daemon which will accomplish the same thing. It will even let you run an external job if it files directories and/or files have changed. Good way to keep disks in sync automagically. On the other hand, a simple ls -R /mnt/user/Movies 1>/dev/null 2>&1 A simple as this is, it does not satisfy my needs and it's a waste of resources because you throw the output away. I have an ftp server directory with hudreds of downlodas, Image directory with years of pictures, over 60.000 mp3s and all my movies.... locate allows me to scan(grep) for files and find their locations very rapidly. (mere seconds) If the directory entries are in ram, re-running the updatedb command takes less then 5 seconds without spinning up the drives. On my system, I use a script I wrote that pings my media players. If one of them is online, it basically does a loop like this, keeping the directory listing in memory. if you had more memory forced drive activation could be defeated. Best way to keep disks online is to use the hdparm command, like a forced spin up... hdparm -S242 /dev/sd? or even do smartctl queries... I think the real answer is the file tree walk tool. I have it done, I just have to package it up as an installable slackware package. (side tracked with rtorrent, which has been working so very well. downloaded 30G in the last week without a hitch. Quote Link to comment
xbit Posted July 29, 2008 Share Posted July 29, 2008 I think the real answer is the file tree walk tool. I have it done, I just have to package it up as an installable slackware package. WeeboTech, any chance of getting this package from you soon? Quote Link to comment
fitbrit Posted July 30, 2008 Share Posted July 30, 2008 I' in the same situation, and I think I will just move my data around so that shares spanning all disks are more constricted to specific disks. It makes much more sense that way.... NOW. When I set up my shares, I only had 3 disks and I didn't care so much. Quote Link to comment
NAS Posted July 30, 2008 Share Posted July 30, 2008 Ive started this process as well. I was finding multiple disks spinning up for the most trivial of things. Quote Link to comment
WeeboTech Posted August 16, 2008 Share Posted August 16, 2008 I think the real answer is the file tree walk tool. I have it done, I just have to package it up as an installable slackware package. WeeboTech, any chance of getting this package from you soon? I made some progress today. There have been things baffling me for a long time. Today I had to look at the raw ftw routines in the libraries. I realized that the lstat call is failing because the size of the files is > 2GB. This is also causing a problem with slocate. I rewrote part of the daemon to use ftw64 and it looks good so far. I ran the ftwd with an interval of 10 seconds on the /mnt/user tree. I copied 2 dvd's from one disk to the other and the ftwd kept the directory entries in memory. I.E. My other drive did not spin up. I think I'm almost there. Quote Link to comment
NAS Posted August 16, 2008 Share Posted August 16, 2008 superb Update: A question for you WeeboTech. Could you daemon cache ls to disk and replay it on first run. I am thinking that caching the fs to ram is great but it takes ages on first boot up and thats alot of disk thrashing to produce only a small data set which hardly changes. I don't know how these things work so this might be a complete nonsense request. Quote Link to comment
WeeboTech Posted August 16, 2008 Share Posted August 16, 2008 Could you daemon cache ls to disk and replay it on first run. I don't see the benefit of this. Let me explain. By setting the kernel parameter sysctl vm.vfs_cache_pressure=0 You tell the kernel to keep as much directory cache information in ram as possible. (in "kernel ram') What my daemon does is scan the file system repeatedly doing a stat64() call on each file. This in turn puts the information about the files into the dentry "kernel ram" The program itself does not store or cache any file information. There is nothing I could load to do so. It's the active act of statting each file that keeps that directory entry in ram accessed and current. By keeping it current and accessed, the kernel chooses not to free it and use it for something else. So no matter what, you are going to have to scan the filesystem upon boot up anyway. Quote Link to comment
NAS Posted August 16, 2008 Share Posted August 16, 2008 OK i see your point. I am extremely keent to try your daemon as evern with 4GB of ram and ls -R running every minute and vm.vfs_cache_pressure=0 i still lose the inode cache. Quote Link to comment
WeeboTech Posted August 16, 2008 Share Posted August 16, 2008 OK i see your point. I am extremely keent to try your daemon as evern with 4GB of ram and ls -R running every minute and vm.vfs_cache_pressure=0 i still lose the inode cache. I set the daemon to cycle every 10 seconds. root@Atlas:/mnt/disk2/Videos/Compressed# ps -ef | grep ftwd root 13132 1 5 Aug15 ? 01:16:48 ./ftwd -i 10 -d /mnt/user Quote Link to comment
NAS Posted August 16, 2008 Share Posted August 16, 2008 now you are just being a tease Quote Link to comment
NAS Posted August 17, 2008 Share Posted August 17, 2008 I had an idea on how to log and subsequently the efficiency of this tool. If we parse /proc/slabinfo for certain information and then echo the numerical diffs to syslog (not echoing if there is no diff to save syslog space) we can keep a quantitative eye on how well inodes are being cached. To prove the theory I first cached alot of my data using the ls -R method and dumped slabinfo > 1slabinfo_before_full_cache.txt Now i cache all my data. and dumped slabinfo >2slabinfo_after_full_cache.txt And lastly yet another full cache, dumping slabinfo >3slabinfo_after_another_full_cache.txt You can see big changes in the: fuse_request fuse_inode reiser_inode_cache radix_tree_node dentry between the first dump and the second and then hardly any changes between the second and the third dump. I believe analysis of this data over time will highlight the areas of the kernel we need to refine to best suit our needs and it should also show any catastrophic failures in the cached data which would cause the daemon to spin up all disks. What do you think? Quote Link to comment
Joe L. Posted August 17, 2008 Share Posted August 17, 2008 I had an idea on how to log and subsequently the efficiency of this tool. If we parse /proc/slabinfo for certain information and then echo the numerical diffs to syslog (not echoing if there is no diff to save syslog space) we can keep a quantitative eye on how well inodes are being cached. To prove the theory I first cached alot of my data using the ls -R method and dumped slabinfo > 1slabinfo_before_full_cache.txt Now i cache all my data. and dumped slabinfo >2slabinfo_after_full_cache.txt And lastly yet another full cache, dumping slabinfo >3slabinfo_after_another_full_cache.txt You can see big changes in the: fuse_request fuse_inode reiser_inode_cache radix_tree_node dentry between the first dump and the second and then hardly any changes between the second and the third dump. I believe analysis of this data over time will highlight the areas of the kernel we need to refine to best suit our needs and it should also show any catastrophic failures in the cached data which would cause the daemon to spin up all disks. What do you think? Actually, you have access to the number of blocks read, and the number of blocks written to the disks... and you can do it without spinning them up. The statistics are available in /proc/diskstats Therefore, a process that looks at the stats every minute, and looks at the blocks written/read, detects when they change from the prior stat, records the most recent time a read/write value changed on a disk, you can then detect the most recent time a given disk was accessed. When a disk is not accessed in an interval desired, you can spin it down. Joe L. Quote Link to comment
NAS Posted August 17, 2008 Share Posted August 17, 2008 So essentially we are saying we can write our own spin down and disk usage monitoring daemonI see alot of advantages with this. We would also need to monitor the memory usage as well to both measure the efficiency of the ftw daemon and analyse the memory pruning algorithm. The reason i say this is that at this point I am pretty certain it is not going to work 100%. For instance there is no way i fill up my ram with samba type information in one minute however if i write alot of data to a single disk share I can cause the ls -R to spin up every disk. With our current understanding of the memory model and kernel tunables this should not happen... but it does and I can reproducible these results 100% of the time. Quote Link to comment
WeeboTech Posted August 17, 2008 Share Posted August 17, 2008 I can cause the ls -R to spin up every disk. Shorten the time lil by lil. This is what I am testing. So far, If I access the disks via /mnt/user and have the ftwd scanning the same tree, disks stay spun down. I've switched to pushing files to /mnt/disk3, and since then disk1 is spinning up. So I'm still in the process of testing things out. Right now I'm at 10 seconds. I think I may drop it down to 5 seconds to see if that changes things. In the meantime I had to download some torrents so I'll have to wait for the next test. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.