WeeboTech

User Customizations · April 29, 2014

We're used to thinking of memory in a capitalistic sense - you need memory, so you get it and now you own it and no one else can use it. Linux seems more like a hippie sharing economy - you need memory, so you use what you need until someone else needs some of it.

I love this line, it describes linux memory management with the buffer cache eloquently and in very common terms!!

User Customizations · April 25, 2014

It is my understanding that the cache memory is used and released as other needs demand. Ie - it uses only free memory. It doesn't just cache directories but files as well so if you read files on one but not the other, it would skew things.

Could be wrong. *shrug*

The cache area is transient.

It expands up to all useful memory. As other areas of the kernel or applications need memory, cache pages are released.

The issue between 32 bit and 64 bit is that 32 bit has a finite amount of low memory. as this gets used up, it's not released as easily as the cache. Plus it also can get fragmented. On 32 bit, adding a swap file can help as can dropping the cache. This may not be needed for 64 bit.

We wont know for a while what the ramifications are of long term high cache_dir usage will be.

My guess would be that 64 bit handles memory management of busy dentries better.

User Customizations · April 1, 2014

As I mentioned in a prior post, It can help alleviate the NFS stale file handle issue if you have a user share mounted over NFS. So I would lobby that it stays with a description on recommendations.

I wouldn't even bother posting this except this usage case was something that Tom mentioned to me.

User Customizations · April 1, 2014

With the file count as previously defined, cache_dirs will help prevent unnecessary spin ups. The fuse usershare does not cache everything. It caches some data, not all. There is a finite table just like the dentry table.

So where cache_dirs helps is in keeping all of the directory inodes in ram even if a large file is read.

This helps when the user share has to wade through higher level directory. user share will search all drives for the named directory at that level. Therefore, having those directory inodes available in ram allows the user share to walk through directories that much faster.

Where this becomes ineffective is when there are so many files and a limited amount of low memory thus causing memory pressure in other ways. In my system I have so many music files that cache_dirs is counter productive. With a small count of files, cache_dirs is a good aide in alleviating unnecessary spin ups.

User Customizations · April 1, 2014

Is that whole 16GB allocated to unRAID or is it divided up with XEN/Vmware?

From my experience, you should be working fine.

User Customizations · March 31, 2014

How much ram is in your system and how many files on that 'Media' share?

User Customizations · March 31, 2014

Scan user shares: yes

This was an option left in by Joe to satisfy users who still felt it accomplished something for them, but it is difficult to see the rationale for that. User shares are a file system maintained entirely in memory, so caching them is just making an extra copy of the file system elements in the same RAM. That seems like a waste of RAM, with no benefit. The idea behind Cache_dirs is to keep in memory those file system elements that are constantly being requested from disk, so that the disk does not have to keep spinning up. I would set this to NO and free up some memory.

Normally, it should be set to NO.

However, if you are using NFS, it could make a difference to help alleviate the Stale NFS handle issue.

As Tom explained to me, it keeps fuse blocks in memory as well as the dentry information.

So there is a benefit in small use cases where NFS is used to access the user share.

User Customizations · March 30, 2014

Do you want me to try and preclear again under 6.0 as well? Or do you want to leave it with Weebo? I have one 4TB drive I don't need immediately I can play with.

I can update and build / test under 32 bit, but I do not have the ability to recompile with 64 bit. Tom did it for me last time, but Weebo was running some tests this time. But I am not sure the Linux he is using would produce an executable compatible for Slackware 64 bit or not. Maybe I'll try sending it to Tom to compile again.

But I definitely would like you to retest on your drive!

Thanks.

What version of slackware is 6.0 compatible with? I'll see about making a virtual machine.

User Customizations · March 29, 2014

My core question is, Is the unraid server (virtualized or not) capable of holding 4TB in ram?

You are, of course, kidding !!

... unless you've acquired some RAM modules from the future

(and a motherboard that supports them)

LMAO, I just woke up and must not have been thinking or adding correctly!!!

User Customizations · March 29, 2014

Definitely worthwhile to read and verify within one program.

Perhaps change the parameter in the open and see if O_DIRECT changes things.

Either that or drop the buffer in the readvz program with some command line option.

User Customizations · March 29, 2014

At this point I can only surmise there is data in the buffer cache allowing the program to read at a higher rate then is physically possible.

Unless there's also an addressing error that's causing it to reread the same data (from the local buffer cache), this wouldn't explain it either, since it would have to read at least some new data from the disk. And even if some addressing error was causing it to always read the same spot on the disk (thus the data would be in the disk's buffer), it wouldn't be this fast, since the indicated speed is appreciably faster than the SATA-III interface !

Years ago when we used to run bonnie tests on our web servers, we had to account for the buffer cache effect. We would have to select a test that was twice as much as ram or it would skew the results. It has some effect and it could be the reason for the extremely fast results.

User Customizations · March 29, 2014

O_DIRECT (since Linux 2.4.10)
              Try to minimize cache effects of the I/O to and from this
              file.  In general this will degrade performance, but it is
              useful in special situations, such as when applications do
              their own caching.  File I/O is done directly to/from user-
              space buffers.  The O_DIRECT flag on its own makes an effort
              to transfer data synchronously, but does not give the
              guarantees of the O_SYNC flag that data and necessary metadata
              are transferred.  To guarantee synchronous I/O, O_SYNC must be
              used in addition to O_DIRECT.  See NOTES below for further
              discussion.


              A semantically similar (but deprecated) interface for block
              devices is described in raw(.

User Customizations · March 29, 2014

My initial code review reveals that there is nothing out of the ordinary other then an open, read, verify buffer.

At this point I can only surmise there is data in the buffer cache allowing the program to read at a higher rate then is physically possible.

bkastner, how much ram is in your system?

16GB.

Is that 16GB just for the unRAID machine or is the unraid machine virtualized somehow.

do a free and post it for us please.

My core question is, Is the unraid server (virtualized or not) capable of holding 4TB in ram?

if so, that might explain the speed. In that case, there should probably be a directive to drop the buffer cache or open the device in direct mode.

User Customizations · March 29, 2014

My initial code review reveals that there is nothing out of the ordinary other then an open, read, verify buffer.

At this point I can only surmise there is data in the buffer cache allowing the program to read at a higher rate then is physically possible.

bkastner, how much ram is in your system?

User Customizations · March 29, 2014

something doesn't see right about this.

== Last Cycle's Pre Read Time  : 11:41:19 (95 MB/s)
== Last Cycle's Zeroing time   : 10:14:10 (108 MB/s)
== Last Cycle's Post Read Time : 1:23:07 (802 MB/s)

Given today's current technology, how does an application read a 5400 RPM 4tb drive at 800MB/s ?

Maybe something is cached somewhere.

In all my tests the fastest I could get out of my drives was 195MB/s and approx 300-400MB/s on an SSD.

While it seems feasible to read through a 4TB drive in a couple hours.

The speed difference is so drastic that I question it.

User Customizations · March 28, 2014

I'm interested in a multipass!!!!

User Customizations · March 7, 2014

Can you describe a scenario where that would be beneficial?

As Tom mentioned to me. with NFS and the SHFS/USER share, FUSE let's go of internal file handle structures after a certain amount of time when unused. This is the reason for the NFS stale file handle issue that crops up on the user shares.

User Customizations · February 25, 2014

Much as I am loathed to admit this the combination of "vm.vfs_cache_pressure=0", a ram based OS and no page file is probably a recipe for disaster.. just as you have seen it.

unRAID in general would probably benefit from some sort of sanity cehcking RAM monitor and some more thought into OOM given all the addons that are about now and how cheap 4TB disks are.

I also wonder about buying an SSD solely for page file. Conventional wisdom is SSD and page file is not a good idea but unRAID isnt a conventional setup. A 30GB+ SSD based page file would cost a few tens of bucks. It is a bit of a brute force approach but its a cheap one that will only get cheaper

If root was on tmpfs rather then rootfs, it would be worth it.

It's not currently worth it unless you do some other migration off rootfs and use TMPFS more.

I.E. If you are going to move /var to a tmpfs by some fancy moving.

While the swap file will help with really memory hungry apps, you'll see that for normal unraid usage, it will hardly come into play. It will not serve you as you think... yet.

I voted to have /var on tmpfs, Tom only wanted /var/log on tmpfs.

if I had my way all of the root would be on tmpfs so unused portions could be swapped out.

>> Conventional wisdom is SSD and page file is not a good idea,

and I've also read to the contrary that a page/swap file on SSD makes good sense since it's the fastest piece of static space.

User Customizations · February 25, 2014

So as I was doing another test last night, I set the vm.vfs_cache_pressure=0 and bam. I crashed. That was no fun. I had to do a hard power cycle.

While I did have something monitoring low memory for a while. at some point, it was exhausted.

I have 4GB and I was only scanning the /mnt/disk partitions on 3 disks.

At the default of cache_pressure=10 I had no issues.

So this may be the magic number to adjust per array usage and prevent crashed.

I had another thought that if cache_dirs monitored low memory with free -l | grep Low it could make an emergency judgement call and either pause or drop the page cache (not the direntry/inode cache). This would free up ram rapidly and possibly defragment low memory.

Another choice is a separate program that monitors low memory and does the emergency cache drop. But at what level?

User Customizations · February 24, 2014

Well the point of these test is to show how much the raw stat structures and names take up.

The dentry structure is larger then a stat structure and to cache the name/stat another way is feasible but at a cost.

An variant sqlite table of the name, inode, mtime, size and space for md5 proves to take longer and is larger.

The price of sql access.

sqlite table
files processed: 2208100, sql inserts: 2208088, sql duplicates: 0, sql errors: 12
files processed: 2208200, sql inserts: 2208188, sql duplicates: 0, sql errors: 12

selects 0, deletes 0, duplicates 0, inserts 2208248, errors 12 

real    138m42.533s
user    8m3.230s
sys     6m49.720s

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/ftwcache# ls -l --si /mnt/cache/ftwstatcache_sqlite.sqlite3 
-rw-r--r-- 1 root root 696M 2014-02-24 06:29 /mnt/cache/ftwstatcache_sqlite.sqlite3

I'm running a second pass. The second pass will not insert any new data, just find duplicates. It's more of a timing test.

I should probably install mlocate again and do that as a size / time test.

The issue isn't XEN or VMWARE, it's a 64 bit flat memory model with no low memory bounds.

User Customizations · February 24, 2014

As an acedemic test, I've scanned [find] (ftw64'ed) down my whole array. 3 data disks with tons of files.

I did various tests. The first so far is caching all of the stat() blocks from a find (ftw64) down the whole /mnt tree.

With my various tests I found that storing this ever growing gdbm cache on a disk, proved to slow it down immensely.

The first test is a ftw and storing the inode as the key to a filename as the data.

I only did this test because I had the code already. I use it in another scenario to find the seed inode file to a huge 4TB disk of data on an ftp server where files are hard lnked. This is mostly for rsyncing data to a remote ftp server while preserving the links and thus space.

I''m only presenting the history to support the variance in final intended program. It was a quick way to test something, get a benchmark as to time and space.

This was for my /mnt/disk1 disk only.

fetched 159396 records, deleted 0 records, stored 499364 records

files processed: 543300, stores: 499321, duplicates: 0, errors: 0

real    19m23.062s
user    0m7.680s
sys     0m42.020s

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/ftwcache# ls -l --si /tmp/ftwinocache.gdbm 
-rw-rw-r-- 1 root root 89M 2014-02-23 09:03 /tmp/ftwinocache.gdbm

On an ARRAY disk it went from 19m to 75m.

There's a big benefit from doing this on the ram drive even if it takes up a couple hundred MB.

I did a few other tests.

A sweep following the first sweep usually was much faster. this the caching if dentries helps a great deal.

I did not receive any OOM errors on a 4 disk wide array.

I re-wrote the structure that is stored.

This time the filename is the key, the stat() block is the data.

This is something I've been intending to do for a long time.

I'E. my own locate program using a gdbm or sqlite file to catch the array, then insert the md5 in the record somehere.

files processed: 2373800, stores: 2208671, duplicates: 0, errors: 0
files processed: 2373900, stores: 2208770, duplicates: 0, errors: 0
fetched 0 records, deleted 0 records, stored 2208833 records

real    118m42.174s
user    0m39.050s
sys     3m51.320s

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/ftwcache# ls -l --si /mnt/cache/.ftwstatcache.gdbm 
-rw-rw-r-- 1 root root 347M 2014-02-23 20:37 /mnt/cache/.ftwstatcache.gdbm

I'm sure it would take longer if the ftwstatcache.gdbm were on an array drive. I did not test that.

It would be faster on a cache drive and ultimately faster on an SSD cache drive.

What this proves is that given 2 million files and directories, the stat structures and filenames take up about 350MB.

A little more then my earlier math, but close. I did not account for the size of filenames in the prior math.

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/ftwcache# free -l
             total       used       free     shared    buffers     cached
Mem:       4116784    3716964     399820          0      54388    3136936
Low:        869096     597000     272096
High:      3247688    3119964     127724
-/+ buffers/cache:     525640    3591144

free -l shows that I still have plenty of low memory which leads me to believe array width and md driver buffering have a big part to play in this too.

I haven't explored how fast a lookup will be with gdbm. For a locate type database it's not going to matter that much.

But as a feasible stat cache in the user share filesystem, it could be a problem wading through 2 million records.

There are other possibilities here in that cache_dirs is re-engineerd to only cache very specific parts of the array keeping those entries in ram on purpose via some configurable method. I'm sure there's a way to do it now, but I don't know if you can give it an array of directories and/or decide the depth.

Using sqlite is going to take more time. It's more useful, but at the cost of cpu cycles and time. Writing to a disk with sqlite and journaling is slow. Using the ramdisk makes it much much faster.

It's probably not much slower then the old updatedb I used to do for a locate database. While I did not use cache-dirs, I used locate which functioned the same 'once' then cached everything for a quick lookup. This is exactly what I intend to duplicate one day and store the md5sums. At least with sqlite, pulling data out of the DB will be quicker and be more useful.

There is a final choice of mysql. Earlier tests with my inode cache proved that mysql with 'flat files' I.E. no DB process, function quite fast. Faster then sqlite and a lil faster the gdbm. However that now gets into a heavier load with library dependency.

Ideally this is the best choice. If limetech were to compile the mysql flat file libraries as shared libraries and link the php application to them, we would have some pretty rich local flat file db access without the need of a DB process.

sqlite is simpler. You can statically compile binaries, a shell tool and even a loadable bash module to access the database with sql.

with GDBM I have a bash loadable module that allows access to the gdbm, however it's not sql access.

I know I'm deviating off the core topic here, but I wanted to present my findings for the following reasons.

1. Presents an idea of memory requirements to cache the stat data for an array.

2. Possible ideas of how the data can be cached, with a few mechanisms.

3. Provide food for thought. Yes the usershare filesystem could possibly cache all of the stat blocks it visits.

A. Should it?

B. Think of the memory requirements.

C. Think of the time it takes to look the data up.

D. Think of the man hours it's going to take to program it and then test it.

E. Is it really worth all that to save spinning up a disk?

[/code]

User Customizations · February 23, 2014

Regardless of what mechanism caches the inode entries a user should be able to estimate the load each included directory brings to the process in terms of inode count and ultimately RAM.

I think we would have to script something as so far all the off th shelf counters dont understand the concept of inode count to xx folders deep only and memory usage needs some thought too

Here's an interesting read.

http://www.makelinux.net/books/lkd2/ch12lev1sec7

http://web.cs.wpi.edu/~claypool/courses/4513-B02/samples/linux-fs.txt

http://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/

Now we need to find out how many of these structures are kept in low memory, then the size of each structure, to estimate the maximum amount of files to structures that are cached.

At that point it gets weighed against free low memory, which can be fragmented at some point of time, thus causing issues.

It's not just a hard limit of how many files/directories to disks.

It's what other activity is causing memory pressure and/or low memory fragmentation.

Which is why dropping all cache before and after the mover in general alleviates a lot of issues.

User Customizations · February 23, 2014

Thinking out loud...

Fundamentally the problem is a simple one. unRAID can and does create situations where one folder can be 24 disks wide. A process shouldn't have to spin up the whole or large parts of the array to get a dir listing, that is crazy. Equally a user shouldn't have to control data placement on specific disks and this goes aainst the ease of use of unRAID

Even if we could come up with a way to keep the cache on a cache drive that would give people options. I would happily pay $50 for a SSD for the purpose as the real world perceived improvement for me with cache_dirs working is nothing short of amazing.

I dont think the kernel allows this kind of split in page file would it i.e. a page file dedicated to inodes/dentry?

This all may be pointless when 64 bit is mainstream and we remove the dependency on low memory.

I doubt the kernel would page(swap) out inodes. It's counter productive. The kernel can just read the disk.

I remember years ago there was the ability to put the superblock on a different drive with ext2 or 3.

But since directories are actually files and they succumb to being buried in the file tree, I'm not sure it's worth it.

Another choice would be to have the user share filesystem cache directory data on the cache drive or on tmpfs (Which CAN be swapped out).

Instead of going to the filesystem, the data is available in ram. (tmpfs) or on a known spinning disk (cache).

I do something like this for another project which has nothing to do with caching a directory for this purpose.

For my purpose it's to monitor a directory for changes and run an event on them (remote or local).

The issue then becomes synchronization.

Perhaps the usershare filesystem only caches the stat blocks once you visit them. I can say this though a find down a whole tree accessing the disk mount point takes a long time as it is.

What does this buy you? Not spinning a drive up?

Now space/size

I have


   654959 /mnt/disk1/filelist.txt
   162707 /mnt/disk2/filelist.txt
   275314 /mnt/disk3/filelist.txt
  1092980 total

on 3 data disks.


root@unRAID:/etc/cron.d# ls -l --si /mnt/disk1/filelist.txt /mnt/disk2/filelist.txt /mnt/disk3/filelist.txt
-rw-rw-rw- 1 root root 54M 2014-02-11 08:50 /mnt/disk1/filelist.txt
-rw-rw-rw- 1 root root 21M 2014-02-11 09:42 /mnt/disk2/filelist.txt
-rw-rw-rw- 1 root root 34M 2014-02-19 21:01 /mnt/disk3/filelist.txt

with filenames at this size.

Consider the size of a stat structure is 144 bytes.

We have 157,389,120 for the stat block.

So we need 300MB of memory to store all that stat information.

I'll have to write a program to catalog all these files into a .gdbm file, sqlite table and maybe a mysql flat file to see how much space it will take.

While it's feasible to do this, and have a file catalog simultaneously, Gotta think of all the hours involved in building another cache mechanism to prevent a spin up.

At that point, Is it worth it?

I'm not sure you can have your cake and eat it too until we have a total flat memory model.

User Customizations · February 22, 2014

Part of the issue may be the 32/64 bit architecture. Once we are 64 bit, the low memory issue, may not be an issue.

Interesting as whenever anyone states memory issues, and thats why we just want a 64 bit version of unRAID that is LIKE the 32bit 5 version of unRAID with no other bells and whistles, bunch of people jump all over on what memory issues?

But since we can't get a 64 bit counterpart its really hard to show a before and after.

I had problems with 32 bit unraid, cache_dirs and no plugins.

In fact I turned off cache_dirs and still had issues unless I would drop the cache before and after massive file operations down a tree.

I keep saying it depends on

1. How wide your array is (how many disks).

2. How much buffering the md driver is set for

3. How many files you have on the whole array.

4. What the kernel tunings are set for regarding memory management, pressure and buffering.

User Customizations · February 22, 2014

I am sourcing some new disks to try and free up a server to test 64bit and cache dirs. Its a huge bunch of work though so I cant see it being done shy of 2 months

I assume the kernel tunable you are referring to is cache pressure. if so I have never seen 0 do what it is documented to do but as you say yet again that could be 32bit and PAE.

I just find it hard to believe that we cant fix it so one video stream, one preclear and once cache dirs on an i5 with 16GB of RAM cause see stuff like this:

load average: 5.91, 5.53, 5.50

Well we have the ionice command that can nice down a pre-clear.

http://linux.die.net/man/1/ionice

However, you will always choke the array when running a preclear no matter how much memory is available.

cache_dirs can do it also, depending on how many directories, files, depth and whatever else is going on in the array.

An answer to running parallel pre-clears is to use badblocks to write the 0's with a timeout parameter that will cause a context switch.

My thought however is to use a laptop or another machine to do the preclear and keep the array available until you are actually ready to use it.

We're talking about a few things here though.

preclear will choke your machine and it's not fair to even discuss that here.

64 bit will not solve nor assist this issue.

cache_dirs does what it's designed to do. find down the filesystem tree and access every file's inode.

The memory issue with cache_dirs depends on the size of your array, width and files.

64 bit "MAY" alleviate the out of memory by not eating up low memory. This is yet to be seen.

WeeboTech

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by WeeboTech

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

Unofficial Faster Preclear

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up

cache_dirs - an attempt to keep directory entries in RAM to prevent disk spin-up