Utilizing SSD as a smart read cache


jpeltoniemi

Recommended Posts

2 minutes ago, trurl said:

As you have probably already discovered, this is slackware linux.

Yeah. I'm mostly used to Debian-based distros but luckily Slackware is not nightmarishly different from Debian.

5 minutes ago, trurl said:

As far as I know there isn't really any "3rd party" except open source. ... I think this includes both the parity and user share implementation.

3rd party, as in not in-house code. Hopefully with documentation somewhere. I'd assume(like that's done me any good so far) that parity is Limetech's own implementation, but not so sure about user shares, as software like UnionFS exist. Maybe they didn't fit unRAID's requirements or maybe Limetech want to reinvent the wheel just for fun(I do this all the time). Let's hope someone has more accurate information :)

Link to comment

unRAID 'shfs' was written long before other currently available union-type filesystems.  Originally it just returned symlinks.  Anyway, current implementation allows for certain unRAID-specific functions, such as supporting different allocation methods (like split level), hard links and preserving when possible, COW flags.

 

'shfs' operates similar to other union file systems.  The 'top' branch is always '/mnt/cache' and other branches are, in order, '/mnt/disk1', '/mnt/disk2', ... '/mnt/diskN'.

 

I don't mind answering specific questions, but most of the info is sprinkled around in docs on website, wiki, and forum posts.  Sorry about the 'search' capability of the forum which fairly sucks, we're looking into that.  Probably we should get around to writing a formal doc on how 'shfs' (and other things) work... added to the todo list...

  • Like 1
Link to comment
1 hour ago, limetech said:

unRAID 'shfs' was written long before other currently available union-type filesystems.  Originally it just returned symlinks.  Anyway, current implementation allows for certain unRAID-specific functions, such as supporting different allocation methods (like slit level), hard links and preserving when possible, COW flags.

I was just beginning to think that shfs in unRAID maybe isn't what can be found on Google. Good, now I can stop trying to make sense how that shfs fits into this equasion :D I'll forgive myself with this one since with Greyhole you actually mount storage pools using CIFS.

1 hour ago, limetech said:

'shfs' operates similar to other union file systems.  The 'top' branch is always '/mnt/cache' and other branches are, in order, '/mnt/disk1', '/mnt/disk2', ... '/mnt/diskN'.

Nice! I guess this means I can put whatever I want in cache and it'll just work, as long as I make sure that mover won't touch them. There shouldn't be a problem even if I disable mover and write my own.

1 hour ago, limetech said:

I don't mind answering specific questions, but most of the info is sprinkled around in docs on website, wiki, and forum posts.  Sorry about the 'search' capability of the forum which fairly sucks, we're looking into that.  Probably we should get around to writing a formal doc on how 'shfs' (and other things) work... added to the todo list...

TBH the docs could be better. If I may suggest a quick improvement, just a list of essential keywords, config files and scripts would be a big help in getting started with tuning unRAID. That said, I have to commend your involvement with the community. Kind of makes me want to use my last money to buy a license right now instead of waiting for payday :D

Link to comment
2 hours ago, limetech said:

Probably we should get around to writing a formal doc on how 'shfs' (and other things) work... added to the todo list...

Just a quick question - does shfs keep static inode allocations until reboot or is fuse allowed to reuse inode allocations?

 

Inode reuse is an issue for NFS shares.

Link to comment
2 hours ago, jpeltoniemi said:

I guess this means I can put whatever I want in cache and it'll just work, as long as I make sure that mover won't touch them

 

If you set 'use cache' for the share to 'no' or 'only', the 'mover' will not operate on that share at all.

 

2 hours ago, pwm said:

Just a quick question - does shfs keep static inode allocations until reboot or is fuse allowed to reuse inode allocations?

 

By default FUSE will free inodes after a delay (looking at FUSE source code, min is 10 seconds).  That is what the 'fuse_remember' tunable is for on the Settings/NFS page.  That is also the reason we tie that tunable to the NFS settings page.  When NFS is enable the default value of 330 is 5 1/2 minutes (when NFS is not enabled, a setting of 0 is passed to FUSE).  The typical client-side NFS handle cache time-to-live is 5 minutes.  You have to be careful with this setting however, because if you are doing operations that touch a huge number of files, the FUSE memory footprint can keep growing.  This is related to an issue we're looking into right now:

 

Link to comment
9 hours ago, limetech said:

 

If you set 'use cache' for the share to 'no' or 'only', the 'mover' will not operate on that share at all.

 

 

By default FUSE will free inodes after a delay (looking at FUSE source code, min is 10 seconds).  That is what the 'fuse_remember' tunable is for on the Settings/NFS page.  That is also the reason we tie that tunable to the NFS settings page.  When NFS is enable the default value of 330 is 5 1/2 minutes (when NFS is not enabled, a setting of 0 is passed to FUSE).  The typical client-side NFS handle cache time-to-live is 5 minutes.  You have to be careful with this setting however, because if you are doing operations that touch a huge number of files, the FUSE memory footprint can keep growing.  This is related to an issue we're looking into right now:

 

Yes, one reason why I asked was specifically thinking about that memory leak thread. I have developed some own FUSE applications but they supply inode values from a database (since they work as "time machine" and can present arbitrary disk snapshots based on backup times). But since my FUSE code will always supply the same inode for each presented file I haven't seen any issue with leaked memory even if the database contains many hundred million files.

Link to comment
1 minute ago, limetech said:

 

You're using the low-level interface?

Yes.

 

For simpler things - like presenting individual streams of a BD image - i use the high-level interface.


But the VFS for the backup server solution is using the low-level interface and hands over the database record ID as inode to FUSE. But I haven't found much information about how FUSE itself handles inodes - they have their own inode field in their structures but seems to always duplicate the inode value I supplied.

 

 

 

 

 

 

Link to comment
Just now, limetech said:

 

Your DB record ID is passed as the st_ino value you mean?  Do you see issues with NFS stale file handles?

Yes, I place the DB record ID into the st_ino field before handing over to the FUSE code.


I have used it too little with NFS shares - most browsing is either done using SMB (with the FUSE VFS running on the storage server), or I run a copy of the FUSE VFS on the client machine and stream the actual file data over a TLS-encrypted tunnel from the storage server.

 

I should really set up a dedicated test system stressing NFS - especially since the VFS also includes all my media files and could present suitable movie or music selections to media players. I already know my older Popcorn Hour and QNAP media players works much better with NFS than SMB for some movie titles.

 

One important difference here compared to shfs in unRAID is that my VFS only allows viewing of archived file data - i.e. read-only access. Writes to the storage server happens by having a backup client scan a directory tree and "check in" changes to the storage pool. But this doesn't involve any FUSE code. The VFS code can just get a notification that more file data has been "committed" to the storage server so it can check if more and/or changed files should be made visible in the presented VFS.

Link to comment
34 minutes ago, pwm said:

Yes, I place the DB record ID into the st_ino field before handing over to the FUSE code.


I have used it too little with NFS shares - most browsing is either done using SMB (with the FUSE VFS running on the storage server), or I run a copy of the FUSE VFS on the client machine and stream the actual file data over a TLS-encrypted tunnel from the storage server.

 

I should really set up a dedicated test system stressing NFS - especially since the VFS also includes all my media files and could present suitable movie or music selections to media players. I already know my older Popcorn Hour and QNAP media players works much better with NFS than SMB for some movie titles.

 

One important difference here compared to shfs in unRAID is that my VFS only allows viewing of archived file data - i.e. read-only access. Writes to the storage server happens by having a backup client scan a directory tree and "check in" changes to the storage pool. But this doesn't involve any FUSE code. The VFS code can just get a notification that more file data has been "committed" to the storage server so it can check if more and/or changed files should be made visible in the presented VFS.

 

Sounds like a nice project.  Here's what happens with NFS.  As you know all I/O is referenced against a file handle, which is an opaque field (though in practice it's easy to see how an OS forms a file handle and this is a major security hole IMHO and one reason I really hate NFS and would really like to rip it out of unRAID... but I digress...).  What you will find, if you ever have to support NFS, is that older clients, especially older media/dvd players, only support a 32-bit file handle field, even though NFSv3 spec permits 64 bits.  Must keep this in mind if you want to be compatible with those devices.

 

Edit: that bit of knowledge just saved you days of debugging... you're welcome :D

Link to comment

Yes, it's a fun project. It also makes sure my unRAID machines don't get too bored since they are part of the storage pool - committed files are normally sent to multiple storage pools for redundancy. :)

 

But you did catch me with the NFS handle size. I think the physical files still just about fits in 32-bit numbers but virtual views are created using inode values way outside the 32-bit range.

 

A lot can be said about NFS security. Especially since most equipment runs NFSv3 that is limited to host-based authentication. But it works quite well for read-only access to media files.

Link to comment
  • 4 years later...

Hello folks,

I know this topic is quite old but I wanted to add something I've realized on my Unraid server.

 

I'm basically remounting my disks and cache-pool with relatime. Then I've made my own mover in python, which is listing all files on my share (I only have one) and sorting them by accesed date. From that I can decide which files should go to cache and which should go to array. In my script I'm creating two lists: one array-lsit and one cache-list. I'm adding files to the cache-list until its reaching 75% of my cache-pool size. After that I'm invoking the "move"-Script and moving the files which should go to cache and array. I'm running the mover once a day, which synergizes with the 24h time period of relatime.

 

I don't know if relatime has major drawbacks. I know that it is increasing writes, which is not good especially for the sdd cache-pool. But I've monitored it a while and it seems like there are not that much additional writes.

 

If you are interested, here is my custom mover: https://github.com/ericbeier/unraid-mover/blob/main/mover_custom
I would not recommend to just copy and paste it, since it's highly specialized to my server. But maby you get a idea. Since I'm german, I hope you forgive my bad english and the comments are of courese also in german.

 

So thank you for reading guys.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.