Jump to content

[Request] True caching with cache drive


sane

Recommended Posts

And present the cache drive mechanism has several faults which mean it really can't be considered a caching solution:

[*]It is possible for it to fail, where writing a large file that is larger than the size of the cache drive space remaining will result in a failed write. The user has no idea this is likely to happen.

[*]Files can spend far too much time on the cache drive, dependent on when the mover script runs.

[*]At no point does it cache reads.

It is suggested that the cache drive mechanism be revised, such that:

[*]If the attempt is made to write a file larger than the space on the cache drive, the write will proceed at a block level, and if the cache space is filled, will seamlessly failover to writing blocks to the actual array, adding back the cached blocks to the file block chain after the write of the end of the file has completed.

[*]Moving the cached files to the data array would proceed almost immediately that free bandwidth was detected, and at a block level (so it could be interrupted) with the aim that data spends the minimum time in the cache and not in the data array. The normal state of the cache for write purposes should be empty.

[*]Where files are being streamed from the NAS at a slower rate than the access rate of the array, the cache should be used to buffer reads from the array so as to avoid stalls from fragmentation etc. and to ensure that the user always gets both a smooth stream of data, and the cache finds utilisation.

[*]Where it can be detected that particular small clusters of data as being access repeatedly, those clusters would be precached by the system. Potentially writes of these cached items back to the array might be optimised to prevent regular waking of the array (eg log files).

In short, the cache works at a block level, and with the same resilience expectations as any other cache.

Link to comment

And present the cache drive mechanism has several faults which mean it really can't be considered a caching solution:

[*]It is possible for it to fail, where writing a large file that is larger than the size of the cache drive space remaining will result in a failed write. The user has no idea this is likely to happen.

[*]Files can spend far too much time on the cache drive, dependent on when the mover script runs.

[*]At no point does it cache reads.

It is suggested that the cache drive mechanism be revised, such that:

[*]If the attempt is made to write a file larger than the space on the cache drive, the write will proceed at a block level, and if the cache space is filled, will seamlessly failover to writing blocks to the actual array, adding back the cached blocks to the file block chain after the write of the end of the file has completed.

[*]Moving the cached files to the data array would proceed almost immediately that free bandwidth was detected, and at a block level (so it could be interrupted) with the aim that data spends the minimum time in the cache and not in the data array. The normal state of the cache for write purposes should be empty.

[*]Where files are being streamed from the NAS at a slower rate than the access rate of the array, the cache should be used to buffer reads from the array so as to avoid stalls from fragmentation etc. and to ensure that the user always gets both a smooth stream of data, and the cache finds utilisation.

[*]Where it can be detected that particular small clusters of data as being access repeatedly, those clusters would be precached by the system. Potentially writes of these cached items back to the array might be optimised to prevent regular waking of the array (eg log files).

In short, the cache works at a block level, and with the same resilience expectations as any other cache.

 

Interesting post. Here are my comments:

 

Issues:

 

1. A write to a device ALWAYS has the possibility to fail if you are writing to a device with a larger file or set of files than it has the capacity to take.

2. I believe this is incorrect - you can schedule your mover to run as and when you wish. A the moment Hourly is the lowest denomination. Is hourly too much time to spend on the Cache device? You have to remember the reason for the cache device - it is to improve write/read speeds. I believe the time spent on the cache drive is the transaction cost for the increased speed. Fact is you CANT make a device write faster than it can - hence why cache device was implemented - but there is a cost. Basic economics.

3. Why would you want cached reads - I don't understand?

 

Suggestions:

 

1) By pure definition if you write to the Array when you have Caching enabled (whereby you have some data on the Cache device and some on the Array as you suggest) you are not caching. In addition, with an array protected by parity you would have some data which is protected by parity and some not. You could "reserve space" on the Parity protected device but what happens when that is FULL?

2) Please define what you mean by "Free Bandwidth" and how you propose "detecting it". I understand the reason to minimise time on the cache drive and you're looking for a trigger rather than a scheduled time to move the files but I don't understand what that trigger is or how it can be SAFELY calculated. E.g. What if you deicide that there is enough "bandwidth" and you start moving files => the user then starts a massive copy (say for arguments sake 1TB) what happens - the move stops (thus leaving your data unprotected) or it finishes and you get a massive write penalty for moving to cache device which is hard at it moving to the protected array? I think your logic is flawed.

 

3 and 4 are a bit out of my depth so Ill stop there and let someone else chime in.

 

However I don't understand your issue to be honest. Cache drive is a solution to a problem and it feels like you are trying to ask for solutions to problems created by the solution to the problem (which in turn will create problems itself). Caching could be improved but I don't think it can be in the way you are suggesting. There are penalties/cost in this equation no matter where you look - I feel you are just looking at shifting that penalty / cost around BUT its still there.

Link to comment

Interesting post. Here are my comments:

 

Issues:

 

1. A write to a device ALWAYS has the possibility to fail if you are writing to a device with a larger file or set of files than it has the capacity to take.

 

Not if you are doing a block-level caching, since provided your cache size is larger than the block size all that happens is the older blocks end up getting flushed to disk. How you exactly handle it depends on your cache write strategy (eg write-behind - http://docs.oracle.com/cd/E13924_01/coh.340/e13819/readthrough.htm )

 

2. I believe this is incorrect - you can schedule your mover to run as and when you wish. A the moment Hourly is the lowest denomination. Is hourly too much time to spend on the Cache device? You have to remember the reason for the cache device - it is to improve write/read speeds. I believe the time spent on the cache drive is the transaction cost for the increased speed. Fact is you CANT make a device write faster than it can - hence why cache device was implemented - but there is a cost. Basic economics.

 

See above. There is a balance, but in general terms if you can be reasonably certain that the file isn't being used as a temporary store, it's best to block-write that file to the parity protected array quickly - protecting it and clearing space in the cache. Doing it at block level also helps you intersperse the cache clearing with other demands.

 

3. Why would you want cached reads - I don't understand?

 

It's possible to get situations where there's a stall in reading data from the array, such that if you are streaming at a high data rate with no cache buffering, you get a gap. You can particularly see that when the array gets full and blocks end up all over the disk (1-2 sec stalls). Cache/buffering the reads can help to remove that problem - though it is something of a nice to have.

 

However I don't understand your issue to be honest. Cache drive is a solution to a problem and it feels like you are trying to ask for solutions to problems created by the solution to the problem (which in turn will create problems itself). Caching could be improved but I don't think it can be in the way you are suggesting. There are penalties/cost in this equation no matter where you look - I feel you are just looking at shifting that penalty / cost around BUT its still there.

Here's a paper from synology on how they do caching - you'll see how close it is to what I suggest:

 

http://global.download.synology.com/download/Document/WhitePaper/Synology_SSD_Cache_White_Paper.pdf

 

What I suggest is much closer to real caching solutions which are generally used in these type of scenarios.

Link to comment

Here's a paper from synology on how they do caching - you'll see how close it is to what I suggest:

 

http://global.download.synology.com/download/Document/WhitePaper/Synology_SSD_Cache_White_Paper.pdf

 

What I suggest is much closer to real caching solutions which are generally used in these type of scenarios.

 

Remember that products like synology are using RAID-style capabilities which can implement caching at a filesystem level.

 

unRAID's philosophy is that each drive maintains its own filesystem which has some significant advantages over RAID. In short, the chances of a few mistimed events taking down the entire array are dramatically reduced. You might say the unRAID's cache feature is an "uncache" as it also operates as an autonomous file system. The block level features you describe are not appropriate for this type of structure.

 

The unRAID cache was written at a time when write performance was only around 11-12 MB/sec, and many users were seeing single digit. Caching was a godsend! By allowing files to be copied at or near network bottleneck speed, while the array performed the final copy to the parity protected array while a user wasn't waiting was very appealing. An update a number of years back significantly improved write performance, and with tweaking in today's world you can get writes up in the 40, 50, and some maybe as high as 60 MB/sec speeds, the caching advantage is much less. Remember that a write to cache is not protected by redundancy (although the new cache pools do give that flexibility). But IMO the day of the cache drive to speed media writes has somewhat ended. Today cache drives are used for storing docker images, VM disk images, downloaded files in intermediate form, and non-media application data with high performance to support Dockers and VM applications.

 

In short, I think this is an interesting idea but non consistent with the unRAID architecture. I do think you raise some interesting ideas about error and recovery handling of out-of-space scenarios that deserve consideration and may be implemented with far less drastic changes to the "uncache" feature.

Link to comment

[*]Where files are being streamed from the NAS at a slower rate than the access rate of the array, the cache should be used to buffer reads from the array so as to avoid stalls from fragmentation etc. and to ensure that the user always gets both a smooth stream of data, and the cache finds utilisation.

[*]Where it can be detected that particular small clusters of data as being access repeatedly, those clusters would be precached by the system. Potentially writes of these cached items back to the array might be optimised to prevent regular waking of the array (eg log files).

In short, the cache works at a block level, and with the same resilience expectations as any other cache.

 

The posts previous to mine did a good job of discussing points 1 and 2.

 

I don't think that points 3 and 4 make much sense. Reading from the unRAID array is basically the equivalent of reading from an individual disk. Meaning I’m not sure how the cache disk can be used to solve the “bottleneck” you’ve described in point 3, as the bottleneck shouldn’t be the array.

Four is an interesting idea, but I think proper configuration of the programs that are making the logs / frequently accessed files can already solve this problem.  If you set the programs to install or reference a cache only share you’ve basically forced this without any additional overhead.

 

Link to comment

unRAID's philosophy is that each drive maintains its own filesystem which has some significant advantages over RAID. In short, the chances of a few mistimed events taking down the entire array are dramatically reduced. You might say the unRAID's cache feature is an "uncache" as it also operates as an autonomous file system. The block level features you describe are not appropriate for this type of structure.

 

Well, I obviously don't know the intricacies of UNRAIDs operation, but I kind of assume that since it's calculating the parity for the bytes written to the array file system disks, the writing has to operate fairly low level at some point, on the order of individual blocks - otherwise how would it know which block of the parity disk had to have which parity block corresponding to the data disks?

 

And fairly obviously, there is no specific reason of why what happens on the array has to be the same as what happens on the cache. In the end everything ends up being block level transfers - just with the organisation of the blocks in the file system being different between file systems.

 

Thought occurs, is there an SSD caching codebase that could be incorporated plug'n'play like into the read/write chain? Somewhere before the file and parity writes handled by UNRAID? It seems like something someone would have done in the Linux world.

 

eg https://wiki.debian.org/SSDOptimization

http://www.raid6.com.au/posts/SSD_caching/

https://www.varnish-software.com/blog/accelerating-your-hdd-dm-cache-or-bcache

 

The unRAID cache was written at a time when write performance was only around 11-12 MB/sec, and many users were seeing single digit. Caching was a godsend! By allowing files to be copied at or near network bottleneck speed, while the array performed the final copy to the parity protected array while a user wasn't waiting was very appealing. An update a number of years back significantly improved write performance, and with tweaking in today's world you can get writes up in the 40, 50, and some maybe as high as 60 MB/sec speeds, the caching advantage is much less.

 

The reason for suggesting this enhancement is that the cache seems 'more trouble than it's worth' as things stand. If I'm backing up a disk image of a 1TB disk, it's GOING to end up blowing past the end of the cache space and failing - so it seems not worth implementing a cache for the smaller files if I lose on nugatory failed writes on the bigger ones.

 

Link to comment
The reason for suggesting this enhancement is that the cache seems 'more trouble than it's worth' as things stand. If I'm backing up a disk image of a 1TB disk, it's GOING to end up blowing past the end of the cache space and failing - so it seems not worth implementing a cache for the smaller files if I lose on nugatory failed writes on the bigger ones.

That's one reason I tell the image software I'm using (True Image 2011) to split the image up into 50GB chunks.  But generally I only use the cache drive for VMs and plugins myself and reads and writes go directly to the array.
Link to comment
  • 2 weeks later...

I spent a good amount of time studying bcache, and I think eventually it will be really useful.  It suffers from a few issues.  First, it doesn't support any kind of redundancy. You can use dm layer to set up raid1, but there were reports of issues with that.  Second there are reported isues when it's backing btrfs file system.  Third, you can't configure multiple caches for multiple backing devices.  Fourth, the project doesn't seem to be very active  :(

Link to comment

The reason for suggesting this enhancement is that the cache seems 'more trouble than it's worth' as things stand. If I'm backing up a disk image of a 1TB disk, it's GOING to end up blowing past the end of the cache space and failing - so it seems not worth implementing a cache for the smaller files if I lose on nugatory failed writes on the bigger ones.

 

I don't feel this is a valid argument not to implement a cache drive at all. You implement one that is commensurate with your requirements. In your case size of Cache drive should be > 1.5TB. However you could always (as I do) define a backup share which does not use the cache drive so the backup goes straight to the array. However to say, I don't want faster writes on the transfer of ANY of my files because I don't want to work around the issue or buy a larger drive for the occasion I need to transfer a larger file (which is clearly important to you) sounds a bit like cutting your nose off to spite your face.

 

I'm starving and entree and dinner look great but there is no desert. And I really want desert. So Ill have nothing.

 

I personally would not use Unraid if there wasn't the cache feature as it is implemented today.

Link to comment

I spent a good amount of time studying bcache, and I think eventually it will be really useful.  It suffers from a few issues.  First, it doesn't support any kind of redundancy. You can use dm layer to set up raid1, but there were reports of issues with that.  Second there are reported isues when it's backing btrfs file system.  Third, you can't configure multiple caches for multiple backing devices.  Fourth, the project doesn't seem to be very active  :(

Thanks for actually looking at it. Obviously there are a few different cache solutions out there, but integrating with the way UNRAID does things and multiple different filesystems introduces complexities that you are much more versed in.

 

Personally I'm going to avoid the UNRAID cache drive and just use the SSD for dockers and VMs. The probability of suffering a write fail with the way it works at present just means it's not worth it (I can write at not far off the network speed limit anyway).

Link to comment

I spent a good amount of time studying bcache, and I think eventually it will be really useful.  It suffers from a few issues.  First, it doesn't support any kind of redundancy. You can use dm layer to set up raid1, but there were reports of issues with that.  Second there are reported isues when it's backing btrfs file system.  Third, you can't configure multiple caches for multiple backing devices.  Fourth, the project doesn't seem to be very active  :(

Thanks for actually looking at it. Obviously there are a few different cache solutions out there, but integrating with the way UNRAID does things and multiple different filesystems introduces complexities that you are much more versed in.

 

Personally I'm going to avoid the UNRAID cache drive and just use the SSD for dockers and VMs. The probability of suffering a write fail with the way it works at present just means it's not worth it (I can write at not far off the network speed limit anyway).

 

With a cache pool you have redundancy.

Link to comment

Personally I'm going to avoid the UNRAID cache drive and just use the SSD for dockers and VMs. The probability of suffering a write fail with the way it works at present just means it's not worth it (I can write at not far off the network speed limit anyway).

 

With a cache pool you have redundancy.

If you write directly to the array, you have parity.

 

If you have a true cache, then the time spend in the cache and not in the array gets much more limited (no mover script). That limits the probability of loss.

 

Hell, from a suggestion before, if you make it much easier to backup from the array to, say, a collection of USB drives, the average level of protection goes up (I'm prepared to bet that many arrays aren't backed up).

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...