Accelerator drives


jonp

Recommended Posts

Guys just an update to say there is no update. As it stands you officially cannot and should not use SSD in the array. There are several users that are doing it but since this might come down to specific drives and firmware being supported YMMV and therefore your risk.

 

As I hear more info you will because I REALLY want to stop having to use mechanical disks when an SSD would be a better/ideal fit

 

Maybe someone with an SSD or few SSD's in the array could test.

Add a bunch of files to the SSD. Do a parity check.

Remove some files.

Add some other files.

Do a parity check.

wait a few days,weeks or so.

Do another parity check.

 

This or possibly a more advanced suite of tests.

 

I would surmise that if people are doing monthly parity checks, this situation might have reared it's ugly head.

 

I have been doing this for several months now.  None of my parity checks come back with any errors. (always 0).

 

I have a 1TB SSD as my "accelerator drive"  where I move files to, from the cache drives weekly.  Then every so often when I see that the accelerator drive is getting within about 100GB of being full, I run another script that moves the files over to on of my 8TB Seagate archive drives.  So far I have run my script to move files off the Accelerator drive twice.  Freeing up about 250GB more from the SSD Accelerator drive.

 

I realize that it's not supported and that I may lose data.  But so far, it's working.  (Plus I have all my files rsynced over to a second unRAID server and important files backed up to the cloud.)

Link to comment

A while ago I did a test with 9 different SSDs and there weren't any sync errors after copying and deleting various GBs, naturally there's a lot more models and there could be issues with some, but I believe it's safe to use them in the array.

 

After those tests I've built my first SSD only server, at the moment has 24 SSDs soon to expand to 30, use it mostly with temp data, so there's a lot of writes and deletes, I do frequent parity checks and never had a single sync error, done various rebuilds for size upgrades, file integrity plugin never found a single cheksum error.

Link to comment

OK so let me see if I can summarise where we are:

 

As of today the official unRAID manual categorically states SSD should not be used.

 

Do not assign an SSD as a data/parity device. While unRAID won’t stop you from doing this, SSDs are only supported for use as cache devices due TRIM/discard and how it impacts parity protection. Using SSDs as data/parity devices is unsupported and may result in data loss at this time.

 

however this was written at a time when almost no work had been with unRAID and SSD.

 

The confirmed issue is solely TRIM/discard/garbage collection (confirmed internally directly with LT). The fear is that since unRAID works at the block level, any mechanism that alters disk blocks and does not report this to unRAID will invalidate parity and ultimately result in data loss should a disk fail and a recovery be necessary.

 

Two known mechanism for this happening exist, manual TRIM (running a command in the OS) or automatic TRIM (garbage collection by the disk firmware itself).

 

All know instances of parity becoming invalid are linked solely to manual TRIM which is no longer easily possible since recent version of unRAID have been altered to disallow TRIM again array and perhaps parity as well?

 

Where this leaves us. There is no known evidence that any modern SSDs' automatic TRIM mechanism breaks unRAID which makes sense as it would also break all other RAID products which no manufacturer wants. Unfortunately at best our evidence is anecdotal and based on logic and a small unRAID sample set but even still confidence is high it is correct.

 

Given this then the community can say that as long as you choose unRAID >v6 and a SSD with automatic TRIM/ garbage collection you can use it as either a parity or RAID drive.

 

Does this summary make sense, if so I will amend and update OP and link to the internal ticket.

Link to comment

It seems to me that it would be relatively easy to put together a program that can test whether there are any issues for any given SSD (and its controller).

 

You would start of by formatting the SSD as a given file system, and then filling it with files with random data.  The next step would be to create a block level image of that disk and write that to a file on a HDD.  The program would them mount both the SSD and the HDD file (as a loop device) and start making a mixture of file level and block level changes to both of them in parallel.  This can be interspersed with manual trim commands as well if necessary.  You can now do comparisons at the block level between the SSD and the loop level copy to see if any discrepancies can be found.  This would be MUCH faster process than trying to put the SSD into the array and subsequently seeing if any issues can be identified with parity. 

 

As this might vary with SSD and disk controller model then making this publically available could allow the community to do formalised testing of a large variety of controller/SSD combinations to see if there really IS any problem.  It would also not put any users data at risk so could be safely done on any unRAID system.

Link to comment

Where this leaves us. There is no known evidence that any modern SSDs' automatic TRIM mechanism breaks unRAID which makes sense as it would also break all other RAID products which no manufacturer wants.

 

Agree, e.g, I've been using a 4 SSD RAID5 in Windows with Intel Rapid Storage driver for years without any issue, trim is also not supported but never seen anyone say you can't use them this way, and it uses parity like unRAID.

Link to comment

It seems to me that it would be relatively easy to put together a program that can test whether there are any issues for any given SSD (and its controller).

 

You would start of by formatting the SSD as a given file system, and then filling it with files with random data.  The next step would be to create a block level image of that disk and write that to a file on a HDD.  The program would them mount both the SSD and the HDD file (as a loop device) and start making a mixture of file level and block level changes to both of them in parallel.  This can be interspersed with manual trim commands as well if necessary.  You can now do comparisons at the block level between the SSD and the loop level copy to see if any discrepancies can be found.  This would be MUCH faster process than trying to put the SSD into the array and subsequently seeing if any issues can be identified with parity. 

 

As this might vary with SSD and disk controller model then making this publically available could allow the community to do formalised testing of a large variety of controller/SSD combinations to see if there really IS any problem.  It would also not put any users data at risk so could be safely done on any unRAID system.

 

I think this is a very clever idea but I would be loathed to suggest the community takes on such an undertaking without LT sponsorship. We need to be working hand on this.

 

Where this leaves us. There is no known evidence that any modern SSDs' automatic TRIM mechanism breaks unRAID which makes sense as it would also break all other RAID products which no manufacturer wants.

 

Agree, e.g, I've been using a 4 SSD RAID5 in Windows with Intel Rapid Storage driver for years without any issue, trim is also not supported but never seen anyone say you can't use them this way, and it uses parity like unRAID.

 

I think I am leaning towards just accepting defeat on this as a feature and accepting the smaller win that it works for me and continuing to develop my own internal accelerator drives scripts and processes. In a moment of clarity I realised this is at least my third year trying to get this accepted and it hasn't even made it beyond the "unscheduled pile". There is obviously no interest upstream and that is fine, reality is what it is.

Link to comment

Part of me thinks the whole internal block level reassignment is partially FUD.

 

Where there is real data, an internal block move, for what ever reason in the firmware, should still equate to an external block request no matter where the firmware puts it.

Just think of how our high level file systems would show corruption as blocks are moved and reassigned internally as cells decay.

 

The issue being, what is returned from a block where the data has been deleted.

If trim is not engaged there shouldn't be an issue.

 

My understanding is, the firmware doesn't really know file system. It only knows blocks and if a trim command occurs on a block, it is erased and added to a free block list.

 

How does garbage collection come into play here?

How does the firmware know a block is no longer in use?

 

Does it?

 

 

If a block is re-assigned to a new spot, I would surmise the old block is tagged for garbage collection, yet a request for the orginal block gets data from the re-assigned spot.

Link to comment

It seems to me that it would be relatively easy to put together a program that can test whether there are any issues for any given SSD (and its controller).

 

You would start of by formatting the SSD as a given file system, and then filling it with files with random data.  The next step would be to create a block level image of that disk and write that to a file on a HDD.  The program would them mount both the SSD and the HDD file (as a loop device) and start making a mixture of file level and block level changes to both of them in parallel.  This can be interspersed with manual trim commands as well if necessary.  You can now do comparisons at the block level between the SSD and the loop level copy to see if any discrepancies can be found.  This would be MUCH faster process than trying to put the SSD into the array and subsequently seeing if any issues can be identified with parity. 

 

As this might vary with SSD and disk controller model then making this publically available could allow the community to do formalised testing of a large variety of controller/SSD combinations to see if there really IS any problem.  It would also not put any users data at risk so could be safely done on any unRAID system.

 

I think this is a very clever idea but I would be loathed to suggest the community takes on such an undertaking without LT sponsorship. We need to be working hand on this.

 

This seems like allot of work and may be easily prone to error.

 

Perhaps filling a SSD in the unRAID array, removing all of the files.

Stop the array, Trigger a trim or wait for garbage collection to occur.

Start the array and trigger a parity check.

 

Given some data on another array drive.

Replace that drive with a replacement candidate.

Rebuild the replacement.

Validate the new drive either by direct comparison or via hash sum checking.

 

 

Those are pretty much the steps that would occur if a drive required replacement.

Link to comment

My understanding is, the firmware doesn't really know file system. It only knows blocks and if a trim command occurs on a block, it is erased and added to a free block list.

 

How does garbage collection come into play here?

How does the firmware know a block is no longer in use?

 

Does it?

 

If a block is re-assigned to a new spot, I would surmise the old block is tagged for garbage collection, yet a request for the original block gets data from the re-assigned spot.

 

I suspect that is why the tool is called 'fstrim', because it is the agent between the file system and the trimmable device that conveys the list of available blocks to be 'discarded', trimmed, garbage-collected, etc.  Trimming in the device HAS to have that inside info from the file system (as far as I know!).  fstrim works with any file system that supports this discard operation.  I think BTRFS must have a form of fstrim built into it.

Link to comment

From what I gather via various reads we can't enable trim on an arraydrive.

If it's not enabled, the SSD will operate slower over time.

Yet that should make it safe for unRAID. as long as the trim/discards are not enabled.

 

This makes sense to me.

I thought this was a good read as far as trim. http://unix.stackexchange.com/questions/218076/ssd-how-often-should-i-do-fstrim

 

Trim triggers writes to the blocks directly without the OS knowing what's going on.

Thus actually erasing blocks. Invalidating parity.

 

I think the other part is how garbage collection works. Perhaps we are joining the two when they should not be.

Whle they work hand in hand to make the SSD efficient, Garbage collection may be safe where trim is definitely not.

 

According to what I recently read

Garbage collection without TRIM will always be moving all invalid data during the GC process acting like the SSD is operating at full capacity. Only the TRIM command can identify the invalid data and improve performance

 

I think BTRFS must have a form of fstrim built into it.

It is 'discard' aware if mounted with discard.

 

Does Btrfs support TRIM/discard?

 

There are two ways how to apply the discard:

 

during normal operation on any space that's going to be freed, enabled by mount option discard

 

on demand via the command fstrim

 

"-o discard" can have some negative consequences on performance on some SSDs or at least whether it adds worthwhile performance is up for debate depending on who you ask, and makes undeletion/recovery near impossible while being a security problem if you use dm-crypt underneath (see http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html ), therefore it is not enabled by default.

Link to comment

If I was to read this

 

http://xfs.org/index.php/FITRIM/discard

 

in isolation I would just assume it would work.

 

Where in unRAID stack does this break down?

 

It breaks down at the first requirement:

The block device underneath the filesystem must support the FITRIM operation.

 

In unRAID, the virtual device (/dev/md#) is underneath the filesystem and the virtual device does not support FITRIM.  unRAID would need to update the parity drive when the actual data drive (/dev/sdX) is trimmed. There was a little discussion about a year ago in this very thread if you want to think more about the details:

http://lime-technology.com/forum/index.php?topic=34434.msg382654#msg382654

 

 

I think the other part is how garbage collection works. Perhaps we are joining the two when they should not be.

Whle they work hand in hand to make the SSD efficient, Garbage collection may be safe where trim is definitely not.

 

 

This wikipedia article helped me differentiate between TRIM and garbage collection:

https://en.wikipedia.org/wiki/Write_amplification#BG-GC

 

The only part that concerns me is the section about Filesystem-aware garbage collection. From the referenced paper (pdf) and another one (pdf), it appears that some SSD devices did garbage collection after a Windows XP format operation. Windows XP did not have TRIM support but it appears some devices where aware of the format operation anyway and proceeded to do garbage collection.

 

So the question remains, are there any SSD devices that will do garbage collection on data that is still valid in unRAID? I think it is very unlikely but impossible to prove.

 

My little piece of anecdotal evidence: I've been running for over a year with an SSD accelerator drive (Silicon Power 240GB S60). I've run monthly parity checks and once did checksums on parity reconstructed data and found no issues. Some data has been cycled off and on and it has generally been over 95% full. I have no plans to stop using the SSD but I also have full backups.

Link to comment

Thanks for the link, much clearer read.

 

In the context of unRAID

 

"When a file is permanently deleted or the drive is formatted, the OS sends the TRIM command along with the LBAs that no longer contain valid data. This informs the SSD that the LBAs in use can be erased and reused. This reduces the LBAs needing to be moved during garbage collection".

 

What is this "permanently" delete option they refer to later and could it be the thing we need i.e. unRAID safe marking of "LBAs that no longer contain valid data"

Link to comment

Its sending the list of sectors that the file used to be on. This lets the ssd know what flash sectors it can erase.

 

The permenantly deleted means the sectors (lba logical block address) will be erased, where as a typical delete operation on a hard drive is typically just marking those sectors as being unused but the underlying data is still there, so if the filesystem had an undelete operation the file contents could be restored. However in a Flash SSD when the sectors are erased, the flash cells are reset to all zeros behind the scenes (erased).

 

In order for parity to be preserved whenever a file is deleted and a TRIM command is set, the device driver must treat those sectors as being set to all zeros. The trouble comes in that the ssd drive may not immediately erase the sectors but could put it in a list of sectors are available to be erased at some later point in time. It all depends on how much free processing time the ssd has available as to when the sectors will be cleared.

Link to comment

Yeah i thought i was on to something here but it still doesn't matter since there is no magic delete+trim single command that bypasses the limitation we are seeing.

 

So that is us out of options. We buy SSD that have built in garbage collection (most these days) and check parity more often. If we see anyone finding a gotcha we can post in this thread, confirm the issue and then push upsteam to LT ... but short of that are at the limit of what we can do beyond anecdotal testing.

 

Not all doom and gloom because it fundamentally works, just no one likes using unsupported features.

 

 

Link to comment
  • 2 weeks later...

Official reply from another thread

 

Building an all-flash array is something we plan on experimenting with in the future.  For now, we don't have any official support for it other than allowing you to do it in the webGui.  Bottom line:  your mileage may vary.  It all comes down to the fact that we can't officially support what we don't test ourselves.  We also think there may be a number of tuning changes we would need to make to get the most of out SSD array devices, but then again, there are some users in this forum that have an all flash array already and report pretty decent performance.

Link to comment

This is discussed as implemented via user shares, which are significantly slower that direct disk shares.

 

So while you will accelerate some accesses, you will slow down accesses to files that are not on the "accelerator" drive versus accessing them via disk shares.

 

 

Link to comment
  • 1 month later...

Could the new 'prefer cache' mode added in 6.2rc1 have per share file size and/or extension options added to achieve this goal for those with a large enough cache pool?

 

Example = User Share 'Movies' could be set to 'prefer cache' for all files <5mb and/or all files with a .jpg, .nfo, .srt extension.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.