SSDs as array drives question


Guest

Recommended Posts

I read in the getting started guide that this isnt a good idea because of no TRIM support. What do people think of this, is this a big issue? Do others have all SSD as their array? Please forgive the newbie question  :)

Link to comment

Depends on your use case, I had an SSD only array for some time, ended up moving them to a RAID10 cache pool due to the low write speed on the array, tried various SSDs but but never got sustained writes above 100-150MB/s, I still use a couple of SSDs on my main server due to low power and read speed (write speed is not important here because I use a NVMe device for cache)

Link to comment

 

I think the question is why would I want to use SSDs as array drives?

 

The array is designed to give you a large, cheap, protected, storage space.

SSDs are not large well not the cheap ones anyway

TRIM would mess with the already calculated parity so you would lose protection (maybe there is a way around this)

 

SSDs are good for speedy data access but if your accessing you array over gigabit ethernet a decent HDD will saturate this (on reads at least anyway as these is a slight slowdown when writing due to parity being calculated/written

 

You would probably be better off building an array to protect your archived (does not change very often) data and then having a protected cache made up of SSDs

you can assign shares to live on just the cache drive so you would have the benefit of speed and avoid the problems of having SSDs in your array

Link to comment

TRIM would mess with the already calculated parity so you would lose protection (maybe there is a way around this)

 

No, trim doesn't work on array devices, parity is maintained (but write performance takes a hit).

 

SSDs are good for speedy data access but if your accessing you array over gigabit ethernet a decent HDD will saturate this (on reads at least anyway as these is a slight slowdown when writing due to parity being calculated/written

 

True, but they can be worth using for the read speeds when using 10GbE, I get 500MB/s read speed from my array SSDs.

 

Like I said, depends on your use case, but they can be a valid choice.

Link to comment

A file system issues TRIM commands as a result of deleting a file to tell the ssd that the set of blocks which previously made up the file are no longer being used.  The sdd can then mark those blocks as 'free'.  Later when the ssd internal garbage collection runs, then it knows that it doesn't have to preserve the contents of those blocks.  This makes garbage collection more efficient.  There are lots of articles that explain this.

 

The trouble this causes for parity-based array organizations is that the data returned from a TRIM'ed data block can be indeterminate.  This paper is a bit wordy but lays it out on p. 13:

 

Since the parity information must always be consistent with the data, it has

to be updated after a TRIM command was processed. A useful drive characteristic

that would ease the maintenance of a consistent state in parity-based RAIDs

would be that subsequent read requests to a trimmed LBA range always return

the same data. However, the SATA standard [14] tolerates that subsequent reads

of trimmed logical blocks may return different data. The exact behavior of a subsequent

read request to a trimmed sector is reported by a particular SATA device

when the IDENTIFY DEVICE command [14] is issued to it. There are three

possibilities: The first is that each read of a trimmed sector may return different

data, i.e., an SSD shows non-deterministic trim behavior. We denote this variant

of trim behavior as "ND_TRIM" for the rest of this report. The remaining two possibilities

represent a deterministic trim behavior, where all subsequent reads of a

trimmed logical block return the same data, which can be either arbitrary (denoted

as "DX_TRIM") or contain only zero-valued bytes (denoted as "DZ_TRIM").

The SATA standard [14] leaves for the variant DX_TRIM open whether the returned

data will be the same for dfferent sectors or not.

 

To boil this down for unRAID: it should work to use SSD's in an unRaid P or P+Q array if TRIM is not used.  This is current behavior.  However note that:

a) Write performance can degrade faster on data disks depending on how many file deletions take place.

b) The parity disk is also written for each data disk write.

c) The data disks really should be completely written first because theoretically a block that was never written from the point of view of the SSD, can return non-deterministic data for those blocks.  We have not seen this happen, but then again we have not run too many SSD arrays (it would show up as parity sync errors).  This is pretty undesirable thing to do however since it will guarantee slowing down subsequent writes.

d) If you don't want to pre-write the disks as above, then only use SSD's that support "DX_TRIM" or "DZ_TRIM", and instead of writing the disks with zeros, simply use 'blkdisard' command to first TRIM the entire device instead.

 

You can use the 'hdparm' command to determine if your SSD's have this support:

 

hdparm -I /dev/sdX   # substitute X for your ssd device assignment

 

You want to look near the end of the "Commands/features:" section for:

 

          *    Data Set Management TRIM supported

 

Following this will either see this:

 

          *    Deterministic read data after TRIM

 

or you will see this:

 

          *    Deterministic read zeros after TRIM

 

or you won't see either of the above (if this is the case, do not use in unRAID P or P+Q array).

 

In a future release we do plan to add proper TRIM support to array disks.  Here's a heads-up on that.  In order to support TRIM in unRaid P or P+Q array, we must add code to the md/unraid driver and all SSD's in the array must support either "DX_TRIM" or "DZ_TRIM" mode as described above.  In addition there's a really good chance we will only support SSD's that support "DZ_TRIM" since to support "DX_TRIM" is a lot more work  ;)

 

  • Like 1
  • Thanks 1
Link to comment

Thanks for the detailed info, I hope to see trim support on the array in the future, the reason I stopped using my SSD only array was the deteriorating write performance I was getting because of the lack of trim, never more than 100 to 150MB/s and sometimes much less, like 50MB/s writes.

 

I did use it a lot and never got a sync error, when I get the chance I'll check the type of trim each different model used.

 

I do still use a couple of SSDs on my main array, for these I only care about the read speed, still trim support would be always nice for endurance, also never got a single sync error, and I been using them for about 6 months, just checked and they are DZ_TRIM.

Link to comment

Thanks for the detailed info, I hope to see trim support on the array in the future, the reason I stopped using my SSD only array was the deteriorating write performance I was getting because of the lack of trim, never more than 100 to 150MB/s and sometimes much less, like 50MB/s writes.

 

I did use it a lot and never got a sync error, when I get the chance I'll check the type of trim each different model used.

 

I do still use a couple of SSDs on my main array, for these I only care about the read speed, still trim support would be always nice for endurance, also never got a single sync error, and I been using them for about 6 months, just checked and they are DZ_TRIM.

 

That's good info, thanks.  I'm still a bit worried about Parity updates slowing down the writes even with TRIM.  This is because with those DZ_TRIM devices we can treat TRIM like a "WRITE all zeros" and update parity accordingly, but probably Parity will not be all zeros.  A refinement would be to check if data to be written to Parity is all-zeros and if so, instead doing actual write, send down TRIM instead.  Not sure how this would affect performance, I think TRIM is one of those commands that causes a queue draining, which may also impact performance.  ::)

Link to comment

Superb summary.

 

Can I suggest that in the interim including some easy indicator in emHTTP as to which of the 3 states ("DX_TRIM" or "DZ_TRIM" or "Unsupported") should be added ASAP.

 

This will raise visibilty in the community and start the natural process of recommendations and more importantly removals

Link to comment
  • 1 year later...

Have there been any updates on TRIM support for SSDs in the main array?

 

I would like to keep my music and documents on an SSD to keep all of my large HDDs (Movie Drives) spun down on the array. I have been trying to setup the Sync Docker to keep it in unashinged devices, but am having no luck. 

 

I am curious if TRIM is coming soon to the main array, so I can store my SSD there for now. I am not concerned about write speeds since I will be using a cache drive. My main goals are focused on fast seek times, data parity, low power consumption and drive health.

Link to comment
3 minutes ago, Twisted said:

Have there been any updates on TRIM support for SSDs in the main array?

 

I would like to keep my music and documents on an SSD to keep all of my large HDDs (Movie Drives) spun down on the array. I have been trying to setup the Sync Docker to keep it in unashinged devices, but am having no luck. 

 

I am curious if TRIM is coming soon to the main array, so I can store my SSD there for now. I am not concerned about write speeds since I will be using a cache drive. My main goals are focused on fast seek times, data parity, low power consumption and drive health.

Another approach for you would be to setup a cache pool and get redundancy that way. I have several cache-only shares for frequently accessed files like music.

Link to comment

@trurl Thank you for the advice. I was going to go this way initially, but I just bought a 500GB SSD and I have a 250GB SSD laying around and didn't want to send back the 500 for a 250. I am worried 250GB may not be enough space. I already have a 250GB M.2 drive for my VMs and Dockers, so I don't see any other use case for the 250GB SSD. If I cant get Sync to work and TRIM support is not coming, I may just send back the 500GB SSD.

Link to comment
11 minutes ago, Twisted said:

How do you handle redundancy?

System doesn't it's just JBOD. For Docker/apps I use the CA backup utility. For my VMs I make a manual copy of my vhd's to a vmBackup folder on my array. Everything else is either Mover for the shares. Which just leaves the Steam games I leave on the cache, and to me if I lost this I only see lost time as I just have to download it from Steam again (for me this isn't a problem).   

 

So you're correct, my cache is not protected - I'm fine with that, and take steps to cover my bum. :)

Link to comment
4 minutes ago, Twisted said:

My challenge is I need redundancy. So i need to put an SSD in the main array or figure out how to setup Sync. So if TRIM support is coming, I am not going to hassle with trying to find an alternate solution.

 

I would not hold my breath for unRAID to offically support SSD arrays. I don't know how SSD parity would ever work. The whole idea of unused space has no real meaning with parity, unless we treat a block of parity as unused if the corresponding blocks on all disks are unused. unRAID would have to be very aware of the inner workings of the file systems to know. And if you filled a disk, parity would always be considered full and never able to be trimmed - hence would slow down. I think we're looking at a paradigm shift for SSDs to be supported. You could use SSDs in the array if parity is a spinner and trim runs periodically (let's say ever 2 months), and that when you trim all the SSDs, you rebuild parity. Not sure it is worth it.

 

So I'd stick with spinners for the array. If you have a specific use case that requires faster speed and redundancy, go with a cache pool and you'd get speed and redundancy . Or go with a RAID card in the unRAID server and run it as a UD alongside the array on the server.

Link to comment
15 minutes ago, SSD said:

You could use SSDs in the array if parity is a spinner and trim runs periodically (let's say ever 2 months), and that when you trim all the SSDs, you rebuild parity.

That triggered an idea. Perhaps offer the option of trim SSD data drives + rebuild parity zeroing it first if it was SSD as well, with the warning that until the function has completed, your array data (ALL OF IT) is at risk. For some, that could be an acceptable risk in return for the performance return.

 

Some SSD's have good enough spare sector wear leveling and internal management that they don't suffer from performance drops nearly as bad as other models. Perhaps if the incentive was there, we could figure out which specific SSD's could be used as is right now. @johnnie.black did some testing a while back which seemed to indicate that certain models worked fine in unraid as array drives in certain circumstances, even without trim enabled.

Link to comment
3 hours ago, jonathanm said:

That triggered an idea. Perhaps offer the option of trim SSD data drives + rebuild parity zeroing it first if it was SSD as well, with the warning that until the function has completed, your array data (ALL OF IT) is at risk. For some, that could be an acceptable risk in return for the performance return.

 

Some SSD's have good enough spare sector wear leveling and internal management that they don't suffer from performance drops nearly as bad as other models. Perhaps if the incentive was there, we could figure out which specific SSD's could be used as is right now. @johnnie.black did some testing a while back which seemed to indicate that certain models worked fine in unraid as array drives in certain circumstances, even without trim enabled.

 

Rebuilding parity is a particularly expensive operation for an SSD given its limited # of writes. 

Link to comment
8 minutes ago, Twisted said:

Has anyone tried to use a Docker Sync to pull data off of an unassigned SSD and copy it to the main array nightly? I think this seems to be the best option for those of us who have a random stack of SSD sizes laying around.

 

I have not. Instead of using an unassigned device ssd, could you make the ssd the cache. Then I think the mover would do what you want.

Link to comment
8 hours ago, SSD said:

And if you filled a disk, parity would always be considered full and never able to be trimmed - hence would slow down.

There is no obligatory need to trim SSD. It's just a question of which SSD you buy - an SSD with a hidden overprovisioning pool will handle block erases in the background as it rotates flash blocks in/out of the overprovisioning pool, making it possible to treat the SSD as a normal HDD.

 

So a RAID with SSD can work very well, and there are already lots of products that runs RAID on SSD.

Link to comment
3 hours ago, pwm said:

There is no obligatory need to trim SSD. It's just a question of which SSD you buy - an SSD with a hidden overprovisioning pool will handle block erases in the background as it rotates flash blocks in/out of the overprovisioning pool, making it possible to treat the SSD as a normal HDD.

 

So a RAID with SSD can work very well, and there are already lots of products that runs RAID on SSD.

 

Looking at the Samsung 960 PRO, the endurance of the 512G version is 400TB. That basically means that it can be filled 800 times. That is a pretty big number.

 

But I am assuming that is looking at a usage pattern in which wear is effectively managed. If wear were to be poorly managed, let's say the hardest hit blocks get hit 10x more than other parts, that number goes from 800 to 80. (Maybe that is not realistic - I am not an expert. But without trim I am not sure.) Running frequent parity rebuilds re-writing every sector would take chunks out of its lifespan that would add up.

 

Generally I would say that the cache pool is a better place for data requiring high speed access. That or a HW RAID controller that is designed with SSD usage in mind, running as a UD. High speed access to a large media library really doesn't make a lot of sense.

Link to comment
26 minutes ago, SSD said:

But I am assuming that is looking at a usage pattern in which wear is effectively managed. If wear were to be poorly managed, let's say the hardest hit blocks get hit 10x more than other parts, that number goes from 800 to 80. (Maybe that is not realistic - I am not an expert. But without trim I am not sure.)

The SSD is responsible for wear-leveling. So it will regularly change the internal mapping what flash blocks that will be used for different LBA written from the OS.

 

A more troubling issue is write amplification - how very small writes will count as large writes when it comes to actual drive wear. If the SSD have 128 kB flash blocks, then it isn't always possible for the drive to fit 128 kB of writes on the block before it needs to erase the block and restart. And when it's time to erase a flash block, the drive sometimes has part of the block in use and so have to copy that data to a new block before it can append the new write - so the drive then performs additional internal writes besides the writes sent over the SATA cable.

 

Some SSD has a specific SMART attribute that specifies the amount of write amplification.

Link to comment
8 minutes ago, pwm said:

The SSD is responsible for wear-leveling. So it will regularly change the internal mapping what flash blocks that will be used for different LBA written from the OS.

 

A more troubling issue is write amplification - how very small writes will count as large writes when it comes to actual drive wear. If the SSD have 128 kB flash blocks, then it isn't always possible for the drive to fit 128 kB of writes on the block before it needs to erase the block and restart. And when it's time to erase a flash block, the drive sometimes has part of the block in use and so have to copy that data to a new block before it can append the new write - so the drive then performs additional internal writes besides the writes sent over the SATA cable.

 

Some SSD has a specific SMART attribute that specifies the amount of write amplification.

 

Would Plex metadata create a significant write amplification issue?  Plex metadata files are tiny, but there are a zillion of them!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.