SSD cache, HDD array, no parity, no move, only duplicate


b10011

Recommended Posts

I have found out that you cannot (or at least shouldn't) create all-SSD array with parity. My current plan would be the following:

  • 2x 2TB SSDs as cache
  • 1x 4TB HDD in array
  • No parity (at least for now)
  • The data will be never moved to the HDD, only duplicated

This setup would be good for me because

  • Really fast file access (going to have point-to-point 10GbE between desktop and unraid server) (I do understand that it's overkill for the time being and 2xSSD are still slower than 10GbE in practice)
  • TRIM can be kept on for the SSDs
  • Data is secured from corruption and single drive failure (whether in the cache or in the array)
  • HDDs are cheap, 4TB WD RED costs 18% of the 2x 2TB Samsung SSD price here so data duplication to HDDs is almost free
  • The only time the HDD array would spin up would be when the data gets copied (once a day)
  • Cache SSDs and array HDDs could be added as needed, no need to rebuild a real RAID every time
  • SSDs can differ in sizes, now I have 2x 2TB, few years and I'll most likely have 2x 2TB + 2x 4TB and so on as the prices go down

So the questions are:

Did I miss something important?

Is there a way to do this without tinkering or hacky solutions, are there plugins for this purpose?

If tinkering is required, what would be the most reasonable way?

Link to comment
8 minutes ago, gareth_iowc said:

I've been running a fully ssd array for over a year now with no problems. Even without trim

 

I don't use parity as it's just a gaming pc.

 

I have understood that is supported, but I need parity and that on the other hand is not supported. If I added HDD parity, everything would slow down significantly and I wouldn't want that. The only solution I came up with is the one I mentioned.

Link to comment
8 minutes ago, b10011 said:

but I need parity and that on the other hand is not supported

You can have an SSD as parity, AFAIK there's just one SSD model (Kingston SV300) that would cause a couple of sync errors after a power cycle, all others I tested worked fine, except for the already mentioned lack of trim, I've been using an all SSD small array for a while without issues, mostly to test how lack of trim will affect performance over time.

Link to comment
2 hours ago, b10011 said:

I have understood that is supported, but I need parity and that on the other hand is not supported. If I added HDD parity, everything would slow down significantly and I wouldn't want that. The only solution I came up with is the one I mentioned.

Beside what johnnie already mentioned, you can opt for a very large HDD (compared to your SSD size).

That way when you are writing, you would be using the fastest section of the HDD, which (from my experience) is near 200 MB/s. So your array will read at perhaps 400MB/s and write at about half of that (assuming turbo write is on - and it should be turned on for an SSD-based array). Pretty decent I would say.

 

Impact from lack of trim is tiny bit overblown. Most home uses tend to be read-heavy which isn't affected by the lack of trim.

 

Also note that having SSD in the cache = RAID1. It's fine now with 2x2TB but remember RAID1 will only protect you against 1 drive failure, even with 2x2TB + 2x4TB (same situation with RAID10). Would you then still be happy with sacrificing 6TB of available storage for exactly the same benefit?

 

TL;DR: with the plan you mention, you should put the SSD's in the array.

Link to comment

Forgot to mention, if possible use a faster SSD for parity, more endurance is also a plus, because parity will need to handle much more writes than the array devices and can never be trimmed, even if/when LT adds trim support for the array, since there's no filesystem to trim on parity., e.g.:, I'm currently using a WD black NVMe device for parity and WD Blue 3D SATA devices for data .

 

 

Link to comment
40 minutes ago, testdasi said:

Beside what johnnie already mentioned, you can opt for a very large HDD (compared to your SSD size).

That way when you are writing, you would be using the fastest section of the HDD, which (from my experience) is near 200 MB/s. So your array will read at perhaps 400MB/s and write at about half of that (assuming turbo write is on - and it should be turned on for an SSD-based array). Pretty decent I would say.

 

Impact from lack of trim is tiny bit overblown. Most home uses tend to be read-heavy which isn't affected by the lack of trim.

 

Also note that having SSD in the cache = RAID1. It's fine now with 2x2TB but remember RAID1 will only protect you against 1 drive failure, even with 2x2TB + 2x4TB (same situation with RAID10). Would you then still be happy with sacrificing 6TB of available storage for exactly the same benefit?

 

TL;DR: with the plan you mention, you should put the SSD's in the array.

200-400 MB/s is approximately 20-40% of the speed that the connection can handle. And that would only get worse over time.

 

SSDs in cache are RAID1? And it cannot be used as RAID0? Because I would want to run cache disks as a combined drive that is fast with zero redundancy. And every night the server would duplicate the SSD cache data to HDDs in array. That way I would have 1 drive redundancy at least for the data from the last night.

 

Only way I would see it working as an all-SSD array would be to have Samsung EVO drives in the array and a PRO M.2 drive as parity (so that it's fast enough for the 10 GbE). But M.2 isn't that easy to put into the array (would need lots of PCIe M.2 adapters) so it would have to be large enough (say 4TB) so that I could expand the array with up to 4TB drives. After I need to go over 4TB, I would have to throw the M.2 into trash and get larger M.2 + larger array drives. That sounds way more expensive than zero redundancy SSD cache with HDD duplication every night.

Link to comment
3 minutes ago, b10011 said:

200-400 MB/s is approximately 20-40% of the speed that the connection can handle. And that would only get worse over time.

 

SSDs in cache are RAID1? And it cannot be used as RAID0? Because I would want to run cache disks as a combined drive that is fast with zero redundancy. And every night the server would duplicate the SSD cache data to HDDs in array. That way I would have 1 drive redundancy at least for the data from the last night.

 

Only way I would see it working as an all-SSD array would be to have Samsung EVO drives in the array and a PRO M.2 drive as parity (so that it's fast enough for the 10 GbE). But M.2 isn't that easy to put into the array (would need lots of PCIe M.2 adapters) so it would have to be large enough (say 4TB) so that I could expand the array with up to 4TB drives. After I need to go over 4TB, I would have to throw the M.2 into trash and get larger M.2 + larger array drives. That sounds way more expensive than zero redundancy SSD cache with HDD duplication every night.

Typical SATA SSD can reach 550MB/s (or so) but only sequential and on a barebone system during benchmark. 400MB/s is a more optimistic in-real-life estimate (again, sequential).

For random IO, network latency is the main bottleneck.

 

Your read speed will not get worse over time (in the sense of with vs without trim).

What is your use case?

As I said, most home uses tend to be read-heavy.

 

You seem to be heavily interested in trying to theoretically max out your 10GbE LAN but that is rather misguided in my opinion, especially as you mentioned "point-to-point 10GbE between desktop and unraid server".

For maximum speed, you want your SSD to be in your desktop with Unraid server as a pure backup of the desktop.

 

As someone who have seen and personally experience data loss, I don't recommend RAID-0 as a matter of principle, regardless of backup strategy. Period.

Link to comment
2 minutes ago, testdasi said:

Typical SATA SSD can reach 550MB/s (or so) but only sequential and on a barebone system during benchmark. 400MB/s is a more optimistic in-real-life estimate (again, sequential).

For random IO, network latency is the main bottleneck.

 

Your read speed will not get worse over time (in the sense of with vs without trim).

What is your use case?

As I said, most home uses tend to be read-heavy.

 

You seem to be heavily interested in trying to theoretically max out your 10GbE LAN but that is rather misguided in my opinion, especially as you mentioned "point-to-point 10GbE between desktop and unraid server".

For maximum speed, you want your SSD to be in your desktop with Unraid server as a pure backup of the desktop.

 

As someone who have seen and personally experience data loss, I don't recommend RAID-0 as a matter of principle, regardless of backup strategy. Period.

Fast connection: AI training from datasets at the server + fast desktop backup

 

I can deal with losing data from past 1 day. After all I'm going to have daily backup to the array + off-site. Most of the work I do will be "backed up" in git, photos are also in Dropbox (easier for phone) and anything else isn't so important that I couldn't risk it for 1 day.

Link to comment

Lots of commentary on SSDs in the array so I won't discuss that and I don't have any experience with that anyway.

 

As for your original idea. There are a couple of things that come up when trying to actually implement that idea. User Shares and Mover.

 

Mover doesn't copy, it only moves. So some other solution would need to be implemented to get your cache backed up to the array.

 

Cache is part of User Shares, so having identical files and folders on both cache and array would mean duplicated files and folders in the User Shares. I'm not sure which would win out when accessing the User Shares.

 

Link to comment
4 minutes ago, trurl said:

Cache is part of User Shares, so having identical files and folders on both cache and array would mean duplicated files and folders in the User Shares. I'm not sure which would win out when accessing the User Shares.

In my experience cache seems to win out, but since this is not documented anywhere I would not want to rely on that staying true in the future.

Link to comment
12 minutes ago, itimpi said:

In my experience cache seems to win out

Cache is always first if duplicates exists, but probably better to create a different share on the array and then use for example rsync to make a daily backup, you even use btrfs to snapshot and send incremental backup with send send/receive, assuming all data on cache fits on a single array disk.

Link to comment

Interesting!

 

Stupid question, would unraid support SSD TRIM if I didn't use parity for the array?

 

Anyway, I probably have to go for the following configuration:

  • unraid SSD array, no parity
  • (a real) hardware RAID0 HDD array (just so that all drives appear as one)
  • automation of the copying process

That way I have 1 drive redundancy as any one drive breaking destroys either SSD or HDD array, but it leaves open the possibility to add M.2 parity later.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.