Multi Hundred TBs


Mat1926

Recommended Posts

I don't know much about the math behind the parity calculations, so I am curios are there any limitations in the total number of parity disks that we can have? What kind of storage solutions/technology is recommended for multi hundred TBs?

 

Thnx 

Edited by Mat1926
Link to comment
6 hours ago, ashman70 said:

Currently unRAIDis limited to two parity disks.

 

Not sure what you mean by your second question. Currently an unRAID pro license is limited to 30 drives which includes one or two parity and the rest data. So if you had 29x12TB drives and one 12TB parity drive, you could have an array of 348TB.

 

Sorry, I was not clear. What I meant by my 1st question was that, is it possible for unRaid to support more than dual parities in the near future? Or is this a limitation caused by the technology used and we can't have more than dual parities? The 2nd one, what I was referring to was that if I am going to have several hundreds of TBs, I think I prefer to have more than 1-2 parity disks...and since unRaid currently does not support more than dual parity, what are my options?!

 

Thnx

Link to comment

The parity limitation is, I believe, imposed by Lime-Tech who created unRAID. With respect to parity, it's not so much how much data you have but how many drives you have, IMO. The more drives you have the higher the risk of drive failure IMO. Also, perhaps you should read up on how parity works in unRAID, it's not like traditional hardware raid.

Link to comment

I would think twice about having so much data in a single machine - with any type of RAID.

 

One single PSU failure can kill all drives and then it's a quite large job of restoring several hundred TB from backup. Not to mention that it takes a bit of time to backup several hundred TB if we assume that you are talking about living data and not just a large archive of mostly static files.

 

If the files are important then the storage system should be able to validate the consistency of the storage pool at least once/month. So if you have 300 TB that would mean either test 300 TB once/month or 100 TB every 10 days or 10 TB every day. That would be an average of 400 GB/hour or 7 GB/minute or 120 MB/second. If you "only" have 200 TB, then you would "only" need to validate 80 MB/s on average - year in, year out, and besides the normal disk accesses. Without very special hardware, you would either have to step down the storage pool validation or settle for just running regular extended SMART tests and hope that you don't have issues with inconsistent parity or silent data errors.

 

Larger storage pools are running multiple arrays and with block checksums and redundant storage to be able to scale the load while still catching storage errors.

  • Upvote 1
Link to comment
1 hour ago, pwm said:

I would think twice about having so much data in a single machine - with any type of RAID.

 

One single PSU failure can kill all drives and then it's a quite large job of restoring several hundred TB from backup. Not to mention that it takes a bit of time to backup several hundred TB if we assume that you are talking about living data and not just a large archive of mostly static files.

 

If the files are important then the storage system should be able to validate the consistency of the storage pool at least once/month. So if you have 300 TB that would mean either test 300 TB once/month or 100 TB every 10 days or 10 TB every day. That would be an average of 400 GB/hour or 7 GB/minute or 120 MB/second. If you "only" have 200 TB, then you would "only" need to validate 80 MB/s on average - year in, year out, and besides the normal disk accesses. Without very special hardware, you would either have to step down the storage pool validation or settle for just running regular extended SMART tests and hope that you don't have issues with inconsistent parity or silent data errors.

 

Larger storage pools are running multiple arrays and with block checksums and redundant storage to be able to scale the load while still catching storage errors.

 

The parity check time is more dependent on the size of parity than the data size of the array. Drives are processing in parallel, so with 29 data drives running in parallel, you might well be processing 4000+ MB/sec, and complete in a day or a little longer. 

 

Parity calculation can be very efficiently implemented with blazing fast XOR operations for single and dual parity, but when you go beyond, the technique requires math functions which are slower. Going beyond 2 parties would further slow writes and other parity operations. Maybe with fast hardware this would not be a bottleneck, but as of Tom's last update, he was not planning to go beyond 2.

Link to comment
23 minutes ago, SSD said:

but as of Tom's last update, he was not planning to go beyond 2.

I feel dual parity is appropriate for a 28 data disk max array size, especially considering that if you have additional failures you don't loose the entire array like e.g. with zfs, if one day that limit is increased say to 60 or 100, then triple parity would be a good option to have.

Link to comment
20 minutes ago, johnnie.black said:

I feel dual parity is appropriate for a 28 data disk max array size, especially considering that if you have additional failures you don't loose the entire array like e.g. with zfs, if one day that limit is increased say to 60 or 100, then triple parity would be a good option to have.

 

@Mat1926

I tend to agree. Once you overcome issues of loose cables (I recommend hot swap bays) and eliminate Marvell controllers, drives dropping due to true failures are exceedingly rare. Taking action on SMART issues reduces the risks even further. There are certainly risks with that much data, but I'd certainly rather have them in an unRaid where a catastrophic event like a flood that wipes out some disks leaves the others with data intact!

 

I've operated with an array of 20 drives with single parity at times, and think dual parity with 28 is reasonable if you use best practices.

Link to comment
1 hour ago, johnnie.black said:

I feel dual parity is appropriate for a 28 data disk max array size, especially considering that if you have additional failures you don't loose the entire array like e.g. with zfs, if one day that limit is increased say to 60 or 100, then triple parity would be a good option to have.

Yes, unRAID doesn't need as many parity drives as a traditional RAID where you get a 100% data loss if one drive too many fails.

Link to comment
1 hour ago, SSD said:

The parity check time is more dependent on the size of parity than the data size of the array. Drives are processing in parallel, so with 29 data drives running in parallel, you might well be processing 4000+ MB/sec, and complete in a day or a little longer. 

Of course parity test runs concurrently on multiple disks. But the transfer speed of larger disks doesn't scale with their increased size, so a 300 TB system will have a quite significant work to do. If you have 30 12 TB drives and they each average 100 MB/s then that means 3 GB/s to stream through the controller cards and to compute the two parities for and each parity test will take about 1 day and 9 hours - any additional disk accesses to a single drive during this task will affect the parity scan drastically. A parity scan that takes more than 24 hours means it isn't possible to run a nightly backup process without interfering badly with the parity scan.

 

unRAID is just not well matched to this kind of usage. It would have been way better if unRAID could support multiple arrays - having two 14+2 arrays or three 9+2 arrays would be much more manageable. And having support for partial parity scans where a user could specify that the scan is given a 3-4 hour window/night and the system continues for the number of nights needed would also make unRAID better suited for the task.

Link to comment

@pwm, @Mat1926

 

I have requested a feature to allow parity checks to be split so they can be run over successive nights. We'll see but am optimistic this is getting serious consideration.

 

100MB/sec is very slow for newer larger disks. My 8T barely get into the 90s on the innermost cylinders. I'd estimate at least 25% faster average.

 

Multiple arrays are possible if you run one as a VM. I prefer the one big array or 2 arrays on two separate boxes. The latter especially if the initial array is going to max out unRaid. A person definitely needs room to grow!

Link to comment
22 minutes ago, SSD said:

Multiple arrays are possible if you run one as a VM. I prefer the one big array or 2 arrays on two separate boxes. The latter especially if the initial array is going to max out unRaid. A person definitely needs room to grow!

I run multiple mid-sized unRAID because I don't think 30-disk arrays is a good solution. I desperately want support for multiple arrays and for partial, sequential, parity scans.

 

24 minutes ago, SSD said:

100MB/sec is very slow for newer larger disks.

But 3GB/s of aggregated bandwidth is not very slow for controller cards and there are lots of ways to fail to reach the maximum theoretical bandwidth.


What are the best parity scan speeds seen for unRAID?

Link to comment
35 minutes ago, pwm said:

What are the best parity scan speeds seen for unRAID?

As fast as the hardware allows it, this is my 30 drive array, limit here are the WD green drives that max out at 150MB/s:

 

5a89ae382eef6_Screenshot2017-04-2713_59_34.thumb.png.fe773a277f2af69f1bb4fbdd4b1a5e74.png

 

And this is my record, when I had the SSD only array:

 

5a89ae92d9eb3_Screenshot2017-04-0820_14_26.thumb.png.de643aba7710af7b1da78189893823d9.png

 

Note that the 26 disks on the stats plugin is a bug, it was a 30 disk array, parity check ran close to 300MB/s

 

Link to comment
3 minutes ago, SSD said:

 

Looks like a backup server to me. The 4T parity and I am seeing some 2T drives at the bottom. Mine is similar.

 

That screenshot is from some time ago, that server now has thirty 2 and 3TB drives and less capacity than it was at the time, 76TB, it acts as backup of another server with 21 x 4TB drives and the same total capacity.

Link to comment

How valuable is the data?

 

This is the only real question. From that truth you can build the required durability. It is best to get a good understanding of that before trying to build the storage system. It is very possible you will have one dataset which is of a different value than another, such as personally derived works (family albums) and backup copies of DVDs. By definition the DVD backups are backup and probably do not need the yet another backup, depending on the amount of work the curator has done, and potentially be lost. Personally derived works should not be stored as a single copy, regardless of durability, due to locality, use an offsite backup.

 

As far as the math of RAID is concerned, basic RS can be used for an unlimited number of parity disks. But practically, the process needs optimizations, and so RS is replaced with others, Hamming, etc.

 

A RAID stripe is the data words (n) and parity set (m). The current maximums for unRAID is the sum of n+m being 30, or (28,2) and (29,1). Each increase in n reduces durabilty, and each increase in m increases durability. The fewer data disks the better durability, the more parity disks the better durability. As mentioned above the best/easy way to get more than 2 parity disks is backup. Doing so not only increases durability, but can done to also increase availability.

 

In the possible unRAID combinations, (29,1) would be the lowest durability, and (1,2) would be the highest. I think there is way to run unRAID without parity, but that would be worse.

 

All this durability comes at a price, assuming the disks are fully utilized, the ratio of (29,1) to (1,2) is (30X)/(87X). It costs 2.9 times as much to store data at the highest durability vs the lowest.

 

A dataset stored on RAID is a single copy, with some level of durability. It is not a backup.

 

Since dual parity is often implemented without RS, adding a third parity is complicated. Single parity can be done with simple addition (XOR). Row Diagonal Parity allows for optimal computations for dual parity, again using only XOR. Triple Parity is available, raidz3, etc. These higher level of durability are filed under the section titled erasure codes.

 

Edited by c3
Link to comment
5 hours ago, c3 said:

All this durability comes at a price, assuming the disks are fully utilized, the ratio of (29,1) to (1,2) is (30X)/(87X). It costs 2.9 times as much to store data at the highest durability vs the lowest.

It costs way more than that since unRAID can only have one array. So not just 87 disks compared 30 but 29 unRAID systems compared to 1.

Link to comment

It's important to separate absolute risk from relative risk.

 

Drives are pretty reliable, and experience had borne out that even single failures are rare if one has solid cabling (hot swap bays are best), doing parity checks, monitoring smart attributes, and taking action when signs of failure are present. And dual failure is incredibly rare unless there is some precipitating event (like flood, or booting up a server that had been offline for a very long period of time). Almost seems the use cases for more than 2 simultaneous failures and near total loss are almost same.

 

And unRaid parity(ies) are not a replacement for a backup. Operating without one carries a non-trivial risk of loss no matter how many parities you have. If you are willing to take that disk, the delta risk you take of losing more than 2 disks writhin a very short time period is quite small, even with a 30 drive array.

 

And remember, if you did lose 3 disks out of 30 due to some nasty situation, you did not lose the data on the other 27. This is totally different from a RAID type solution, where failures beyond redundancy result in losing ALL the data in a RAID array or pool. The risks are the same but the impact very different, and both the risk and impact need to be considered.

 

Not arguing with desire to raise the bar, or that multiple arrays in same server is not a useful feature, just that current unRaid is a vey reasonable option for the OP's use case.

Link to comment

Adding lots of parity drives doesn't give much additional data security because it's so low probability of having multiple drives fail at the same time with the exception of really major incidents when you are likely to lose every single drive. This means that you are more likely to get data loss because of a bad write that will be committed to the RAID and at the same time affect all parity drives making the write unrecoverable. Many parity drives is pretty meaningless in a fire, burglary, lightning strike etc.

 

It's normally better to add one more storage server or even better one more storage location. That's the only true redundancy you can have.

 

RAID with parity gives better availability.

But proper backup solutions always wins when it comes to minimize the probability of data loss.

 

That's also why real data centers runs many storage servers and use some form of matrix allocation to keep all data mirrored on multiple servers instead of using crazy amounts of parity drives.

Link to comment

Yes, there is the diminishing return for increased cost of increasing durability. That is why I said to get a good understanding of the value of the data. Some data needs only two or three 9s, other maybe four or even eleven 9s. And it is true that in some cases you need to plan for exceptionally large numbers of drive failures due to perhaps drive firmware, MarFS is configured to survive 200+ drive failures.

 

The architecture used is as close to shared nothing as financially possible. In an extreme, this would be one drive per server per data center. Which is obviously not financially possible at scale. So, yes, more than one server and more than one location. The 7+5 configuration allows for a data center to be offline/unreachable/on fire and still the data is both available and durably stored, by putting 3 or 4 storage servers in each location, for a cost below mirroring.

 

Backup should always be considered. Software issues can invalidate strategies relying on versioning and snapshots.

 

Mirroring is just too expensive (as noted above), hence "crazy amounts" of parity drives are used. http://lambdastack.io/blog/2017/02/26/erasure-coding/

Not sure if Comcast, or ATT, or Facebook qualify as "real data centers" but they all use "crazy amounts" of parity drives.

Edited by c3
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.