Staggered parity of data drives

January 8, 200917 yr

I'm by no means an unRAID guru however there is a feature I would like to see assuming it is indeed valid. I'm sure the wealth of veterans here can either kibosh or nod to this.

It appears as if unRAID calculates the parity of all drives starting from the beginning and I would like to ability to choose where the data drives align relative to the parity drive. I believe this could aid in terms of performance with mixed drive setups, mostly when doing parity checks. This would work in configurations where there is a significant I/O bottleneck (port multipliers, pata - master/slave, pci contention, etc.. why do they all start with "P"?) with relatively fast drives and the sum of 2 drives is less than the size of the parity drive.

I'll use the following example: (not my configuration for the record)

Parity: ICH7 SATA 0 - SATA 1TB
Disk 1: ICH7 SATA 1 - SATA 1TB
Disk 2: ICH7 SATA 2 - SATA 1TB
Disk 3: ICH7 SATA 3 - SATA 1TB
Disk 4: SiI3114 SATA 0 - 750GB
Disk 5: SiI3114 SATA 1 - 750GB
Disk 6: SiI3114 SATA 2 - 500GB
Disk 7: SiI3114 SATA 3 - 500GB
Disk 8: ICH7 PATA Master - 250GB
Disk 9: ICH7 PATA Slave - 250GB

Drives 4-9 all share the PCI bottleneck of 133MB/sec which could result in a significant bottleneck.

I could see one of three different ways to implement this in the interface by giving the user the option to:

specify the "starting point" on the parity drive by typing in an offset in GB, i.e. Disk 8 & 9 start at 750GB (since drive sizes aren't exact and there are the base 2/10 differences, leave the math up to user in bytes perhaps) where the starting point + the drive size has to be less or equal to the parity size
choose a proceeding drive (or drives) whose sum is less or equal to the parity drive, i.e. when selecting Drive 9, if no other drive is ordered, drives 4, 5, 6, 7, 8, 9 may be available and drive 9 would start where the selected drive ends
choose simply from "start" or "end" where if it at the start, parity would be calculated 0bytes, while end would be from (parity disk size) - (chosen disk size), in the case of Disk 9, it would be 1000GB-250GB so 750GB

If it is indeed a valid request I would personally see this being low on the laundry list as I would love to see software RAID1 support for the cache drives (call me paranoid), hot spares, and hotswap (if ever possible) first.

I can think of a couple situations where a user could configure this to actually slow things down more, but it would be up to each user to figure that out of course.

January 8, 200917 yr

Your assumptions about how parity works were interesting, but not at all how it works in unRAID. I believe you were thinking of the storage of error correction info. Please see the (just expanded) explanation of unRAID parity in the FAQ: How does parity work?

Edit: The above was written after a significant mis-read of the original post. I skimmed too quickly over it. I do apologize to praeses.

January 8, 200917 yr

There is merit in your ideas about improving the throughput, decreasing the bottlenecks, by staggering the disk I/O. It might be possible to analyze the drives, busses, and controllers, and optimize the order of I/O requests to improve the performance. Probably safer to do it internally, rather than involve the user in the decision process.

As you say, this would probably be a lower priority addition to unRAID than many of the other requested features.

January 8, 200917 yr

RobJ,

I don't see how that changes the question at hand... if you were to shift a 250 gb drive so that it starts at say the 500gb point on the parity drive.. you would just need to make sure the array thinks the bits that have been shifted are all considered to be zeroes.. basically the drive is (falsely) believed the 250gb drive to be 750gbs in size with the first 500Gb ALWAYS zero. It might not be easy to implement, granted, but in theory, looking solely at the way parity is calculated, could be possible to solve.

Does it not already do that when a disk isn't as big as another.. if the drive doesn't have say... the 1 millionth bit, then it ignores(assumes zero) and continues on with a parity check with drives that still have a 1 millionth bit?

Cheers,

Matt

January 8, 200917 yr

Your suggestion might speed up the parallel queries of the disks, but it is at the cost of having all the parity queries slow down.

In other words, for the set of drives you gave earlier. assuming the groupings do not exceed the parity drive size (I have no idea if this will work in practice, but for now, let's say it does):

* Parity: ICH7 SATA 0 - SATA 1TB

* Disk 1: ICH7 SATA 1 - SATA 1TB

* Disk 2: ICH7 SATA 2 - SATA 1TB

* Disk 3: ICH7 SATA 3 - SATA 1TB

* Disk 4: SiI3114 SATA 0 - 750GB <- potential group A

* Disk 5: SiI3114 SATA 1 - 750GB <- potential group B

* Disk 6: SiI3114 SATA 2 - 500GB <- potential group C

* Disk 7: SiI3114 SATA 3 - 500GB <- potential group C

* Disk 8: ICH7 PATA Master - 250GB <- potential group A

* Disk 9: ICH7 PATA Slave - 250GB <- potential group B

Now, for the entire 1TB we always will have to read 7 drives. (Each of the 1TB drives + one each from group A,B, and C.

If we operate as it does today, we read

10 drives for the first 250 Gig,

8 drives for the next 250 Gig,

6 drives for the next 250 Gig,

4 drives for the last 250 Gig.

The parity check speed will greatly vary from start to finish.

The disk bottleneck would be the deciding factor. On your example, the odds of 2 IDE drives saturating the PCI bus is pretty small. If the SATA drive controller is on a PCIe bus, then it will probably not have a tough time with the 8 drives. Your parity check speed will do well with the mix of disks. If the SATA controller shares the PCI bus with the IDE controller, then the PCI bus will be the limiting factor. With your proposed change, it will need to deal with 7 drives for the entire duration of the parity check.

The big question is how much of a gain do you get. On my PCI based array with 12 drives (most IDE) I start out at about 12MB/s. I end up at about 75MB/s for my final 250Gig.(Only have two 1TB drives, and they are both SATA drives) The parity check speed varies greatly as the parity check gets past the size of the smaller IDE disks.

Real life: I checked... My 1.TB drives are 1500301910016 bytes, my 1GB drives are 1000204886016 bytes, my 500Gig drives are 500107862016 bytes.

If I have a 1.5TB parity (1500301910016 bytes), then a 1TB plus a 500Gig data drive (1000204886016 + 500107862016) = 1500312748032 (and the "stacked" total is bigger than the parity and this cannot work)

If I had a 1TB parity drive (1000204886016 bytes), then two 500Gig data drives (500107862016 + 500107862016) = 1000215724032 (again the stacked total is greater than parity, so this won't work)

Three 500 Gig drives also total more than the 1.5TB drive, so it won't work either.

It looks like the combined disk sizes are not "stacking" as you might like in the ideal world. Because of this, I suspect that odds are this will not be implemented soon, if ever.

Joe L.

January 8, 200917 yr

Your suggestion might speed up the parallel queries of the disks, but it is at the cost of having all the parity queries slow down.

In other words, for the set of drives you gave earlier. assuming the groupings do not exceed the parity drive size (I have no idea if this will work in practice, but for now, let's say it does):

* Parity: ICH7 SATA 0 - SATA 1TB

* Disk 1: ICH7 SATA 1 - SATA 1TB

* Disk 2: ICH7 SATA 2 - SATA 1TB

* Disk 3: ICH7 SATA 3 - SATA 1TB

* Disk 4: SiI3114 SATA 0 - 750GB <- potential group A

* Disk 5: SiI3114 SATA 1 - 750GB <- potential group B

* Disk 6: SiI3114 SATA 2 - 500GB <- potential group C

* Disk 7: SiI3114 SATA 3 - 500GB <- potential group C

* Disk 8: ICH7 PATA Master - 250GB <- potential group A

* Disk 9: ICH7 PATA Slave - 250GB <- potential group B

Now, for the entire 1TB we always will have to read 7 drives. (Each of the 1TB drives + one each from group A,B, and C.

If we operate as it does today, we read

10 drives for the first 250 Gig,

8 drives for the next 250 Gig,

6 drives for the next 250 Gig,

4 drives for the last 250 Gig.

The parity check speed will greatly vary from start to finish.

The disk bottleneck would be the deciding factor. On your example, the odds of 2 IDE drives saturating the PCI bus is pretty small. If the SATA drive controller is on a PCIe bus, then it will probably not have a tough time with the 8 drives. Your parity check speed will do well with the mix of disks. If the SATA controller shares the PCI bus with the IDE controller, then the PCI bus will be the limiting factor. With your proposed change, it will need to deal with 7 drives for the entire duration of the parity check.

The big question is how much of a gain do you get. On my PCI based array with 12 drives (most IDE) I start out at about 12MB/s. I end up at about 75MB/s for my final 250Gig.(Only have two 1TB drives, and they are both SATA drives) The parity check speed varies greatly as the parity check gets past the size of the smaller IDE disks.

Real life: I checked... My 1.TB drives are 1500301910016 bytes, my 1GB drives are 1000204886016 bytes, my 500Gig drives are 500107862016 bytes.

If I have a 1.5TB parity (1500301910016 bytes), then a 1TB _ a 500Gig data drive (1000204886016 + 500107862016) = 1500312748032 (and the "stacked" total is bigger than the parity and this cannot work)

If I had a 1TB parity drive (1000204886016 bytes), then two 500Gig data drives (500107862016 + 500107862016) = 1000215724032 (again the stacked total is greater than parity, so this won't work)

Three 500 Gig drives also total more than the 1.5TB drive, so it won't work either.

It looks like the combined disk sizes are not "stacking" as you might like in the ideal world. Because of this, I suspect that odds are this will not be implemented soon, if ever.

Joe L.

All very true and I agree with all of it, but how about a "partially stacked" solution where the overlap of the drives is attempted to be minimized.. this would effectively increase the AVERAGE partiy speed throughout the entire process. By shifting to reduce the overlap, you will have less, if any time, where ALL drives are being read simultaneously.

I agree that this is not likely a viable feature request as suggested but might lead to future ideas that are valid.

I'd also be interested in hearing in what effects all this stacking might have on the write rates to disk or user shares. I'm assuming more processing power will be needed to decide where to put the information and how to update parity properly, thus slowing them down. We are already trying to find ways to increase write speeds, no need to slow them down with something that many people only do about once a month.

Cheers,

Matt

January 9, 200917 yr

There have been requests to be able to use multiple physical drives to create a larger "parity". Similar concept.

I believe that on a PCI bus laden system, the advantage of "stacking" drives for parity computation purposes would be valuable to speed up parity checks and heavy multi-processed access.

For a PCIe based system, the advantage is likely nil.

The only other benefit is, depending on how it is implemented, it could increase the max number of usable drives. Of course Tom could do this is he wanted to irregardless.

All in all an interesting idea, worthy of discussion. But I agree with Joe L. that it will likely never get implemented.

January 11, 200917 yr

I do apologize to praeses. I significantly mis-read the original post. It is the second time lately that I have skimmed too quickly over a posted message.

Staggered parity of data drives

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)