How to impliment BTRFS Checksumming with unRAID?

December 6, 201510 yr

Author

Would this BTRFS loopback pool work as a substitute for using the parity data to rebuild BTRFS errors, or as an additional source for good data to be drawn from?

I prefer using parity data because, as stated, there is already space allocated for parity data. Any RAID 1 config will work just as well, but at the expense of doubling the size of your dataset.

Quote

December 6, 201510 yr

Would this BTRFS loopback pool work as a substitute for using the parity data to rebuild BTRFS errors, or as an additional source for good data to be drawn from?

I prefer using parity data because, as stated, there is already space allocated for parity data. Any RAID 1 config will work just as well, but at the expense of doubling the size of your dataset.

No, its for Dockers and Dockers only.

Currently dockers are contained inside a single file, docker.img. Internally this file is a single device BTRFS pool. In order to have data scrubbing and correcting function on this, one would need this virtual BTRFS to have multiple devices in some sort of mirror mode.

Quote

December 6, 201510 yr

Would this BTRFS loopback pool work as a substitute for using the parity data to rebuild BTRFS errors, or as an additional source for good data to be drawn from?

I prefer using parity data because, as stated, there is already space allocated for parity data. Any RAID 1 config will work just as well, but at the expense of doubling the size of your dataset.

No, we're talking about something totally independent of your original request. In addition to supporting btrfs as a filesystem for array devices, we also use it for Docker. So say you have a completely non-btrfs setup with unRAID, but want to use Docker. Well Docker leverages btrfs snapshotting, subvolumes, COW, etc. To support Docker without forcing folks to use btrfs for their storage devices themselves, we create a virtual disk image on top of an existing device formatted with whatever filesystem you want (that unRAID supports, that is). Then we format the virtual disk with btrfs. What BRiT is talking about is the ability to then create another virtual disk image for btrfs, then make those two virtual disks into a btrfs raid1, which would then enable us to do btrfs scrub and repair errors on the docker loopback image.

Quote

December 6, 201510 yr

Author

Perfect. I didn't realize that Docker used BTRFS in its image file, regardless of what filesystem was in use on the disk. I think that's a great request, too, and can save some trouble for people who have corrupted docker images in the future.

That said, I think protecting the humongous data array should be a higher priority than protecting the docker image, but they are both are worthwhile features to add.

I like this thread.

Quote

December 6, 201510 yr

@Jonp, quick question that likely does not make any sense but that's never stopped me from asking before...

Is there any means of creating the docker image file which is setup as a virtual BTRFS internally in a mirror mode BTRFS fashion? If so, would that help to recover from data corruptions automatically since its no longer a single pool setup?

Ok, I stand corrected, this thread has now turned into TWO feature requests. That's not necessarily a bad idea. So basically two loopback BTRFS images of the same size (and even on the same btrfs pool if you wanted). Create those as a btrfs raid1, then scrubbing could, in theory, repair btrfs errors like this.

I honestly don't know how that would work, but I'm eager to find out!! Will be trying this next week!!

So, as an extension to this, could we get a feature that software RAID1's a single physical disk into 2 volumes to implement data redundancy and correction? I realize this won't be true RAID1 protection but I think it would go a long way into making the cache drive more resilient to corruption caused by power interruptions and crashes. Maybe not to the level of 2 physical disks, but if you have the extra space, it may be a good option.

Perhaps also allow carving up multiple cache drives into virtual volumes, say you have 2 SSD's, a 250 and a 500. Maybe carve up half the 500 to provide RAID1 for the 250, but still allow the other half of the 500 for non-fault tolerant use. Or any other combination that would make sense, like perhaps 2 500GB drives, each halved, and the one pair RAID1 and the other RAID0 for speed. You would end up with 250GB fault tolerant and still have 500GB to use for VM's that don't need realtime redundancy.

Quote

December 6, 201510 yr

@Jonp, quick question that likely does not make any sense but that's never stopped me from asking before...

Is there any means of creating the docker image file which is setup as a virtual BTRFS internally in a mirror mode BTRFS fashion? If so, would that help to recover from data corruptions automatically since its no longer a single pool setup?

Ok, I stand corrected, this thread has now turned into TWO feature requests. That's not necessarily a bad idea. So basically two loopback BTRFS images of the same size (and even on the same btrfs pool if you wanted). Create those as a btrfs raid1, then scrubbing could, in theory, repair btrfs errors like this.

I honestly don't know how that would work, but I'm eager to find out!! Will be trying this next week!!

So, as an extension to this, could we get a feature that software RAID1's a single physical disk into 2 volumes to implement data redundancy and correction? I realize this won't be true RAID1 protection but I think it would go a long way into making the cache drive more resilient to corruption caused by power interruptions and crashes. Maybe not to the level of 2 physical disks, but if you have the extra space, it may be a good option.

Perhaps also allow carving up multiple cache drives into virtual volumes, say you have 2 SSD's, a 250 and a 500. Maybe carve up half the 500 to provide RAID1 for the 250, but still allow the other half of the 500 for non-fault tolerant use. Or any other combination that would make sense, like perhaps 2 500GB drives, each halved, and the one pair RAID1 and the other RAID0 for speed. You would end up with 250GB fault tolerant and still have 500GB to use for VM's that don't need realtime redundancy.

I actually think that for the array devices, using unRAID's parity disk to correct errors found from btrfs scrub would be a far better and more efficient way to deliver what you're asking.

Quote

December 6, 201510 yr

wow this is a very informative thread.

Quote

December 6, 201510 yr

Author

Does this mean I'm not a Noob now?

How do I change my username? ;p

Quote

December 7, 201510 yr

wow this is a very informative thread.

For sure, brings some of the limitations and possibilities of BTRFS to light.

Quote

December 7, 201510 yr

Just got a response from someone else in the mailing list with some interesting feedback.

... And more to the point, expanding on that, on a single device btrfs,

data is single mode by default, so scrub for it (as opposed to metadata)

is error-detect-only as mentioned.

However, while the default mode separates data and metadata, and in that

mode, historically (there's a patch to change this, adding the missing

option) data was single-only, mixed-bg mode (the mkfs.btrfs --mixed

option) puts data and metadata both in the same shared block-group type,

which can then be either dup or single mode.

Obviously duplicating data as well as metadata means you can only store

half as much data, since it's all stored twice, but that will let scrub

correct errors in cases where only one of the two copies doesn't verify

checksum, but the other one does.

And as mentioned above, there's a patch in process now, that will remove

the single-device restriction of data (as opposed to metadata) to single

mode, allowing the choice of dup mode for data as well as metadata.

Also, in addition to the mixed-mode workaround to get dup data, it's

possible, altho rather inefficient in performance terms, to partition a

physical device such that two equal sized partitions are made available

as logical devices, and then mkfs.btrf -d raid1 -m raid1 the two logical

devices into a single btrfs, raid1 for both data and metadata, so btrfs

creates two copies that way, again letting scrub correct errors when only

one of the two fails to verify against checksum.

Sounds like this --mixed option would let you store redundant copies of the data on a single device, enabling scrub to repair errors. This sacrifices 50% of the capacity on that device obviously, but provides the ability to perform a repair.

Quote

December 7, 201510 yr

All that makes sense as possibilities. Duplicating the data could make sense for the Docker image or a single cache drive. Still, using the scrub output to feed a parity drive operation that reconstructs the data still makes more sense storage capacity wise

Quote

December 7, 201510 yr

Author

Yup, agreed. Parity is positively the best solution for people who are warehousing lots of data and need their capacity to go far.

However, because btrfs allows you to select whether you want these functions on or off, adding those options to the GUI of unRAID would give great flexibility. Then, someone could, for example, use disk 5 as a btrfs dup disk and disks 1-4 in btrfs single. Any shares that contain sensitive data that just has to be right could be stored on disk 5, with everything else that's less critical on disks 1-4.

I'll probably keep everything in dup mode because that's the feature of btrfs that I like the most, but others will want to be able to choose based on their risk tolerance.

Quote

December 7, 201510 yr

You'd be better off creating a 2-disk RAID1 BTRFS array and then assigning that array as a disk in the unRAID array, as opposed to using data duplication on a single disk.

Quote

December 7, 201510 yr

There is one other possibility as well using reflink. So for those unaware, file-level snapshotting on btrfs is as easy as this:

cp /path/to/source /path/to/dest --reflink

Doing this will cause the destination to simply be a snapshot of the source. However, unlike traditional snapshots that require the source to remain for the destination to work, you can modify/delete the source in btrfs. Here's the basic test:

root@unJON:/mnt/cache/appdata/tmp# echo 123 > source
root@unJON:/mnt/cache/appdata/tmp# cp source dest --reflink

root@unJON:/mnt/cache/appdata/tmp# echo 456 >> dest

root@unJON:/mnt/cache/appdata/tmp# echo 789 >> source

root@unJON:/mnt/cache/appdata/tmp# cat source

123

789

root@unJON:/mnt/cache/appdata/tmp# cat dest

123

456

root@unJON:/mnt/cache/appdata/tmp# rm source

root@unJON:/mnt/cache/appdata/tmp# cat dest

123

456

In another article I read (see here), it talks about using this to recover from a corruption this way. Here's the example:

Let's say there is a file foo. The file is snapshotted nightly with the command "snap.ocfs2 foo foo.$(date)". Today is 2008.11.18. Yesterday the file was corrupted, and so you want to go back to the version from two days ago. It's simple.

# unlink foo

# reflink foo.2008.11.18 foo

Now foo is an exact copy of the snapshot, and it takes no extra space to boot. The CoW properties of the refcount tree will take hold when you start modifying foo.

So their commands are a little different, but the result is the same. Not sure if this would protect against bitrot corruption. Thoughts?

Quote

December 7, 201510 yr

There is one other possibility as well using reflink. So for those unaware, file-level snapshotting on btrfs is as easy as this:

cp /path/to/source /path/to/dest --reflink

Doing this will cause the destination to simply be a snapshot of the source. However, unlike traditional snapshots that require the source to remain for the destination to work, you can modify/delete the source in btrfs. Here's the basic test:

root@unJON:/mnt/cache/appdata/tmp# echo 123 > source
root@unJON:/mnt/cache/appdata/tmp# cp source dest --reflink

root@unJON:/mnt/cache/appdata/tmp# echo 456 >> dest

root@unJON:/mnt/cache/appdata/tmp# echo 789 >> source

root@unJON:/mnt/cache/appdata/tmp# cat source

123

789

root@unJON:/mnt/cache/appdata/tmp# cat dest

123

456

root@unJON:/mnt/cache/appdata/tmp# rm source

root@unJON:/mnt/cache/appdata/tmp# cat dest

123

456

In another article I read (see here), it talks about using this to recover from a corruption this way. Here's the example:

Let's say there is a file foo. The file is snapshotted nightly with the command "snap.ocfs2 foo foo.$(date)". Today is 2008.11.18. Yesterday the file was corrupted, and so you want to go back to the version from two days ago. It's simple.

# unlink foo

# reflink foo.2008.11.18 foo

Now foo is an exact copy of the snapshot, and it takes no extra space to boot. The CoW properties of the refcount tree will take hold when you start modifying foo.

So their commands are a little different, but the result is the same. Not sure if this would protect against bitrot corruption. Thoughts?

Mind f-ing blown

Quote

December 7, 201510 yr

Mind f-ing blown

;-)

BTW, yes, this does work with virtual disk images for VMs as well ;-). And the time it takes to create a reflink copy is measured in seconds, even with a 35GB+ size vdisk. What was even more surprising is that this worked, even though I disabled COW for my virtual disk image file.

Quote

December 7, 201510 yr

;-)

BTW, yes, this does work with virtual disk images for VMs as well ;-). And the time it takes to create a reflink copy is measured in seconds, even with a 35GB+ size vdisk. What was even more surprising is that this worked, even though I disabled COW for my virtual disk image file.

You would have a lot of fun playing around with an XtremIO EMC products [ http://xtremio.com/ ]. It's so fast when creating snapshots of our production servers and databases. Pretty similar concepts.

Quote

December 7, 201510 yr

;-)

BTW, yes, this does work with virtual disk images for VMs as well ;-). And the time it takes to create a reflink copy is measured in seconds, even with a 35GB+ size vdisk. What was even more surprising is that this worked, even though I disabled COW for my virtual disk image file.

You would have a lot of fun playing around with an XtremIO EMC products [ http://xtremio.com/ ]. It's so fast when creating snapshots of our production servers and databases. Pretty similar concepts.

Yup. I used to work for an IT integrator that sold solutions like that from both EMC and NetApp. Pretty amazing what a couple $100k can get you, eh? Even nicer when we can bring similar benefits to our own users for a fraction of the cost.

Quote

December 7, 201510 yr

... Even nicer when we can bring similar benefits to our own users for a fraction of the cost.

Settle down Salesy McSalesman ;-)

Quote

December 7, 201510 yr

... Even nicer when we can bring similar benefits to our own users for a fraction of the cost.

Settle down Salesy McSalesman ;-)

couldn't resist!

Quote

December 8, 201510 yr

Author

This is a cool development, but it only solves the problem if you happen to run a scrub on your device before your foo.12.18.2008 file gets rotated off of the disk, right?

If you don't notice that the checksum fails before the good copy goes bye-bye, then having 7 versions of a corrupted file is not going to help anyone. The nice thing about RAID parity is that you always have a second copy somewhere, it's not time-limited. The odds of both files corrupting on two independent disks before you run your next bi-weekly or monthly scrub are impossibly small. I don't have enough space to keep 14 or 30 days worth of foo.(date) hanging around. And I don't want to run scrubs much more often than bi-weekly because of the consequences for disk life. So, unless I misunderstood this, it is cool, but it is not a replacement for a conventional btrfs setup.

Quote

December 18, 201510 yr

Author

So, almost two weeks have passed since this thread came to a (very helpful) conclusion.

@jonp, any word on whether it is even possible for LT to implement btrfs self-healing from the parity drive?

Quote

December 23, 201510 yr

Author

Seeing as I got no answer, I went and found Jon's discussion with the btrfs devs on the listserv. They claim that doing block-level rebuilds from parity is possible, but that it is a non-trivial task to try to accomplish.

Jon P made a statement about simply using parity data to restore an entire disk when a BTRFS checksum fails. That solution doesn't sound like it would be difficult to implement because that's what the parity in unRAID is already configured to be able to do, albeit under different circumstances. Jon admitted that this is not an ideal solution, but that it would be better than nothing.

I have to agree; it is better than nothing. If unRAID were able to rebuild my data for me when it failed a checksum, even if that means rebuilding an entire disk, I would start deploying unRAID servers. This is the only missing feature keeping me from becoming a customer.

Updates, other than these from weeks ago on the listserv, would be appreciated from the LT team.

Quote

December 23, 201510 yr

I'm going to hazard a guess that while jon might be discussing it on the listserv, and shit who knows maybe even playing a little on a test box, LT's entire effort is being put towards finalizing 6.2 and what we all believe is dual parity support. So I wouldn't expect much movement on this front for a while.

In the mean time, if you want data rebuild capability you might look into the Checksum Suite plug-in which includes PAR2 functionality that can both identify corrupt files but also repair them. Being a community plug-in vice built-into-unRAID it wuold be understandable if you weren't willing to "deploy" this as a solution if "deploy" means as a VAR vice in your home usage.

http://lime-technology.com/forum/index.php?topic=43396.0

EDIT: bwahaha I see you are already well versed in checksum suite. please excuse my ignorance

Quote

December 23, 201510 yr

Hi!

Coming from my topic

http://lime-technology.com/forum/index.php?topic=44858

with minimal experience of anything other than windows!

Looking exactly this. BTRFS self-healing feature.

One easy (I believe) solution is to add the option for a "Safe Share" with specified space (with options to shrink/expand), being a virtual raid1 of two folders either in the same disk or another.

In this case, we will be able to double store only the most important staff and get bit rot protection plus disk failure protection.

On the other hand, this could be done with other tools also, like par2 with 100% redundancy!

Just got confused again

I think that it's impossible to combine btrfs raid and unraid raid.

Quote

How to impliment BTRFS Checksumming with unRAID?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)