How to impliment BTRFS Checksumming with unRAID?


Noob

Recommended Posts

I know BTRFS includes checksumming natively. I know V6 unRAID includes BTRFS. If I select BTRFS as my file system and spread it across three data disks and a parity disk:

 

(1) do I have checksumming automatically? (i.e., is it turned on and working)

(2) how do I make UnRAID verify the integrity of the data stored with BTRFS once checksumming is active? Is this done through a routine scrub command?

Link to comment
  • Replies 92
  • Created
  • Last Reply

Top Posters In This Topic

At the present moment checksumming is enabled automatically and scrubs can be performed manually by clicking on the btrfs device or pool you wish to scrub, then clicking scrub from its device page.  There is no automated scrub routine implemented yet.

 

Also note that any shares where enable COW is set to "No" will not have checksumming on their data (neither will any virtual disks for VMs or the docker.img file).

Link to comment

At the present moment checksumming is enabled automatically and scrubs can be performed manually by clicking on the btrfs device or pool you wish to scrub, then clicking scrub from its device page.  There is no automated scrub routine implemented yet.

 

Also note that any shares where enable COW is set to "No" will not have checksumming on their data (neither will any virtual disks for VMs or the docker.img file).

 

Okay, I got this loud-and-clear from your post: If I select BTRFS with COW turned on, checksumming will occur automatically.

 

However, I'm still not clear on using the checksum data to verify file integrity. I understand that scrubs have to be run manually, but if I run one will it check the existing data against the stored checksum? Also, if it finds a corrupt file, will it rebuild from the parity disk or just notify me about the inconsistency?

 

Thanks for your reply. I was worried nobody on the forum would know how BTRFS was being implemented. I appreciate you reaching out to me.

Link to comment

 

 

At the present moment checksumming is enabled automatically and scrubs can be performed manually by clicking on the btrfs device or pool you wish to scrub, then clicking scrub from its device page.  There is no automated scrub routine implemented yet.

 

Also note that any shares where enable COW is set to "No" will not have checksumming on their data (neither will any virtual disks for VMs or the docker.img file).

 

Okay, I got this loud-and-clear from your post: If I select BTRFS with COW turned on, checksumming will occur automatically.

 

However, I'm still not clear on using the checksum data to verify file integrity. I understand that scrubs have to be run manually, but if I run one will it check the existing data against the stored checksum? Also, if it finds a corrupt file, will it rebuild from the parity disk or just notify me about the inconsistency?

 

Thanks for your reply. I was worried nobody on the forum would know how BTRFS was being implemented. I appreciate you reaching out to me.

 

On the device page in the webgui, click the help icon and read the text about btrfs scrub for instructions on its use.

 

Corrupt data in btrfs can be repaired outside of parity.  That's the beauty of copy on write.

 

I need to find some time to add a bunch more documentation to the wiki.  We are definitely a little light in that area.

Link to comment

On the device page in the webgui, click the help icon and read the text about btrfs scrub for instructions on its use.

 

Corrupt data in btrfs can be repaired outside of parity.  That's the beauty of copy on write.

 

I need to find some time to add a bunch more documentation to the wiki.  We are definitely a little light in that area.

 

I went to the "Main" tab, which lists all of my devices, but I don't see any instructions in the Help blurbs about scrubbing or BTRFS. Are you referring to a different page?

 

Your second comment makes it sound like if I use BTRFS, then I don't need a parity disk for file corrections. I do, however, still need a parity disk to protect me from disk failures, correct?

 

Last question: Is there any advantage to using BTRFS with unRAID? Does my parity disk basically perform checksumming and have the ability to correct unreadable or error-filled files?

Link to comment

On the device page in the webgui, click the help icon and read the text about btrfs scrub for instructions on its use.

 

Corrupt data in btrfs can be repaired outside of parity.  That's the beauty of copy on write.

 

I need to find some time to add a bunch more documentation to the wiki.  We are definitely a little light in that area.

 

I went to the "Main" tab, which lists all of my devices, but I don't see any instructions in the Help blurbs about scrubbing or BTRFS. Are you referring to a different page?

 

Your second comment makes it sound like if I use BTRFS, then I don't need a parity disk for file corrections. I do, however, still need a parity disk to protect me from disk failures, correct?

 

Last question: Is there any advantage to using BTRFS with unRAID? Does my parity disk basically perform checksumming and have the ability to correct unreadable or error-filled files?

 

You have to click on one of your btrfs devices from the main tab, then click the help on that specific device page.

 

Checksums do not protect you from device failure and parity does not protect you from data corruption.  The two provide completely independent benefits to data protection, but if I had to choose one between the two, I would take parity over checksum simply because a disk failing isn't just a possibility, with time, it is guaranteed to happen.

 

Link to comment

 

You have to click on one of your btrfs devices from the main tab, then click the help on that specific device page.

 

Checksums do not protect you from device failure and parity does not protect you from data corruption.  The two provide completely independent benefits to data protection, but if I had to choose one between the two, I would take parity over checksum simply because a disk failing isn't just a possibility, with time, it is guaranteed to happen.

 

Ha! I didn't have any BTRFS devices yet, so that was the issue. I'm reformatting the FS on my test array now and will look through those help blurbs when I can. Thank you :)

 

I agree with your risk assessment above, but since unRAID gives me an option to have both I'm going to use BTRFS and parity! Ah ha! I feel like a wizard :)

Link to comment

Test it very heavily before you rely on it. Very heavily. Do a LOT of reading and writing.

 

I had my cache drive formatted as BTRFS and it constantly corrupted. It would barely work for a couple of weeks before requiring a re-format. After the second or third time I got smart and switched to XFS and it's be fine for a year now. unRaid creates a Docker image file that it mounts which is BTRFS and my Docker image keeps corrupting too. It's started to show corruptions within a few days of  being re-created.

 

The scrub button is useless because it never fixes the corruption. This is what I get as a result.

 

"corrected errors: 0, uncorrectable errors: 10, unverified errors: 0"

 

 

Link to comment

 

 

Test it very heavily before you rely on it. Very heavily. Do a LOT of reading and writing.

 

I had my cache drive formatted as BTRFS and it constantly corrupted. It would barely work for a couple of weeks before requiring a re-format. After the second or third time I got smart and switched to XFS and it's be fine for a year now. unRaid creates a Docker image file that it mounts which is BTRFS and my Docker image keeps corrupting too. It's started to show corruptions within a few days of  being re-created.

 

The scrub button is useless because it never fixes the corruption. This is what I get as a result.

 

"corrected errors: 0, uncorrectable errors: 10, unverified errors: 0"

 

When was the last time you tried it?  I can tell you that older / lower quality devices (especially SSDs) can sometimes be problematic with btrfs.  I have not had the issues you are mentioning and in fact, I'd say those are extremely rare nowadays and probably an indication of faulty hardware.

Link to comment

Hrm. Well, there are some pretty big name storage companies using BTRFS in production now on their RAID arrays. I don't know which version of BTRFS is included with the current unRAID release, but as a basic file system it is considered very stable now. There are still some limitations with RAID5 and 6, but unRAID wouldn't ever encounter those because of the way it...well, unRAIDs things.

 

So, as long as the BTRFS implementation is current, I don't expect to have any issues with it.

 

Also, for those who use unRAID to manage vast pools of rarely accessed data (media comes to mind), how do you prevent bitrot without checksumming? I'm totally fine with formatting back to XFS, but not without some kind of bitrot protection to layer over it. Any suggestions?

Link to comment

I personally use "bunker". The only downside is that currently there is not a pretty interface so you have to use the CLI. Also if you want automattic checks every week or month you have to setup a cron task. More info here:

http://lime-technology.com/forum/index.php?topic=37290.0

 

There is also Checksum Suite which is probably more beginner friendly.

http://lime-technology.com/forum/index.php?topic=43396.0

Link to comment

I personally use "bunker". The only downside is that currently there is not a pretty interface so you have to use the CLI. Also if you want automattic checks every week or month you have to setup a cron task. More info here:

http://lime-technology.com/forum/index.php?topic=37290.0

 

There is also Checksum Suite which is probably more beginner friendly.

http://lime-technology.com/forum/index.php?topic=43396.0

 

I just took a look at Checksum Suite (because I'd like to dodge CLI when possible) and it seems to do everything but file repair. It will MD5 your folder and check at an assigned interval that the data is still intact. I didn't see anything in the documentation about repairing the files after it finds a mismatch, though. That's a pretty big missing feature, in my opinion. Knowing when a file is broken is a huge step, but being able to actually put it back together is the kind of security I'm looking for.

Link to comment

Hrm. Well, there are some pretty big name storage companies using BTRFS in production now on their RAID arrays. I don't know which version of BTRFS is included with the current unRAID release, but as a basic file system it is considered very stable now. There are still some limitations with RAID5 and 6, but unRAID wouldn't ever encounter those because of the way it...well, unRAIDs things.

 

So, as long as the BTRFS implementation is current, I don't expect to have any issues with it.

My personal issue with BTRFS is resilience to bad acts and the recovery from them, not how it works when everything is humming along. The previous mainstay of unraid, ReiserFS, was extremely tolerant of abuse, to the point of mostly not minding if the server was power cycled mid write. Most of the time the next boot would be a little long as the journal was replayed, and the worst case unmountable drive was able to be recovered flawlessly with the reiserfsck tools. Even overwriting significant portions of a ReiserFS drive has resulted in good recovery of the remaining data.

 

Contrasting that, I was troubleshooting a lockup issue, and after a power cycle my BTRFS cache drive was unmountable. Subsequent repair attempts showed the journal was corrupted, preventing even a rescue mount. I had to clear the journal (which corrupted several in progress files) and then was able to rescue mount and recover my data to another disk.

 

I don't blame BTRFS for the lockup or the power cycle, but the recovery process I personally went through, and others on the board that I have participated in, left a bad taste in my mouth. I have not had to test XFS resilience yet, but I haven't seen any bad experiences here, so I changed from BTRFS to XFS. That means I don't have cache pool ability, only single drive, but I'm ok with that since I've been living with backing up my cache drive to an array drive weekly for years now.

Link to comment

I personally use "bunker". The only downside is that currently there is not a pretty interface so you have to use the CLI. Also if you want automattic checks every week or month you have to setup a cron task. More info here:

http://lime-technology.com/forum/index.php?topic=37290.0

 

There is also Checksum Suite which is probably more beginner friendly.

http://lime-technology.com/forum/index.php?topic=43396.0

 

I just took a look at Checksum Suite (because I'd like to dodge CLI when possible) and it seems to do everything but file repair. It will MD5 your folder and check at an assigned interval that the data is still intact. I didn't see anything in the documentation about repairing the files after it finds a mismatch, though. That's a pretty big missing feature, in my opinion. Knowing when a file is broken is a huge step, but being able to actually put it back together is the kind of security I'm looking for.

 

http://lime-technology.com/forum/index.php?topic=43396.msg423820#msg423820

Checksums (any checksum be it md5 sha or blake2) only detect corruption (silent or otherwise).  There is par2 within the plugin which will repair corruption.  The par2 side of things while its functional the gui is a little rough around the edges (real life keeps getting in the way of me finishing that - should be able to do it over the holidays)

 

It sounds like par2 is what you would be looking for. I'll be keeping my eye on this once squid has a chance to clean up the GUI.

Link to comment

A quick search about Par2 shows that it is not as well supported as BTRFS is. There is a single developer working on the project since Par1 was retired in 2010. Par2 is considered stable and is being widely used, though, so it has some community testing to back it up.

 

The BTRFS development effort is huge and has been speeding along for the last two years now. It seems like there are major improvements monthly, so it won't be long before BTRFS is up to snuff.

 

I don't know if any Lime Tech employees will see this thread, but I would like to know how current the BTRFS implementation is/how frequently updates get rolled out. The LT Wiki says that they "are committed to saying up-to-date on developments." If that is still true, then I'm okay using BTRFS.

Link to comment

I don't know if any Lime Tech employees will see this thread, but I would like to know how current the BTRFS implementation is/how frequently updates get rolled out. The LT Wiki says that they "are committed to saying up-to-date on developments." If that is still true, then I'm okay using BTRFS.

 

@JonP is a LT employee.

Link to comment
When was the last time you tried it?  I can tell you that older / lower quality devices (especially SSDs) can sometimes be problematic with btrfs.  I have not had the issues you are mentioning and in fact, I'd say those are extremely rare nowadays and probably an indication of faulty hardware.

 

I had the cache formatted BTRFS about 10 or 11 months ago. My Docker BTRFS image has already corrupted this week after I re-created it last week. My cache drive is an older WD drive but it has worked without any issue for a few years using first reiserFS and then XFS after BTRFS didn't work, so I'm not seeing how it could be faulty hardware.

 

I won't be trying it on a data drive again any time soon. Besides it not working for me, I also agree with the comments about how little information on recovery is available. ReiserFS may be old but is it ever robust and easy to recover from.

 

Link to comment

 

 

When was the last time you tried it?  I can tell you that older / lower quality devices (especially SSDs) can sometimes be problematic with btrfs.  I have not had the issues you are mentioning and in fact, I'd say those are extremely rare nowadays and probably an indication of faulty hardware.

 

I had the cache formatted BTRFS about 10 or 11 months ago. My Docker BTRFS image has already corrupted this week after I re-created it last week. My cache drive is an older WD drive but it has worked without any issue for a few years using first reiserFS and then XFS after BTRFS didn't work, so I'm not seeing how it could be faulty hardware.

 

I won't be trying it on a data drive again any time soon. Besides it not working for me, I also agree with the comments about how little information on recovery is available. ReiserFS may be old but is it ever robust and easy to recover from.

 

Just because older filesystems work fine doesn't mean it's not hardware related. Sitting on the btrfs mailing list has proven this quite consistently. And the fact that your btrfs loopback is getting corrupted as well is even more proof that you have some funky hardware issues.  The first step to recovery is admitting you have a problem ;-)

 

Also, btrfs in a cache pool is not the same as btrfs on a single disk. So comparing reliability between btrfs pool and xfs single is like comparing zfs to ntfs.  Apples and oranges.

 

Totally understand why you don't trust btrfs on YOUR hardware, but there are countless configs I've done now where it works totally fine.  And like noob has said, multiple distros not only supporting it now, but have it as their default filesystem of choice.

 

Oh and the recovery information is readily available. Btrfs wiki has plenty of info. We need to assemble better docs on our own wiki too, but not sure what else you are looking for as far as docs from btrfs go.

Link to comment

The first step to recovery is admitting you have a problem ;-)

 

Which is exactly what you need to do with BTRFS.

 

 

 

Again, never had a problem with it and neither have the majority of our users or we'd hear more about it on a regular basis.  I've configured it in multiple RAID types and with both HDDs and SSDs.  I've tested subvolumes, reflinks (file level snapshots), and subvolume snapshots., and I've experimented pretty extensively with btrfs repair and recovery.  And there are solutions out there that are 100% btrfs only. 

 

A few bad apples don't make the orchard worth torching.

Link to comment

We need to assemble better docs on our own wiki too, but not sure what else you are looking for as far as docs from btrfs go.

 

I would love to see examples in the wiki of what a btrfs drive corruption looks like when it happens in unRAID and then the typical steps to attempt and fix said corruption (like if a drive gets corrupt when writing data and then sudden power lose). Also, it would be awesome to see another example of data corruption in unRAID and then the steps needed for btrfs to "rebuild" the corrupt data from its checksums (or however btrfs rebuilds corrupt/bit rot data).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.