Bitrot detection, through filesystem or software


Recommended Posts

One of the main features that some other storage solutions offer but unraid does not (as far as i know) is some kind of bitrot detection. This would be something i would prioritize for future unraid releases if i was the one building it.

 

Hope to see some solution to this in the future!

  • Like 1
Link to comment
1 hour ago, Kilrah said:

There's the "File Integrity" plugin. 

Thank you, very interesting. How heavy is this on a massive array? I take it may require a lot of read/write and cpu cycles to compute every file? Once it is done once per file is it faster or does it always take the same amount of time?

 

53 minutes ago, JorgeB said:

It does if you use btrfs as the filesystem.

Also interesting, i use btrfs as my cache because i have 2x of them and that was the recommended setup when i started using unraid, never had any issues with it.

 

What are the upsides vs downsides with btrfs? I have big trust in the xfs encrypted filesystem because i have all the tools needed to mount and rescue data incase of total disaster (both parity breaks and array goes down, i can still mount each single disk and take data that way). Not sure if that is possible with btrfs.

Edited by je82
Link to comment
4 minutes ago, je82 said:

What are the upsides vs downsides with btrfs?

Upsides:

Mainly checksum and snapshot support.

 

Downsides:

Not as resilient as xfs especially with bad hardware, and recovery in case of serious corruption might be more difficult, though in my experience, and I have around 200 btrfs filesystems, all except about a dozen are single device, singe device filesystems, like the ones used in the array are more resilient than multi device filesystems, against corruption, not against a disk failure obviously, for that you use parity in the array.

Link to comment

Isn't there enough hardware level support in hard drives where they detect and correct bitrot from a sector perspective? I vaguely recall a post by Limetech (maybe?) that went into all the various levels of protection that is actually built into the storage hardware systems.

Link to comment
9 minutes ago, JorgeB said:

Bitrot is extremely rare, and the drives have error correction, but it can happen, IMHO cheksums are mostly useful for when for example an issue occurs during a disk rebuild, like errors on another disk, and you can see if any/which files were affected.

 

interesting, yes i see that it would be very useful incase of errors detected while rebuild to see what was actually corrected/changed. i wonder if it is worth the extra load to have these checksums stored. i have many files on my array, over 20 million, i believe that may be a little heavy lifting, but perhaps i am going about it the wrong way? perhaps the amount of files does not matter but the size of the file? the bigger the file the longer its hash is to compute or am i completely wrong here?

 

anyway, i may do some tests to see i can have the plugin just for the sake of it, its nice to have that little extra intellgience of checksums just in case something was changed. i actually did a little script to my rsync backup routine that checks the checksum of 100 random files accross my array that is never suppose to change, why? well its a bad attempt at having some kind of minimal protection against cryptolocker type attacks, in case any of the files are changed across the array the backup script will not run its routine and possibly save me from backuping cryptolocked files, there may be a more sophisticated way of doing this but i am far from sophisticated so it works for me ;)

Link to comment
7 minutes ago, je82 said:

the bigger the file the longer its hash is to compute or am i completely wrong here?

When the checksums are done by the filesystem, like zfs or btrfs, they are done block by block, not by file, when done by for example file integrity plugin they are done file by file, and they are always the same size, and very small, they fit in the extended attributes.

 

10 minutes ago, je82 said:

there may be a more sophisticated way of doing this but i am far from sophisticated so it works for me

There are various way, I for example use snapshots, they are read-only and cannot be modified by those kind of attacks.

Link to comment
53 minutes ago, je82 said:


i actually did a little script to my rsync backup routine that checks the checksum of 100 random files accross my array that is never suppose to change, why?

 

For files that are never supposed to change, look into marking them as immutable with, chattr +i filename . The only way the file can be changed is through removing the immutable attribute first with, chattr -i filename.

Link to comment
On 8/3/2022 at 8:00 PM, BRiT said:

 

For files that are never supposed to change, look into marking them as immutable with, chattr +i filename . The only way the file can be changed is through removing the immutable attribute first with, chattr -i filename.

 

That's a feature that could easily be implemented in the Dynamix File Manager plugin. It already allows chown and chmod operations. The ability to twiddle the immutability bit in the GUI would be very useful, especially if the file's icon was to change to indicate that it has been set.

 

Link to comment

This is what I've been running for a while now. I created different scripts with User.Script to call it, but it works well. It uses Chattr, but its a bit more advanced than that. 

 

 

This is some code I wrote to call this script to Lock my Movies and then I have another to lock my Tv shows. I of course have a couple to unlock my TV or Movies too. 

 

#!/bin/bash
#noParity=true
#arrayStarted=true
/mnt/cache/appdata/scripts/no_ransom.sh --lock-files 'yes' --media-shares 'Movies' --include-extensions '*.*' --debug 'yes'

echo "Sending Notification"
/usr/local/emhttp/plugins/dynamix/scripts/notify -e "$(date +%D-%H:%I) Movies Locked" -d "$(date +%D-%H:%I) Movies Locked" -i "normal" 
echo "."
echo "."
echo "done"

 

So basically I put Bin-Hex's code on my SSD and then I use a User.Script to call it. I run it on my Media on the 15th of Every Month and get a neat little message to remind me. I've tried to delete files on purpose and it appears to work, but then refresh and realize my files are still there. 

 

 

  • Like 1
  • Thanks 1
Link to comment
Quote

One of the main features that some other storage solutions offer but unraid does not (as far as i know) is some kind of bitrot detection. This would be something i would prioritize for future unraid releases if i was the one building it.

 

My thoughts...

 

1) Figure out if the parity error is parity or data.  Don't leave users to figure out if they should restore the data from parity, repair parity, or recover from a backup.  There's tools today to automate this.

 

2) Snapshot RAID (like SnapRAID) is phenomenal.  I'm really liking that I can simply snapshot a folder, not an entire drive.  I can also do snapshots at different intervals.  With snapshots, I can also undelete a file.  This detects checksum errors, does a "backup" via parity calculation, saves storage for multiple drives (via parity calculation, not compression), and can do a restore of a file, even at a point of time depending on your snapshot date.  You can also store your parity offline, external, or cloud.  I'm surprised this technology is not more popular or a plugin.  <--- This really NEEDS to become a plugin!

 

3) Silent error detection (bitrot or whatever causes files to change).  

 

4) I have frequent false positive with Dynamix File Manager plugin when doing manual checks.  So now I'll use the plugin to give me the error, but then manually check my other checksums for verification.  I get enough false parity errors and false DFI errors that I no longer consider them credible.  How do I know?  Because my external BLAKE3 checksums validate, so do BTRFS scrubs.

 

5) BTRFS scrubs only detect a problem, but do not correct.

 

6) Don't like that Unraid is weak at correcting problems.

 

7) Want something at the file level, not block level.

 

Unraid + SnapRAID would be a great integrated platform.

 

 

 

  • Upvote 1
Link to comment
19 minutes ago, Jaybau said:

1) Figure out if the parity error is parity or data.  Don't leave users to figure out if they should restore the data from parity, repair parity, or recover from a backup.  There's tools today to automate this.

Link?

 

How do you know which bit is wrong when all you know is at least one of them is wrong? Any of the data disks or parity disks could have a bit flipped, how do you pinpoint which one it is, when all you know is the sum is odd when it's supposed to be even?

  • Thanks 1
Link to comment
9 minutes ago, JonathanM said:

Link?

 

https://forums.unraid.net/search/?q=parity error

https://www.reddit.com/r/unRAID/search/?q=parity errors

 

Quote

How do you know which bit is wrong when all you know is at least one of them is wrong? Any of the data disks or parity disks could have a bit flipped, how do you pinpoint which one it is, when all you know is the sum is odd when it's supposed to be even?

 

BTRFS, ZFS, SnapRAID have presumably solved this.  Instead of comparing data to parity, compare to a known presumed reliable stored hash.  I don't believe Unraid has metadata to perform this logic, therefore Unraid (nor the user) knows what to do.

Link to comment
6 minutes ago, Jaybau said:

Instead of comparing data to parity, compare to a known presumed reliable stored hash. 

The first hurdle is figuring out which files are implicated by a specific parity error. That's not an insignificant challenge. Not insurmountable, but not easy either. Mapping a raw sector address to the file it contains, repeating that process for every device in the parity calculation, querying each respective file system for possible hash data, etc, etc.

 

There are no tools that I'm aware of to automate this, as you say.

Link to comment
1 minute ago, JonathanM said:

The first hurdle is figuring out which files are implicated by a specific parity error. That's not an insignificant challenge. Not insurmountable, but not easy either. Mapping a raw sector address to the file it contains, repeating that process for every device in the parity calculation, querying each respective file system for possible hash data, etc, etc.

 

There are no tools that I'm aware of to automate this, as you say.

 

It's possible Unraid may need to evolve their parity algorithm or integrate other algorithm choices (snapraid should be simple).

 

Link to comment
On 8/3/2022 at 12:00 PM, BRiT said:

 

For files that are never supposed to change, look into marking them as immutable with, chattr +i filename . The only way the file can be changed is through removing the immutable attribute first with, chattr -i filename.

 

Any concerns using chattr on the "user" folder (with the links/inode)?  Or do I need to use chattr on the physical drive?  Any issues with balance, moving, or other Unraid operations?

Link to comment
2 hours ago, JonathanM said:

The first hurdle is figuring out which files are implicated by a specific parity error. That's not an insignificant challenge. Not insurmountable, but not easy either. Mapping a raw sector address to the file it contains, repeating that process for every device in the parity calculation, querying each respective file system for possible hash data, etc, etc.

Combining BTRFS integrity checking with the unRAID parity system is a feature i have long wished for, but I am well aware of how dauntingly complex the implementation would be. That said seeing this brought up again and your comment specifically made me wonder how much of a hurdle that first part is. After some digging I think there may be ioctls for that functionality, GETFSMAP for XFS and maybe BTRFS_IOC_LOGICAL_INO  for BTRFS. Its only the first of many hurdles but I have spent enough time down this particular rabbit hole for today.

Link to comment
2 hours ago, Jaybau said:

 

Any concerns using chattr on the "user" folder (with the links/inode)?  Or do I need to use chattr on the physical drive?  Any issues with balance, moving, or other Unraid operations?

 

If you Run Chattr from my perspective it will not allow you to move files or delete them. Its the reason I run it, but I only run it on items that are in their permanent spot on the array. I run it on Shares using that script I posted above. 

 

From my experience you can not run chattr directly on shares you have to run them on drives, but again that script I posted above gets around that. 

  • Like 1
Link to comment
3 minutes ago, primeval_god said:

Combining BTRFS integrity checking with the unRAID parity system is a feature i have long wished for, but I am well aware of how dauntingly complex the implementation would be. That said seeing this brought up again and your comment specifically made me wonder how much of a hurdle that first part is. After some digging I think there may be ioctls for that functionality, GETFSMAP for XFS and maybe BTRFS_IOC_LOGICAL_INO  for BTRFS. Its only the first of many hurdles but I have spent enough time down this particular rabbit hole for today.

 

I think a solution can be very simple with at least SnapRAID.  

 

I don't know if Unraid is saying that if you want BTRFS integrity+parity or SnapRAID, then find another platform, or if Unraid will change their mind and begin R&D.  I don't know if Unraid doesn't see the need, doesn't see the value, too costly, or doesn't want to.  Unraid is great, but having the functional of SnapRAID or BTRFS would be evolutionary.

 

You could have BOTH realtime and snapshot RAID...easily.  That's what I'm doing.  Unfortunately it is not integrated with a GUI, schedule, configuration, notifications, updates, support, etc. 

 

SnapRAID is very powerful and extremely simple, and probably a solution most people would want.  Snapshots can be safer too since there's no data and parity changes happening during a crash/outage.  You can also put your parity anywhere.  You can save multiple parity snapshots too.

 

With SnapRAID, I plan on doing some sort of 3-2-1 backup strategy.  

 

If I'm misguided, please let me know.

 

 

 

Link to comment
4 hours ago, Jaybau said:

Any concerns using chattr on the "user" folder (with the links/inode)?  Or do I need to use chattr on the physical drive?  Any issues with balance, moving, or other Unraid operations?

 

It doesn't work on the user filesystems, at least it didn't when I tried it 5 years ago. So use it on the actual filesystems such as /mnt/disk#/share/directory/file

Link to comment
2 hours ago, Jaybau said:

I don't know if Unraid is saying that if you want BTRFS integrity+parity

I thought you can already get this by using BTRFS as your filesystem in your data drives. That way your array drives are parity protected and the contents of your files have Metadata checksums for the file contents.

Link to comment
28 minutes ago, BRiT said:

I thought you can already get this by using BTRFS as your filesystem in your data drives. That way your array drives are parity protected and the contents of your files have Metadata checksums for the file contents.

To an extent you do. What you dont have is the ability to use parity data to recover a file corruption found by BTRFS or use filesystem checksums to determine where a parity invalidation is located (par1, par2, or a data drive).

 

3 hours ago, Jaybau said:

I think a solution can be very simple with at least SnapRAID.  

Personally i have no more interest in SnapRAID than i do ZFS, or a pure BTRFS solution. I am very happy with unRAID (the disk pooling system not the OS, though i am also happy with the OS) and would not trade any of its great features like realtime parity, and independent disk file systems for any other solution. That said bitrot resistance (whether it addresses a realistic problem or not) would be the cherry on top, and the underlying data to make it happen is already there just waiting for someone more clever than myself to figure out how to reach across the layers of the storage stack and bring it together.

Edited by primeval_god
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.