March 12, 20242 yr Hello, I feel like the topic was discussed quite a bit in other forum threads, but still, in my opinion it was not a clearly answered to some of mine questions. Thats why I decide to investigate on my own. I do not agree that when Bitrot is detected on array we are screwed up our Bitrot affected data without any backup. This is how I tested it: - I add rather small pendrive (32gb) formatted in ZFS (uncompressed and not encrypted) to array (I suppose that brtfs will work as well): - I created text file : root@home:/mnt/disk2# cat 123.txt abcdefgh_123_elo - I intentionally broke one bit in this file. How? You cannot simply modify an file since it will end up modified and not corrupted. I used here operation on raw disk data. I created a bunch of raw sectors out of /dev/sdX using linux dd command - in my case - sda is my pendrive: dd if=/dev/sda of=/mnt/user/test/sector bs=512 count=2000000 2000000 * 512 gives us 1 GB of raw disk data stored as file named sector . For the next sectors I used -skip=already readed sectors option. and I used Hex editor to locate my file and its content stored as raw data (I used Vim here but other might work as well). at around 9GB I spot: I changed manually only one bit (from 63 to 62) in those found sequence so in ASCII hex representation it was abbdefgh_123_elo. Then I used scrub. root@home:~# zpool status -v ... pool: disk2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:01 with 1 errors on Tue Mar 12 14:49:47 2024 config: NAME STATE READ WRITE CKSUM disk2 ONLINE 0 0 0 md2p1 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /mnt/disk2/123.txt So basically I achieved the detected corruption of 123.txt file. just for confirmation: - I ran Unraid parity check with UNCHECKED Write corrections to parity which gives me 1 error during check. - I stopped Array, removed disk from array, start array, and emulated 123.txt file worked well: - I added once again the same pendrive to the array to rebuild it and everything worked as suspected. So conclusion? - Run ZFS Scrub (or Btrfs equivalent) first to detect any possible errors - if errors is detect you can check Unraid parity check but WITHOUT Write corrections ! ! ! - if you enable it you update the parity drive with corrupted data - rebuild the corrupted drive to possibly a new one Hope this helps some of you. Regards Edited March 13, 20242 yr by copper
September 19, 20241 yr Thanks for running the timely experiment!! Never run paritiy checks with write corrections as it doesn't know right from wrong? Instead use ZFS to correct bit corruption. Now I wonder if there's any reason for parity checks if we're running ZFS?... might as remove the parity drive. Speed diff between ZFS and Parity checks be moot (not worth discussing) if both takes <1 day for ~10-30TB of HDD data.
September 19, 20241 yr ZFS drives in the parity array are single volume, and can't correct corruption. If you want ZFS to correct bit corruption you must use multi drive volumes with the appropriate RAID level in pools. Don't use ZFS in the parity array if you want bit correction.
November 29, 20241 yr Author @JonathanM sure You are right! my point was that when you have 2 copies of one file* (on one drive and 2nd on parity) and orginal file on drive failed due to bitrot or whatever, you can always restore this file from parity check. You have no guarantee that this will work but in order to have even small possibility to work you cannot overwritten your parity drive with already broken file. So actually best way for avoid such things is running scrub and detecting broken files. If not broken files exist overwrite parity drive. If scrub detects Permanent errors issue on file you can try to recover files from parity drive. *I know those are not simple copies but for ease of understanding lets pretend those are copies
November 29, 20241 yr 3 hours ago, copper said: If scrub detects Permanent errors issue on file you can try to recover files from parity drive. There is currently no way to restore a file from Unraid parity. Parity doesn't know which drive contains a bit (or more to the issue, possibly bits on multiple drives) that doesn't match, it only knows that a mismatch has occurred. It may be possible to write code that would iteratively "fail" one drive at a time virtually, and run a scrub through each iteration to see if one of the possibilities is more "correct" than the others, but ultimately there is no way without backup file comparison to be mathematically assured that you made the correct choice. If corruption is found on a ZFS volume (or any file system) in the parity array, the only viable current option is to restore that file from backup. Parity only deals with 1 specific failure mode, that is, the loss of the entire drive. If parity is NOT in perfect sync across all drives when that occurs, the rebuilt drive will be corrupt. Trying to use parity to deal with file system corruption is not a viable strategy.
March 18, 20251 yr On 11/29/2024 at 4:41 PM, JonathanM said: If parity is NOT in perfect sync across all drives when that occurs, the rebuilt drive will be corrupt. Trying to use parity to deal with file system corruption is not a viable strategy. Parity is supposed to be in sync otherwise is worthless! copper demonstrated that the correct information is present but the mechanism for easy fixing the problem is absent. Combining Unraid's parity technology and zfs/btrfs error detection will be a huge selling point. We just need an utility to Re-sync a specific file and not the whole drive. If the parity is not synced then 1) we don't lose anything more and 2) we learn about it to fix it! Is there anything that I am missing?
March 18, 20251 yr 4 hours ago, karateo said: Is there anything that I am missing? The value of the time needed to code vs. the actual value of the utility. It would only be useful in an extremely small number of cases, and all of those cases would be covered by a proper backup, which would be needed anyway for a much larger fraction of data loss episodes. It's just not worth it right now to spend limited time (money) coding something where a solution already exists. If it is worth your time to work on coding it, nobody is saying to not do it.
March 18, 20251 yr Fair enough. It's not that implementing this is impossible; the necessary information is available. It's simply that, for now it's not in lime-tech's priority.
March 18, 20251 yr Community Expert I am doubtful that many users would be able to use a complicated feature like that without getting themselves in more trouble.
March 18, 20251 yr Last time I had an error during parity-check was in 2021 (for a 24/7/365 4-5 discs array) and then it was a failing disc. For me (having a Xeon with ECC - don't know if that matters) it's not necessary. If it happens I just need to know which files are affected to restore from a backup. To be fair I hadn't realize that with parity check I am also checking for file integrity (assuming two bit rots in two discs at the same position is realistically impossible!). Just to note that all the above applies only in the case of a filesystem with check summing to identify (through scrub) the altered file. Unraid can not identify which disc had the out of sync bit.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.