Jump to content

Now that 6.12 has ZFS, what are our options for recovering from bit rot


Recommended Posts

Even though bit rot or a bit flip is very rare, I would like to protect against it if possible. I saw space invader one's video about 6.12 and saw that he added a ZFS formatted drive into the Unraid Array (not a zPool). As I understand we now have 3 ways to detect bit-rot: xfs + file integrity, btrfs, and zfs.

 

When bit rot is detected do any of these 3 options have mechanism to recover from bitrot? Like could Unraid use the parity drive to recover the data on that block?

Edited by Swirl3208
Link to comment
30 minutes ago, Swirl3208 said:

When bit rot is detected do any of these 3 options have mechanism to recover from bitrot? Like could Unraid use the parity drive to recover the data on that block?

No, not for the array. The parity drive does not hold any data. You would be able to detect bitrot, but only way to recover from it is to restore from backup (which you should have anyway :)). In a zpool tho it would self heal automatically when the data is read (or via scrub), if it detects corruption.  

Link to comment

Theoretically, for an array with 1 parity drive, we could recover from bit rot if only 1 drive has corrupted data for that block. Using zfs checksum mechanism, If we know the block on disk1 is corrupted then we can use the parity information from the other drives to rebuild the block. This idea is similar to how zfs currently self heals like you said.

 

Quote

but only way to recover from it is to restore from backup (which you should have anyway :))

 

Yes everyone should have a backup, but not everyone can afford the setup for a backup. The backup should be a last resort option. The reason why we have a parity drive and these other checks is to recover faster without going to the backup. Otherwise what's the point of the parity drive? Currently with xfs if there's a parity check failure, we can't tell which drive is failing the parity check, so we don't know which drive to fix with the parity data.

Link to comment
  • Swirl3208 changed the title to Now that 6.12 has ZFS, what are our options for recovering from bit rot
4 minutes ago, Swirl3208 said:

Theoretically, for an array with 1 parity drive, we could recover from bit rot if only 1 drive has corrupted data for that block. Using zfs checksum mechanism, If we know the block on disk1 is corrupted then we can use the parity information from the other drives to rebuild the block. This idea is similar to how zfs currently self heals like you said.

No, since you mentioned spaceinvader one's videos you should watch the video about parity.  The parity is for recovery from a failed disk not for recovery of corrupted data. As I said the parity drive does not hold any data so it's not possible for it to recover any corrupt data. IF it in fact DID hold any data your theory would be possible. 

 

In order to be able to use the healing capabilities of zfs your will need to have a redundant pool like raidz/2/3 etc. Or you'll need multiple copies of the data on the same disk if using a single pool disk. But the parity disk would not be able to help you at all like I already mentioned. The unraid parity disk is not like regular raid/zfs parity.

Link to comment
16 minutes ago, strike said:

The parity is for recovery from a failed disk

 

  This is a good point, I forgot about this scenario, which is definitely more likely than bit rot. I was too focused on bit rot in my last reply.

 

Quote

As I said the parity drive does not hold any data so it's not possible for it to recover any corrupt data.


I'm not quite understanding this part, if the parity drive can recover an entire drive, then it should be able to recover data for 1 block within the drive. From what I understand the parity drive is just an XOR operation of all the data drives. When 1 entire disk fails, unraid will rebuild the disk by taking an XOR of the parity drive + other drives. When a block on a disk fails, why wouldn't we be able to take an XOR of the block on the parity drive + other drives to rebuild that block?

 

The parity calculation should be the same whether its unraid or zfs. The difference is that unraid stores all the parity data on 1 drive, where zfs stripes the data across a row, and stores the parity in different drives.

 

References for zfs parity

Edited by Swirl3208
Broke up the quotes and replied to each each sentence.
Link to comment
7 minutes ago, Swirl3208 said:

The difference is that unraid stores all the parity data on 1 drive, where zfs stripes the data across a row, and stores the parity in different drives.

I think you answered your own question here. The unraid array consist of multiple disks, but each disk has it's own filesystem. The array just pools all the disks together so you're able to access them all in a pool. So the data only exists on that single disk. Disk 1 has no idea of the data on disk 2 because they are both a single disk with it's own filesystem. And the parity disk has no idea about the data on disk 1 or 2 either unless it checks all the blocks of all the other 8 disks in the array, only then can it recover a drive. And if one file on disk 1 is corrupted, that data does not exist anywhere else. It's not in the parity disk, it could be on disk 2 (theoretically) if you have a copy there as well, but that doesn't matter because disk 1 doesn't know about the data on disk 2 so it can't recover the data on disk 1 with the data from disk 2 anyway other than copying it back to disk 1 that is. So if the data from disk 1 doesn't exists anywhere else, even in the parity drive. where do you recover from? You can't, other then a backup. 

 

But yeah, watch spaceinvader one's video about the array and parity, he explains it much better then I do.

Link to comment
1 hour ago, strike said:

And the parity disk has no idea about the data on disk 1 or 2 either unless it checks all the blocks of all the other 8 disks in the array, only then can it recover a drive. And if one file on disk 1 is corrupted, that data does not exist anywhere else.

 

@strike

Let's say that we have 1 parity + 8 disks in our array.

 

Scenario 1 with xfs + file integrity plugin:

If unraid found that the file was corrupted on disk 1. Why can't unraid go to the exact spot the file starts and ends in the parity drive and disk 2 - disk 8 and do and XOR operation on all the bits in that range to recover the data on drive 1?

 

Scenario 2 with btrfs or zfs:

Alternatively if our drives are formatted with btrfs or zfs, why can't unraid go to the exact same block in the parity and disk 2 - disk 8 and do an XOR to recover the corrupted block on disk 1? (Although since zfs has dynamic block size we might need to do scenario 1)

 

I know this isn't how unraid works today but is this something that could possibly be implemented in the future?

 

32 minutes ago, JorgeB said:

ZFS can self-heal if you use it with a redundant pool, which you can also do with Unraid, when used in the array it can only detect but not fix any corruption.

 

@JorgeB

Currently with today's tools, how do we recover if the file integrity plugin or the filesystem detects rot?
 

One solution I can think of is if we detect a corrupted file on disk 1. We could proceed to Rebuild a drive onto itself. This method will use the parity information from the parity drive + disk 2 - disk 8 to rebuild the entire drive. My question asking why can't we just rebuild a small section of the hard drive that we know is corrupted instead of doing the entire hard drive

Edited by Swirl3208
Link to comment
29 minutes ago, Swirl3208 said:

Why can't unraid go to the exact spot the file starts and ends in the parity drive

I feel like I'm only repeating myself. There is no spot where the file start or ends on the parity drive. The parity drive only holds the answer to the parity calculations for all the disk. And in order to get the answer you must first ask the question which represents all the other disk. So if one disk dies we can ask the question with all the drives and get the answer from the parity drive to calculate the correct data to rebuild. But when a file is corrupt so is the answer/question and it can't be answered. I don't think I can explain it in more ways than I already have, so I give up :)

 

29 minutes ago, Swirl3208 said:

I know this isn't how unraid works today but is this something that could possibly be implemented in the future?

What the future holds no one knows, but I doubt it.

Edited by strike
Link to comment
1 hour ago, JorgeB said:

From a backup.

... or depending on the file type you might be able to obtain a new copy from an online source.   

 

That is why each user needs to decide for themselves how important any particular piece of data is, and how they would handle it being corrupted or lost.   At the very least anything important must be replicated somewhere - either in a local backup, offsite backup or an online backup (ideally all of these)

Link to comment
7 hours ago, strike said:

I don't think I can explain it in more ways than I already have, so I give up

 

I guess I'm having a hard time trying to explain what I have in my head as well. I'll try in 2 different ways, a high level thought and a lower level appraoch.

For the high level thought

Think about it this way - Today, if 1 drive dies, unraid can emulate it while it is offline and we can still grab files from the emulated disk. Then to recover we should take the failed disk out and replace it with a new one. Unraid will start reconstructing the entire disk. After this you have a completely normal array again. This process is defined in the Normal Replacement or Rebuilding a drive onto itself guide from the unraid support pages. However reconstructing the disk does have a risk of another drive failing.

 

When a corrupted file is detected on disk 1, unraid could easily "emulate" the corrupted file from parity + disk 2 - disk 7, then rewrite the file to disk 1 as a correction mechanism (actually in unraids case it doesn't even need to write back to the same disk, it only needs to be written back to the same share). Does that make sense?

 

The benefit I see from this is that you don't have to rebuild an entire disk, which reduces the risk of another drive failing in the process.

On a very low level approach:

The Parity-Protected Array document states: "To rebuild the data on the newly replaced disk, we use the same method as before, but instead of solving for the parity bit, we solve for the missing bit."

 

Unraid knows on disk 1 that my_file.txt is corrrupted. Unraid also knows that my_file.txt is on bit (or "column") 123 to bit 1000. Unraid could theoretically go into all the other drives and recalculate the bits 123 - 1000 on disk 1.

 

5 hours ago, itimpi said:

 At the very least anything important must be replicated somewhere

 

That's a very true point. I can probably afford to replicate the important stuff like documents. For this topic I'm trying to see if we recover as early in the process as possible.

Link to comment
54 minutes ago, Swirl3208 said:

Unraid knows on disk 1 that my_file.txt is corrrupted. Unraid also knows that my_file.txt is on bit (or "column") 123 to bit 1000. Unraid could theoretically go into all the other drives and recalculate the bits 123 - 1000 on disk 1.

This where the logic breaks down.   The parity mechanism has no understanding of files and does not know that a particular file is corrupt, let alone on what sectors on the drive this might involve.

Link to comment
42 minutes ago, Swirl3208 said:

When a corrupted file is detected on disk 1, unraid could easily "emulate" the corrupted file from parity + disk 2 - disk 7, then rewrite the file to disk 1 as a correction mechanism (actually in unraids case it doesn't even need to write back to the same disk, it only needs to be written back to the same share). Does that make sense?

No it doesn't make sense. Why do you assume the emulated disk holds a healthy file? Where does it magically pull this healthy file from when it doesn't exist anywhere else? In a rebuild unraid can recover all the data from the emulated disk, that includes all the bits of a corrupted file as well. So if disk 1 has a corrupted file and you pull that disk, replace it with a new one an rebuild, the new disk will still have the corrupted file.

Link to comment
3 hours ago, itimpi said:

This where the logic breaks down.   The parity mechanism has no understanding of files and does not know that a particular file is corrupt, let alone on what sectors on the drive this might involve.


As of today, unraid’s parity mechanism doesn’t have this capability. But could it be implemented in the future like how I outlined above? Unraid can see which blocks are corrupt for a btrfs / zfs file system, then get an XOR of the bits on the parity + other disks to fix the corrupted blocks on the original drive.

 

3 hours ago, strike said:

Why do you assume the emulated disk holds a healthy file? Where does it magically pull this healthy file from when it doesn't exist anywhere else?


If a corrupted file was written to the drive then yes, reconstructing would result in a corrupted file.

 

I’m assuming the emulated disk holds a healthy file because bit rot / bit flip doesn’t update the parity. Bit rot / flip is a physical change on the disk and it happens outside of unraid’s control. The parity only gets updated when unraid is writing to the disk (which isn’t the case with bit rot / bit flip).

 

You keep mentioning that this file doesn’t exist on any other drive.. I get that. The parity information can be used to reconstruct the bits where the file lives. I think we are on different pages on how parity works / is used for. I’m curious what do you think the parity is there for?

Edited by Swirl3208
Link to comment
3 hours ago, Swirl3208 said:

I’m assuming the emulated disk holds a healthy file because bit rot / bit flip doesn’t update the parity. Bit rot / flip is a physical change on the disk and it happens outside of unraid’s control. The parity only gets updated when unraid is writing to the disk (which isn’t the case with bit rot / bit flip).

Emulated disk utilises parity disk to calculate, which can also bitflip at the same probability as a regular array data disk.

Link to comment
4 hours ago, Swirl3208 said:

I think we are on different pages on how parity works / is used for. I’m curious what do you think the parity is there for?

It looks that way yeah. The parity is there in case of disk failure as I mentioned earlier. It can't recover a corrupt file any more than it can recover a deleted file.

Link to comment
3 hours ago, tjb_altf4 said:

Emulated disk utilises parity disk to calculate, which can also bitflip at the same probability as a regular array data disk.


If the parity drive was using btrfs or zfs, wouldn’t each block have a checksum (according to this btrfs document on checksums)

 

So unraid can know if this particular block on the parity disk is corrupted as well. If the checksum doesn’t match on the same block for disk 1 and parity drive, then it is definitely not recoverable (assuming we have 1 parity). 
 

But if the checksum is good on the parity disk + disk 2 - disk 7 for that block then we can safely reconstruct the block on disk 1.

 

2 hours ago, strike said:

The parity is there in case of disk failure as I mentioned earlier. It can't recover a corrupt file


This is where I disagree.

 

Currently as of today, unraid cannot recover a file. You are right that today, unraid can only reconstruct an entire disk.

 

However I’m talking more about the possibility of implementing a recovery mechanism that utilizes the checksum (to validate the data is still healthy) and parity information (to rebuild the block) in the future so we can recover a small portion of the disk instead of the entire thing.

Edited by Swirl3208
Link to comment
39 minutes ago, Swirl3208 said:

This is where I disagree.

Then we have to agree to disagree. 

 

I'm talking about how the parity in the unraid array works today. 

 

If you want to discuss what you want the parity to be then I agree with you. It would be nice to be able to recover a corrupted file from parity. But I don't see this changing for the unraid array in the foreseeable future. 

 

And to achieve what you want you can just set up a zfs redundant pool. Unraid now have the best from both worlds. 

 

You can back up your unraid array to a zfs pool if you want. Then all your files are protected from bitrot, even if the unraid array parity can't fix it. But this would just be a backup like any other backup, which is a must for important files anyway. 

 

 

Link to comment
  • 1 month later...
On 6/24/2023 at 12:09 AM, Swirl3208 said:

However I’m talking more about the possibility of implementing a recovery mechanism that utilizes the checksum (to validate the data is still healthy) and parity information (to rebuild the block) in the future so we can recover a small portion of the disk instead of the entire thing.

 

I just want to say I don't think you're crazy. What you've outlined seems to make perfect sense to me as well. Successful recovery would have to assume that no bitrot had happened on other discs in the same blocks, or that parity hadn't been updated during a parity check while the bitrot was active. In fact, if there was a system like you mentioned implemented, it would be a good idea for the parity check to consult block checksums on the data disks before deciding to update parity when a difference is detected. The problem that I think others were trying to get at is that, currently, the parity part of unraid software is "dumb" and doesn't know anything about filesystems.

 

Seems to me like the recovery method you've described could be implemented as an add-on/plugin just by doing raw reads of the parity and data drives once you've consulted the filesystem and partition info to get the file's actual location on the target disk. And it seems like it would be safest to write out a new copy of the file and delete the old one rather than try to directly correct the flipped bits on disk. The problem is this would require precise knowledge of the structure of the xfs/zfs/btrfs filesystems to isolate just the data and not the metadata. If you tried to just correct the data bits in place, the parity system would flip the bits on the parity drive and parity would be out of sync with the corrected data. (if understand correctly, when writing to a disk, the parity system checks the current value on the data disk - which would have been flipped due to rot - and compares it to the new value. It then flips the bit on parity if the value changed on the data disk. That's how it avoids needing to spin up all disks when you write to just one.)

Link to comment
  • 7 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...