Single File restore

trurl · February 11, 2023

No disks are mounted in Maintenance mode so nothing is written

robobub · February 11, 2023

6 minutes ago, itimpi said:

As far as I know no data is written to drives when starting the array in Maintenance mode.

5 minutes ago, trurl said:

No disks are mounted in Maintenance mode so nothing is written

So then a single file restore with the steps I mentioned is actually currently possible then without any modifications. The required steps are to mount the disks with explicit options to do no writes: "ro,norecovery" with btrfs. xfs, reiserfs, and ext* also have similar options, and though there is some debate whether they actually work, creating a "ro,loop" device guarantees it with any block device.

There is an additional layer if encrypted, so one would have to verify luks opening doesn't write any data either just in case.

Thanks, this is definitely good to know and can potentially save days of rebuilding then under very specific scenarios.

trurl · February 12, 2023

How are you going to know what file to restore? How would you restore it?

apandey · February 12, 2023

10 hours ago, robobub said:

Thanks, this is definitely good to know and can potentially save days of rebuilding then under very specific scenarios.

not unless you know what needs to be restored and where it sits on the disk - both of these are filesystem concepts. A corrupted filesystem can still have a valid parity, so if you restore from parity, you still end up with corrupted filesystem. If you restore based on filesystem repair tools, it has nothing to do with unraid itself and parity doesn't play a part in that

so how do you plan to mix the two concepts?

robobub · February 12, 2023

5 hours ago, trurl said:

How are you going to know what file to restore? How would you restore it?

49 minutes ago, apandey said:

not unless you know what needs to be restored and where it sits on the disk - both of these are filesystem concepts. A corrupted filesystem can still have a valid parity, so if you restore from parity, you still end up with corrupted filesystem. If you restore based on filesystem repair tools, it has nothing to do with unraid itself and parity doesn't play a part in that

so how do you plan to mix the two concepts?

Ah, I see where the communication breakdown happened. I'll modify the first requirement to be:

know a particular drive had a failure in particular files

Yes, it's not by default in unraid that you get this, but there are tools that exist that give you this and would certainly to use them.

In my case recently, it was btrfs (and it's fsck/associated tools) that alerted me to the specific files and folders with errors. It's also likely the file integrity plugin that I use would also have done so. I grabbed those files from backup.

If I had known about the option I put forth in this thread, I would've at least investigated to see requirement 2 (know parity has the right bits) was satisfied by just mounting the emulated drive as true read-only and grabbing a checksum of the file. In my case, the issue ended up being bad RAM, so I would guess it's equally likely that the error was propagated both the parity and the drive vs just one of the two.

I also did know that requirement 3 (the rest of the disk was good) was satisfied after extensive testing of the disk in another system and discovering the root cause.

apandey · February 12, 2023

3 hours ago, robobub said:

Yes, it's not by default in unraid that you get this, but there are tools that exist that give you this and would certainly to use them.

All those tools work at filesystem level. So yes, they can potentially tell you that the filesystem is not in a good state and some file is corrupted. This DOES NOT mean that parity is invalid. It also DOES NOT mean that parity holds anything related to the valid non-corrupted file

4 hours ago, apandey said:

A corrupted filesystem can still have a valid parity, so if you restore from parity, you still end up with corrupted filesystem

I had covered this case explicitly. If filesystem tells you that a file is corrupted, and parity is valid, you will only get the same corrupted state of filesystem when you rebuild disk

3 hours ago, robobub said:

In my case recently, it was btrfs (and it's fsck/associated tools) that alerted me to the specific files and folders with errors. It's also likely the file integrity plugin that I use would also have done so. I grabbed those files from backup.

If you grab those files from backup, you don't have to rebuild a disk. Simply restore from backup

3 hours ago, robobub said:

I would've at least investigated to see requirement 2 (know parity has the right bits) was satisfied by just mounting the emulated drive as true read-only and grabbing a checksum of the file

In what situation would you presume that parity will have valid uncorrupted data but the fs does not? That simply means that parity is not in sync, and in addition that means we have no way of knowing how badly out of sync it is. All bets are off at this point. It is very unlikely that parity is broken at the same time as filesystem being broken, and in addition there is no concept of knowing parity is off by how much

3 hours ago, robobub said:

In my case, the issue ended up being bad RAM, so I would guess it's equally likely that the error was propagated both the parity and the drive vs just one of the two

Yes, and that is the real communication breakdown. By design, a valid parity should reflect the state of writes on disk, even if those writes lead us to a corrupted file on disk. A known out of sync parity has no way of knowing how much off it is. It is completely independent of any filesystem management

Perhaps, you are looking for a system like zfs which tackles redundancy and filesystem concepts in unison rather than try to achieve that with unraid array which works very differently

Edited February 12, 2023 by apandey
fix typos

robobub · February 14, 2023

Indeed, we still have a communication breakdown. I'll try to clarify again.

On 2/12/2023 at 4:22 AM, apandey said:

All those tools work at filesystem level. So yes, they can potentially tell you that the filesystem is not in a good state and some file is corrupted. This DOES NOT mean that parity is invalid. It also DOES NOT mean that parity holds anything related to the valid non-corrupted file

I had covered this case explicitly. If filesystem tells you that a file is corrupted, and parity is valid, you will only get the same corrupted state of filesystem when you rebuild disk

No, you are covering a different case than the scenario I laid out. In your scenario, the parity is synchronized with the corrupted filesystem. It is numerically valid, but it does not satisfy my condition 2: that the parity has the correct bits, as in uncorrupted.

I explicitly did not say whether or not the parity is invalid in this scenario, because it depends on the point of reference. In this scenario, it is synchronized with the uncorrupted filesystem, and out of sync with the physical corrupted filesystem.

On 2/12/2023 at 4:22 AM, apandey said:

If you grab those files from backup, you don't have to rebuild a disk. Simply restore from backup

In my original comment's scenario, restoring from backup incurs a cost that is not desirable: remaining offsite backup that didn't fail is annoying to access: on a tape drive, in a different geographic location, on the cloud with expensive restore costs, etc.

On 2/12/2023 at 4:22 AM, apandey said:

In what situation would you presume that parity will have valid uncorrupted data but the fs does not?

There are various scenarios where a write can be successful on one disk and not on another. The one I had thought up was that the connection with an externally connected SAS drive could be disrupted while the connection with the parity drive is intact. And then my actual scenario with RAM corruption, any particular computation could be corrupted, and the parity computation is separate from writing the block metadata or data itself. You could also have sata bus or CRC errors due to a bad cable. I did state these conditions up front for my scenario, if this was your big issue you could've brought this up in the beginning. I made no claims about the likelihood of this scenario, in fact in my first comment I stated it's probably rare. It doesn't mean rare cases should be ignored.

On 2/12/2023 at 4:22 AM, apandey said:

That simply means that parity is not in sync, and in addition that means we have no way of knowing how badly out of sync it is. All bets are off at this point. It is very unlikely that parity is broken at the same time as filesystem being broken, and in addition there is no concept of knowing parity is off by how much

Perhaps, you are looking for a system like zfs which tackles redundancy and filesystem concepts in unison rather than try to achieve that with unraid array which works very differently

The filesystem I mentioned already, btrfs, checksums both the data and metadata of every write. So yes, you can with a very high likelihood know if your parity is "correct" by emulating it, and also to know, to an extent, how "off" your physical filesystem is. And you'll get similar results with the file integrity plugin and other filesystems: once you notice your checksums are off with your physical disk, you can mount the emulated disk and see if the checksum is correct. This, of course, requires that you haven't done a parity check that writes corrections to parity, something that is explicitly recommended avoiding by those that use that plugin.

There are plenty of advantages of unraid over zfs for my and other's use cases, why would I move to zfs if I can get some protection by adding additional layers on top of unraid's parity?

apandey · February 14, 2023

42 minutes ago, robobub said:

In this scenario, it is synchronized with the uncorrupted filesystem, and out of sync with the physical corrupted filesystem

There is nothing that can tell us that in how unraid parity works. If it's a hypothetical scenario, sure, but in practice, how will you know this is the case. The only communication breakdown is that we are talking hypothetical vs real

When you start proposing things qualified with "high likelihood" or "to an extent", they sound not implementable to me. What you proposed about knowing parity state from filesystem state is not knowable because they don't interact with each other in any way

Quote

why would I move to zfs if I can get some protection by adding additional layers on top of unraid's parity?

I don't think you can, and you haven't shown you can. zfs provides guarantees and tools based on those guarantees, not just tools which can be used based on guesses about validity of data. Doing anything similar will need a filesystem to be written, which unraid is not

We are going round in circles, so unless you have a practical way to implement this scenario, we should stop here

Edited February 14, 2023 by apandey

robobub · February 14, 2023

2 hours ago, apandey said:

There is nothing that can tell us that in how unraid parity works. If it's a hypothetical scenario, sure, but in practice, how will you know this is the case. The only communication breakdown is that we are talking hypothetical vs real

When you start proposing things qualified with "high likelihood" or "to an extent", they sound not implementable to me. What you proposed about knowing parity state from filesystem state is not knowable because they don't interact with each other in any way

I don't think you can, and you haven't shown you can. zfs provides guarantees and tools based on those guarantees, not just tools which can be used based on guesses about validity of data. Doing anything similar will need a filesystem to be written, which unraid is not

We are going round in circles, so unless you have a practical way to implement this scenario, we should stop here

At this point it's clear you aren't reading what I write. I told you exactly how you will know, checksums, which are built into btrfs.

Let me change my language then, if the btrfs filesystem says the emulated drive from parity is good, then it's good, not "highly likely"— the only reason I used that terminology is to be mathematically precise. File integrity plugin will also get you to essentially the same place in a less real-time fashion with more steps. If you read all those same bits and the checksum comes out the same, then for all effective purposes (an essentially impossible random hash collision), it is good. There is no guessing here, please read up on cryptographic functions.

As for "to an extent" that was more of a commentary on knowing "how off" your physical filesystem is. You can't really know how much of the block (or file in the plugin's case) but well, it's not really important how damaged your physical filesystem is if your emulated parity drive with checksums is good. Yes, integrating in those pieces together like zfs has advantages, particularly in terms of usability, automation, and ease, but all the information is still there.

We're not going in circles, I've been explaining the same scenario since my first comment. I'm more than willing to continue to educate you on this as long as you want to respond. I've already laid out a practical implementation that works, given the other comments here stating when data is and is not written from the unraid side. You don't have to use these utilities to safeguard your data or streamline the restoration process if you don't want to.

Edited February 14, 2023 by robobub

apandey · February 14, 2023

4 hours ago, robobub said:

At this point it's clear you aren't reading what I write. I told you exactly how you will know, checksums, which are built into btrfs.

Since it's clear I am not following you, I will drop off this conversation now

I don't think what you are saying will work, but you can surely convince others who can understand it better than me.

Good luck

robobub · February 16, 2023

So it turns out the issue I stated earlier (RAM that went bad) was actually something else: a bad SATA cable or PSU after some more googling. Since replacing RAM and verifying on memtest, some writes were submitted that were ignored and the sata link has to be reset and thus actually fits this scenario more precisely than just bad RAM. However, there is indeed a problem by not having the checksumming and parity integrated together, as apandey's intuition suggested, at least when set up for copy-on-write filesystem (which is the default for BTRFS). There is still a solution I used to recover without a full rebuild, but it's advanced.

Some more details: While I was able to recover the particular file that had this corruption (which manifested as "Input/Output Error", parity is still out of sync. Why? Not from the recovery process, it was already out of sync. The emulated drive did not have those BTRFS checksum errors when accessing the file, and the array's stats showed 0 in the write column for all drives when recovering (and a bunch of similar reads even though only the emulated device was mounted, which shows emulation was working). But the physical drive did have those BTRFS checksum errors.

The problem: updating/deleting the file on a copy-on-write filesystem does not modify the corrupted sectors. As I mentioned earlier, while parity had the correct bits and correctly wrote the file, parity was out of sync because the physical drive had incorrect bits. When updating or deleting the file, BTRFS by default will place the new data in a new sector if it was not created with the chattr attribute to disable COW. That means the physical sectors that were used to hold the data, while now unreferenced by the BTRFS filesystem, still hold data that is out of sync with the parity. So if one needed to restore from those sectors on one of the other hard drives, it will be incorrect.

Eventually BTRF will reclaim that space, which could be sped up via balance or defrag, but that's not really acceptable. The solutions is to tell unRAID to parity sync specific sectors. You can do this with dd or other advanced low-level tools after looking up the physical locations associated with the file by inspecting the BTRFS filesystem metadata. But I'll just strongly recommend against it unless you know what you are doing. I was able to get it working with dd, btrfs-progs, and help from #btrfs on irc.

Some snippets of the errors. When trying to access a particular file, "Input/Output Error" is received.

Quote

Feb 14 23:14:26 t1000 kernel: ata4.00: exception Emask 0x10 SAct 0x80e0000f SErr 0x4890000 action 0xe frozen
Feb 14 23:14:26 t1000 kernel: ata4.00: irq_stat 0x04400040, connection status changed
Feb 14 23:14:26 t1000 kernel: ata4: SError: { PHYRdyChg 10B8B LinkSeq DevExch }
Feb 14 23:14:26 t1000 kernel: ata4.00: failed command: WRITE FPDMA QUEUED
Feb 14 23:14:26 t1000 kernel: ata4.00: cmd 61/40:00:b8:c2:2a/05:00:01:00:00/40 tag 0 ncq dma 688128 out
Feb 14 23:14:26 t1000 kernel: res 40/00:a0:f8:b0:2a/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
Feb 14 23:14:26 t1000 kernel: ata4.00: status: { DRDY }
<repeated>
Feb 14 23:14:26 t1000 kernel: ata4: hard resetting link
Feb 14 23:14:32 t1000 kernel: ata4: link is slow to respond, please be patient (ready=0)
Feb 14 23:14:36 t1000 kernel: ata4: COMRESET failed (errno=-16)
Feb 14 23:14:36 t1000 kernel: ata4: hard resetting link
Feb 14 23:14:42 t1000 kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 14 23:14:42 t1000 kernel: ata4.00: configured for UDMA/133
Feb 14 23:14:42 t1000 kernel: ata4: EH complete
Feb 15 05:20:41 t1000 kernel: ata4.00: exception Emask 0x10 SAct 0x20010 SErr 0x400100 action 0x6 frozen
Feb 15 05:20:41 t1000 kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Feb 15 05:20:41 t1000 kernel: ata4: SError: { UnrecovData Handshk }
Feb 15 05:20:41 t1000 kernel: ata4.00: failed command: WRITE FPDMA QUEUED
Feb 15 05:20:41 t1000 kernel: ata4.00: cmd 61/28:20:08:53:8f/05:00:4d:01:00/40 tag 4 ncq dma 675840 out
Feb 15 05:20:41 t1000 kernel: res 40/00:20:08:53:8f/00:00:4d:01:00/40 Emask 0x10 (ATA bus error)
Feb 15 05:20:41 t1000 kernel: ata4.00: status: { DRDY }
Feb 15 05:20:41 t1000 kernel: ata4.00: failed command: READ FPDMA QUEUED
Feb 15 05:20:41 t1000 kernel: ata4.00: cmd 60/10:88:30:58:8f/03:00:4d:01:00/40 tag 17 ncq dma 401408 in
Feb 15 05:20:41 t1000 kernel: res 40/00:20:08:53:8f/00:00:4d:01:00/40 Emask 0x10 (ATA bus error)
Feb 15 05:20:41 t1000 kernel: ata4.00: status: { DRDY }
Feb 15 05:20:41 t1000 kernel: ata4: hard resetting link
Feb 15 05:20:47 t1000 kernel: ata4: link is slow to respond, please be patient (ready=0)
Feb 15 05:20:51 t1000 kernel: ata4: COMRESET failed (errno=-16)
Feb 15 05:20:51 t1000 kernel: ata4: hard resetting link
Feb 15 05:20:57 t1000 kernel: ata4: link is slow to respond, please be patient (ready=0)
Feb 15 05:20:58 t1000 kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 15 05:20:58 t1000 kernel: ata4.00: configured for UDMA/133
Feb 15 05:20:58 t1000 kernel: ata4: EH complete

root@t1000:/boot/logs# dmesg | tail
[ 390.786359] BTRFS warning (device dm-3): csum failed root 5 ino 1477 off 4214784 csum 0xa73833e3 expected csum 0xb3fde932 mirror 1
[ 390.786363] BTRFS error (device dm-3): bdev /dev/mapper/md4 errs: wr 0, rd 0, flush 0, corrupt 6227, gen 0
[ 390.786369] BTRFS warning (device dm-3): csum failed root 5 ino 1477 off 4218880 csum 0x1c54baf8 expected csum 0x4c56e317 mirror 1
[ 390.786374] BTRFS error (device dm-3): bdev /dev/mapper/md4 errs: wr 0, rd 0, flush 0, corrupt 6228, gen 0
[ 390.786380] BTRFS warning (device dm-3): csum failed root 5 ino 1477 off 4222976 csum 0x32ab5ee9 expected csum 0x024fede7 mirror 1
[ 390.786384] BTRFS error (device dm-3): bdev /dev/mapper/md4 errs: wr 0, rd 0, flush 0, corrupt 6229, gen 0
[ 390.786391] BTRFS warning (device dm-3): csum failed root 5 ino 1477 off 4227072 csum 0xfb12230a expected csum 0x3cf93529 mirror 1
[ 390.786395] BTRFS error (device dm-3): bdev /dev/mapper/md4 errs: wr 0, rd 0, flush 0, corrupt 6230, gen 0
[ 390.786402] BTRFS warning (device dm-3): csum failed root 5 ino 1477 off 4231168 csum 0x7416f297 expected csum 0x47879011 mirror 1
[ 390.786406] BTRFS error (device dm-3): bdev /dev/mapper/md4 errs: wr 0, rd 0, flush 0, corrupt 6231, gen 0

I did the steps that I outlined and After re-mounting with the disk, a parity check has shown no sync errors so far.

Steps (WARNING, you will lose data unless you know what you are doing and how to navigate low level filesystem metadata)

stop array
(optional) backup /boot/config
select "no device" for disk with error
start array in maintenance mode
mount device with `-o ro,loop,norecovery`
Copy file in question to somewhere not on the disk/array. I put it on my `/tmp` ram but /boot or an unassigned device can also work
stop array
Tools -> New Config, preserve all configurations (some settings like shutdown timeout are reset, can also restore /boot/config)
start array with "Parity is valid"
write 0's for the sectors with sync errors
- For btrfs, you have to look up the logical offsets for each extent associated with the file, and zero the those extents for the size of those extents
(optional) start a parity check, there should be no corrections, but you can uncheck write corrections to see if you messed something up.
- There is some irony in performing this step, as it will cause essentially the same wear/workload as rebuilding, at least for HDDs where reading/writing are likely similar in wear

Edited March 6, 2023 by robobub

Single File restore

Recommended Posts

trurl

Link to comment

robobub

Link to comment

trurl

Link to comment

apandey

Link to comment

robobub

Link to comment

apandey

Link to comment

robobub

Link to comment

apandey

Link to comment

robobub

Link to comment

apandey

Link to comment

robobub

Link to comment

Join the conversation