I messed up and potentially lost all of the data on my cache pool. Is there any way to recover from this?

Sigma207 · November 1, 2022

Yes, I know RAID is not backup. Everything that is irreplaceable is backed up using the 123 rule. This mistake will just cost me many hours reinstalling containers and recovering from backups.

My cache pool consists of two sata SSDs in RAID 1. I have been having an issue where one of the cache drives kept having errors in the log. So, I swapped in a m.2 drive I have and let that run for a bit until I can get around to looking more into it. I found some forum posts that suggest that type of error could be caused by the sata controller on the motherboard. My attempted fix was to use a mini-SAS to 4x sata cable with the extra slot on my LSI card and then immediately swap the "bad" sata SSD into the cache pool. Since it is RAID 1, it should just rebuilt as it did when I swapped in the m.2. Except it didn't. It didn't rebuild because unraid decided to change the ID of drive one (the good one) from sdn to sdc. I didn't notice that at first, so I started up the array. After swapping the drives back to the motherboard sata controller, it went back to sdn. But still shows up as unmountable. Nothing has been formatted. I tried using the new config tool in unraid, but that didn't help either.

Any ideas on how to get that data back? I am a noob, so I hope I explained everything well enough and that I am just missing some simple step.

JorgeB · November 1, 2022

Please post the diagnostics.

Sigma207 · November 1, 2022

Quote

Please post the diagnostics.

Here you go

arceus-diagnostics-20221101-1743.zip

JorgeB · November 2, 2022

sdn was one member of the old pool correct?

Nov  1 13:34:42 Arceus kernel: BTRFS info (device dm-10): bdev /dev/mapper/sdn1 errs: wr 1844641, rd 194170, flush 60578, corrupt 2447461, gen 0
Nov  1 13:34:42 Arceus kernel: BTRFS error (device dm-10): parent transid verify failed on 1113877725184 wanted 33563 found 32798

That device looks to be out of sync, the many read and write errors suggest it dropped offline in the past, which device was the other pool member?

Sigma207 · November 2, 2022

Quote

sdn was one member of the old pool correct?

That is correct. The other pool member was labeled as nvmeOn1. The screenshot posted is of the old pool before I moved stuff around.. I

JorgeB · November 2, 2022

Looks like the NVMe device was wiped with wipefs, if it wasn't encrypted it's usually easy to recover, but with encryption I don't know how, first the device would need to be decrypted but now it's not being likely because the LUKS heaters were also wiped.

Sigma207 · November 2, 2022

Well, that is unfortunate. Thanks for your help. I went ahead and already reconfigured some of the stuff since I needed to get this back online, so the two SATA devices are already wiped as well. But was hoping to still learn how to recover if this happened in the future. I should just be more vigilant with my backups. Thanks for your help.

JorgeB · November 2, 2022

For non encrypted devices wiped with wipefs you just type:

btrfs-select-super -s 1 /dev/sdX1

or

btrfs-select-super -s 1 /dev/nvmeXn1

and usually that everything you need, encryption add another layer, you'd need to first restore the LUKS headers from a backup (if available) then run:

btrfs-select-super -s 1 /dev/mapper/sdX1

or

btrfs-select-super -s 1 /dev/mapper/nvmeXn1

I messed up and potentially lost all of the data on my cache pool. Is there any way to recover from this?

Recommended Posts

Sigma207

Link to comment

JorgeB

Link to comment

Sigma207

Link to comment

JorgeB

Link to comment

Sigma207

Link to comment

JorgeB

Link to comment

Sigma207

Link to comment

JorgeB

Link to comment

Join the conversation