BTRFS cache pool issue

LittleMike · August 12, 2021

So I'm sure I screwed up something somewhere. Looking for some assistance. I had 2x 512GB SSD's in a cache pool (default BTRFS RAID1). One of the drives died. This happened a few months back. I was able to remove the bad drive, no errors. Everything seemed okay (though under the hood, they may not have been). Recently I bought a 1TB Samsung SSD. Put that in, added it to the cache pool. Everything seemed to be okay still.

The second 512GB drive started giving me errors again. So In my screwing around trying to remove it and just use the 1TB, I started getting BTRFS pool profile errors. I ran a balance and the error went away. However, my 1TB drive is showing as 70% used. Looking at btrfs fi, I'm not sure if it's correct. I am not versed well enough in btrfs to know for sure, but looks like some of my data is possibly duplicated, but also not entirely. Like maybe a balance didn't complete and I screwed something up?

The original 512 is still in the machine but now is an unassigned device. I was going to try to re-add it back to the pool but I got the warning that all data would be wiped, so I decided against it. Could someone just take a peek and see if this looks right?

Data, RAID1: total=75.00GiB, used=55.18GiB
Data, single: total=194.00GiB, used=147.93GiB
System, RAID1: total=32.00MiB, used=64.00KiB
System, single: total=32.00MiB, used=0.00B
Metadata, RAID1: total=1.00GiB, used=455.95MiB
Metadata, single: total=12.00GiB, used=9.98GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

My concern is the RAID1 entries. I should be Single now, I think. But that's where I probably screwed up. Attaching Diagnostics in case it helps.

blackpearl-diagnostics-20210811-2218.zip

JorgeB · August 12, 2021

sdg is failing and because of that the balance aborted, since there's a lot of data using the single profile on that device you can't just remove it, copy everything you can from the pool then re-format with just the good device.

LittleMike · August 12, 2021

5 hours ago, JorgeB said:

sdg is failing and because of that the balance aborted, since there's a lot of data using the single profile on that device you can't just remove it, copy everything you can from the pool then re-format with just the good device.

When you say copy everything from the pool, do you mean both? Because my concern is that neither contains all the data.

JorgeB · August 12, 2021

Pool is still both devices.

trurl · August 12, 2021

And you can't work with them separately.

LittleMike · August 12, 2021

7 minutes ago, JorgeB said:

Pool is still both devices.

2 minutes ago, trurl said:

And you can't work with them separately.

Understood.

So forgive the noobish question, what's the best way to do that? And more importantly, what's the best way to restore it? If I do an rsync /mnt/cache is that going to grab everything? And then format the good drive, remove the bad one, then rsync back? Will it then see it as one profile? Like is it just metadata and how do I prevent that from being restored back?

Oh, when you say format, did you mean the cache drive or the unRAID OS drive, just to clarify?

JorgeB · August 12, 2021

1 hour ago, LittleMike said:

If I do an rsync /mnt/cache is that going to grab everything?

Depends on the state of the failing device, also and if using the array as destination make sure you rsync to a disk, or use /mnt/user0/share.

1 hour ago, LittleMike said:

And then format the good drive, remove the bad one, then rsync back? Will it then see it as one profile?

Make a new pool of the remaining device only and format it, you can wipe it first.

LittleMike · August 12, 2021

6 minutes ago, JorgeB said:

Depends on the state of the failing device, also and if using the array as destination make sure you rsync to a disk, or use /mnt/user0/share.

Make a new pool of the remaining device only and format it, you can wipe it first.

Okay. So let me see if I got this right:

Copy everything from /mnt/cache somewhere (rsync to /mnt/user0/share or to a disk, or WinSCP to another machine, whatever, correct?)
Create a new cache pool of 1 device using the new/working drive
Format new drive
Copy everything backed up to new drive
Remove old pool
Restart docker services/VM's etc.

Is that it? Just copying back all of the data will line everything up correctly? Is that because the configuration is on the OS drive? So is the profile information stored on the cache pool instead? I'm just trying to figure out how this happened in the first place so I can prevent it from happening again.

trurl · August 12, 2021

13 minutes ago, LittleMike said:

Remove old pool

Create a new cache pool of 1 device using the new/working drive

Format new drive
Copy everything backed up to new drive

Changed the order for you.

You don't want a different pool (name) in the end or you will have to deal with reconfiguring some things to use a different pool.

LittleMike · August 12, 2021

3 minutes ago, trurl said:

Changed the order for you.

You don't want a different pool (name) in the end or you will have to deal with reconfiguring some things to use a different pool.

Okay, so remove old pool first.

I didn't even realize you can name the pools. I should make the new one just "Cache" like the existing one. Is that why you suggest removing the old one first, because the defaults should do what I want?

trurl · August 12, 2021

If you have more than one pool they must have different names so you can work with them separately. "Cache" is the name of the pool from pre 6.9 releases, so that is what your shares are likely configured to use and any path to cache you might have specified explicitly.

LittleMike · August 12, 2021

1 minute ago, trurl said:

If you have more than one pool they must have different names so you can work with them separately. "Cache" is the name of the pool from pre 6.9 releases, so that is what your shares are likely configured to use and any path to cache you might have specified explicitly.

Ah that totally makes sense. Yeah, this is on 6.9.2 but was set up on whatever revision it was 3 years ago or so, so definitely pre 6.9. Okay, currently backing up the contents of /mnt/cache. I should just need appdata, domains, downloads, and system, right? Do I need to back up from anywhere else like /mnt/user0 or anything?

LittleMike · August 12, 2021

Okay, so I think there were some steps that were missing.

I deleted the pool, started, stopped the array, created a new pool, started the array and nothing had changed and it didn't give me the option to format. So I stopped the array, clicked on the cache pool and selected Erase. Restarted the array and it yelled at me "Unmountable: Unsupported partition layout"

Going to try messing around with trying to get it formatted.

*EDIT* A combination of deleting the pool. Start/stop array, create pool with no drive assigned, start/stop array, assign the drive, then the Format checkbox appeared. Now to copy my data back over and cross my fingers.

Edited August 12, 2021 by LittleMike

BTRFS cache pool issue

Recommended Posts

LittleMike

Link to comment

JorgeB

Link to comment

LittleMike

Link to comment

JorgeB

Link to comment

trurl

Link to comment

LittleMike

Link to comment

JorgeB

Link to comment

LittleMike

Link to comment

trurl

Link to comment

LittleMike

Link to comment

trurl

Link to comment

LittleMike

Link to comment

LittleMike

Link to comment

Join the conversation