I had a BTRFS cache pool with a 12TB disk with data. I installed a new 18TB disk and the goal is to replace the 12TB with new 18TB disk in the cache pool. My first plan was to add the new 18TB in the cache alongside the existing 12TB, move all data from 12TB to 18TB, and then remove 12TB from the cache. So I went ahead and did this: Changed slots from 1 to 2. Added the 18TB in the second slot in the pool. Started the array. The BTRFS rebalance started. While it was taking time, I searched online and found that to swap a BTRFS disk in cache, all I had to do is just change the 12TB to 18TB and the data will be moved to the new disk (this is probably wrong information, I still don't know). So I did the following: Canceled the on-going balance operation Stopped the array. Removed the 12TB from the first slot (the second slot is still the 18TB disk) Tried to start the array, but it said missing devices. Then I removed the second slot (18TB) and changed the number of slots to 1. But after that, whenever I add any of these 1 or 2 devices in any order, it's just not recognizing the full BTRFS partition. In console, `btrfs check /dev/sdf1` gives: Opening filesystem to check... warning, device 1 is missing warning, device 1 is missing warning, device 1 is missing bad tree block 27803648, bytenr mismatch, want=27803648, have=0 Couldn't read chunk tree ERROR: cannot open file system How do I recover from this? insane-homelab-diagnostics-20240701-0831.zip

BTRFS cache pool is unrestorable

July 1, 20242 yr

I had a BTRFS cache pool with a 12TB disk with data. I installed a new 18TB disk and the goal is to replace the 12TB with new 18TB disk in the cache pool.

My first plan was to add the new 18TB in the cache alongside the existing 12TB, move all data from 12TB to 18TB, and then remove 12TB from the cache. So I went ahead and did this:

Changed slots from 1 to 2.
Added the 18TB in the second slot in the pool.
Started the array. The BTRFS rebalance started.

While it was taking time, I searched online and found that to swap a BTRFS disk in cache, all I had to do is just change the 12TB to 18TB and the data will be moved to the new disk (this is probably wrong information, I still don't know). So I did the following:

Canceled the on-going balance operation
Stopped the array.
Removed the 12TB from the first slot (the second slot is still the 18TB disk)
Tried to start the array, but it said missing devices.
Then I removed the second slot (18TB) and changed the number of slots to 1.

But after that, whenever I add any of these 1 or 2 devices in any order, it's just not recognizing the full BTRFS partition.

In console, `btrfs check /dev/sdf1` gives:

Opening filesystem to check...
warning, device 1 is missing
warning, device 1 is missing
warning, device 1 is missing
bad tree block 27803648, bytenr mismatch, want=27803648, have=0
Couldn't read chunk tree
ERROR: cannot open file system

How do I recover from this?

insane-homelab-diagnostics-20240701-0831.zip

Edited July 1, 20242 yr by rahatzaman
Properly formatted the code

Quote

July 1, 20242 yr

Community Expert

45 minutes ago, rahatzaman said:

this is probably wrong information, I still don't know

Most likely, do you have a link for that?

Post the output of

btrfs fi show

Quote

July 1, 20242 yr

Author

44 minutes ago, JorgeB said:
btrfs fi show

warning, device 1 is missing
warning, device 1 is missing
warning, device 1 is missing
Couldn't read chunk tree
Label: none  uuid: ef9c4ff3-1b11-42c0-9ebf-2b21aa4eab75   <-- This is the pool I want to restore
        Total devices 2 FS bytes used 9.81TiB
        devid    2 size 16.37TiB used 31.03GiB path /dev/sdf1
        *** Some devices missing

Label: none  uuid: 8a2ed3b1-8c86-4d72-952e-62b4904eaa5a
        Total devices 1 FS bytes used 231.35GiB
        devid    1 size 476.94GiB used 283.02GiB path /dev/nvme0n1p1

Edited July 1, 20242 yr by rahatzaman
pointed the pool in question

Quote

July 1, 20242 yr

Community Expert

1 hour ago, rahatzaman said:

found that to swap a BTRFS disk in cache, all I had to do is just change the 12TB to 18TB and the data will be moved to the new disk (this is probably wrong information, I still don't know).

How could the data be transferred to the new drive if the only one that has it is removed?

Probably misunderstood something that was relevant to a setup that already had 2 drives and one was being upgraded.

Quote

July 1, 20242 yr

Community Expert

21 minutes ago, rahatzaman said:
<-- This is the pool I want to restore

For the missing device type

sfdisk /dev/sdX

then type 64 and hit return, don't do anything else, post the results of that.

Replace X with correct letter.

Quote

July 1, 20242 yr

Author

20 minutes ago, JorgeB said:
sfdisk /dev/sdX

Running `sfdisk /dev/sdf`:

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/sdf: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
Disk model: ST18000NT001-3LU
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 0BE62498-4ACD-4BDC-8C7D-D5C8CA5E109F

Old situation:

Device     Start         End     Sectors  Size Type
/dev/sdf1     64 35156656094 35156656031 16.4T Linux filesystem

Type 'help' to get more information.

>>> 64
Created a new GPT disklabel (GUID: DF1C1DC0-E056-B545-91CA-A2A043C34DA3).
Sector 64 already used.
Failed to add #1 partition: Numerical result out of range
/dev/sdf1:

Quote

July 1, 20242 yr

Community Expert

50 minutes ago, rahatzaman said:

Label: none  uuid: ef9c4ff3-1b11-42c0-9ebf-2b21aa4eab75   <-- This is the pool I want to restore
        Total devices 2 FS bytes used 9.81TiB
        devid    2 size 16.37TiB used 31.03GiB path /dev/sdf1
        *** Some devices missing

Isn't sdf the disk that is still there? You need to run it on the other pool disk, the one that is currently missing from that output.

Quote

July 1, 20242 yr

Author

Sorry, but I am not getting on which pool I should run this on.

The image above is currently what I have with all the disks. "Unimportant" is the pool in question (the data is important though, so ignore the name). The old disk is the 12TB one. So I should run the sfdisk command on the old disk (sde).

Output of `sfdisk /dev/sde`:

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Disk model: ST12000NM0127   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

sfdisk is going to create a new 'dos' disk label.
Use 'label: <name>' before you define a first partition
to override the default.

Type 'help' to get more information.

>>>

Quote

July 1, 20242 yr

Community Expert

22 minutes ago, Rahat Zaman said:

So I should run the sfdisk command on the old disk (sde).

Yes, run the rest.

58 minutes ago, JorgeB said:

then type 64 and hit return, don't do anything else, post the results of that.

Quote

July 1, 20242 yr

Author

Output of `sfdisk /dev/sde` followed by `64`:

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Disk model: ST12000NM0127   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

sfdisk is going to create a new 'dos' disk label.
Use 'label: <name>' before you define a first partition
to override the default.

Type 'help' to get more information.

>>> 64
The size of this disk is 10.9 TiB (12000138625024 bytes). DOS partition table format cannot be used on drives for volumes larger than 2199023255040 bytes for 512-byte sectors. Use GUID partition table format (GPT).
Created a new DOS disklabel with disk identifier 0xb574ec0d.
Created a new partition 1 of type 'Linux' and of size 2 TiB.
Partition #1 contains a btrfs signature.

Do you want to remove the signature? [Y]es/[N]o:

Quote

July 1, 20242 yr

Community Expert

Forgot to change to GPT, Hit CTRL + C to abort, then:

echo "label: gpt" | sfdisk /dev/sde

then 64 and enter

Quote

July 1, 20242 yr

Author

1 minute ago, JorgeB said:
echo "label: gpt" | sfdisk /dev/sde

root@insane-Homelab:~# echo "label: gpt" | sfdisk /dev/sde
Checking that no-one is using this disk right now ... OK

Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Disk model: ST12000NM0127   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

>>> Script header accepted.
>>> Done.
Created a new GPT disklabel (GUID: 4C9C9298-B23E-D844-8355-66D6119E3FB2).

New situation:
Disklabel type: gpt
Disk identifier: 4C9C9298-B23E-D844-8355-66D6119E3FB2

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Then again I typed `sfdisk /dev/sde` followed by `64`:

root@insane-Homelab:~# sfdisk /dev/sde

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Disk model: ST12000NM0127   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4C9C9298-B23E-D844-8355-66D6119E3FB2

Old situation:

Type 'help' to get more information.

>>> 64
Created a new GPT disklabel (GUID: 59B4ACD5-2321-AF4B-8276-7DFB0F360490).
Sector 64 already used.
Failed to add #1 partition: Numerical result out of range
/dev/sde1:

Quote

July 1, 20242 yr

Community Expert

Abort and post

fdisk -l /dev/sde

Quote

July 1, 20242 yr

Author

Just now, JorgeB said:
fdisk -l /dev/sde

root@insane-Homelab:~# fdisk -l /dev/sde
Disk /dev/sde: 10.91 TiB, 12000138625024 bytes, 23437770752 sectors
Disk model: ST12000NM0127   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4C9C9298-B23E-D844-8355-66D6119E3FB2

Quote

July 1, 20242 yr

Community Expert

hmm not sure why it's reporting sector 64 in use and there's no partition, try this command instead:

sgdisk -o -a 8 -n 1:32K:0 /dev/sde

Then output of

btrfs fi show

Quote

July 1, 20242 yr

Author
Solution

Just now, JorgeB said:
sgdisk -o -a 8 -n 1:32K:0 /dev/sde
Then output of
btrfs fi show

Label: none  uuid: ef9c4ff3-1b11-42c0-9ebf-2b21aa4eab75
        Total devices 2 FS bytes used 9.81TiB
        devid    1 size 10.91TiB used 10.59TiB path /dev/sde1
        devid    2 size 16.37TiB used 31.03GiB path /dev/sdf1

Label: none  uuid: 8a2ed3b1-8c86-4d72-952e-62b4904eaa5a
        Total devices 1 FS bytes used 231.35GiB
        devid    1 size 476.94GiB used 283.02GiB path /dev/nvme0n1p1

Quote

July 1, 20242 yr

Community Expert

Now unassign both pool devices, start array, stop array, re-assign both pool devices, this warning cannot be there:

Start array and the pool should import.

Quote

July 1, 20242 yr

Author

4 minutes ago, JorgeB said:

Start array and the pool should import

Awesome! Now, assigning the 2 disks in the right places, there is no red warning.

Getting back to the initial goal, How do you suggest I should approach removing the 12TB disk from the pool?

Quote

July 1, 20242 yr

Community Expert

Post current diags to see pool status.

Quote

July 1, 20242 yr

Author

1 minute ago, JorgeB said:

Post current diags to see pool status.

insane-homelab-diagnostics-20240701-0831.zip

Quote

July 1, 20242 yr

Community Expert

You would need to do a balance to raid1 first, this will take a few hours, when the balance is done post new diags to confirm all is OK before removing the other device.

Quote

July 4, 20242 yr

Author

Okay, I ran the balance. Here are the diags.

Also, I have noticed that the share in that pool is Read-only.

insane-homelab-diagnostics-20240704-0858.zip

Edited July 4, 20242 yr by Rahat Zaman

Quote

July 4, 20242 yr

Community Expert

Balance failed because there wasn't enough space, were you writing new data to the pool?

In any case you will need to free up some space and then try again, free up at least 100GB, you will need to re-start the array or reboot to get the pool read/write, and probably will need to manually cancel the balance, so post new diags after that.

Quote

July 18, 20241 yr

Author

Okay, so I ended up moving all the data from that pool to the array and deleting the cache pool. But the problem now is most of my media files that were in the pool (moved to array) are corrupted.

In that pool, I had about 11,175 mkv files, among which only 80 mkv files are healthy (in tdarr). I tried to open them and got errors in mpv.

$ mpv Airlift\ \(2016\)\ Bluray-1080p.mkv 

user_input: 
user_input: stack traceback:
user_input: 	[C]: at 0x5cabcfb522c0
user_input: 	[C]: at 0x5cabcfb529a0
user_input: Lua error: /home/insane/.config/mpv/scripts/user-input.lua:541: attempt to call field 'shared_script_property_observe' (a nil value)
mpv_thumbnail_script_client_osc: 
mpv_thumbnail_script_client_osc: stack traceback:
mpv_thumbnail_script_client_osc: 	.../.config/mpv/scripts/mpv_thumbnail_script_client_osc.lua:4232: in function 'visibility_mode'
mpv_thumbnail_script_client_osc: 	.../.config/mpv/scripts/mpv_thumbnail_script_client_osc.lua:4236: in main chunk
mpv_thumbnail_script_client_osc: 	[C]: at 0x5cabcfb522c0
mpv_thumbnail_script_client_osc: 	[C]: at 0x5cabcfb529a0
mpv_thumbnail_script_client_osc: Lua error: .../.config/mpv/scripts/mpv_thumbnail_script_client_osc.lua:3593: attempt to call field 'shared_script_property_set' (a nil value)
  mpv_thumbnail_script_server_2: Thumbnail worker registering timed out
  mpv_thumbnail_script_server_1: Thumbnail worker registering timed out
                        cplayer: Failed to recognize file format.
                        cplayer: Exiting... (Errors when loading file)

So is there any way to recover these files? I did not keep any backups.

My latest diags are attached.

insane-homelab-diagnostics-20240718-1244.zip

Quote

July 19, 20241 yr

Community Expert

How were the files copied to the array?

Quote

BTRFS cache pool is unrestorable

Featured Replies

Solved by Rahat Zaman

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)