Btrfs cache lost due to stupid error on my part


Endy
Go to solution Solved by JorgeB,

Recommended Posts

Generally I have a rule that I don't work on things when I am really tired so that I don't make stupid mistakes. 

Last night I was really tired and ignored that rule and now I am paying the price.

 

I just swapped motherboard/processor/ram for my Unraid server yesterday and that went well. I also added 2 nvme drives to use for my cache pool to replace the 1 nvme drive I had been using so that I would have a little redundancy there. (I know it's not a proper backup, I am actually in the process of setting up a more proper backup system.) 

 

I added 1 of the new nvme drives to the cache pool and that seemed fine and it balanced it. This is where I should have stopped and waited until after I had slept. I was impatient and I thought it was a good idea to just swap the old nvme drive for the new one in the gui directly. Bad idea. Btrfs pool now gone. Turned off the server, went to bed, and now here I am the next day.

 

I've tried the restore options in the FAQ up until the last option. Currently I have the original nvme drive and the one I added 1st yesterday in the cache pool like it was when it was working before I made the mistake.

 

I have a feeling that I am screwed, but is there any way to restore from here?

 

 

Link to comment
Label: none  uuid: 849f7ded-6fbb-4f5f-9627-fa781a175567
        Total devices 2 FS bytes used 714.96GiB
        devid    1 size 931.51GiB used 826.48GiB path /dev/sdd1
        devid    2 size 931.51GiB used 826.48GiB path /dev/sdb1

Label: none  uuid: db14842a-e91c-4154-bdd8-fbe89dfc7ce3
        Total devices 1 FS bytes used 340.00KiB
        devid    1 size 20.00GiB used 536.00MiB path /dev/loop2

Label: none  uuid: 6331b51d-add2-421b-a015-22c674718eb5
        Total devices 1 FS bytes used 412.00KiB
        devid    1 size 1.00GiB used 126.38MiB path /dev/loop3

 

sdd and sdb are part of another btrfs pool not having to do with cache.

 

turtle-diagnostics-20231212-1040.zip

Link to comment
root@Turtle:~# fdisk -l /dev/nvme2n1
Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P1SSD8                            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@Turtle:~# fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WD_BLACK SN850X 2000GB                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Link to comment

root@Turtle:~# sfdisk /dev/nvme2n1

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P1SSD8                            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

sfdisk is going to create a new 'dos' disk label.
Use 'label: <name>' before you define a first partition
to override the default.

Type 'help' to get more information.

>>> 2048
Created a new DOS disklabel with disk identifier 0xfc45e933.
Created a new partition 1 of type 'Linux' and of size 931.5 GiB.
/dev/nvme2n1p1 :         2048   1953525167 (931.5G) Linux
/dev/nvme2n1p2: 

Link to comment
root@Turtle:~# sfdisk /dev/nvme2n1

Welcome to sfdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P1SSD8                            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

sfdisk is going to create a new 'dos' disk label.
Use 'label: <name>' before you define a first partition
to override the default.

Type 'help' to get more information.

>>> 64
Created a new DOS disklabel with disk identifier 0xe90b059d.
Created a new partition 1 of type 'Linux' and of size 931.5 GiB.
Partition #1 contains a btrfs signature.

Do you want to remove the signature? [Y]es/[N]o: 

 

Link to comment

That worked. 

 

/dev/nvme2n1p2: write

New situation:
Disklabel type: dos
Disk identifier: 0xe90b059d

Device         Boot Start        End    Sectors   Size Id Type
/dev/nvme2n1p1         64 1953525167 1953525104 931.5G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@Turtle:~# btrfs fi show
Label: none  uuid: 849f7ded-6fbb-4f5f-9627-fa781a175567
        Total devices 2 FS bytes used 714.96GiB
        devid    1 size 931.51GiB used 826.48GiB path /dev/sdd1
        devid    2 size 931.51GiB used 826.48GiB path /dev/sdb1

Label: none  uuid: db14842a-e91c-4154-bdd8-fbe89dfc7ce3
        Total devices 1 FS bytes used 340.00KiB
        devid    1 size 20.00GiB used 536.00MiB path /dev/loop2

Label: none  uuid: 6331b51d-add2-421b-a015-22c674718eb5
        Total devices 1 FS bytes used 412.00KiB
        devid    1 size 1.00GiB used 126.38MiB path /dev/loop3

warning, device 2 is missing
Label: none  uuid: 9bceeb35-e26e-47d8-9ef3-3534abbaa204
        Total devices 2 FS bytes used 383.72GiB
        devid    1 size 931.51GiB used 884.05GiB path /dev/nvme2n1p1
        *** Some devices missing

*edited to show the part after 'write' as well

Edited by Endy
Link to comment
  • Solution

I assume nvme2n1, the Crucial device, was the original cache? If yes we can try and mount it alone, if needed you can then still try to recover the other device.

 

To mount that device alone, unassign both pool members from the pool, start array, stop array, re-assign only the Crucial device, start array, post new diags.

Link to comment

It's alive! As far as I can tell, everything seems to be there. Docker started up and seems to be working.

 

Am I out of the woods now?

If so, because I plan on just using the 2 new drives for cache and removing the Crucial drive, next step would be to temporarily move all data off of the cache pool? Then delete and recreate the cache pool using just the 2 new drives and then move the data back?

turtle-diagnostics-20231213-0735.zip

Link to comment

Ok, it finished.

 

If I am planning on removing the Crucial drive, do I want/need to add one of the new drives? Just want to make sure I don't mess up again.

 

Also, thank you so much for your help JorgeB. While I wouldn't have lost any irreplaceable data, you have saved me from countless hours recreating and setting everything up. I truly appreciate it.

turtle-diagnostics-20231213-0936.zip

Link to comment
Dec 13 09:35:19 Turtle kernel: BTRFS error (device nvme2n1p1): bdev /dev/nvme2n1p1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Dec 13 09:35:33 Turtle kernel: BTRFS info (device nvme2n1p1): balance: ended with status: -5

 

Balance didn't finish because some data corruption was detected, possibly also why you had issues before adding the device, in this case you should not add the new device, you can create a new pool with the new device(s), copy everything you can then remove that device.

 

23 minutes ago, Endy said:

Also, thank you so much for your help JorgeB.

Your welcome.

Link to comment

Hopefully small hiccup... I stopped the array and created a new pool for the 2 new drives and when I tried to start the array I got a message saying

 

"Wrong Pool State

cache - too many missing/wrong devices"

 

I didn't touch the original cache pool. I tried deleting the new pool and same message.

Link to comment
1 hour ago, Endy said:

I tried deleting the new pool and same message.

Missed this part, so this is about the old pool, possibly because the missing device failed to be removed, try re-importin the pool again, unassign the old cache device, assign the new ones to a new pool now, because you may have the same issue later after an array stop, start array, stop array, re-assign old cache, start array.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.