cache partition lost - can restore btrfs?


Recommended Posts

I seem to be having bad luck with Unraid.

 

Today, one of my nvme cache disks seemingly failed, and I sourced a larger replacement SSD to replace it.

I read docs and shutdown the array and started it without the bad cache disk, replaced the disk and added it back to the cache pool replacing the missing broken disk and waited.

 

At this point, I saw that the cache pool is empty, without any data. Not sure why this happened.

 

So I'm assuming that my cache is broken. Great.

 

My 'failed' nvme seems to be mysteriously working again, and I've put it back in the server to try recover data from it but it appears without a FS.

I've been trying steps from here: 

 

Using those steps tried to mount the disk to copy data from it, but I get errors that I cant mount the FS.

 

Can anyone help me to recover the files off the remaining old cache disk that I can access?

Edited by KptnKMan
Link to comment

Thanks for the reply.

The array was created in 6.7.0/6.7.1, server is running 6.8.1 now.

 

6 hours ago, johnnie.black said:

There was a bug in v6.7.x where any pool created would have non redundant metadata.

I saw this mentioned in another thread. Does this^ mean there is no recovery data?

 

Here is a log of the commands in the FAQ:

Linux 4.19.94-Unraid.
root@blaster:~# mkdir /bt
root@blaster:~# mount -o usebackuproot,ro /dev/nvme1
nvme1      nvme1n1    nvme1n1p1  
root@blaster:~# mount -o usebackuproot,ro /dev/nvme1n1p1 /bt
mount: /bt: wrong fs type, bad option, bad superblock on /dev/nvme1n1p1, missing codepage or helper program, or other error.
root@blaster:~# mount -o degraded,usebackuproot,ro /dev/nvme1n1p1 /bt
mount: /bt: wrong fs type, bad option, bad superblock on /dev/nvme1n1p1, missing codepage or helper program, or other error.
root@blaster:~# mount -o degraded,usebackuproot,ro /dev/nvme0n1p1 /bt
mount: /bt: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error.
root@blaster:~# mount -o ro,notreelog,nologreplay /dev/nvme1
nvme1      nvme1n1    nvme1n1p1  
root@blaster:~# mount -o ro,notreelog,nologreplay /dev/nvme1n1p1 /bt
mount: /bt: wrong fs type, bad option, bad superblock on /dev/nvme1n1p1, missing codepage or helper program, or other error.
root@blaster:~# /dev/nvme1n1p1 /bt
-bash: /dev/nvme1n1p1: Permission denied
root@blaster:~# btrfs restore -v /dev/nvme1n1p1 /bt
bad tree block 479137857536, bytenr mismatch, want=479137857536, have=0
Couldn't setup device tree
Could not open root, trying backup super
bad tree block 479137857536, bytenr mismatch, want=479137857536, have=0
Couldn't setup device tree
Could not open root, trying backup super
ERROR: superblock bytenr 274877906944 is larger than device size 250059317248
Could not open root, trying backup super
root@blaster:~# btrfs restore -vi /dev/nvme1n1p1 /bt
bad tree block 479137857536, bytenr mismatch, want=479137857536, have=0
Couldn't setup device tree
Could not open root, trying backup super
bad tree block 479137857536, bytenr mismatch, want=479137857536, have=0
Couldn't setup device tree
Could not open root, trying backup super
ERROR: superblock bytenr 274877906944 is larger than device size 250059317248
Could not open root, trying backup super
root@blaster:~# btrfs check --repair /dev/nvme1n1p1
enabling repair mode
WARNING:

        Do not use --repair unless you are advised to do so by a developer
        or an experienced user, and then only after having accepted that no
        fsck can successfully repair all types of filesystem corruption. Eg.
        some software or hardware bugs can fatally damage a volume.
        The operation will start in 10 seconds.
        Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
bad tree block 479137857536, bytenr mismatch, want=479137857536, have=0
Couldn't setup device tree
ERROR: cannot open file system

 

Edited by KptnKMan
Link to comment

I'm in need of some serious help, if anyone has time to help me out.

 

At this point, I've managed to get both disks installed back in the original server, and put into the cache pool.

However, the cache pool seems unmountable, and I need help troubleshooting.

image.thumb.png.74be04bce0f3bc4e8c6bbd772ce68599.png

 

I'm also warned that Unraid wants to format the disk that was previously removed:

image.thumb.png.4c8500becf1dde817cd394bac67034ef.png

 

Will this delete all data and permanently lose everything?

 

Any advice on how to proceed?

Link to comment
7 hours ago, KptnKMan said:

Does this^ mean there is no recovery data?

Basically yes, metadata will be missing, so recovery is much more difficult, sometimes impossible.

 

5 hours ago, KptnKMan said:

At this point, I've managed to get both disks installed back in the original server, and put into the cache pool.

Post current diags, after trying to start the array with both original cache devices.

Link to comment
1 hour ago, johnnie.black said:

Basically yes, metadata will be missing, so recovery is much more difficult, sometimes impossible.

Well, that sucks.

I really enjoy using Unraid, but I'm getting frustrated with losing all my appdata every other month because of an update or bug. It always seems to happen at the worst moment.

 

Anyway...

 

1 hour ago, johnnie.black said:

Post current diags, after trying to start the array with both original cache devices.

Attached, array started, I've not formatted the first nvme yet.

Thanks for taking a look.

blaster-diagnostics-20200128-0950.zip

Edited by KptnKMan
Link to comment
28 minutes ago, KptnKMan said:

I really enjoy using Unraid, but I'm getting frustrated with losing all my appdata every other month because of an update or bug. It always seems to happen at the worst moment.

Yes, that bug sucks, but you need to have backups of anything important, many other things can happen making you lose data.

 

30 minutes ago, KptnKMan said:

Attached, array started, I've not formatted the first nvme yet.

One of the devices appears to have been cleared, likely from previous troubleshooting you did, like trying to re-add it to the pool, make sure you try the recovery options on the FAQ on both devices, if still nothing not much more I can help, you can try looking for more advanced help on IRC (#btrfs) or the btrfs mailing list.

Link to comment
  • 1 month later...

Thanks everyone who helped me with this issue.

After a few weeks of messing around, I eventually gave up and took the config loss and started over with new settings.

 

For notice, I tried to politely reach out to the btrfs mailing list multiple times, and had no response back.

I wouldn't recommend trying that channel, as I subscribed and saw many people having issues with btrfs and no responses. And I mean a many people.

 

So I've gone to a single 1TB NVME cache and nightly backups, thanks for that tip.

Been working great, although I haven't tested a restore yet. Gonna do a dry run sometime soon.

 

For the record also, my new cache is on xfs and I don't think I'll be touching btrfs again.

It's far too buggy and the bugs in it have left me with a sour taste.

 

A warning to future people who may stumble upon this: Backups and avoid BTRFS.

 

A heartfelt thanks to everyone that helped me. 🙂

Link to comment
5 hours ago, KptnKMan said:

I wouldn't recommend trying that channel, as I subscribed and saw many people having issues with btrfs and no responses. And I mean a many people.

I also subscribe to the list, and very rarely I see questions without a response, though recovery isn't always possible, very rarely I see a question without a developer answering, in fact don't remember seeing that for some time, can you link me your email on the archives?

Link to comment
3 hours ago, johnnie.black said:

I also subscribe to the list, and very rarely I see questions without a response, though recovery isn't always possible, very rarely I see a question without a developer answering, in fact don't remember seeing that for some time, can you link me your email on the archives?

Here is my own response to my second mail asking for help. No responses.

Not sure where my first is, I cant seem to find it. I followed all the bot and mail list instructions exactly. Oh well.

https://lore.kernel.org/linux-btrfs/CAMry8Zs8omAJGqyJWL=O5=pKBq5yhq1+tnKvS9OFEooZNsv-GQ@mail.gmail.com/

 

Looking at the archive, I can still see many unanswered mails.

 

Edit: Anyway, I'm past that, learned my lesson. happy to not use btrfs anymore.

I do have a more current issue that I would really appreciate help with resolving if anyone has time:

 

Edited by KptnKMan
Link to comment
7 minutes ago, johnnie.black said:

Yes, I see that, but I (and likely anyone else on the list) never received the original emails, first two messages don't appear, apparently there was some problem:

 

imagem.png.d2755d4213065e037313f11d87740656.png

 

 

 

 

I can see that too, but I followed the exact same process for all my mails, following the exact instructions for using the mailing list.

 

Is it not odd that I received my own message, from their mailing list (As I replied to it here), but it doesn't show up otherwise in their achives?

 

¯\_(ツ)_/¯

Edited by KptnKMan
Link to comment

Either way, I got no help when I REALLY needed it tbh.

 

For me at least, this only highlights that this mailing list is unreliable.

 

Sure, maybe its not arriving in people's inboxes, and we can pass it off as that, but it makes me wonder how many people have sent mails to this list and they never even showed up.

 

I want to thank you for your help though, I really appreciate it. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.