Jump to content

Did Unraid just purge all data from my cache pool?


Recommended Posts

Hi everyone,

 

I had to open up my Unraid box in order to replace a fan. Upon closing it back up I forgot to re-attach the SATA cable of one of my cache drives. I only realised this because I had a strange CPU spike in idle directly after boot which turned out to be „btrfs balance“. That was when I knew sth. was afoot and upon checking, I realised one of the cache drives was missing. So I shut Unraid down again and re-attached the cable (easy enough).

 

However, contrary to my expectation, the disk was not simply added to the cache pool again. Instead, it was listed under unassigned devices as btrfs device. So I stopped the array and assigned the drive back into the cache pool. It then was listed as unaccessible and that it would have to be reformatted. I thought maybe it is a hiccup in Unraid, so I rebooted again. Alas, when I booted it still said the drive needed to be formatted. I wondered why on Earth it would not simply add it back into the cache pool since all I did was boot with a disconnected SATA, but gave up and reformatted it. Only Unraid went ahead and formatted the WHOLE CACHE POOL including the drive that was still fine and included all data! I immediately stopped the array.
 

I had a look at the respective section in the FAQ: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

But none of the steps could restore any data. Also tried the btrfs-undelete script, but also to no avail: https://gist.github.com/Changaco/45f8d171027ea2655d74

 

Is there anything else I can do to restore the data or should I proceed to rebuild the VMs from scratch? Also if anyone can shed light on what just transpired and why my mirrored cache pool simply imploded in this situation would be highly appreciated.

 

Thanks a lot in advance!

Link to comment

Hi @johnnie.black, thanks for your quick response! Is this a general rule, that one should not act "too fast", i.e. before the balance is finished?

 

Also, I attached the diags. The wipe of both cache devices is clearly visible in the syslog. But maybe you can see more in there that might help recover the data? Thanks anyway for having a look, it is highly appreciated!

 

Edited by ledon
removed diags
Link to comment
7 minutes ago, ledon said:

Is this a general rule, that one should not act "too fast", i.e. before the balance is finished?

If a btrfs balance is running you'll get an inhibited Stop button with a reason why:

 

imagem.png.ac913346e8b1b6e944cf0cd96631dcef.png

 

8 minutes ago, ledon said:

The wipe of both cache devices is clearly visible in the syslog.

It is, but you're pool wasn't redundant and that is way it was unmountable the first time, possibly the result on being created on v6.7.x due to a bug:

 

Aug  3 17:09:29 CubeZero kernel: BTRFS warning (device sdd1): devid 1 uuid 82740355-53fc-4d7a-8aaf-0ec4de6f38ce is missing
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Aug  3 17:09:29 CubeZero kernel: BTRFS warning (device sdd1): chunk 402246860800 missing 1 devices, max tolerance is 0 for writeable mount
Aug  3 17:09:29 CubeZero kernel: BTRFS warning (device sdd1): writeable mount is not allowed due to too many missing devices

You then formatted the pool, and yes, by doing that it wiped both devices, now it still might be possible to recover the pool using a backup superblock, see here, but can't really help with this since I've never used it, you can ask for help on IRC #btrfs linke mentioned in that thread.

 

Link to comment

Hm, thanks, interesting to know that the pool actually was not redundant. That raises actually more questions than it answers since it only had the capacity of one instead of both drives, but if it was a bug, I guess that might account for that. Anyway, thanks again, I will have a look at the superblock option!

Link to comment
1 minute ago, ledon said:

That raises actually more questions than it answers since it only had the capacity of one instead of both drives

The bug is that only data is redundant, metadata isn't, metadata takes very little space but if a device fails or is missing whole pool is lost/won't mount.

 

 

Link to comment
2 hours ago, johnnie.black said:

The bug is that only data is redundant, metadata isn't, metadata takes very little space but if a device fails or is missing whole pool is lost/won't mount.

 

 

Alright, thanks, that explains the behaviour I saw!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...