Unraid Cache Recovery -- What am I missing


Recommended Posts

So I made a mistake and removed my cache 2 drive (have a large disk attached storage) [newb mistake] and I made an assumption I knew how to recover the btrfs raid 1...

Well I couldn't and ended up botching my filesystem and just restoring a backup.

 

I spun up a Unraid VM to test some failure scenarios of the btrfs raid 1 setup.. after simulating a failure, Unraid doesn't seem to have a mechanism to recover...

 

My test plan was to setup the cache/cache2 shutdown and then remove the cache2 image and reboot..

 

The cache wouldn't mount..

 

Expectation:

 

Kinda like MDADM when it loses a disk, it just uses the other disk in single mode.

Then if I add another cache disk it restores and rebuilds...

 

 

What am I missing?

Link to comment

Yeah I have tried multiple methods to get this to work using my logic flow to simulate a failure.

 

  1. Create a VM with 4 virtual disks in SATA (2 of them 200mb 2 of them 100mb) passthrough USB Stick with Unraid
    1. The 2x 100mb will be 1x Parity 1x Data
    2. The 2x 200mb will be 2x Cache
  2. Boot system and assign above
  3. Once system is up and running write some data to the /mnt/cache
    1. pv < /dev/urandom > /mnt/cache/random.bin

      This pretty much fills up that 200mb

  4. Then create the havoc that could be a bad hard drive. (I've done this before using MDADM in another virtual machine)

    1. pv < /dev/zero > /dev/sde1

      This zeros out the device and starts creating errors in the cache

  5. This is where I am curious what I'm doing wrong... 

    1. My steps are:

      1. Stop the array

      2. Unassign cache2 and aggree thats what I want to do

      3. Start up array.. see that there are errors looking for the other drive

      4. Shutdown the machine

      5. Remove vdisk4 and reboot

      6. Start array:

        1. Cache is unmountable file system

    2. My expectations:

      1. Upon unassigning cache 2 and starting up array that it does a raid1 restore and remove of the cache 2 from the Btrfs raid1

        1. It does not do that

          1. warning, device 2 is missing
            Label: none  uuid: 19f7dc30-f58d-4f3e-9dc0-52b6fec475bf
                    Total devices 2 FS bytes used 75.11MiB
                    devid    1 size 199.97MiB used 132.00MiB path /dev/sdd1
                    *** Some devices missing

             

That is my method...

Link to comment
15 minutes ago, ShoGinn said:
  • Unassign cache2 and aggree thats what I want to do 

  • Start up array.. see that there are errors looking for the other drive 

When doing this cache will mount and pool will be rebalanced to single device, stop array button will be disabled with the info that a "btrfs operation is running" you need to wait for the balance to finish before stopping the array.

 

17 minutes ago, ShoGinn said:

Upon unassigning cache 2 and starting up array that it does a raid1 restore and remove of the cache 2 from the Btrfs raid1 

That's exactly what is done, a device delete plus balance to single device.

 

6 minutes ago, ShoGinn said:

Also, "replacing" the failed cache drive with another virtual image does not create a restore either.. still unmountable

Assigning a new cache2 device, either the old cleanly removed device or a new one to the pool and it will be re-balance to raid1, there could be problems if a previously pool device that's wasn't cleanly removed is assigned again without being cleared/wiped, but that is a btrfs quirk.

 

Not sure why it's not working for you, possibly because using vdisks, but would need the diagnostics grabbed after a cache operation to see the reason why.

 

 

Link to comment

I ran through the following test

 

  1. This time after shutting down and removing the virtual disk.
  2. Started array 
    1. all is well
  3. Still shows array as a raid1
    1. Label: none  uuid: 901e6235-a8c5-443f-bdcb-eb8c496a9c48
              Total devices 2 FS bytes used 131.16MiB
              devid    2 size 199.97MiB used 198.94MiB path /dev/sdd1
              *** Some devices missing

      This would be correct in how I would expect it to work

  4. Not sure why it wouldn't mount my last go around

------

I guess the moral of this story is that testing pays dividends.

 

I could not recreate my initial situation.

 

One of the other issues I saw was that even after you replace the drive, if the failed drive is the main drive and you replace it, you will lose everything.

 

Thanks @johnnie.black for validating how it should play out.

My Google fu was unable to find a situation like this.

I am ultra paranoid about data loss and after my mistake it didn't do well for confidence.

I like Unraid, and want to purchase a license but if I couldn't get this to work it would have been a deal breaker.. So thats been rectified.

 

Hopefully this helps others!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.