ShoGinn Posted March 22, 2019 Share Posted March 22, 2019 So I made a mistake and removed my cache 2 drive (have a large disk attached storage) [newb mistake] and I made an assumption I knew how to recover the btrfs raid 1... Well I couldn't and ended up botching my filesystem and just restoring a backup. I spun up a Unraid VM to test some failure scenarios of the btrfs raid 1 setup.. after simulating a failure, Unraid doesn't seem to have a mechanism to recover... My test plan was to setup the cache/cache2 shutdown and then remove the cache2 image and reboot.. The cache wouldn't mount.. Expectation: Kinda like MDADM when it loses a disk, it just uses the other disk in single mode. Then if I add another cache disk it restores and rebuilds... What am I missing? Quote Link to comment
JorgeB Posted March 23, 2019 Share Posted March 23, 2019 9 hours ago, ShoGinn said: Kinda like MDADM when it loses a disk, it just uses the other disk in single mode. Then if I add another cache disk it restores and rebuilds... That's how it works, assuming you're adding a new or clean device for the restore, what exactly did you do? Quote Link to comment
ShoGinn Posted March 23, 2019 Author Share Posted March 23, 2019 Yeah I have tried multiple methods to get this to work using my logic flow to simulate a failure. Create a VM with 4 virtual disks in SATA (2 of them 200mb 2 of them 100mb) passthrough USB Stick with Unraid The 2x 100mb will be 1x Parity 1x Data The 2x 200mb will be 2x Cache Boot system and assign above Once system is up and running write some data to the /mnt/cache pv < /dev/urandom > /mnt/cache/random.bin This pretty much fills up that 200mb Then create the havoc that could be a bad hard drive. (I've done this before using MDADM in another virtual machine) pv < /dev/zero > /dev/sde1 This zeros out the device and starts creating errors in the cache This is where I am curious what I'm doing wrong... My steps are: Stop the array Unassign cache2 and aggree thats what I want to do Start up array.. see that there are errors looking for the other drive Shutdown the machine Remove vdisk4 and reboot Start array: Cache is unmountable file system My expectations: Upon unassigning cache 2 and starting up array that it does a raid1 restore and remove of the cache 2 from the Btrfs raid1 It does not do that warning, device 2 is missing Label: none uuid: 19f7dc30-f58d-4f3e-9dc0-52b6fec475bf Total devices 2 FS bytes used 75.11MiB devid 1 size 199.97MiB used 132.00MiB path /dev/sdd1 *** Some devices missing That is my method... Quote Link to comment
ShoGinn Posted March 23, 2019 Author Share Posted March 23, 2019 Also, "replacing" the failed cache drive with another virtual image does not create a restore either.. still unmountable Quote Link to comment
JorgeB Posted March 23, 2019 Share Posted March 23, 2019 15 minutes ago, ShoGinn said: Unassign cache2 and aggree thats what I want to do Start up array.. see that there are errors looking for the other drive When doing this cache will mount and pool will be rebalanced to single device, stop array button will be disabled with the info that a "btrfs operation is running" you need to wait for the balance to finish before stopping the array. 17 minutes ago, ShoGinn said: Upon unassigning cache 2 and starting up array that it does a raid1 restore and remove of the cache 2 from the Btrfs raid1 That's exactly what is done, a device delete plus balance to single device. 6 minutes ago, ShoGinn said: Also, "replacing" the failed cache drive with another virtual image does not create a restore either.. still unmountable Assigning a new cache2 device, either the old cleanly removed device or a new one to the pool and it will be re-balance to raid1, there could be problems if a previously pool device that's wasn't cleanly removed is assigned again without being cleared/wiped, but that is a btrfs quirk. Not sure why it's not working for you, possibly because using vdisks, but would need the diagnostics grabbed after a cache operation to see the reason why. Quote Link to comment
ShoGinn Posted March 23, 2019 Author Share Posted March 23, 2019 I ran through the following test This time after shutting down and removing the virtual disk. Started array all is well Still shows array as a raid1 Label: none uuid: 901e6235-a8c5-443f-bdcb-eb8c496a9c48 Total devices 2 FS bytes used 131.16MiB devid 2 size 199.97MiB used 198.94MiB path /dev/sdd1 *** Some devices missing This would be correct in how I would expect it to work Not sure why it wouldn't mount my last go around ------ I guess the moral of this story is that testing pays dividends. I could not recreate my initial situation. One of the other issues I saw was that even after you replace the drive, if the failed drive is the main drive and you replace it, you will lose everything. Thanks @johnnie.black for validating how it should play out. My Google fu was unable to find a situation like this. I am ultra paranoid about data loss and after my mistake it didn't do well for confidence. I like Unraid, and want to purchase a license but if I couldn't get this to work it would have been a deal breaker.. So thats been rectified. Hopefully this helps others! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.