Cannot mount old cache pool drive in Unassigned Devices

johnsanc · November 28, 2018

I recently added 2 SDDs to my cache pool and I want to remove the old spinner disk. There is still a little bit of data on that cache drive I would like to restore. I added the drives to the cache pool, balanced, then shut down array and removed the old cache disk. Now I am trying to mount that old cache disk in Unassigned Devices and i get this. Any idea how I can get this disk to be mountable so I can retrieve the data from it?

Nov 28 12:23:13 Tower unassigned.devices: Adding disk '/dev/sdm1'...
Nov 28 12:23:13 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o auto,async,noatime,nodiratime '/dev/sdm1' '/mnt/disks/oldcache'
Nov 28 12:23:13 Tower kernel: BTRFS info (device sdm1): disk space caching is enabled
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): devid 2 uuid e98a22d0-63c9-484b-b6ea-90ca68386ff1 is missing
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): failed to read the system array: -2
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): open_ctree failed
Nov 28 12:23:13 Tower unassigned.devices: Mount of '/dev/sdm1' failed. Error message: mount: /mnt/disks/oldcache: wrong fs type, bad option, bad superblock on /dev/sdm1, missing codepage or helper program, or other error. 
Nov 28 12:23:13 Tower unassigned.devices: Partition 'ST3320620AS_6QF28RW3' could not be mounted...

JorgeB · November 28, 2018

If it was raid1 is should mount using -o degraded, but you'll need to use the command line, see here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

johnsanc · November 28, 2018

Yes it was raid1. None of the non-destructive repair options worked.

After creating the temp "x" directory, both "mount -o recovery,ro /dev/sdm1 /x" and "mount -o degraded,recovery,ro /dev/sdm1 /x" return an error saying "mount: /x: mount point does not exist."

Trying "btrfs restore -v /dev/sdm1 /mnt/disk13/Restore" I get:

warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

I know very little about btrfs, and this is my first time attempting a cache pool. I have to say the implementation is far from intuitive for a newbie, and it looks like I may have gotten myself into trouble by adding and removing cache disks in the process. I tried to completely remove my old cache drives as well and go back to a single cache drive, but the unraid UI warns me that it will erase all data if I do that... so now I'm kind of stuck and don't know what to do.

If these were supposedly mirrored in RAID1 then shouldn't the data still all be there? Especially since this was the disk I originally used for my single-drive cache. Can't I just undo the RAID1 status of that disk somehow and mount it?

As of right now this is the only disk with any of my old cache data on it. My other 2 SDDs which now occupy my cache pool are empty (which was a surprise because they were balanced before I removed the old cache drive... I would have expected them to have the data from this old cache drive.)

Anyways, any other ideas or do I have to basically wipe out all the data and chalk this up to user error?... which is fair but at no point was I warned that the data would be corrupted somehow.

Edited November 28, 2018 by johnsanc

JorgeB · November 28, 2018

I though you disconnected the disk, If you removed if from the pool letting it balance it would destroy the superblock, making recovery very difficult if not impossible, you can try btrfs restore but without the superblock it likely won't work either.

johnsanc · November 28, 2018

No i did not disconnect the disk. Yeah this is not documented well at all. This should really be a power user feature until the Unraid UI is updated to warn against stuff like this.

JorgeB · November 29, 2018

Something is still missing here, because if you had removed the disk correctly any data on it would be moved to the remaining pool devices

JorgeB · November 29, 2018

6 hours ago, johnsanc said:

I recently added 2 SDDs to my cache pool and I want to remove the old spinner disk. There is still a little bit of data on that cache drive I would like to restore. I added the drives to the cache pool, balanced, then shut down array and removed the old cache disk.

OK, I read this too fast before, if I understand correctly, you added 2 SSDs to the pool, making it a 3 device pool and let it balance, but then instead of removing the disk and let it balance again you just disconnected it? If this is what happened that disk will be unmountbale because it's missing a couple of members, but if the balance finished completely before removing it and assuming you are using the default raid1 profile any data that was there would be on the current pool.

johnsanc · November 29, 2018

How can i remove the disk incorrectly? Here is what I did after all 3 drives were balanced and the cache config showed "No balance found on /mnt/cache" (which is very confusing, but I digress):

Shut down array
Removed all cache disks (Old 320gb HDD + 2x new 1tb SSDs)
Changed cache drives from 3 to 2
Reassigned the 2 SSDs
At this point Cache 2 warned that data would be erased on that disk... wasnt really sure why
Proceeded to start array
Rebalancing automatically took place
Cache drives had no data

I never physically removed any disks, only unassigned and reassigned

Another weird thing I didn't understand is that when all 3 drives were connected it said the total cache space was 1.3ish tb, so basically the 320gb + the 1tb (mirrored I assume?). If this was really in RAID1 shouldn't the max space be the smallest of the drives?

Edited November 29, 2018 by johnsanc

trurl · November 29, 2018

1 minute ago, johnsanc said:

If this was really in RAID1 shouldn't the max space be the smallest of the drives?

If there were only 2 disks it would be, but with more than 2 the mirroring uses the other disks to get more capacity. There are still copies of everything on separate disks.

btrfs disk usage calculator

JorgeB · November 29, 2018

7 minutes ago, johnsanc said:

At this point Cache 2 warned that data would be erased on that disk... wasnt really sure why

This shouldn't happen when doing the normal procedure, if you have them or haven't rebooted yet post the diagnostics showing the replacement, though I will only look at them tomorrow since it's past my bed time now.

Edited November 29, 2018 by johnnie.black

johnsanc · November 29, 2018

Diagnostics attached. Hopefully it contains some clues as to what went wrong here (or more likely what I did wrong...)

tower-diagnostics-20181128-1921.zip

JorgeB · November 29, 2018

The problem apparently was caused by you changing the number o cache slots after re-arranging the disks, if you had done one thing at the time it would have worked, i.e., unassigned the disk, rearrange the disks, start the array to balance and when the balance was done you could have changed the number o cache slots without a problem, but what you did should probably have worked, I have never tried it and likely neither did LT, but there can also be reasons why it can't be done, I'll bring the issue up with LT to see if something can be done for the future to improve that.

JorgeB · November 29, 2018

Created a bug report and also added a warning about this to the procedure on the FAQ.

johnsanc · November 29, 2018

Just for my own reference, how should I have done this? I needed to remove the disk in slot #1 (old drive) and move discs #2 and #3 into positions #1 and #2.

Are you saying I would have needed to do the following, starting with my original state of a single cache drive.

Shut down array
Increase cache slots from 1 to 3
Add new disks to positions 2 and 3
Start array and let the pool balance
Stop array
Rearrange all disks to a different order so that the disk I need to remove is in position 3
Start array and wait until any starting processes complete
Stop array
Remove disk 3 from the cache pool
Change cache pool from 3 to 2 disks without making any other position switches
Start array

I might be missing something, but cannot think of another way to do this one step at a time since the disk I needed to remove was in position #1

Also thank you so much for taking the time to try to reproduce this, logging a bug report, and updating the FAQ. Thankfully I didn't have any data on the cache drive that cannot be replaced.

Edited November 29, 2018 by johnsanc

JorgeB · November 29, 2018

The only thing you needed to do differently was not changing the cache slots back to 2 when you removed the HDD, if you had left it at 3 it would have worked normally, it could then be changed to 2 at any time after the balance completed.

johnsanc · November 29, 2018

Well thats not very intuitive... so you have to start the array with 2 filled slots and 1 blank slot? I didn't even think that was possible.

JorgeB · November 29, 2018

You can have as many array and cache slots select as you want, what counts is the devices actually assigned, I would guess most users don't change the number of slots when removing cache devices, hence why this issue never came up before.

johnsanc · November 30, 2018

Maybe I don't understand. Why wouldn't I change the number of slots if I am reducing the number of drives in the pool? It doesn't make sense at all to me why I would leave 3 slots open when I am only using 2 drives.

Anyways, thanks for your help, hopefully some extra checks are eventually put in place to keep other people from making the same mistake I did.

JorgeB · November 30, 2018

Just now, johnsanc said:

Why wouldn't I change the number of slots if I am reducing the number of drives in the pool?

Because it doesn't matter, Unraid will only use the actual assigned slots, you can have two devices assigned and 24 cache slots select, pool will still be two devices.

johnsanc · November 30, 2018

Well now I know. That's not readily apparent based on the way the cache UI is setup. Unlike the array UI that shows a fixed number of slots, the cache UI does not. This difference in UI behavior is actually what made me think I was supposed to align the number of slots to the number of drives I was using.

I suppose consider this a user experience enhancement suggestion to avoid confusion.

John_M · November 30, 2018

I must say that it's in my nature to tidy things up the way you did. But equally, I wouldn't have made a three device pool as an intermediate step in moving from one to two devices. If my intention was to go from a single HD to a pair of SSDs I would have added one SSD, waited for the balance to complete, then replaced the HD with the other SSD and again waited for the balance to complete. I'm not saying that what you did was wrong - just that it is unlikely to have been tested. I'm sorry you were bitten and I hope the bug gets fixed.

johnsanc · November 30, 2018

No worries, no critical data was lost.

However considering this is a "data loss" issue I would consider this a pretty serious issue, even if it is an edge case. Especially since this is meant to be a feature (cache pool) thats explicitly built with a default configuration to prevent data loss (RAID1). It would be a little different if I was trying to build something totally custom and not supported.

The process isn't dummy proof and one would have to hunt for info buried in forums to not shoot themselves in the foot. I'll hop off my soapbox now

John_M · November 30, 2018

Well, you found a real bug and Johnnie was able to reproduce it and file a report. It will be taken seriously and I expect that it will be fixed pretty quickly. It looks like a logic problem in the GUI: if this, this and this, then don't allow this.

Eksster · August 28, 2019

Hi guys, I can really relate with Johnsanc.

I've been invested in Unraid for about 2 months vigorously watching spaceinvader ones guides. I got pretty far and lost all my data (I of course had backups) due to essentially the issues discussed here. With the main difference (Maybe it's the same) that 1 of my 2 120GB SSD identical cache drives randomly mounts upon a reboot. Currently for example, I have Cache 2 filled and things are fine. After losing my data due to being a newb and fail safes still missing in 6.7.2 and .3, I am very reluctant to trust a single cache drive. Even if it is my hardware having a lemon taste, I would like to have a system in place to prevent failure of all vested data, dockers and vm's.

My tribulations got me to live by a personal motto of: "My server is only as good as the reference guide I've created for all my work." I've got an 80 page word document going so far. What's super amazing for me is having the "Table of Contents" functionality. Everything is linked, and I keep the sidebar up and can jump around and organize large collections of text like never before. It's awesome. I'm reluctant to start due to failure (Which is dumb), but I know I need to work on some backup and restore settings to automatically keep files on the array that I'll be able to use to rebuild VM's and Dockers.

I find it difficult to find information like this thread about cache disk and cache data management and essential operability and reliability steps/ practices. There is from what I can tell, still very much a need for it. I remain not confident on the subject and would enjoy more information.

JorgeB · August 28, 2019

5 hours ago, Eksster said:

With the main difference (Maybe it's the same) that 1 of my 2 120GB SSD identical cache drives randomly mounts upon a reboot.

Not sure what you mean by this, but if need any help you should start a new thread and post the diagnostics, also see here for better pool monitoring.

Cannot mount old cache pool drive in Unassigned Devices

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation