Cannot mount old cache pool drive in Unassigned Devices


johnsanc

Recommended Posts

I recently added 2 SDDs to my cache pool and I want to remove the old spinner disk. There is still a little bit of data on that cache drive I would like to restore. I added the drives to the cache pool, balanced, then shut down array and removed the old cache disk. Now I am trying to mount that old cache disk in Unassigned Devices and i get this. Any idea how I can get this disk to be mountable so I can retrieve the data from it?

 

Nov 28 12:23:13 Tower unassigned.devices: Adding disk '/dev/sdm1'...
Nov 28 12:23:13 Tower unassigned.devices: Mount drive command: /sbin/mount -t btrfs -o auto,async,noatime,nodiratime '/dev/sdm1' '/mnt/disks/oldcache'
Nov 28 12:23:13 Tower kernel: BTRFS info (device sdm1): disk space caching is enabled
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): devid 2 uuid e98a22d0-63c9-484b-b6ea-90ca68386ff1 is missing
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): failed to read the system array: -2
Nov 28 12:23:13 Tower kernel: BTRFS error (device sdm1): open_ctree failed
Nov 28 12:23:13 Tower unassigned.devices: Mount of '/dev/sdm1' failed. Error message: mount: /mnt/disks/oldcache: wrong fs type, bad option, bad superblock on /dev/sdm1, missing codepage or helper program, or other error. 
Nov 28 12:23:13 Tower unassigned.devices: Partition 'ST3320620AS_6QF28RW3' could not be mounted...

 

Link to comment

Yes it was raid1. None of the non-destructive repair options worked.

 

After creating the temp "x" directory, both "mount -o recovery,ro /dev/sdm1 /x" and "mount -o degraded,recovery,ro /dev/sdm1 /x" return an error saying "mount: /x: mount point does not exist."

 

Trying "btrfs restore -v /dev/sdm1 /mnt/disk13/Restore" I get: 

 

warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 4 is missing
warning, device 2 is missing
bytenr mismatch, want=1718833954816, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

 

I know very little about btrfs, and this is my first time attempting a cache pool. I have to say the implementation is far from intuitive for a newbie, and it looks like I may have gotten myself into trouble by adding and removing cache disks in the process. I tried to completely remove my old cache drives as well and go back to a single cache drive, but the unraid UI warns me that it will erase all data if I do that... so now I'm kind of stuck and don't know what to do.

 

If these were supposedly mirrored in RAID1 then shouldn't the data still all be there? Especially since this was the disk I originally used for my single-drive cache. Can't I just undo the RAID1 status of that disk somehow and mount it?

 

As of right now this is the only disk with any of my old cache data on it. My other 2 SDDs which now occupy my cache pool are empty (which was a surprise because they were balanced before I removed the old cache drive... I would have expected them to have the data from this old cache drive.)

 

Anyways, any other ideas or do I have to basically wipe out all the data and chalk this up to user error?... which is fair but at no point was I warned that the data would be corrupted somehow.

Edited by johnsanc
Link to comment
6 hours ago, johnsanc said:

I recently added 2 SDDs to my cache pool and I want to remove the old spinner disk. There is still a little bit of data on that cache drive I would like to restore. I added the drives to the cache pool, balanced, then shut down array and removed the old cache disk.

OK, I read this too fast before, if I understand correctly, you added 2 SSDs to the pool, making it a 3 device pool and let it balance, but then instead of removing the disk and let it balance again you just disconnected it? If this is what happened that disk will be unmountbale because it's missing a couple of members, but if the balance finished completely before removing it and assuming you are using the default raid1 profile any data that was there would be on the current pool.

Link to comment

How can i remove the disk incorrectly? Here is what I did after all 3 drives were balanced and the cache config showed "No balance found on /mnt/cache" (which is very confusing, but I digress):

 

  1. Shut down array
  2. Removed all cache disks (Old 320gb HDD + 2x new 1tb SSDs)
  3. Changed cache drives from 3 to 2
  4. Reassigned the 2 SSDs
  5. At this point Cache 2 warned that data would be erased on that disk... wasnt really sure why
  6. Proceeded to start array
  7. Rebalancing automatically took place
  8. Cache drives had no data

 

I never physically removed any disks, only unassigned and reassigned

 

Another weird thing I didn't understand is that when all 3 drives were connected it said the total cache space was 1.3ish tb, so basically the 320gb + the 1tb (mirrored I assume?). If this was really in RAID1 shouldn't the max space be the smallest of the drives?

Edited by johnsanc
Link to comment
7 minutes ago, johnsanc said:

At this point Cache 2 warned that data would be erased on that disk... wasnt really sure why

This shouldn't happen when doing the normal procedure, if you have them or haven't rebooted yet post the diagnostics showing the replacement, though I will only look at them tomorrow since it's past my bed time now.

 

 

Edited by johnnie.black
Link to comment

The problem apparently was caused by you changing the number o cache slots after re-arranging the disks, if you had done one thing at the time it would have worked, i.e., unassigned the disk, rearrange the disks, start the array to balance and when the balance was done you could have changed the number o cache slots without a problem, but what you did should probably have worked, I have never tried it and likely neither did LT, but there can also be reasons why it can't be done, I'll bring the issue up with LT to see if something can be done for the future to improve that.

 

 

Link to comment

Just for my own reference, how should I have done this? I needed to remove the disk in slot #1 (old drive) and move discs #2 and #3 into positions #1 and #2. 

 

Are you saying I would have needed to do the following, starting with my original state of a single cache drive.

  1. Shut down array
  2. Increase cache slots from 1 to 3
  3. Add new disks to positions 2 and 3
  4. Start array and let the pool balance
  5. Stop array
  6. Rearrange all disks to a different order so that the disk I need to remove is in position 3
  7. Start array and wait until any starting processes complete
  8. Stop array
  9. Remove disk 3 from the cache pool
  10. Change cache pool from 3 to 2 disks without making any other position switches
  11. Start array

I might be missing something, but cannot think of another way to do this one step at a time since the disk I needed to remove was in position #1

 

Also thank you so much for taking the time to try to reproduce this, logging a bug report, and updating the FAQ. Thankfully I didn't have any data on the cache drive that cannot be replaced.

Edited by johnsanc
Link to comment

Maybe I don't understand. Why wouldn't I change the number of slots if I am reducing the number of drives in the pool? It doesn't make sense at all to me why I would leave 3 slots open when I am only using 2 drives.

 

Anyways, thanks for your help, hopefully some extra checks are eventually put in place to keep other people from making the same mistake I did.

Link to comment

Well now I know. That's not readily apparent based on the way the cache UI is setup. Unlike the array UI that shows a fixed number of slots, the cache UI does not. This difference in UI behavior is actually what made me think I was supposed to align the number of slots to the number of drives I was using.

 

I suppose consider this a user experience enhancement suggestion to avoid confusion.

Link to comment

I must say that it's in my nature to tidy things up the way you did. But equally, I wouldn't have made a three device pool as an intermediate step in moving from one to two devices. If my intention was to go from a single HD to a pair of SSDs I would have added one SSD, waited for the balance to complete, then replaced the HD with the other SSD and again waited for the balance to complete. I'm not saying that what you did was wrong - just that it is unlikely to have been tested. I'm sorry you were bitten and I hope the bug gets fixed.

Link to comment

No worries, no critical data was lost.

 

However considering this is a "data loss" issue I would consider this a pretty serious issue, even if it is an edge case. Especially since this is meant to be a feature (cache pool) thats explicitly built with a default configuration to prevent data loss (RAID1). It would be a little different if I was trying to build something totally custom and not supported.

 

The process isn't dummy proof and one would have to hunt for info buried in forums to not shoot themselves in the foot. I'll hop off my soapbox now :)

Link to comment
  • 8 months later...

Hi guys, I can really relate with Johnsanc. 

I've been invested in Unraid for about 2 months vigorously watching spaceinvader ones guides. I got pretty far and lost all my data (I of course had backups) due to essentially the issues discussed here. With the main difference (Maybe it's the same) that 1 of my 2 120GB SSD identical cache drives randomly mounts upon a reboot. Currently for example, I have Cache 2 filled and things are fine. After losing my data due to being a newb and fail safes still missing in 6.7.2 and .3, I am very reluctant to trust a single cache drive. Even if it is my hardware having a lemon taste, I would like to have a system in place to prevent failure of all vested data, dockers and vm's. 

 

My tribulations got me to live by a personal motto of: "My server is only as good as the reference guide I've created for all my work." I've got an 80 page word document going so far. What's super amazing for me is having the "Table of Contents" functionality. Everything is linked, and I keep the sidebar up and can jump around and organize large collections of text like never before. It's awesome. I'm reluctant to start due to failure (Which is dumb), but I know I need to work on some backup and restore settings to automatically keep files on the array that I'll be able to use to rebuild VM's and Dockers. 

 

I find it difficult to find information like this thread about cache disk and cache data management and essential operability and reliability steps/ practices. There is from what I can tell, still very much a need for it. I remain not confident on the subject and would enjoy more information.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.