Cache Pool in a bad state (v6.8.3)


Recommended Posts

Hey everyone,

 

I have a cache pool of 2 devices but I would like to revert this back to 1 device.

What happened was I plugged in the drive, and added it to the cache pool before I realized the implications of the SSD's not being the same size (240GB vs. 2TB) -- so I've been following the below guides trying to remove the 2TB drive without any success and I was hoping to get some help here.

 

https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?tab=comments#comment-480418

 

Important note: This is my backup unraid box I'm trying to setup and I'm prepared to lose the data on here if I need to, but I would like to avoid having to re-transfer the ~12TB of data I have backed up on here because it takes several days. (i.e. matter of convenience)

 

After a fresh reboot with both SSDs successfully in the cache pool, I am seeing this for the "btrfs filesystem show" output -- The device I'd like to remove is /dev/sdb and is not listed there but is showing green on the web page.

root@Fatcat:~# btrfs filesystem show /mnt/cache/
Label: none  uuid: e923abd6-c954-48cd-b2f4-9710cc30eedf
        Total devices 2 FS bytes used 21.00GiB
        devid    1 size 223.57GiB used 44.06GiB path /dev/sdc1
        *** Some devices missing

 

Performing the steps in the 1st link (../topic/51133-*) I do not see any errors when I run the "blkdiscard" command .. and yes I made sure I wasn't missing a digit or anything when running my commands.

 

At some point I realized that I wasn't able to remove a cache pool disk when the default mode is raid1, so I tried to do a balance operation and convert it to single mode ... when I attempt this via webpage it seems to work fine and does not display any errors ... when I attempted this via console command it error'd out and said there was not a sufficient amount of disk space. To me this explains why I have never been able to see anything other than "No balance found on '/mnt/cache'" on the webpage or the console, because I believe the 240GB SSD is too small for whatever operation is being attempted.

 

Here is the "btrfs filesystem df" output, please let me know if any additional information is required

 

PS: If I can't figure this out, can I use the "New Config" tool -- preserve my assignments -- and then start fresh? .. and basically just take the hit of wasting several days re-transferring my data.

 

btrfs filesystem df:

Data, single: total=42.00GiB, used=21.00GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=176.00KiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

 

Thanks in advance any and all assistance 

 

Link to comment
17 minutes ago, stah0121 said:

so here is the entire zip folder.

That's what we want.

 

With array started type on the console:

 

btrfs balance start -f -mconvert=single /mnt/cache

 

When done, it should only be a few seconds, stop and re-start array, then after a few minutes check to see if the missing device was deleted, if not post new diags.

Link to comment
root@Fatcat:~# btrfs filesystem show /mnt/cache
Label: none  uuid: e923abd6-c954-48cd-b2f4-9710cc30eedf
        Total devices 1 FS bytes used 22.00GiB
        devid    1 size 223.57GiB used 24.03GiB path /dev/sdc1

 

It looks like it cleaned up the missing device ... should I make another attempt at removing the drive from the pool? If so, can you clarify which steps I should use?

In one of the posts there was mention of wiping the drive, and also btrfs remove device command was mentioned .. just want to make sure I am doing the right steps in the right order ... I didn't realize the array needed to be stopped and started after a balance operation. Apologies if that is supposed to be obvious.

Link to comment
19 minutes ago, stah0121 said:

should I make another attempt at removing the drive from the pool?

It's already removed from the pool, I didn't noticed it was still assigned, it should be OK to just unassign it, but to play it safer do this:

 

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign the smaller cache device only (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning), re-enable Docker/VMs if needed, start array.

 

 

 

 

Link to comment

I don't have any VMs or docker containers configured .. and prior to my last reboot set all my shares to "NO" for the cache setting. So my understanding is nothing should be pointing to either cache drive.

 

However when I stop the array, and remove one or both of the cache drives from the Cache Devices list ... it displays a "missing" label and throws an unraid error saying that "Cache 2 in error state (disk missing) No device identification"

 

When either of the cache SSDs are unassigned the button to Start the array is gray and not able to be clicked.

Attached is another collection of diag data.

fatcat-diagnostics-2.zip

Link to comment
2 minutes ago, stah0121 said:

it displays a "missing" label and throws an unraid error saying that "Cache 2 in error state (disk missing) No device identification"

That's normal.

 

3 minutes ago, stah0121 said:

When either of the cache SSDs are unassigned the button to Start the array is gray and not able to be clicked.

There's a checkbox you must click to allow array start with a missing cache device.

Link to comment

Wow I'm dumb haha ... yep that did it. I appreciate your patience.

 

For anyone just tuning in here are the steps I followed to remove 1 of 2 cache SSDs from the pool.

#1) btrfs balance start -f -mconvert=single /mnt/cache
#2) stop and start the array
#3) after 2min pause run (btrfs filesystem show /mnt/cache) to ensure the "missing devices" message is not displayed
#4) stop the array
#5) unassign the drive you want to remove (or all of them to reset cache config)
#6) check the box that will allow the array to start with missing devices
#7) start the array
#7.1) if you removed all devices, stop the array, re-assign the devices you still want to use in the cache pool and start the array

Feel free to correct me if I'm off base on the steps -- I personally did 1 final stop of the array and set the number of cache drive slots back to 1, but I don't think that is an overly critical step here.

 

Thanks again for your help

Link to comment

This snippet from that post threw me off...
"You can only remove devices from redundant pools (raid1, raid5/6, raid10, etc)"

 

Which is why I tried to convert the pool to Single mode but I didn't think Single mode was redundant but I figured removing 1 drive from that configuration might work.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.