Cache Pool in a bad state (v6.8.3)

July 1, 20206 yr

Hey everyone,

I have a cache pool of 2 devices but I would like to revert this back to 1 device.

What happened was I plugged in the drive, and added it to the cache pool before I realized the implications of the SSD's not being the same size (240GB vs. 2TB) -- so I've been following the below guides trying to remove the 2TB drive without any success and I was hoping to get some help here.

https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?tab=comments#comment-480418

Important note: This is my backup unraid box I'm trying to setup and I'm prepared to lose the data on here if I need to, but I would like to avoid having to re-transfer the ~12TB of data I have backed up on here because it takes several days. (i.e. matter of convenience)

After a fresh reboot with both SSDs successfully in the cache pool, I am seeing this for the "btrfs filesystem show" output -- The device I'd like to remove is /dev/sdb and is not listed there but is showing green on the web page.

root@Fatcat:~# btrfs filesystem show /mnt/cache/
Label: none  uuid: e923abd6-c954-48cd-b2f4-9710cc30eedf
        Total devices 2 FS bytes used 21.00GiB
        devid    1 size 223.57GiB used 44.06GiB path /dev/sdc1
        *** Some devices missing

Performing the steps in the 1st link (../topic/51133-*) I do not see any errors when I run the "blkdiscard" command .. and yes I made sure I wasn't missing a digit or anything when running my commands.

At some point I realized that I wasn't able to remove a cache pool disk when the default mode is raid1, so I tried to do a balance operation and convert it to single mode ... when I attempt this via webpage it seems to work fine and does not display any errors ... when I attempted this via console command it error'd out and said there was not a sufficient amount of disk space. To me this explains why I have never been able to see anything other than "No balance found on '/mnt/cache'" on the webpage or the console, because I believe the 240GB SSD is too small for whatever operation is being attempted.

Here is the "btrfs filesystem df" output, please let me know if any additional information is required

PS: If I can't figure this out, can I use the "New Config" tool -- preserve my assignments -- and then start fresh? .. and basically just take the hit of wasting several days re-transferring my data.

btrfs filesystem df:

Data, single: total=42.00GiB, used=21.00GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=176.00KiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

Thanks in advance any and all assistance

Quote

July 1, 20206 yr

Community Expert

Please post the diagnostics: Tools -> Diagnostics

Quote

July 1, 20206 yr

Author

Not exactly sure which files you're after, so here is the entire zip folder.

fatcat-diagnostics-20200701-0812.zip

Quote

July 1, 20206 yr

Community Expert

17 minutes ago, stah0121 said:

so here is the entire zip folder.

That's what we want.

With array started type on the console:

btrfs balance start -f -mconvert=single /mnt/cache

When done, it should only be a few seconds, stop and re-start array, then after a few minutes check to see if the missing device was deleted, if not post new diags.

Quote

July 1, 20206 yr

Author

root@Fatcat:~# btrfs filesystem show /mnt/cache
Label: none  uuid: e923abd6-c954-48cd-b2f4-9710cc30eedf
        Total devices 1 FS bytes used 22.00GiB
        devid    1 size 223.57GiB used 24.03GiB path /dev/sdc1

It looks like it cleaned up the missing device ... should I make another attempt at removing the drive from the pool? If so, can you clarify which steps I should use?

In one of the posts there was mention of wiping the drive, and also btrfs remove device command was mentioned .. just want to make sure I am doing the right steps in the right order ... I didn't realize the array needed to be stopped and started after a balance operation. Apologies if that is supposed to be obvious.

Quote

July 1, 20206 yr

Community Expert

19 minutes ago, stah0121 said:

should I make another attempt at removing the drive from the pool?

It's already removed from the pool, I didn't noticed it was still assigned, it should be OK to just unassign it, but to play it safer do this:

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign the smaller cache device only (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning), re-enable Docker/VMs if needed, start array.

Quote

July 1, 20206 yr

Community Expert

After all is done you can do a complete device trim on the 2TB SSD to completely wipe it, use:

blkdiscard /dev/sdX

Quote

July 1, 20206 yr

Author

I don't have any VMs or docker containers configured .. and prior to my last reboot set all my shares to "NO" for the cache setting. So my understanding is nothing should be pointing to either cache drive.

However when I stop the array, and remove one or both of the cache drives from the Cache Devices list ... it displays a "missing" label and throws an unraid error saying that "Cache 2 in error state (disk missing) No device identification"

When either of the cache SSDs are unassigned the button to Start the array is gray and not able to be clicked.

Attached is another collection of diag data.

fatcat-diagnostics-2.zip

Quote

July 1, 20206 yr

Community Expert

2 minutes ago, stah0121 said:

it displays a "missing" label and throws an unraid error saying that "Cache 2 in error state (disk missing) No device identification"

That's normal.

3 minutes ago, stah0121 said:

When either of the cache SSDs are unassigned the button to Start the array is gray and not able to be clicked.

There's a checkbox you must click to allow array start with a missing cache device.

Quote

July 1, 20206 yr

Author

Wow I'm dumb haha ... yep that did it. I appreciate your patience.

For anyone just tuning in here are the steps I followed to remove 1 of 2 cache SSDs from the pool.

#1) btrfs balance start -f -mconvert=single /mnt/cache
#2) stop and start the array
#3) after 2min pause run (btrfs filesystem show /mnt/cache) to ensure the "missing devices" message is not displayed
#4) stop the array
#5) unassign the drive you want to remove (or all of them to reset cache config)
#6) check the box that will allow the array to start with missing devices
#7) start the array
#7.1) if you removed all devices, stop the array, re-assign the devices you still want to use in the cache pool and start the array

Feel free to correct me if I'm off base on the steps -- I personally did 1 final stop of the array and set the number of cache drive slots back to 1, but I don't think that is an overly critical step here.

Thanks again for your help

Quote

July 1, 20206 yr

Community Expert

Usually you just need to do this, but for some reason there were dual metadata profiles.

Quote

July 1, 20206 yr

Author

This snippet from that post threw me off...
"You can only remove devices from redundant pools (raid1, raid5/6, raid10, etc)"

Which is why I tried to convert the pool to Single mode but I didn't think Single mode was redundant but I figured removing 1 drive from that configuration might work.

Quote

July 1, 20206 yr

Community Expert

We converted only the metadata to single profile because the other device was already missing (despite still being assigned) but Unraid couldn't finish deleting it because of the dual metada profiles.

Quote

Cache Pool in a bad state (v6.8.3)

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)