Trying to remove 1/2 cache pool

DrJake · December 29, 2021

Hi all, I currently have 2 drives assigned to the cache pool:

1TB SATA ssd
500GB NVMe ssd (intend to remove from the pool

I read the FAQ on removing the cache drive, but I'm encountering issues I'm not sure how to proceed.

I'm running Unraid 6.9.2, followed the instructions to the letter.

1. stop the array

2. unassign pool disk to remove

3. did not do any re-ordering

4. start the array (after checking the "I'm sure" box next to the start array button)

The system is asking me if I want to format the cache2 drive, which is the one I intend to keep in the pool.

The drive that was removed from the pool could not be mounted as an unassigned device. I planned on doing this later after the cache pool is sorted...

Having encountered this problem, I didn't really want to deal with a server problem during the holidays... I tried to add the drive I removed back into the pool, but seems like I cannot anymore...

I read on the forum about physically removing the drive, but that was an older version of Unraid, so I wanted to check with the pros before I shut down the server n all... Help... what are my options at this stage?

DrJake · December 29, 2021

So I guess I'm just trying to recover the data on my cache drive at this point.

The 2 pooled drives were setup as raid1 in case 1 of them failed. But at the moment, I can't seem to figure out/find information about how to recover the data. I just saw this thread, and think I might be one of these cases... cant remember when I setup the redundancy, but I recall it was the Unraid version when the pooled cache feature was first introduced.

I believe I have not done anything irreversible, (have not formatted any of the drives, have not rebooted the server). But think I need some expert help...

tower-diagnostics-20211229-1432.zip

Edited December 29, 2021 by DrJake

JorgeB · December 29, 2021

Pool wasn't configured correctly before, i.e., only the NVMe device was part of it, and by unassigning it it was wiped:

Dec 29 11:31:17 Tower emhttpd: shcmd (1202764): /sbin/wipefs -a /dev/nvme1n1p1
Dec 29 11:31:17 Tower root: /dev/nvme1n1p1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d

This usually works for this, type in the console:

btrfs-select-super -s 1 /dev/nvme1n1p1

If the command is successful (there's no error) then reset the pool with:

if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign only the NVMe device (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

DrJake · December 29, 2021

oh wow, thank you thank you so much Jorge.

the server back to life now, with VMs and Dockers all working fine. You've saved me a lot of time reconfiguring my VMs and security cameras...

So back to what I was trying to do in the beginning. I'm trying to swap out the 500GB NVMe where the cache data is on at the moment, and replace it with a 1TB NVMe (currently has data, and used as an unassigned device) pooled with a 1TB SATA ssd.

Following what I originally was planning to do, I guess:

1. add the 1TB SATA ssd to the cache pool (resulting in the redundant cache pool I should have had at the beginning)

2. remove the 500GB NVMe from the cache pool (the 1TB SATA ssd should still function, right?)

3. do my data transfer, so the 1TB NVMe is free

4. add the 1TB NVMe to the cache pool

Is there a better way? I'm a bit worried about step 2 and 4, because what I just went through Would the 1TB SATA ssd be a functional cache drive by itself after step 2?

JorgeB · December 29, 2021

3 hours ago, DrJake said:

remove the 500GB NVMe from the cache pool (the 1TB SATA ssd should still function, right?)

Yes, as long as the device is correctly added to the pool, in doubt you can post diags after doing it, it usually works without issues, but sometimes it doesn't.

DrJake · December 30, 2021

Hi Jorge, something is not right.

I'm only at step 1, but dont think the device was correctly added to the pool (for reasons unknown to me).

It says the 1TB sata ssd is "part of a pool".

I tried to do "balance" in the GUI of Cache (nvme1n1), twice, each time completed without complaint. FYI it would not allow me to "convert to raid1 mode" here.

Afterwards, using "btrfs fi usage -T /mnt/cache" I get this, doesn't even look like the device is in the pool...

726118551_inpool2.PNG.fa91aad9666d55fa1f57a4a422429a13.PNG

Tried to run "btrfs balance start -mconvert=raid1 /mnt/cache" posted on the other thread, and got error

2059494623_errorconvert.PNG.98022e0ad1800b847de8ce5be2c203b8.PNG

syslog says
Dec 30 11:48:38 Tower kernel: BTRFS error (device nvme1n1p1): balance: invalid convert metadata profile raid1

P.S. I tried stopping the array and adding the 2nd cache device twice, same issues. syslog and new diagnostics attached.

syslog.txt tower-diagnostics-20211230-1151.zip

JorgeB · December 30, 2021

Yes, it failed to add the device to the pool, try this:

-stop array

-unassign cache2

-start array

-to completely wipe the device type in the console:

blkdiscard /dev/sdc

-reboot

-try again, post new diags if it still fails.

DrJake · December 30, 2021

Thx for getting back to me Jorge, I suspected it was an issue with the cache pool, something is still bugging out. So I ended up taking the long way around.

1. transferred all the cache data back to the array (for peace of mind as well)

2. deleted the cache pool

3. rebooted the server

4. recreated the cache pool (at some point I needed to use the "btrfs-select-super -s 1 /dev/nvme1n1p1" command again, because the system was preventing me from mounting the drive as an unassigned device)

5. now I think everything is in working order, the mover is still moving data from the array back onto the new cache pool.

So just to confirm, this means the cache pool is working in RAID1 config right? Does the ID number matter as to which drive data will be read/written from/to? because 1 is SATA and 1 is NVMe.

2051594948_finalQ.PNG.17c65356cad9620406c555d31928fe96.PNG

JorgeB · December 30, 2021

4 hours ago, DrJake said:

this means the cache pool is working in RAID1 config right?

Yep.

4 hours ago, DrJake said:

Does the ID number matter as to which drive data will be read/written from/to?

No, with raid1 with will always be written to both devices and read also from both alternatively according to even/odd PIDs.

DrJake · December 30, 2021

Thank you again Jorge, can mark this one as resolved.

Lucky I encountered this issue without actually losing the cache data

Edited December 30, 2021 by DrJake

Trying to remove 1/2 cache pool

Recommended Posts

DrJake

Link to comment

DrJake

Link to comment

JorgeB

Link to comment

DrJake

Link to comment

JorgeB

Link to comment

DrJake

Link to comment

JorgeB

Link to comment

DrJake

Link to comment

JorgeB

Link to comment

DrJake

Link to comment

Join the conversation