Cache drive (BTRFS errors) mirroring issues.

dimitriz · October 27, 2023

So, had this bright idea a week ago. My Cache drive was an NVME 1TB for the last year at least. I had a similar drive laying around along with a USB-C to NVME adapter... added that drive as a Cache 2. (Thinking I can have 2 separate cache drives and move my VM images to that drive. Of course I didn't realize that Unraid puts these in a RAID1 config, somebody should seriously put a warning popup when people do that!) After rebooting I was getting error log full and a LOAD of BTRFS errors and nothing was starting.

I shut the server down and move the 2nd cache drive to the PCI-e card. Somehow things started to work normally... fast forward 5 days later... log is at 92%.... I check and it's the similar BTRFS errors.

At this point I am done with the 2nd cache drive...

If I remove the 2nd drive from the pool would that be all needed to just go back to a single cache drive? OR do I need to rebuilt everything?

Thanks!

tower-diagnostics-20231027-0759.zip

dboonthego · October 27, 2023

Should be able to remove the drive and balance. Under the cache disk settings, choose convert to single mode.

https://docs.unraid.net/unraid-os/manual/storage-management/#removing-disks-from-a-multi-device-pool

dimitriz · October 27, 2023

1 hour ago, dboonthego said:

Should be able to remove the drive and balance. Under the cache disk settings, choose convert to single mode.

https://docs.unraid.net/unraid-os/manual/storage-management/#removing-disks-from-a-multi-device-pool

Thanks. I found the option. Selected convert to single and applied. Errors seem to have stopped, however now none of my dockers want to fire up.., 403 error on all.

Do I need to restart or remove the 2nd disk first?

tower-diagnostics-20231027-0948.zip

Edited October 27, 2023 by dimitriz

JorgeB · October 27, 2023

One of the NVMe devices dropped offline, reboot an post new diags after array start.

dimitriz · October 27, 2023

35 minutes ago, JorgeB said:

One of the NVMe devices dropped offline, reboot an post new diags after array start.

Looks like it's running Balance on '/mnt/cache' is running 81 out of about 761 chunks balanced (84 considered), 89% left

tower-diagnostics-20231027-1107.zip

JorgeB · October 27, 2023

See if the balance to single finishes and post new diags after that.

dimitriz · October 28, 2023

11 hours ago, JorgeB said:

See if the balance to single finishes and post new diags after that.

Ok, that took a while.

It rebalanced, I still noticed the first line still said RAID1. I set it to Convert to single and it runs, but then it says I should Balance again... it's like in a loop.

tower-diagnostics-20231027-2200.zip

This is what I get after running "convert to single mode"

Edited October 28, 2023 by dimitriz

JorgeB · October 28, 2023

Looks like the balance is failing due to the existing corruption from the device dropping earlier, suggest backing up what you can and re-format the pool.

dimitriz · October 28, 2023

1 hour ago, JorgeB said:

Looks like the balance is failing due to the existing corruption from the device dropping earlier, suggest backing up what you can and re-format the pool.

What's really weird is that everything seems to be working, Dockers, VMs....

Either way, I am moving everything to the array now. Will break cache when done and reconfigure it all.

Thanks for your help.

dimitriz · October 29, 2023

Moved everything off... stopped the Array and removed both NVMEs as separate pools.

Re-added them started Array and Formatted.

Tested copying some data back to each NVME... ~~seems like errors returned.~~ No errors, I clicked on the log for each of the NVMEs and it was just fast loading all the RED in the old logs. Needless to say freaked me out.

tower-diagnostics-20231028-2000.zip

Edited October 29, 2023 by dimitriz

JorgeB · October 29, 2023

It would be good to reboot to clear the logs.

Cache drive (BTRFS errors) mirroring issues.

Recommended Posts

dimitriz

Link to comment

dboonthego

Link to comment

dimitriz

Link to comment

JorgeB

Link to comment

dimitriz

Link to comment

JorgeB

Link to comment

dimitriz

Link to comment

JorgeB

Link to comment

dimitriz

Link to comment

dimitriz

Link to comment

JorgeB

Link to comment

Join the conversation