Tried to add a 500GB SSD as a second drive (btrfs cache pool) to my existing 250GB SSD cache with the eventual goal of transitioning all of the data to the 500GB drive. After 24+ hours of waiting I finally pulled down some logs and started digging into the issue.
When I first attempted to debug the issue, I could see that there was a btrfs balance command running with 0% progress that couldn't be stopped. I couldn't pause/cancel that balance.
What's the preferred path out of this? I was considering manually removing the sdh drive from the btrfs pool, but wasn't sure if that would just give me more trouble...
FYI I would include full diagnostics but now my CPU gets pegged to 100% via kworker (self-detected stall CPU : btrfs_async_reclaim_data_space) when I try to start the array, so I only have the unredacted diagnostics. I've attached a reduced syslog from my initial diagnostics.
root@unRAID:~# btrfs device stats /mnt/cache
[/dev/sdg1].write_io_errs 0
[/dev/sdg1].read_io_errs 0
[/dev/sdg1].flush_io_errs 0
[/dev/sdg1].corruption_errs 0
[/dev/sdg1].generation_errs 0
[/dev/sdh1].write_io_errs 0
[/dev/sdh1].read_io_errs 0
[/dev/sdh1].flush_io_errs 0
[/dev/sdh1].corruption_errs 0
[/dev/sdh1].generation_errs 0
root@unRAID:~# btrfs fi show /mnt/cache
Label: none uuid: 8611f812-bd82-4599-bba5-f35f17010bf5
Total devices 2 FS bytes used 157.14GiB
devid 1 size 232.89GiB used 232.89GiB path /dev/sdg1
devid 2 size 465.76GiB used 0.00B path /dev/sdh1
root@unRAID:~# btrfs fi df /mnt/cache
Data, single: total=230.87GiB, used=156.50GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=653.88MiB
GlobalReserve, single: total=211.67MiB, used=0.00B
syslog-reduced.rtf