Been running this system for years, with a few upgrades here and there. Been very stable until recently, where every 6+ days I'd notice the Docker service was no longer running. Restarting would fix the issue until another 6+ days would pass. Finally bothered to look at the diagnostics, and the crashing seems to be related to:
WARNING: CPU: 0 PID: 16956 at fs/btrfs/extent-tree.c:3061 __btrfs_free_extent+0x466/0xc02
...
Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space
...
BTRFS error (device sdh1): unable to find ref byte nr 2845564928 parent 0 root 5 owner 40359587 offset 0
Jul 13 07:04:47 Storage kernel: ------------[ cut here ]------------
Jul 13 07:04:47 Storage kernel: BTRFS: Transaction aborted (error -2)
etc.
I see this in 6.12.2 and 6.12.3 logs (attached)
I'm going to try a BTRFS file system check next; the 2 SSDs that make up the cache drive are definitely old, but never had an issue until 6.12.2
Thoughts?
storage-diagnostics-20230715-1605.zip
storage-diagnostics-20230810-1055.zip