This morning I found all containers and VMS using the BTRFS Raid 1 cache pool irresponsive with numerous log errors, and the syslog was spammed with messages like "Dec 13 06:56:54 NAS kernel: sd 5:0:0:0: [sdi] tag#6 access beyond end of device".
The cache pool is composed of two SATA SSDs :
- sdi : 840 Pro 512GB
- sdh : 860 Evo 500 GB
I quickly understood any write access to the cache pool was reporting errors since Dec 13 06:55:14.
A similar issue has already happened once in 6.9.0 beta25, but today I was clever enough (?) to download diags attached before stopping VM manager and Docker and then reboot. After reboot, I performed a full balance and scrub (no errors) on the pool, then restarted VMs and containers, and everything works fine again. Despite, a parity check was launched after reboot, for whatever reason the shutdown was considered as unclean.
It may be a hardware issue, but I've also always wondered if it was a good idea from me to have a Raid-1 pool with two drives of different capacity, which btw has a reported size of 506GB (!) in the "Main" tab.
Thanks in advance for having a look at the diags and hopefully give me some ideas of how to get rid of this repeated and very worrying instability.