This morning I found all containers and VMS using the BTRFS Raid 1 cache pool irresponsive with numerous log errors, and the syslog was spammed with messages like "Dec 13 06:56:54 NAS kernel: sd 5:0:0:0: [sdi] tag#6 access beyond end of device".
The cache pool is composed of two SATA SSDs :
- sdi : 840 Pro 512GB
- sdh : 860 Evo 500 GB
I quickly understood any write access to the cache pool was reporting errors since Dec 13 06:55:14.
A similar issue has already happened once in 6.9.0 beta25, but today I was clever enough (?) to download diags attached before stopping VM manager and Docker and then reboot. After reboot, I performed a full balance and scrub (no errors) on the pool, then restarted VMs and containers, and everything works fine again. Despite, a parity check was launched after reboot, for whatever reason the shutdown was considered as unclean.
It may be a hardware issue, but I've also always wondered if it was a good idea from me to have a Raid-1 pool with two drives of different capacity, which btw has a reported size of 506GB (!) in the "Main" tab.
Thanks in advance for having a look at the diags and hopefully give me some ideas of how to get rid of this repeated and very worrying instability.
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.