After anywhere from 30mins to 7+days, Unraid becomes almost completely unresponsive until hard shutdown.
This started happening with some 6.12.x version iirc. after almost 2 years of flawless operation.
What still works when "unresponsive"
responds to pings
responds to bare-metal keyboard input but logging in times out
wireguard vpn
What I have tried so far
disable C-states
slightly decrease vm.dirty_background_ratio and vm.dirty_ratio
disable expo profile
running in a minimal working scenario with only 5 dockers and no vms
play with quite a few more settings (1 at a time), which i should have all reverted due to not helping
System hardware
MSI B550M Pro-VDH
Ryzen 5600g
64GB RAM
Samsung 980 Pro 1TB
Samsung 870 QVO 4TB
Software setup
Unraid v6.12.4
Plugins: CA, Active Streams, System Temp, Fix Common Problems, FolderView, Tips and Tweaks, UD, UD+, Unraid Connect, User Scripts
1TB cache
4TB array
2 auto-start ubuntu VMs, 1 auto-start windows VM
several docker containers including typical services like nextcloud behind swag
Syslog server recording
The attached syslog was recorded during following sequence;
Normal operation for 7 days (cut from log except for last 2 entries)
15:01 unresponsive
15:05 hard shutdown
15:43 temporary unresponsiveness (just ~2mins, first time ever noticed by us in this fashion)
15:56 unresponsive
syslog_oct_2.txt server-diagnostics-20231002-1714.zip