I've suddenly started having problems whenever I try to launch a VM. My server was rock solid for well over a year until now.
When I launch a VM, shortly after the guest OS boots, the web UI shows 6 or 7 CPU threads (random ones, not necessarily the ones pinned to the VM) spiking and staying at redline. Seconds later, the web UI becomes unresponsive, the entire system locks up, and you've got no choice but to power-cycle.
I've attached my diagnostic file and a couple of p
For posterity, I'd like to report that I solved the problem.
Using the smartctl / nvme, I discovered that even though the disk's main temperature was within the acceptable range, one of the disk's secondary temperature sensors (labeled Temperature Sensor 5) was reading 60-64c at idle and 70c or higher under load. This is most likely the temperature of the controller. Apparently, most nvme controllers begin throttling at 70c, so the I/O errors make sense.
I moved the disk