Greetings,
I've been an unRAID user for a couple months now. My use case for it is simple. It is a backup datastore for my main NAS (Syno), and it is also my hypervisor. At this time, I run 5 VMs, 2 of them Windows, 3 of them Debian (on my cache SSD). I'm using the default settings for all VMs, except that I use qcow2 instead of raw to make moving the vdisk files off the cache drive quicker.
Windows VMs seem to work fine.
Debian VMs seem to work fine. Except for one: my Confluence VM.
On my Confluence VM, after some arbitrary amount of time has passed, I start receiving the NMI messages via SSH. The number varies (20, 21, 31, etc). These occur exactly every 30 seconds. NMI messages are usually due to a hardware problem. I don't have a hardware problem. I know this because I've spent the last two days testing it. It's not a physical hardware problem. I did discover what is causing these messages, though.
Confluence is a jumbled up mess of Java and Apache. So I wanted to blame Confluence. But it isn't the problem, either. I didn't exonerate Confluence until I attempted to set up Confluence on a CentOS VM. Before you set up Confluence, you set up PostgreSQL. NMI errors started occurring after PostgreSQL install, but before Confluence install. I was able to replicate this on my Debian VM. These distros include old versions of PostgreSQL, so I installed version 9.6.2 from the official repo. Same thing happens. Don't think it's a PostgreSQL bug, but PostgreSQL is without a doubt causing the problem. I've tried every combination of VM settings imaginable (within unRAID) to remedy these errors, nothing helps.
These NMI errors don't seem to cause anything bad to happen. They just making using SSH kind of a chore. Here is how you stop those messages on Debian (may work on CentOS, did not test):
sudo sh -c "echo 'kernel.nmi_watchdog=0' >> /etc/sysctl.conf"
Reboot. No more error messages. You can check the current status of watchdog by using:
cat /proc/sys/kernel/nmi_watchdog
Will return 0 after you reboot, and should from that point onward.
I've seen other posts on this issue in this forum with no solution. I saw someone mention disabling watchdog, but didn't say how. This is only a workaround, but it's the best you can do, for now.
I'm pretty certain this is a KVM bug. Any ideas from anyone who knows more about this stuff than I do?