gusgus

Members
  • Posts

    4
  • Joined

  • Last visited

gusgus's Achievements

Noob

Noob (1/14)

1

Reputation

1

Community Answers

  1. Solved. All this time I assumed that the OOM error was not actually unRAID running out of memory, because I miscalculated the amount of RAM that should be allocated to each VM to prevent an OOM condition. I reset the memory allocation for each VM (correctly this time) and the system no longer crashes.
  2. Additional info: I found in the VM logs the following line at the time of each crash (with different timestamps of course): 2024-03-24 17:41:47.443+0000: shutting down, reason=crashed So it would appear that qemu recognizes a problem and recovers long enough to write to the log. This is the only line after each crash.
  3. Hello all I am getting reproducible out of memory crashes under very specific circumstances, and it appears to be triggered by one VM. I've not run into this on a KVM system before so I'm not sure what to do. Details below, any help is appreciated. I'd be happy to provide further troubleshooting information if anyone could provide a little guidance. EDIT: I have not tried doing this on another VM which I could do if you think it's useful. I expect that this is a problem with unRAID, not a VM. System Setup Hardware specs Dell Optiplex 5080 32 GB RAM Network settings All VMs have network source set to br0 Only one NIC port is a member of br0 All other NIC ports are unused and not a member of br0 (no bonding) Plugins installed Community Applications Fix Common Problems GPU Statistics docker.patch2 Intel GPU TOP Intel GVT-g iSCSI Initiator Nvidia Driver Unassigned Devices Unassigned Devices Plus Unassigned Devices Preclear Offending VM build OS: Linux Mint 20.3 xfce Memory assignment: 20 GB See the attached XML file "OffendingVM.xml" for the VM configuration. This is after the below (minor) changes were made. How the problem has manifested and what I tried I first noticed that the problem appeared to happen during high network usage, what would happen is I would try to copy a 70 GB file from remote SMB share to the local storage and it would crash about 3.5 GB in. I changed the Network Model for the VM from e1000 to virtio. No change in crash behavior. The crash had the following characteristics: VM immediately dropped my SSH connections and would no longer respond to connection attempts unRAID webUI was completely unresponsive and my web browser acted as if the website was down unRAID LAN interface IP address responded correctly to pings as if there was no issue unRAID physical console appeared to work correctly and gave no indication of a problem unRAID behaved normally again after a reboot via either the physical button or the physical console I changed the Network Model for the VM from virtio to virtio-net after reading that virtio-net is most stable. No change in crash behavior. I suspected an issue with the onboard 1 Gb NIC or its driver so I replaced it with a Intel X550 10 Gb PCIe card. A change in crash behavior here. After the swap I was able to copy the file as fast as the remote SMB server could manage (140-500 MBps) without triggering a crash. Now being able to copy files over the network at full speed without a crash, I moved on to computing sha512sum hashes using the linux command "sha512sum". This is a single threaded workload. After 1-2 minutes of this the following happened. Now my working theory is that the NIC swap improved the issue because the new NIC offloads calculations differently, and this was a OOM issue all along. VM dropped my SSH connections unRAID webUI became unresponsive 5-10 minutes later, the webUI suddenly became responsive again. I was able to see that the VM had been shut down. I got a notification from the Fix Common Problems plugin of "Out Of Memory errors detected on your server" It's worth noting that I was running htop in a SSH window from the VM at the time of the crash. htop clearly shows that the sha512sum command became a dead process (it had not finished executing) right before the crash and the memory usage of the VM was no more than 280 MB of 19.5 GB. metaverse-diagnostics-20240324-1101.zip OffendingVM.xml
  4. Please also add how to recover from a working VM booting into EFI Interactive Shell, and the EFI shell not listing the VM disk as a boot device (via map -r). One solution is here, not sure if there are multiple solutions to be listed: