Can't get rid off OOM crashes for windows VM, PLEASE HELP!!!


Go to solution Solved by JonathanM,

Recommended Posts

I'm at my wits end here, I have wasted so much time trying to get this damn windows VM to work. I literally spent half a year trying to get VM passthrough to work and I finally get it to work and now I'm having constant OOM errors. It's outrageous how much time I've sunk into this.

 

Can someone please help me figure out why my Windows 10 VM ALWAYS crashes with Out of Memory errors. Sometimes the OOM error will cite the VM as the source, sometimes it will cite a seemingly random docker as the source. I have more than a sufficient amount of memory for anything I could want to do with this server, it's insane that I'm having OO memory issues on a modern server.

 

I have run the VM with less and more memory to no avail. I have reduced the niceness of all of my dockers. I have switched to isolated threads, solely to try to get this damn VM to work.

 

Can someone please take a look at my diagnostics and try to help me figure out what is going on?

serv-x370-diagnostics-20220702-2013.zip

Link to comment

Here is my VM XML:

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='13'>
  <name>DESK-X370</name>
  <uuid>53c1ad51-0912-f7c2-ba7e-d08b992d4cf2</uuid>
  <description>WIN10PRO (UEFI|GPT) - Q35-6.2</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='22'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='23'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/53c1ad51-0912-f7c2-ba7e-d08b992d4cf2_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/_bin/_vm/images/bob.ombs.modified.win10pex64.v4.8.iso' index='3'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/_bin/_vm/images/virtio-win-0.1.217-2.iso' index='2'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/sda' index='1'/>
      <backingStore/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:7b:84:29'/>
      <source bridge='br0'/>
      <target dev='vnet12'/>
      <model type='virtio-net'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/30'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/30'>
      <source path='/dev/pts/30'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-13-DESK-X370/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='2'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2c' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2c' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
        <address bus='1' device='6'/>
      </source>
      <alias name='hostdev2'/>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Link to comment

I don't have the ability to shut everything down for the amount of time required to verify at the moment but I would hazard to guess that you are correct as typically the errors occur whenever I'm doing some kind of activity in another docker. But it has happened with multiple different docker applications and doesn't seem to be consistent.

 

It seems as if any sort of RAM-heavy activity in another docker seems to trigger an OOM error that very often (but not always) crashes my VM (in addition to whatever process the OOM process kills of course). Just to restate, the VM often crashes even though the OOM finds and kills only a docker process.

 

I just need to find out WHY OOM is being triggered at all when I have 32GB of RAM. WTF IS GOING ON. This is not okay for a server to be constantly killing random processes and crashing VMS. Who knows what essential files have already been corrupted.

Link to comment
  • Solution

Either

add more RAM

limit the amount of RAM your containers are allowed to use

quit using the containers that use too much RAM

set up a swap file on a pool disk (not officially supported, but has worked for some in the past)

 

Keep in mind that the OS unpacks into, and runs entirely from, a RAM disk, so killing processes that are consuming excessive RAM is the only defense against total crashes. It's also possible that you have something writing to RAM, since all the typical OS locations are in RAM instead of on a hard drive. If a file is being written to a location not mounted to one of your array or pool disks, it's writing to RAM.

 

Without you taking the steps to figure out which container(s) are causing the overflow it's tough to help. The VM reserves as much as it's assigned, and that chunk is untouchable to the OS, so even if the VM has free RAM the OS can't use it. Best to reduce the VM RAM to the smallest possible amount, and let the OS manage the rest.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.