Win10 VM with Nvidia GPU freezes System


Recommended Posts

Hi,

I am new to Unraid and currently trying to set up two Win10 gaming VMs on my system.

System specs:

 

CPU: Ryzen 1700X

RAM: 16GB

Mobo: AsRock X370 Professional Gaming

GPU 1: GTX 1080ti (for VM 1)

GPU 2: RX 480 (for VM 2)

PSU: beQuiet! 850W DPP 11

 

Unraid Version: 6.7.0-rc6

Plugins:

Unassigned Devices (ver 2019.03.30)

Community Applications (ver 2019.03.30a)

Fix Common Problems (ver 2019.03.26)

SwapFile (ver 2015.09.21) set to 16GB on the Array

Tips and Tweaks (ver 2019.03.29)

Dockers:

binhex-krusader

 

Drives:

1. Samsung EVO 256 GB as Array Disk, vDisk of VM 2 is located here

2. Samsung SM951 128GB, with Win10 installation, set as vDisk for VM1

3. 2 other SSDs, both passed to VM1

 

 

Before setting up Unraid, I had a Win10 installation on the SM951. The whole drive is set as vDisk for VM1, so that I can keep this installation. VM1 also gets two other SSDs passed, with games and programms on them. VM1 gets 8 Threads, 8GB of RAM and the 1080ti.

VM2 gets 8 Threads, 7GB of RAM, the RX 480 and a 100GB Vdisk on the Array.

 

VM2 works fine.

VM1 works fine, until the system randomly freezes. Theese freezes happen only when VM1 is running. There doesn't appear to be any specific trigger for the freeze. Sometimes it happens while loading windows, sometimes while on desktop, sometimes while browsing or stress testing the GPU, with or without VM2 running. When the freeze happens, the whole system freezes, so both VMs freeze and the WebUI doesn't respond. I have to reset the system when that happens. Usually the freeze happens within 15 min from starting VM1, longest time I had it running is around 1:30 hours.

 

What I have tried so far (chronologically):

Installing SwapFile, so the system doesn't run out of memory.

Installing Fix Common Problems, doesn't show anything.

Installing Tips and Tweaks, set vm.dirty_background_ratio to 1 and vm.dirty_ratio to 2, read somewhere it helped with a similiar problem. Didn't.

Updated Unraid from 6.6 to 6.7.

Tried different combinations of i440fx/Q35 Machine and OVMF/SeaBIOS, all combinations that worked had freezing.

Set up a fresh Win10 VM with vDisk on the Array, passed the 1080ti to it. Still freezes.

Pass 1080ti without passing Audio Controller.

Reducing PCIe speed to Gen 2 and tweaking some BIOS settings.

Allow unsafe interrupts.

Updated BIOS to latest Version.

 

Some reasons I can rule out:

PSU not sufficient: had both VMs running 3D stress tests, System was pulling 700W. System also freezes if VM2 is not running.

Unstable OC: no OC on GPUs, 3D stress test running without problems. CPU and RAM are OCed, but have been running like this for more than a year without problems. I also tried lowering RAM speed, without effect. Both VMs run Prime95 without problems.

Heat: Whole system is watercooled, also freezes happen when system is idleing too.

Bad windows intallation: also happens on fresh Win10 VM.

 

 

To me it appears that the problem has to be somehow connected to the 1080ti.

In the attached logfile, these are the dated of the most recent freezes:

Mar 31 19:20:34

Mar 31 18:35:42

Mar 31 17:58:15

There are older ones as well.

Below are the VM configs.

 

I hope someone can help me with this, thaks in advance.

 

 

VM1:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10-1</name>
  <uuid>a1ce7404-65cd-43f6-c214-9aab2eeff8af</uuid>
  <description>Main System</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='5'/>
    <vcpupin vcpu='6' cpuset='6'/>
    <vcpupin vcpu='7' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='8' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/nvme0n1'/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-SMI_DISK_AA000000000000008737'/>
      <target dev='hdd' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-CT1000MX500SSD1_1846E1D76760'/>
      <target dev='hde' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='nec-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:ce:0e:35'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2e' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/disk1/bios/MSIGTX1080Ti.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2e' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0c70'/>
        <product id='0xf00b'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b34'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x2516'/>
        <product id='0x0015'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x8087'/>
        <product id='0x0aa7'/>
      </source>
      <address type='usb' bus='0' port='4'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

VM2:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10-2</name>
  <uuid>feab9934-4f21-6ef2-b35e-0c82d7170468</uuid>
  <description>Secondary System</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>7340032</memory>
  <currentMemory unit='KiB'>7340032</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='8' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10-2/vdisk1.img'/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:c5:25:dc'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2f' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2f' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x31' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

syslog(1)

Edited by midgard00
more info
Link to comment

You have both systems setup to consume nearly 100% of available physical ram, unfortunately lock ups are the expected behavior when you do that...  It is not actually locking up, it is trying to use your swapfile like RAM, because the fact that it is a swapfile is not told to the VM's...  This can cause all kinds of processes to have a timeout cascade...  If you really need to not upgrade the RAM on your system, then set the VM's to use much less physical RAM, and setup swapfiles in the individual VM's...  However there is no real substitute for just having the free RAM to begin with...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.