Win 10 VM crasehes with iGPU


Recommended Posts

I have a the following system specs

 

ASRockRack E3C252D4U

Intel i5-10400

64 GiB Mem

 

The server is very stable and runs very well, I am having problems with a Win 10 vm that I passed through a second GPU (Intel UHD 630) .

So I am able to pass through the igpu with no problems to the vm. After installing the drivers I am able to see the igpu and the system works fine.  The problem starts after the vm runs for a while I get BSOD crashes randomly sometimes after 4 hours sometimes 8.  Looking at the memory dumps I get the following errors which indicates its a driver problem. So I have tried the intel drivers as well as the windows driver which both have the same outcome. 

SYMBOL_NAME: nt!MiIdentifyPfn+729 MODULE_NAME: nt STACK_COMMAND: .cxr; .ecxr ; kb IMAGE_NAME: ntkrnlmp.exe BUCKET_ID_FUNC_OFFSET: 729 FAILURE_BUCKET_ID: AV_nt!MiIdentifyPfn OS_VERSION: 10.0.19041.1 BUILDLAB_STR: vb_release OSPLATFORM_TYPE: x64 OSNAME: Windows 10

I have also tried Machine: Q35-7.1 as well as i440fx-7.1

Bios: OVMF TPM and OVMF

Below is my current XML

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='27'>
  <name>BI-Emby</name>
  <uuid>402ae7a8-c33e-7460-7e7e-fb3e2b0dff78</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>25165824</memory>
  <currentMemory unit='KiB'>25165824</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='8'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='9'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='10'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='11'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-7.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi-tpm.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/402ae7a8-c33e-7460-7e7e-fb3e2b0dff78_VARS-pure-efi-tpm.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.225-2.iso' index='3'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/nvme0n1' index='2'/>
      <backingStore/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/sdd' index='1'/>
      <backingStore/>
      <target dev='hdd' bus='sata'/>
      <alias name='sata0-0-3'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x18'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x19'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x1a'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x1b'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='6' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:4c:a0:66'/>
      <source bridge='br0'/>
      <target dev='vnet50'/>
      <model type='virtio-net'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:b4:27:38'/>
      <source bridge='br2'/>
      <target dev='vnet51'/>
      <model type='virtio-net'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-27-BI-Emby/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <tpm model='tpm-tis'>
      <backend type='emulator' version='2.0' persistent_state='yes'/>
      <alias name='tpm0'/>
    </tpm>
    <graphics type='vnc' port='5902' autoport='yes' websocket='5702' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Some weird stuff that happens is sometimes I cannot RDP ino the vm have to use VNC console or reboot the machine for rdp to start working.

 

Are there recommended settings for the vm in unraid for a setup like this?

Any suggestions what I can try next I am running out of ideas and any help would be greatly appreciated. 

 

 

 

Link to comment
  • 1 month later...

I have ran a memtest a few weeks back with 4 passes and the test came back clean.  Regarding the diagnostics I will have to reproduce the vm and then will post the diagnostics

 

 

Something to note is that when the vm is running without the iGPU passed through the vm is stable, the problem seems to start when I pass the iGPU through.  Below is the memory dump from the vm .

 

SYMBOL_NAME: nt!MiDeletePteRun+17da MODULE_NAME: nt STACK_COMMAND: .cxr; .ecxr ; kb IMAGE_NAME: ntkrnlmp.exe BUCKET_ID_FUNC_OFFSET: 17da FAILURE_BUCKET_ID: 0x1a_41790_nt!MiDeletePteRun OS_VERSION: 10.0.19041.1 BUILDLAB_STR: vb_release OSPLATFORM_TYPE: x64 OSNAME: Windows 10 FAILURE_ID_HASH: {27ce5d5a-97b1-faf0-183f-c86809269d56} Followup: MachineOwner

Link to comment
  • 1 month later...

From your diagnostics, you're running a 10th Gen Intel (10400)

 

There's a quirk with the motherboards for the 10th Gen where the motherboard manufacturers all decided to effectively do a significant overclock on the CPUs out of the box without any user intervention.

 

Namely, they would set the TDP of the CPUs to be "Auto", and instead of setting it to TDP of the CPU installed (what you would think it would do), auto in this case meant unlimited or 4096 Watts.  In your case, the CPU is 65Watt.

 

The rather extreme auto setting that the motherboard is using has the effect of an overclock because it overrides the internal settings on single core turbo boost, effectively allowing all 12 cores to run full tilt all the time with no regard to the temperatures internally.  Even with the best cooling available this has the effect of introducing big instability into any system.

 

See this post https://forums.unraid.net/topic/100360-z490-multi-core-enhancement/#comment-926134, watch the video and then change the TDP within the BIOS to 65Watt.  Limetech support has seen this issue several times.  All Overclocks introduce instability.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.