VM with GPU passthrough works on first boot, but freezing on subsequent boot-ups


techhit

Recommended Posts

Hi everyone

 

I've got GPU passthrough working on a Window 10 VM, but after I shut it down and boot it back up (or restart the VM), it causes Unraid 100% cpu utilisation, and the VM never starts again.  When I force stop the VM, and disable the GPU passthrough the problem goes away.  But as soon as I switch back to GPU passthrough it locks up again.

 

If I reboot my Unraid host, I can start up the VM without any problems.  It's only on every subsequent restart of the VM.

 

I have PCIe ACS Override=both, and I'm only passing through the GPU and HDMI Audio.

 

Can someone please assist.

 

 

vmlog.txt vmxml.txt

Edited by techhit
Link to comment
  • techhit changed the title to VM with GPU passthrough works on first boot, but freezing on subsequent boot-ups
  • 2 weeks later...

If anyone can help that'd be appreciated.

 

Below is my VM config XML.  As per Space Invader's video above, I have changed the bus, and slot to be consistent for all the devices I'm passing through, as well as setting the multifunction='on' flag.

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Emubox</name>
  <uuid>ec1092b2-0329-59e6-65a1-2b4a2b8369fa</uuid>
  <description>Retro games</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="default.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>6</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='9'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-5.1'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='3' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Emubox/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='writeback'/>
      <source file='/mnt/user/domains/Emubox/vdisk2.img'/>
      <target dev='hdd' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='8' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:30:5c:b8'/>
      <source bridge='br0'/>
      <model type='virtio-net'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/disk1/isos/Cezanne.rom'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x5'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x5'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x7'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x7'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'/>
</domain>

 

Here is my IOMMU groups:

image.thumb.png.66772db24b092b61320979fa8b21af1f.png

Edited by techhit
Link to comment

This is an issue with the reset. Libvirt log says that can't reset 03:00.7 and 03:00.5 without passing through also group 14, containing 03:00.3, but you can't because you have the unraid usb plugged in.

Try to attach the unraid usb to a port of 03:00.4 so you can pass also 03:00.3 and hopefully libvirt will not request to pass also 03:00.4 (but I have doubts that it will work..)

Did you try also to not pass 03:00.7 and 03:00.5?

Edited by ghost82
  • Like 1
Link to comment

Hi,

 

Yes, I've also tried not passing through 03:00.7 and 03:00.5.  And then the error log complains about 03:00.2.

And also, I have just tried passing through the GPU and HDMI audio 03:00.0 and 03:00.1 only, and same result.

 

Here is the log where I pass through everything.  This time I've changed the flash drive from the back to the front, so that's why it's on a different usb root hub.

 

-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x8,chassis=6,id=pci.6,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-root-port,port=0x9,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x1 \
-device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=0xa,chassis=9,id=pci.9,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=10,id=pci.10,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=11,id=pci.11,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=12,id=pci.12,bus=pcie.0,addr=0x1.0x5 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pcie.0,addr=0x7 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 \
-blockdev '{"driver":"file","filename":"/mnt/user/domains/Emubox/vdisk1.img","node-name":"libvirt-2-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=virtio-disk2,bootindex=1,write-cache=on \
-blockdev '{"driver":"file","filename":"/mnt/user/domains/Emubox/vdisk2.img","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk3,write-cache=on \
-netdev tap,fd=34,id=hostnet0 \
-device virtio-net,netdev=hostnet0,id=net0,mac=52:54:00:30:5c:b8,bus=pci.3,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=35,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-device vfio-pci,host=0000:03:00.0,id=hostdev0,x-vga=on,bus=pci.6,multifunction=on,addr=0x0,romfile=/mnt/disk1/isos/Cezanne.rom \
-device vfio-pci,host=0000:03:00.1,id=hostdev1,bus=pci.6,addr=0x0.0x1 \
-device vfio-pci,host=0000:03:00.2,id=hostdev2,bus=pci.6,addr=0x0.0x2 \
-device vfio-pci,host=0000:03:00.3,id=hostdev3,bus=pci.6,addr=0x0.0x3 \
-device vfio-pci,host=0000:03:00.5,id=hostdev4,bus=pci.6,addr=0x0.0x5 \
-device vfio-pci,host=0000:03:00.7,id=hostdev5,bus=pci.6,addr=0x0.0x7 \
-device usb-host,hostbus=3,hostaddr=4,id=hostdev6,bus=usb.0,port=2 \
-device usb-host,hostbus=3,hostaddr=3,id=hostdev7,bus=usb.0,port=3 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-10-30 00:55:14.273+0000: Domain id=3 is tainted: high-privileges
2021-10-30 00:55:14.273+0000: Domain id=3 is tainted: host-cpu
char device redirected to /dev/pts/0 (label charserial0)
2021-10-30T00:56:16.768588Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2021-10-30T00:56:16.785683Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.7, depends on group 15 which is not owned.
2021-10-30 00:56:47.351+0000: shutting down, reason=failed

I'm really not sure what else I can do from here.

 

 

 

Edited by techhit
Link to comment
6 hours ago, techhit said:

Here is the log where I pass through everything.

 

6 hours ago, techhit said:
Cannot reset device 0000:03:00.7

That was what I imagined, most probably all 03:00.X must be passed through, but since you don't have any other usb controller, you can't.

6 hours ago, techhit said:

I'm really not sure what else I can do from here.

For this actual hardware, nothing.

Link to comment
  • 1 month later...

I believe there is an UnRaid app that can update the UnRaid kernel to avoid the reset issues as well.  6.9 and 6.10 have it available albeit in different flavors from what I recall.  Unfortunately due to another issue I had to roll back so can't search the apps to find the exact name for you.  If the above works then great maybe you don't need this.  You might also try searching for Radeon or AMD in the UnRaid apps tab and you should see something come up showing an AMD log that talks about fixing the reset issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.