GPU passthrough issue when switching between VMs


Recommended Posts

Hi,

I have a weird issue with GPU passthrough.

The short version: I have two VMs (Win 11 and PopOS) with the same GPU assigned (R9 290X). Each VM works fine, including reboots. However, once I started the PopOS VM once, I can no longer pass the GPU to the Win 11 VM (no display and Code 43 in RDP). The PopOS VM still works fine, but to get the Win 11 VM working again I need to reboot Unraid.

The longer version: I have two R9 290X in my system with these settings:syslinux_conf.thumb.png.23cd6a503d8476b4dcda1e4a71cc9d8a.pngvm_settings.png.18ca59f59d448abae344f92404e188a7.png

sys_devices.thumb.png.6bd65eda60069a68c950adef73fdad12.png

 

I have two Win 11 VMs, one for each GPU. These are the .xmls (identical apart from CPU pinning, vdisk and GPU address). These configurations contain tweaks from various sources for better performance:

Spoiler

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>X299_290X_Win11_1</name>
  <uuid>907b98cb-1942-491d-eb14-61e24cef8943</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 11" icon="windows11.png" os="windowstpm"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <emulatorpin cpuset='1,19'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi-tpm.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/907b98cb-1942-491d-eb14-61e24cef8943_VARS-pure-efi-tpm.fd</nvram>
    <smbios mode='host'/>
  </os>
  <features>
    <acpi/>
    <apic eoi='on'/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <stimer state='on'>
        <direct state='on'/>
      </stimer>
      <reset state='on'/>
      <vendor_id state='on' value='1234567890ab'/>
      <frequencies state='on'/>
      <reenlightenment state='on'/>
      <tlbflush state='on'/>
      <ipi state='on'/>
      <evmcs state='off'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='off'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='invtsc'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='hypervisor'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='pit' present='no' tickpolicy='discard'/>
    <timer name='rtc' present='no' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
    <timer name='kvmclock' present='no'/>
    <timer name='hypervclock' present='yes'/>
    <timer name='tsc' present='yes' mode='native'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/NVMe_Pool/domains/vdisk_X299_290X_Win11_1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:ca:cd:2d'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <tpm model='tpm-tis'>
      <backend type='emulator' version='2.0' persistent_state='yes'/>
    </tpm>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/Data_Pool/Dokumente/BIOS Files/MSI290XLightningStock.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

Spoiler

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>X299_290X_Win11_2</name>
  <uuid>bfdffd67-cde7-9f95-d585-e21f244330a4</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 11" icon="windows11.png" os="windowstpm"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='30'/>
    <vcpupin vcpu='2' cpuset='13'/>
    <vcpupin vcpu='3' cpuset='31'/>
    <vcpupin vcpu='4' cpuset='14'/>
    <vcpupin vcpu='5' cpuset='32'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='33'/>
    <vcpupin vcpu='8' cpuset='16'/>
    <vcpupin vcpu='9' cpuset='34'/>
    <vcpupin vcpu='10' cpuset='17'/>
    <vcpupin vcpu='11' cpuset='35'/>
    <emulatorpin cpuset='2,20'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi-tpm.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/bfdffd67-cde7-9f95-d585-e21f244330a4_VARS-pure-efi-tpm.fd</nvram>
    <smbios mode='host'/>
  </os>
  <features>
    <acpi/>
    <apic eoi='on'/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <stimer state='on'>
        <direct state='on'/>
      </stimer>
      <reset state='on'/>
      <vendor_id state='on' value='1234567890ab'/>
      <frequencies state='on'/>
      <reenlightenment state='on'/>
      <tlbflush state='on'/>
      <ipi state='on'/>
      <evmcs state='off'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='off'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='invtsc'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='hypervisor'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='pit' present='no' tickpolicy='discard'/>
    <timer name='rtc' present='no' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
    <timer name='kvmclock' present='no'/>
    <timer name='hypervclock' present='yes'/>
    <timer name='tsc' present='yes' mode='native'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/NVMe_Pool/domains/vdisk_X299_290X_Win11_2.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:84:e8:bc'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <tpm model='tpm-tis'>
      <backend type='emulator' version='2.0' persistent_state='yes'/>
    </tpm>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x65' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/Data_Pool/Dokumente/BIOS Files/MSI290XLightningStock.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x65' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

And one PopOS VM that uses the same GPU as the first Win 11 VM:

Spoiler

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>pop-os</name>
  <uuid>756547f3-c642-8027-cb32-b564199ff46f</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Linux" icon="linux.png" os="linux"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>24</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <vcpupin vcpu='12' cpuset='12'/>
    <vcpupin vcpu='13' cpuset='30'/>
    <vcpupin vcpu='14' cpuset='13'/>
    <vcpupin vcpu='15' cpuset='31'/>
    <vcpupin vcpu='16' cpuset='14'/>
    <vcpupin vcpu='17' cpuset='32'/>
    <vcpupin vcpu='18' cpuset='15'/>
    <vcpupin vcpu='19' cpuset='33'/>
    <vcpupin vcpu='20' cpuset='16'/>
    <vcpupin vcpu='21' cpuset='34'/>
    <vcpupin vcpu='22' cpuset='17'/>
    <vcpupin vcpu='23' cpuset='35'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-7.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/756547f3-c642-8027-cb32-b564199ff46f_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='12' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/NVMe_Pool/domains/vdisk_X299_290X_popos.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:c6:2a:85'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/Data_Pool/Dokumente/BIOS Files/MSI290XLightningStock.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

Passthrough works for all three of these VMs, including reboots.

Here is the first Win 11 VM working:

2069794453_Screenshot2023-05-09181029.thumb.png.a44a34d68b8a0317e2f5273739437575.png

And here is PopOS working (performance is not good, I assume this is down to the driver in use, haven't looked into that yet):

1120772981_Screenshotfrom2023-05-0917-52-00.thumb.png.3f05178c90d6569aeb25c03eb54e10cf.png

 

The issue appears when I attempt to start the first Win 11 VM when the PopOS VM was started previously. I get no display out and when accessing the VM through RDP I can see a Code 43 on the GPU:

2011015088_Screenshot2023-05-09181335.thumb.png.946a6a88fa3680a94aaca36099d477da.png

 

PopOS as well as the second Win 11 VM with the other GPU still work fine. So the issue is only with the specific GPU that was passed to PopOS. I had the same issue with a Win 10/Ubuntu combination in the past. The GPU passthrough to Windows starts working again after a reboot of Unraid.

 

I have attached diagnostics, but here is the syslog when I start/stop Win 11 -> start/stop PopOS -> start Win 11, with passthrough to Win 11 working fine the first time but Code 43 the second time.

Spoiler

May  9 17:23:31 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:23:31 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
May  9 17:23:55 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:23:55 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
# start win11 vm
May  9 17:23:57 X299 kernel: br0: port 6(vnet3) entered blocking state
May  9 17:23:57 X299 kernel: br0: port 6(vnet3) entered disabled state
May  9 17:23:57 X299 kernel: device vnet3 entered promiscuous mode
May  9 17:23:57 X299 kernel: br0: port 6(vnet3) entered blocking state
May  9 17:23:57 X299 kernel: br0: port 6(vnet3) entered listening state
May  9 17:23:59 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
May  9 17:23:59 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
May  9 17:23:59 X299 kernel: br0: port 6(vnet3) entered learning state
May  9 17:24:01 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:24:01 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
May  9 17:24:01 X299 kernel: br0: port 6(vnet3) entered forwarding state
May  9 17:24:01 X299 kernel: br0: topology change detected, propagating
May  9 17:24:02 X299  script: Started X299_290X_Win11_1 hotplugging.
# start shutdown
May  9 17:24:45 X299 kernel: br0: port 6(vnet3) entered disabled state
May  9 17:24:45 X299 kernel: device vnet3 left promiscuous mode
May  9 17:24:45 X299 kernel: br0: port 6(vnet3) entered disabled state
May  9 17:24:47 X299  script: Stopped X299_290X_Win11_1 hotplugging.
# start popos vm
May  9 17:25:17 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:25:17 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
May  9 17:25:20 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:25:20 X299 kernel: br0: port 6(vnet4) entered blocking state
May  9 17:25:20 X299 kernel: br0: port 6(vnet4) entered disabled state
May  9 17:25:20 X299 kernel: device vnet4 entered promiscuous mode
May  9 17:25:20 X299 kernel: br0: port 6(vnet4) entered blocking state
May  9 17:25:20 X299 kernel: br0: port 6(vnet4) entered listening state
May  9 17:25:22 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
May  9 17:25:22 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
May  9 17:25:22 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:25:22 X299 kernel: br0: port 6(vnet4) entered learning state
May  9 17:25:23 X299 kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x200
May  9 17:25:23 X299 kernel: vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x400
May  9 17:25:24 X299 kernel: br0: port 6(vnet4) entered forwarding state
May  9 17:25:24 X299 kernel: br0: topology change detected, propagating
May  9 17:25:26 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
# stop popos vm
May  9 17:26:13 X299 kernel: br0: port 6(vnet4) entered disabled state
May  9 17:26:13 X299 kernel: device vnet4 left promiscuous mode
May  9 17:26:13 X299 kernel: br0: port 6(vnet4) entered disabled state
May  9 17:26:14 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:26:15 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:26:15 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
# start win11 vm
May  9 17:26:55 X299 kernel: br0: port 6(vnet5) entered blocking state
May  9 17:26:55 X299 kernel: br0: port 6(vnet5) entered disabled state
May  9 17:26:55 X299 kernel: device vnet5 entered promiscuous mode
May  9 17:26:55 X299 kernel: br0: port 6(vnet5) entered blocking state
May  9 17:26:55 X299 kernel: br0: port 6(vnet5) entered listening state
May  9 17:26:57 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
May  9 17:26:57 X299 kernel: vfio-pci 0000:66:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
May  9 17:26:57 X299 kernel: br0: port 6(vnet5) entered learning state
May  9 17:26:59 X299 kernel: pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
May  9 17:26:59 X299 kernel: pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
May  9 17:27:00 X299 kernel: br0: port 6(vnet5) entered forwarding state
May  9 17:27:00 X299 kernel: br0: topology change detected, propagating
May  9 17:27:00 X299  script: Started X299_290X_Win11_1 hotplugging.

 

This appears to me as though the GPU is not properly reset when PopOS shuts down. As far as I understand this is different from the typical AMD reset bug, as restarting VMs works fine normally. Only switching from PopOS to Windows causes the issue. I did install the AMD vendor reset plugin, but that did not change anything.

 

I would appreciate any tips on how to get the GPU to reset properly.

 

Thanks

x299-diagnostics-20230509-2031.zip

Link to comment

I think the amd vendor reset plugin is useless, it shouldn't support the 290x, in fact in your log you have:

May  9 20:25:31 X299 kernel: vfio-pci 0000:65:00.0: Unsupported reset method 'device_specific'
May  9 20:25:31 X299 kernel: vfio-pci 0000:66:00.0: Unsupported reset method 'device_specific'

 

It could be driver specific or hardware specific.

If it's driver specific I can suggest to set the gpu as multifunction in the vms: from your description, the windows driver is able to properly reset the gpu even if audio and video are on different buses, but for some reason linux is not able to put the gpu in a state where windows is able to use it after (hoping that linux is expecting the gpu as a multifunction device).

So, change to:

for pop-os and win11_1:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/Data_Pool/Dokumente/BIOS Files/MSI290XLightningStock.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x66' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

For win11_2:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x65' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/Data_Pool/Dokumente/BIOS Files/MSI290XLightningStock.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x65' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

Edited by ghost82
Link to comment
45 minutes ago, ghost82 said:

I can suggest to set the gpu as multifunction

Thanks for the tip. While that did not change anything regarding the GPU reset, it did massively improve GPU performance in PopOS.
I will try the official AMD driver next to see if that changes anything.

Screenshot from 2023-05-13 14-31-38.png

Link to comment

I was not able to get any display out when switching from the "radeon" driver to "amdgpu". I tried this for PopOS 22.04 and Ubuntu 20.04. In both cases after switching to "amdgpu" I can't shut down the VM properly and need to force stop it. After that, I don't get any display out on that GPU anymore, not even with a Linux with default "radeon" driver. Only a host reboot fixes that.

Edited by midgard00
Link to comment

Well, I can say only, revert back and live with it...consider yourself lucky with these gpus since you are able to reboot windows without crashing all.

With bugged firmwares you can have all crazy things happening; for example I have a wifi usb dongle that doesn't work in a kali linux vm if the last vm I boot is mac os; I need to start a windows vm and then start the linux one.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.