Issues with AMD GPU Passthrough after upgrade to 6.10.3


Recommended Posts

I upgraded my server from 6.9.2 to 6.10.3 to allow for support Windows 11. I use my server as a home lab and need to get more familiar with Windows 11 for work, so a VM was the best option. 

 

However, after the upgrade. I was unable to get my RX 570 to passthrough to Windows VMs. It would hang and run a single core at 100% until I kill the VM.

 

Also, note my server will not boot legacy mode, not 100% sure if this is due to an issue with the flash drive or the B350 motherboard bios limitations.

 

I have tried both a primary GPU and as secondary. No difference in behavior. Replaced it with a temporary Quadro to determine if that would work, which it does.

Here is the XML for the VMs I tried:
Winsd
<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>JP-W10-VM2</name>
  <uuid>4919473d-6986-4e32-045f-94a3c13ff796</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <vcpupin vcpu='8' cpuset='6'/>
    <vcpupin vcpu='9' cpuset='22'/>
    <vcpupin vcpu='10' cpuset='7'/>
    <vcpupin vcpu='11' cpuset='23'/>
    <emulatorpin cpuset='22-23'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-5.1'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact' check='none'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='clwb'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaves'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='clzero'/>
    <feature policy='require' name='wbnoinvd'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='require' name='rdctl-no'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='mds-no'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='svm'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/VM-OS-Disks/JP-W10-VM2/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk1/domains/JP-W10-VM2/vdisk2.img'/>
      <target dev='hdd' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='15' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='15' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='16' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='16' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='17' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='17' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='18' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='18' port='0x19'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <controller type='pci' index='19' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='19' port='0x1a'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
    </controller>
    <controller type='pci' index='20' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='20' port='0x1b'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
    </controller>
    <controller type='pci' index='21' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:ef:b8:a1'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>
 

Link to comment

Hey, did you find a solution? I have the same problem with one VM, but with a Geforce 2070.

And another VM Container is starting but has no screen. Windows doesnt boot. When i enable VNC Mode and disable the GPU passthrough the VM is working fine. 

 

Michael

Link to comment
23 hours ago, Mika said:

Hey, did you find a solution? I have the same problem with one VM, but with a Geforce 2070.

And another VM Container is starting but has no screen. Windows doesnt boot. When i enable VNC Mode and disable the GPU passthrough the VM is working fine. 

 

Michael

I think it's caused by recent Windows 11 update, not due to unraid. 

Link to comment
On 6/28/2022 at 6:11 PM, Chenhan said:

I think it's caused by recent Windows 11 update, not due to unraid. 

The VM, which is starting up but without a picture still has Windows 10 installed. Other VMs which are configured with a newer machine (i440fx-5.1 for example) are working fine and are running Win11

Edited by Mika
Link to comment
  • 1 month later...

Have you guys found a solution to this? I have a AMD Radeon RX 5700XT and I have this issue with both Windows 10 and Windows 11. I can install the VM just fine and I see picture on my monitor throughout the entire installation and windows setup so I assume that my GPU passthrough is working? 

 

However, as soon as I try to install the AMD driver one of my CPU's SPIKES to 100% and the screen blacks out:

517826610_ScreenShot2022-08-18at11_30_40PM.thumb.png.2d78907cb2230763d0b11c6b1fe8da9d.png

 

 

I then have to force stop the VM and when I try to restart it I get stuck on the loading screen (the spinning circle doesn't move):

IMG_3819.thumb.jpg.5350c934be54658f9f4301d0e0d6036e.jpg

 

One of my CPU cores also gets stuck at 100% again until I force stop the VM.

 

Link to comment

Guys, I think I actually may be getting somewhere... I followed this video: 

 

 

Particularly minute 4:00 - 8:45. Once I created a new VM with that change I was able to install the AMD driver without crashing my system. Still testing to make sure that this was really the culprit but just giving everyone a heads up in case this helps.

 

Link to comment
  • 2 months later...

I am having a very similar issue after upgrading from 6.9.2 to 6.11.1 but with a ubuntu vm using seabios. I'm trying to passthrough radeon HD 6770 video card to a ubuntu vm using seabios and after the vm starts i get a black screen and 1 cpu core is pegged at 100% percent. If i recreate the  vm with OVMF the vm still shows a black screen but i do not have the 100% cpu core problem happening. Still in Both Cases the VM will not Load the OS. All was Working before the upgrade to 6.11.1 . I did Downgrade back to 6.9.2 and had some trouble getting the vms to boot correctly but after vm manager crashed and was re-enabled all is working again. I will Attach my diagnositcs before i downgraded. Hopefully Someone has some insight into this issue.. Thanks. 

dnason-diagnostics-20221102-2050.zip

Link to comment
On 11/4/2022 at 10:12 AM, Brownboy said:

I am having a very similar issue after upgrading from 6.9.2 to 6.11.1 but with a ubuntu vm using seabios. I'm trying to passthrough radeon HD 6770 video card to a ubuntu vm using seabios and after the vm starts i get a black screen and 1 cpu core is pegged at 100% percent. If i recreate the  vm with OVMF the vm still shows a black screen but i do not have the 100% cpu core problem happening. Still in Both Cases the VM will not Load the OS. All was Working before the upgrade to 6.11.1 . I did Downgrade back to 6.9.2 and had some trouble getting the vms to boot correctly but after vm manager crashed and was re-enabled all is working again. I will Attach my diagnositcs before i downgraded. Hopefully Someone has some insight into this issue.. Thanks. 

dnason-diagnostics-20221102-2050.zipUnavailable

I'm having that exact same issue on Ubuntu with my r580x I didn't look at my CPU usage to notice if it was the same, I had to roll back to 6.9.2 and they fired right up. the last entry in the log always stuck at "char device redirected to /dev/pts/0" , If I had time I'd upgrade and try making a new VM instance but I need this VM for work so I'm kinda stuck on 6.9.2 for a while I guess. :(

Link to comment
  • 2 months later...

Guys, any progress on that? I have the same issue with passing through RX550 into VMs (SeaBIOS, both i440fx/Windows10 and Q35/Ubuntu). Everything working fine on 6.9.2, but not when I upgrade to 6.11.5. I had to downgrade. Something is breaking KVMs. When this will be fixed?

Link to comment

I am still not sure what is the core issue here, but I have a workaround (not a complete fix though). After upgrade from 6.9.2 to 6.11.5 all of my VMs that had RX550 passed through got:

- black screen

- 1 core stuck at 100%

and nothing was happening.

 

To have them running again I did the following things:

1. Backup both VM disks and XML files.

2. Deleted VMs and created them again with only one GPU - virtual. I use SeaBIOS in all of them and I switched to Q35 (7.1) for both my Ubuntu and W10 based VMs (I had issues with i440fx).

3. Added the same VM disks to each VM.

4. Run them and via VNC, I deleted all the GPU drivers (AMD Cleanup Utility for Windows and followed removal procedure for Ubuntu here).

5. Reboot and then shutdown.

6. Added RX550 as a second GPU (+audio) and started VM.

7. Installed the newest drivers from the AMD website (followed by step by step installation instructions for Ubuntu). At some point (both W10 and Ubuntu) of driver installation I could see a second display via RX550 GPU - that was really good :) Reboot.

8. Set the monitor as my main display in display settings and virtual 'monitor' as a secondary and I set the displays layout in a way that the virtual one is in the top-right corner of the main one.

9. Reboot.

10. Backup :)

 

After that I always have GPU passthrough running perfectly. This is not ideal, since the second 'screen' is still there are using some of the resources, but I never got it working when using only one GPU (external, passthrough), without the virtual one. When I did that I always ended up with black screen again and when doing Force Shutdown of the VM, something was breaking my VM and I ended up with 'No Bootable Disk' text when starting VM again (I needed to create a fresh VM with the same disks). As I said, this is not ideal, but hopefully would allow some of you (and me) to stay with the latest Unraid version and still have their GPU passthrough in VMs without getting black screen.

Link to comment
  • 4 weeks later...
On 8/19/2022 at 12:51 AM, venicenerd said:

Particularly minute 4:00 - 8:45.

 


Thanks for linking this! I just ran into the same issue with an existing VM and this fixed it beautifully. My passthrough had worked for a couple of years without multifunction, although I've had occasional driver issues before. The VM started running into "code 43" after I upgraded Unraid from 6.9.2 to 6.11.5. I guess some part of KVM or VFIO has become more strict about the config now...

 

(I kept backups of earlier XML configs for this VM and I was able to verify that I never needed multifunction on this video card passthrough in the past)

[EDIT:] This adding multifunction to the VM config was likely not a fix for my issue. It worked once, after I had disabled autostart on the VM, to edit the config. After turning autostart back on, the VM wasn't able to initialize the video card properly and "code 43" returned. The XML hadn't changed, multifunction was still in there. I still had a custom VBIOS ROM specified, which may not be needed anymore. I removed it now, and will test autostart as I need to.

Edited by convergence
initial fix wasn't as good as I thought
Link to comment
  • 1 month later...
On 2/15/2023 at 12:19 AM, convergence said:


Thanks for linking this! I just ran into the same issue with an existing VM and this fixed it beautifully. My passthrough had worked for a couple of years without multifunction, although I've had occasional driver issues before. The VM started running into "code 43" after I upgraded Unraid from 6.9.2 to 6.11.5. I guess some part of KVM or VFIO has become more strict about the config now...

 

(I kept backups of earlier XML configs for this VM and I was able to verify that I never needed multifunction on this video card passthrough in the past)

[EDIT:] This adding multifunction to the VM config was likely not a fix for my issue. It worked once, after I had disabled autostart on the VM, to edit the config. After turning autostart back on, the VM wasn't able to initialize the video card properly and "code 43" returned. The XML hadn't changed, multifunction was still in there. I still had a custom VBIOS ROM specified, which may not be needed anymore. I removed it now, and will test autostart as I need to.

 

I have turned autostart back on today, but haven't properly tested it yet. The changes I've made were removing the custom VBIOS ROM and installing the AMD Vendor Reset plugin. My VM can now restart without rebooting the host machine and I'm optimistic that it will be able to autostart again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.