[SOLVED] Windows 10 GPU Passthrough driver crash


coopooc

Recommended Posts

Well, I think I've exhausted every option, I need some help. 


Problem:

The second I (or windows update) attempts to install Nvidia drivers my Windows 10 VM crashes, attempts to boot a few times and enters Automatic Recovery. 

 

I've seen a hundred of these threads, believe me, I've read them all. Here's what I have:

Unraid 6.7.0

MSI x470 11/06/2018 Bios

EVGA Geforce 1080 FTW Latest vbios

Ryzen 2700x

Other stuff. 

Booting UEFI (no passthrough works without UEFI booting.

 

This has been a long battle so far. I've made slow progress. 

 

Here's what I've tried:

Updated the mobo bios to the reported good version from some other thread here. Apparently the new ones are broken.

I updated the GPU's vbios to the latest version. I couldn't get UEFI to boot with the original vbios.

 

Every conceivable combination of OVMF, SeaBios, Q35, Hyper-Threading on/off. 

I've ended up with Q35.1, SeaBios and HyperThreading on (but off works the same). This is the ONLY combination that will ever output anything to HDMI.

 

I've moved the GPU to different slots, added a second GPU, dumped the vbios and downloaded and modified the vbios from techpowerup. 

I ended up back with just one GPU in the primary slot and am booting with the modified vbios I downloaded rather than my own dump. This is the only one that works. 

 

 

I had previously tried the new pcie-ev stubbing (is that the right term?) option found in this thread. This is not the "old way" but rather the new way. I've since turned that off. 

 

 

Past steps that might be useful to know.

I passed through an old radeon card just fine...on the first try. Cool.

 

With a VNC as the primary and the GTX as a second video card, I managed to find a config that had brought me close enough to get a Code 43 error in the device manager, progress! No HDMI output though. Gave up on that VM. UEFI change got me way closer. 

 

Oh, also, I've tried the ACS overrides and unsafe interrupts. I have both of those disabled again. As above, the UEFI boot seemed to get me closest and I never had an issue with shared groups. 

 

With that, I feel confident that I've ruled out a few things. I don't think it's actually related to Unraid grabbing the GPU. I don't think it's the slot. I don't think it's the MB bios. I'm down to a few potential culprits and I'd love to get some of your expert help on:

 

So, back to the closest I got. No stubbing, GPU primary slot, no secondary GPU, loading techpowerup modded rom, no VNC, Q35, SeaBios, HyperV back on (didn't seem to matter one way or the other). Installed Windows completely through HDMI on the GPU. Awesome! System stable and functional. Display driver is the Windows Basic Display. Load up the virtio drivers for the ethernet card and a few minutes later, Windows Update grabs a new driver, the screen blinks a few times and poof, reboot and then down the automatic recovery path to nothing. 

 

For awhile, I thought it might be related to the MSI demonic sound issue so I made that regedit before installing the Nvidia drivers. Same result. 

 

I'm stuck. 

 

Here's the latest XML for the VM that last died:

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Callisto X5</name>
  <uuid>9d37188a-c81a-0b93-cdcf-c55bff9efa1e</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>14</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='7'/>
    <vcpupin vcpu='6' cpuset='8'/>
    <vcpupin vcpu='7' cpuset='9'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='11'/>
    <vcpupin vcpu='10' cpuset='12'/>
    <vcpupin vcpu='11' cpuset='13'/>
    <vcpupin vcpu='12' cpuset='14'/>
    <vcpupin vcpu='13' cpuset='15'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='14' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Callisto X5/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Win10_1903_V1_English_x64.iso'/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.171.iso'/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:8a:aa:8a'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x1d' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/isos/EVGA.GTX1080.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x1d' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

UPDATE:

This was FINALLY solved by one (or both) of the following:

 

1. I went back to an earlier VBIOS version from techpowerup. I flashed the vbios of the card and then selected that same version in the VM. 

2. It still didn't work on that initial boot but it might have been due to all of my previous fiddling. This command freed up the GPU and from there, everything finally worked:

 

echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

 

 

Thanks to the forum for all the troubleshooting help! 

 

For reference, here's the particular combo that sequence of tips that ultimately got my system to work:

Unraid 6.7.0

MSI x470 11/06/2018 Bios

EVGA Geforce 1080 FTW 2016/11/03

Ryzen 2700x

Virtio 1.171

SeaBios

i440fx-3.1

HyperV-No

Graphics ROM BIOS 2016/11/03

Graphics card primary, no VNC. Setup all done via connected monitor, no VNC.

Sound card NVIDIA passthrough. 

Unraid booting use UEFI otherwise no video output. 

 

This was my 17th configuration combination. Lucky 17. Hope it helps someone else. 

 

jupiter-diagnostics-20190924-2155.zip

Edited by coopooc
Issue Solved
Link to comment

Then yes. I did that. I also did a dump of my own rom too. However, I'm less confident about that working because after I did that dump, I upgraded the firmware on the card so it might be worthwhile to repeat that at some point. My instinct tells me that isn't the issue but what do I know? 

 

Speaking of vios, apparently my card has a switch on it that allows you to switch between two different ones. I've tried it with the switch both ways and it didn't matter but I wish I understood what that was doing. 

 

My latest plan is to try a new combination that I have not yet attempted yet based on another post. Someone said they tried SeaBios with I440fx and it worked. I haven't tried that particular one yet so I figure it's worth a shot. 

Link to comment

Unfortunately, that combination didn't work. The second the drivers loaded I crashed. 

 

I wish I understood what would cause a conflict like this. Feels like I should go back to the vbios. Seems like a driver might freak out if it encountered the wrong vbios. Open to other theories though. 

Link to comment

The problem your describing sounds verbatim like the issue i had when my vbios was no good. Windows would install, and boot up. But right when it would attempt to install the proper video drivers it would black out. Then restart and fail to boot again.

 

So i have a strong feeling it your vbios. I've seen the space invader solution, But have never tried it myself. Having failed to backup and use my own bios file. I ended up using someone else that worked fine ever since.

 

And really its only necessary for when your trying to grab your systems primary video card. Such as in a headless config. Like your trying to do. This is also my configuration.

Link to comment

Thanks. It's really helpful to hear someone else with the exact same issue and puts me on the right troubleshooting path. I'll have to retry the dump again. I do have another video card that I can add back in so it doesn't have to be headless but I want my good passthrough GPU in the x16 slot so I think it'll always be the primary. 

 

Wonder if there are other sources of vbios I could try other than techpowerup. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.