Nvidia driver installation crashes VM


Recommended Posts

Hi all.

 

First time poster here. So I set up my first unraid server about a month ago with version 6.8.0. I set up my first windows 10 vm and passed through a EVGA GTX1060 3GB.

 

Everything worked fined but then I updated to 6.8.1 (still worked) and then recently 6.8.2. But when I tried to restarted my VM it wouldn't start (it would still show the green started arrow).  So I plugged in a monitor to the graphics card (was using rdp before) and saw that it was in a constant reboot loop. 

 

Video Here: https://streamable.com/ezzry

 

I am able to boot into safe mode and revert the driver for the gtx1060 back to the basic display adapter driver and it would boot normally. Once I try to install the nvidia drivers or just have windows auto install an appropriate driver it would crash the vm and auto loop reboot.

 

 

Here's what I've tried:

1. Reverting to 6.8.1 = didn't work

 

2. Dumping the vbios of the gpu (--youtube.com/watch?v=mM7ntkiUoPk)

- I also tried this:

- echo 0 > /sys/class/vtconsole/vtcon0/bind
- echo 0 > /sys/class/vtconsole/vtcon1/bind
- echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

But unraid uses the onboard gpu so it had no effect since the gtx isn't binded to it

 

3. Tried a bunch of different combinations of Seabios, OVMF, i440fx and Q35 = nothing worked same problem

 

4. Fresh install of windows 1809 and 1909 (from my searching I saw there was some bug with rdp and newer versions of windows) = didn't work regardless of what version of windows I tried. It had the same driver install problem.

 

5. Tried setting VNC as the main graphics and the gtx as the 2nd gpu but it would load then give a BSOD error (video tdr failure)

 

I've been battling with this for about 2 days now and I'm fresh out of ideas. I've been through a lot of different forum posts hoping to find a solution but nothing seems to work. 

 

Any help would be appreciated. If there is any more information you need on my end, please let me know.
Thanks in Advance.

 

 

Image of my template: https://ibb.co/BnJ4ddQ

Here is the xml

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10</name>
  <uuid>647f1c37-5e4a-ca99-f154-ebe43a93f27a</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='19'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/647f1c37-5e4a-ca99-f154-ebe43a93f27a_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk1/domains/Win10 Plex Streaming VM/vdisk1.img'/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:59:7d:83'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc018'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x413c'/>
        <product id='0x2003'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

 

macserver-diagnostics-20200206-2339.zip

Edited by macavelly
Added Diagnostics
Link to comment

So just to add a few details.

 

I removed the gpu and just VNC into the VM then I removed all the nvidia drivers.

 

I then shut down the VM and added the GPU back. With a monitor plugged into the gpu, it booted up and I try to manually install the nvidia drivers. Once the driver is installed ( where normally the screen would blink and the resolution be fixed, because the appropriate drivers are installed) instead the vm crashes and reboots itself so show in the video https://streamable.com/ezzry

 

I then tried using VNC as the first Graphics Card and the 1060 as the second. It boots up and as seen in the photos attached, I can see the 1060 in the device manager and task manager and it seems to be working fine. However like in a minute or so it crashes with the BSOD as shown in the picture attached.

 

Also when it crashed the cpus are maxed out for some reason.

 

I'm really lost as to what else to try.

 

Thanks.

1.jpg

2.jpg

3.jpg

4.jpg

Link to comment

Bump.

 

A few other things I've tried are to remove the 1060 and place it in another pcie slot on the motherboard. I am able to boot the VM with a monitor plugged into the gpu and eveything works fine with the Microsoft display driver but as soon as I try to install the nvidia driver's the vm crashes and the CPUs assigned to it are pinned at 100%.

 

I did also pass in both the GPU and the GPU Audio part and did the multifunction xml edit as shown in the spaceinvader video.

 

I've done numerous clean windows installs too but all crash when I try to install the nvidia drivers.

 

 

I really don't know what else to try.

 

😪

 

Edited by macavelly
Link to comment
  • 1 year later...
  • 2 weeks later...

I have the same problem.

 

Just upgraded from Unraid 6.8.3 to 6.9.1

 

Any VM with nvidia GPU passthrough crashes when the nvidia driver loads.

 

If I revert to 6.8.3 - it works again - so I know its not a faulty GPU.

 

It happens on Ubuntu and Windows 10 VM - fine until nvidia driver tries to activate.

 

I have attached diagnostics in the event it may help, as well as the two new VM's i created from scratch to test with.

 

Cheers

 

Rod.

tower-diagnostics-20210405-1732.zip Windows 10.xml Ubuntu.xml

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.