VM Hangs on Launch when GPU Passthrough is Used


Recommended Posts

Goal
I just rebuilt my father's computer and he's not very computer savvy. The intent was to put him on a Hypervisor so that I can provision him Hardware as he needs it without him worrying about the physical layer. He Games on occasion, and GPU Mines, so Passthrough is important (which is why we went UnRAID instead of HyperV).

 

Problem

Whenever trying to start the VM, It shows the Tiannacore Logo. Then one of two different things happen. Either

1) If the VM was improperly shutdown previously, itll take a few minutes, but the windows spinning circles will appear and Windows Recovery pops up. Rebooting after recovery brings you to #2

2) The screen hangs at the Tiannacore Logo (before the spinning circles show up) OR The spinning circles show up, spin once, then hangs halfway through the spin. The VM does not recover from this and freezes.

 

I've been able to boot with VNC + 1 GPU a total of 2 times, in which both cases, I rebooted, and despite making no provisioning changes, after the reboot, the VM fails to boot again. Not sure why/how it booted those times, as I was not able to reproduce that success on demand.

 

Host Specs

UnRAID Ver 6.5.1
Asus ROG Maximus X (Has Onboard HDMI + DP used only for UnRAID)

Intel I7-8700k

8GB Kingston Memory (Yes I know its kinda low, I intend to replace it with better/more in the near future)

EVGA 1080ti FTW3

EVGA 1080ti FTW3
 

VM Specs
All 6 CPU Cores, All 12 Threads

4GB RAM

GPU : 1080ti 1.00.0 - HDMI

GPU : 1080ti 2.00.0

Audio : 1080ti 1.00.1 - HDMI

 

Troubleshooting done so far

1) Removing Both Passthrough GPUs and only using VNC allows me to boot normally without fail. Adding one or both cards causes it to fail. Tried many combinations. VNC + GPU 1, VNC + GPU 2, VNC + GPU 1+2, GPU 1, GPU 2, GPU 1 + 2 etc etc

2) Tried with and Without GPU's HDMI Audio to see if that affected it, it did not.

3) I had an issue previously with a different Rig that had similar symptoms, Here, Applied the same fix with VBIOS, but this did not resolve the issue. I should also note, that with the other Rig, There was no onboard, just 4x Dedicated GPUs, while in this case, there is an onboard, with 2x Dedicated GPUs. This issue shouldn't happen according to research on this rig because there is onboard VGA.
4) Increased/Decreased Resources such as RAM and CPU to see if there is a difference. No Difference.

 

VM Logs

2018-05-14 23:19:51.682+0000: starting up libvirt version: 4.0.0, qemu version: 2.11.1, hostname: CB-URHost01
LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/sbin/qemu -name guest=CB-01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-CB-01/master-key.aes -machine pc-i440fx-2.11,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -cpu host -drive file=/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/etc/libvirt/qemu/nvram/c1fc581a-40df-a97d-2753-43497501e9f6_VARS-pure-efi.fd,if=pflash,format=raw,unit=1 -m 4096 -realtime mlock=off -smp 12,sockets=1,cores=6,threads=2 -uuid c1fc581a-40df-a97d-2753-43497501e9f6 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-CB-01/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device nec-usb-xhci,p2=15,p3=15,id=usb,bus=pci.0,-3-CB-01/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device 'vfio-pci,host=01:00.0,id=hostdev0,bus=pci.0,addr=0x6,romfile=/mnt/user/ISO Repository/Graphics ROM BIOS/EVGA 1080ti FTW3 GPU BIOS.dump' -device vfio-pci,host=02:00.0,id=hostdev1,bus=pci.0,addr=0x8 -device vfio-pci,host=01:00.1,id=hostdev2,bus=pci.0,addr=0x9 -device usb-host,hostbus=1,hostaddr=5,id=hostdev3,bus=usb.0,port=1 -device usb-host,hostbus=1,hostaddr=2,id=hostdev4,bus=usb.0,port=2 -device usb-host,hostbus=1,hostaddr=7,id=hostdev5,bus=usb.0,port=3 -device usb-host,hostbus=1,hostaddr=8,id=hostdev6,bus=usb.0,port=4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa -msg timestamp=on
2018-05-14 23:19:51.682+0000: Domain id=3 is tainted: high-privileges
2018-05-14 23:19:51.682+0000: Domain id=3 is tainted: host-cpu
2018-05-14T23:19:51.723700Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)

 

Edited by RichardBoelens
Link to comment

300$/h is a bit much for a home user unfortunately

Hopefully someone will be able to point me in the right direction.

Other than the Motherboard Model, I've done the exact same configuration with my other build and did not have any issues so I suspect that there is something else at play.

Edited by RichardBoelens
Link to comment

assuming its windows, you shouldn't boot vnc plus gpu. 

 

also assuming its windows, what guide did you follow to setup the vm?

 

best practices is to not assign all cpus to the vm. leave cpu 0 for unRaid to manage its business and then send remaining 11 to the vm.

 

make sure your onboard video is set in the bios to be primary display

 

 

Link to comment
  • 3 weeks later...

Having this same issue with VM not working with GPU pass through.

 

My windows 10 VM works perfectly till i try to pass through the single GPU I want to use. I notice when starting the VM, with a GPU assigned to pass through and a vbios path listed, the first of the VM's assigned CPU cores will shoot up to 100% and freeze there, the rest of the VM's cores stay at 0%. If i try to pass through the GPU with out the vbio pathway listed i can get into the VM with Remote Desktop Connection but not splashtop which just goes to a black screen after i log in. But even when seeing the VM through RDC if i try to launch a game or anything that uses the GPU it will fail to launch. Just screen flick to black for a second then come back to desktop. device manager shows no issues with the gtx1080 or driver.

 

This happens regardless of how many cores are assigned. I have left core 0 alone in all tests and instances for UnRaid to use. The motherboard is set to source the onboard vga first. I have tried changing physical slots removing all other PCI equipment. I've run memtest for 24 hours (no errors). Used online vbios files for my cards modified to work per the guide linked below and its related videos from online sourcing and gpu-z locally. renamed the files to be recognized by the gui interface in VM edit as well as assigned it in the XML editor. i tried all combinations (minus with VNC). put in multiple cards to see if one would pass through, nope. I am seriously scratching my head here. 

 

System as is: https://pcpartpicker.com/list/vW6h7W

 

I haven't rebuilt the VM or a new one. The system and VM were origionally created on a Supermicro board but it didn't have the PCI slot for a full sized graphics card so I swapped to the Asus zp9a-d8 motherboard then tried the pass through. Could there be an issue with the VM being created on the old board and trying to GPU pass on the new motherboard? If so why does VNC work no problem?

 

This is past my paygrade i think so hoping someone has an idea or maybe a theory... every thing else works perfectly ...but if it doesn't pass through the GPU i wasted money and hardware. 

 

I've attached pictures of the sytems hang up and the vm log at the time. xml and gui set up of  the VM and vbios files. along with the last 2hrs of system logs during this.

 

 

 

 

largeserver-diagnostics-20180603-0208.zip

syslog.txt

vbios files.JPG

vm set up gui.JPG

vm xml.JPG

cpu hang up with vbios.JPG

Edited by jlruss9777
added the picture showing cpu hangup with vbios & vm log
Link to comment
  • 3 months later...

Im having the exact same issue with my log hanging at the same point, with a core at 100% cpu utilization. Was this ever resolved? I've tried new VM image and that didnt resolve.

 

Update - I was able to resolve the issue by removing the gpu bios from the vm config and disabling hyper-v. I guess it was a bad bios file (from techpowerup) so I am now going to try dumping my own bios using gpu-z so I can run headless. Currently I get a black screen unless the HDMI cable is connected to the TV (from the GPU being passed thru) at the point the VM is started.

 

Edited by Rusty6285
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.