brando56894 Posted August 19, 2017 Share Posted August 19, 2017 (edited) My VMs crash when they're under load because my CPU sucks, I'm dealing with it until my new motherboard comes in a few weeks. Whenever my Windows 10 VM crashes, it locks my Nvidia GTX 1070 that I have passed through to it, and won't let me boot the VM back up, citing this issue: root@unRAID:~# virsh start Windows\ 10 error: Failed to start domain Windows 10 error: internal error: process exited while connecting to monitor: 2017-08-19T09:14:20.728766Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0) 2017-08-19T09:14:20.810864Z qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:04:00.0: failed to open /dev/vfio/25: Device or resource busy If I choose VNC as my video output it's starts fine. A reboot of unRAID also fixes the issue, but I would rather not have to reboot my server when the VM crashes but everything else works well. I found this string of commands relating to the same thing over on the RedHat forums, but the last one won't work and just fails with "-bash: echo: write error: No such device" That was exactly what was going wrong. efifb had attached to some of the nvidia device's memory. Since efifb can't be compiled as a module, and I'd rather not turn it off, here's what I did: echo 0 > /sys/class/vtconsole/vtcon0/bind echo 0 > /sys/class/vtconsole/vtcon1/bind echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind This completely solves the problem and all is well doing passthrough on my Skylake system. Hopefully now that there's a solution with the right magic words in it on the Internet, others will find their answer here. Thanks again! Edited August 19, 2017 by brando56894 1 Quote Link to comment
Ziggurat Posted August 19, 2017 Share Posted August 19, 2017 Start by giving us all your specs. What are the settings in your VM? And are you passing through your only GPU? Or is unRAID using another GPU? If my VM crashes I can launch it again just fine, but I have dumped the GPU VBIOS and pass it to the GPU using a directive in the XML file. Quote Link to comment
brando56894 Posted August 19, 2017 Author Share Posted August 19, 2017 (edited) 29 minutes ago, Ziggurat said: Start by giving us all your specs. Server: SuperMicro X10SDV-F-0 w/Xeon-D 1540 (16x 2 GHz), 1.2 KW EVGA PSU, 2x 32 GB DDR4 ECC RAM Pool: 5x HGST 4 TB HDDs Cache: 1x 512 GB Samsung 840 Pro SATA SSD 29 minutes ago, Ziggurat said: What are the settings in your VM? <domain type='kvm' id='2'> <name>Windows 10</name> <uuid>f4914b40-ce13-7c85-09cf-1bbe740f2d41</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='3'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.9'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/f4914b40-ce13-7c85-09cf-1bbe740f2d41_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='2' threads='2'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/cache/domains/Windows 10/vdisk1.img'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='usb' index='0' model='nec-xhci'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:20:41:f6'/> <source bridge='br0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/1'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-2-Windows 10/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc22a'/> <address bus='3' device='6'/> </source> <alias name='hostdev2'/> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc22b'/> <address bus='3' device='4'/> </source> <alias name='hostdev3'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc52b'/> <address bus='3' device='8'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='3'/> </hostdev> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </memballoon> </devices> <seclabel type='none' model='none'/> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain> 29 minutes ago, Ziggurat said: And are you passing through your only GPU? The BMC has a built-in aspeed 2400 GPU, which isn't used for anything other than IPMI/console access, so I have the GTX passed through to the Windows VM which is my HTPC. Edited August 19, 2017 by brando56894 Quote Link to comment
CrimsonTyphoon Posted September 4, 2017 Share Posted September 4, 2017 @brando56894, did you ever find an answer? I am having a very similar problem. Anytime my LibreELEC VM crashes, if i start it again the whole system crashes and i have to manually restart (thank god for IPMI) For a unknown reason, my card isn't being re-released to unraid. I am passing through a 9500GT. Might just buy a 1050TI to see if that solves the issue... :-/ Quote Link to comment
brando56894 Posted September 4, 2017 Author Share Posted September 4, 2017 Nope, buying a new card won't help, this is a software issue with either Linux or qemu/libvirt. Quote Link to comment
CrimsonTyphoon Posted September 4, 2017 Share Posted September 4, 2017 (edited) 6 hours ago, brando56894 said: Nope, buying a new card won't help, this is a software issue with either Linux or qemu/libvirt. Dam, we have to figure out this issue! Here are my specs and XML: Supermicro 4U Server CSE-846A-R1200B Chassis X9DRI-F Motherboard 2x E5-2670 2.6ghz 8-Core 8.0 GT/s / 20mb Smart Cache CPUs 8x 8gb PC3-10600R Server Memory 24x 3.5" Trays SAS2-846EL1 Backplane LSI 9207-8i 2x 1200w PSU All VMs on SSD Cache (500GB) <domain type='kvm' id='3'> <name>LibreELEC</name> <uuid>b0d00937-53ca-72ef-0e66-cb938ec10e09</uuid> <metadata> <vmtemplate xmlns="unraid" name="Linux" icon="libreelec.png" os="linux"/> </metadata> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='13'/> <vcpupin vcpu='1' cpuset='14'/> <vcpupin vcpu='2' cpuset='29'/> <vcpupin vcpu='3' cpuset='30'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-q35-2.7'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/b0d00937-53ca-72ef-0e66-cb938ec10e09_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='2' threads='2'/> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='writeback'/> <source file='/mnt/user/domains/LibreELEC/vdisk2.img'/> <backingStore/> <target dev='hdc' bus='sata'/> <boot order='1'/> <alias name='sata0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <alias name='usb'/> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <alias name='usb'/> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <alias name='usb'/> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <controller type='sata' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <controller type='pci' index='0' model='pcie-root'> <alias name='pcie.0'/> </controller> <controller type='pci' index='1' model='dmi-to-pci-bridge'> <model name='i82801b11-bridge'/> <alias name='pci.1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> </controller> <controller type='pci' index='2' model='pci-bridge'> <model name='pci-bridge'/> <target chassisNr='2'/> <alias name='pci.2'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:85:00:be'/> <source bridge='br0'/> <target dev='vnet2'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-LibreELEC/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x82' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0c45'/> <product id='0x5101'/> <address bus='2' device='4'/> </source> <alias name='hostdev2'/> <address type='usb' bus='0' port='1'/> </hostdev> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/> </memballoon> </devices> <seclabel type='none' model='none'/> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain> Edited September 4, 2017 by CrimsonTyphoon Quote Link to comment
brando56894 Posted September 7, 2017 Author Share Posted September 7, 2017 After doing a little more research, it may be as simple as just killing the qemu process that is hanging onto the device. It crashed for me last night but I hadn't seen this yet so I haven't had a chance to test it. My hung device is /dev/vfio/25 and IDK why I didn't think of this before but lsof will show the process that is using the device, which in this case is qemu root@unRAID:~# lsof /dev/vfio/25 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME qemu-syst 5388 root 24u CHR 251,0 0t0 97425 /dev/vfio/25 So if that process still exists after the VM crashes and is shutdown a simple kill -9 5388 should release that device and allow the VM to be restarted since theoretically nothing will be using that device node. Give it a try the next time you experience a crash and let me know what happens. I posted a thread about this on reddit since we're not getting any help here. I also find a similar thread there relating to this, but not Windows VM specific: https://www.reddit.com/r/VFIO/comments/44f1oc/primary_gpu_hotplug/ (now that I see there is a VFIO subreddit I'm gonna cross-post it for more visibility) 1 Quote Link to comment
CrimsonTyphoon Posted September 9, 2017 Share Posted September 9, 2017 Haven't had any crashes in awhile, but I will try this next time. Thanks @brando56894 Quote Link to comment
brando56894 Posted September 9, 2017 Author Share Posted September 9, 2017 No problem buddy, hope it helps. No responses on either of the Reddit threads either so this may be it. Quote Link to comment
CrimsonTyphoon Posted September 16, 2017 Share Posted September 16, 2017 Didn't work for me :-( Summary: LibreELEC VM 9500GT Passthru See sig for rig specs With the VM off, there is nothing in /dev/vifo. With the VM on, there is Here is the error message in the console when I try to turn the VM back on (from unRAID console): Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Shutting down cpus with NMI Kernel Offset: disabled Rebooting in 30 seconds.. It does not actually reboot. I updated to the latest beta (rc8q) hoping it would help, but it did not. I also added the ROM file to the card bios, but still nothing Quote Link to comment
brando56894 Posted September 16, 2017 Author Share Posted September 16, 2017 Ah that sucks :-/ You may be on your own with this one for now because I just upgraded my CPU and motherboard so I doubt this will happen again, but it's only been on for less than 48 hours (Windows has only been up for about 8 hours), so who knows. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.