Endy Posted January 16, 2020 Share Posted January 16, 2020 (edited) I've recently switched hardware and so I've started to create a new Windows 10 vm from scratch to make sure that I can get passthrough working. It looks like there are 2 problems. If I try to start the vm with passed through usb, the vm doesn't start and then it locks up the whole Unraid server. If I try to start the vm with just the video card passed through and not the usb, I don't seem to be getting any video. Hardware specs in sig. This is the only video card in the system and I am trying to passthrough a motherboard usb controller. They are both in their own groups. IOMMU group 28: [10de:1b81] 0d:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) [10de:10f0] 0d:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1) IOMMU group 33: [1022:149c] 10:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller I am using the vfio-pci.cfg file method. BIND=10:00.3 0d:00.0 0d:00.1 vm xml <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Windows 10 Test</name> <uuid>8fe401f7-9b18-2ef5-d565-8a916dc0a78c</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='12'/> <vcpupin vcpu='2' cpuset='5'/> <vcpupin vcpu='3' cpuset='13'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='14'/> <vcpupin vcpu='6' cpuset='7'/> <vcpupin vcpu='7' cpuset='15'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-4.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/8fe401f7-9b18-2ef5-d565-8a916dc0a78c_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='8' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10 Test/vdisk1.img'/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/Data/ISO/Windows 10/Windows1909.iso'/> <target dev='hda' bus='sata'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/Data/ISO/virtio-win-0.1.171.iso'/> <target dev='hdb' bus='sata'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x9'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0xa'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0xb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='pci' index='5' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='5' port='0x14'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/> </controller> <controller type='pci' index='6' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='6' port='0x8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='7' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='7' port='0xc'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/> </controller> <controller type='pci' index='8' model='pcie-to-pci-bridge'> <model name='pcie-pci-bridge'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:a3:f7:80'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/> </source> <rom file='/mnt/user/domains/vbios/GTX1070.rom'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x1'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x10' slot='0x00' function='0x3'/> </source> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> </domain> Relevant Syslinux section. I'm not sure if the video=efifb:off is necessary, it was just something I was trying. label unRAID OS menu default kernel /bzimage append isolcpus=4-7,12-15 initrd=/bzroot video=efifb:off mitigations=off This was in the system log file. I'm not sure what's relevant besides the vfio-pci not ready messages. Jan 15 15:21:03 Turtle kernel: br0: port 2(vnet0) entered blocking state Jan 15 15:21:03 Turtle kernel: br0: port 2(vnet0) entered disabled state Jan 15 15:21:03 Turtle kernel: device vnet0 entered promiscuous mode Jan 15 15:21:03 Turtle kernel: br0: port 2(vnet0) entered blocking state Jan 15 15:21:03 Turtle kernel: br0: port 2(vnet0) entered forwarding state Jan 15 15:21:08 Turtle kernel: clocksource: timekeeping watchdog on CPU10: Marking clocksource 'tsc' as unstable because the skew is too large: Jan 15 15:21:08 Turtle kernel: clocksource: 'hpet' wd_now: 9f7787c3 wd_last: 9e931f84 mask: ffffffff Jan 15 15:21:08 Turtle kernel: clocksource: 'tsc' cs_now: a5d3c006266 cs_last: a5ccefb201b mask: ffffffffffffffff Jan 15 15:21:08 Turtle kernel: tsc: Marking TSC unstable due to clocksource watchdog Jan 15 15:21:08 Turtle kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. Jan 15 15:21:08 Turtle kernel: sched_clock: Marking unstable (2886087668830, -16976841)<-(2886210465096, -139776844) Jan 15 15:21:08 Turtle kernel: clocksource: Switched to clocksource hpet Jan 15 15:21:16 Turtle kernel: vfio-pci 0000:10:00.3: not ready 1023ms after FLR; waiting Jan 15 15:21:18 Turtle kernel: vfio-pci 0000:10:00.3: not ready 2047ms after FLR; waiting Jan 15 15:21:21 Turtle kernel: vfio-pci 0000:10:00.3: not ready 4095ms after FLR; waiting Jan 15 15:21:26 Turtle kernel: vfio-pci 0000:10:00.3: not ready 8191ms after FLR; waiting Jan 15 15:21:35 Turtle kernel: vfio-pci 0000:10:00.3: not ready 16383ms after FLR; waiting Jan 15 15:21:53 Turtle kernel: vfio-pci 0000:10:00.3: not ready 32767ms after FLR; waiting Anything I left out and need to add? I'm not sure what to try next. Edited January 20, 2020 by Endy Quote Link to comment
testdasi Posted January 16, 2020 Share Posted January 16, 2020 With regards to USB locking things up, have you checked (and double checked) that your Unraid USB stick isn't connected to the controller you are passing through? Next, have you checked if the USB controller can be reset? With regards to the GPU, how did you obtain the vbios file? Are you able to RDP into the VM to see if the Nvidia driver spits out error 43? Quote Link to comment
Skitals Posted January 16, 2020 Share Posted January 16, 2020 (edited) You can not pass through that usb controller on x570. That and onboard audio will lock up unraid without fail. You can pass through the other two usb controllers together, assuming your unraid usb isn't plugged into one of them. See my screenshot, on my system there are 4 devices in group 24. Pass all three devices I have checked together. If you use my VFIO-PCI Config plugin you can see which usb controller your unraid usb is connected to. In my case it's the Cruzer Fit in 14.00.3. Not that the ".3" here is a bit of a red flag. Even though it is in its own IOMMU group, it is linked to the other 14.00.x devices (10.00.x in your case), which includes the problematic onboard audio (that should be 10.00.4 in your case). It is an agesa bug at the very least. I've seen reports of getting onboard audio working with a kernel patch, I haven't investigated yet, it might also fix the usb issue. Either way, it is best case scenario to use your 10.00.3 for unraid usb and pass the others. Edited January 16, 2020 by Skitals 1 Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 3 hours ago, testdasi said: With regards to USB locking things up, have you checked (and double checked) that your Unraid USB stick isn't connected to the controller you are passing through? Next, have you checked if the USB controller can be reset? With regards to the GPU, how did you obtain the vbios file? Are you able to RDP into the VM to see if the Nvidia driver spits out error 43? I did my best to map out which usb ports go to which motherboard controller. USB devices connected to the controller I was trying to pass through no longer show up in System Devices, but the Unraid USB stick does. Yes, the USB controller can be reset. It looks like Skitals might have the answer for me on this one. I dumped the bios myself and removed the header as shown in SpacerInvaderOne's video. Being a fresh vm, I haven't been able to get in far enough to actually install Windows 10 so no RDP. 1 hour ago, Skitals said: You can not pass through that usb controller on x570. That and onboard audio will lock up unraid without fail. You can pass through the other two usb controllers together, assuming your unraid usb isn't plugged into one of them. See my screenshot, on my system there are 4 devices in group 24. Pass all three devices I have checked together. If you use my VFIO-PCI Config plugin you can see which usb controller your unraid usb is connected to. In my case it's the Cruzer Fit in 14.00.3. Not that the ".3" here is a bit of a red flag. Even though it is in its own IOMMU group, it is linked to the other 14.00.x devices (10.00.x in your case), which includes the problematic onboard audio (that should be 10.00.4 in your case). It is an agesa bug at the very least. I've seen reports of getting onboard audio working with a kernel patch, I haven't investigated yet, it might also fix the usb issue. Either way, it is best case scenario to use your 10.00.3 for unraid usb and pass the others. That makes a lot of sense. I will try that out and report back. Quote Link to comment
Skitals Posted January 16, 2020 Share Posted January 16, 2020 2 minutes ago, Endy said: I did my best to map out which usb ports go to which motherboard controller. USB devices connected to the controller I was trying to pass through no longer show up in System Devices, but the Unraid USB stick does. Yes, the USB controller can be reset. It looks like Skitals might have the answer for me on this one. I dumped the bios myself and removed the header as shown in SpacerInvaderOne's video. Being a fresh vm, I haven't been able to get in far enough to actually install Windows 10 so no RDP. That makes a lot of sense. I will try that out and report back. Also note that my 0d:00.1 does not have reset functionality. 0d:00.0 "non-essential instrumentation" controls reset for that controller, which is why it's passed together. Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 Success! The vm now starts and Unraid doesn't lock up. I am still having the no video problem. @testdasi I assume what I need to do is to use vnc to setup the vm and then try to add the graphics card afterwards so that then I could RDP into it? 2 hours ago, Skitals said: Also note that my 0d:00.1 does not have reset functionality. 0d:00.0 "non-essential instrumentation" controls reset for that controller, which is why it's passed together. Mine is the same (except that it's 0a:00.0 and oa:00.1). What about 03:08.0 [RESET] 1022:57a4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4 since that is in the same iommu group? I think what I typically read is that everything in a group needs to be passed through? So far it seems to be working without that passed through. Quote Link to comment
testdasi Posted January 16, 2020 Share Posted January 16, 2020 (edited) 6 minutes ago, Endy said: Success! The vm now starts and Unraid doesn't lock up. I am still having the no video problem. @testdasi I assume what I need to do is to use vnc to setup the vm and then try to add the graphics card afterwards so that then I could RDP into it? Start with having the GPU passed through (note: remember to also include the HDMI audio device) to see if there's display. Default generic driver from the windows installer doesn't do error code 43. Error code 43 is a Nvidia driver thing. If no display THEN do Windows install in VNC. However, in this case, it's more likely due to a bad vbios (e.g. file downloaded from Techpowerup that doesn't match the device). In other words, if you do your pass through correctly on the host end (e.g. correct VM xml, correct vbios, stubbing etc.) then you should be able to install Windows with the GPU passed through. If it doesn't work at this stage then you need to focus on fixing the host config first. Once you have installed Windows + Nvidia driver and then you lose display (or can't install Nvidia driver to begin with) then it's likely error code 43 so you then deal with that. Edited January 16, 2020 by testdasi Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 Ok, trying to start with just vnc instead of the graphics card I get this Quote internal error: qemu unexpectedly closed the monitor: 2020-01-16T17:19:14.841017Z qemu-system-x86_64: -device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0: Bus 'pci.1' not found Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 2 minutes ago, testdasi said: If no display THEN do Windows install in VNC. However, in this case, it's more likely due to a bad vbios (e.g. file downloaded from Techpowerup that doesn't match the device). There was no display and I did get the vbios from my card, I did not download it from techpowerup. Quote Link to comment
testdasi Posted January 16, 2020 Share Posted January 16, 2020 4 minutes ago, Endy said: Ok, trying to start with just vnc instead of the graphics card I get this Start a new template and use Q35 machine type (with OVMF). You are better off sorting it out first to make sure your VM boots (to Windows installer) with a display. At the very least, you want to be able to see the Tiano Core screen at VM boot. 1 Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 Ok so I just added the 0a:00.3 usb controller to the vm and now I can get it to start with vnc. The message that comes up is: Guest has not initialized the display (yet). @testdasi like you said, I deleted the template and started again. Now it's working. Thank you. I still need to install the nvidia driver and make sure that works. Quote Link to comment
Skitals Posted January 16, 2020 Share Posted January 16, 2020 27 minutes ago, Endy said: Success! The vm now starts and Unraid doesn't lock up. I am still having the no video problem. @testdasi I assume what I need to do is to use vnc to setup the vm and then try to add the graphics card afterwards so that then I could RDP into it? Mine is the same (except that it's 0a:00.0 and oa:00.1). What about 03:08.0 [RESET] 1022:57a4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4 since that is in the same iommu group? I think what I typically read is that everything in a group needs to be passed through? So far it seems to be working without that passed through. The numbers in the address stand for Bus:Device.Function. You want to pass all functions of a device. Typically saying passing the whole IOMMU group means the same thing... but in this case there is weirdness with the groupings. It's the same reason you want to pass through your gpu audio along with the graphics card, even if they are in different groups (xx:xx.0 and xx:xx.1). 1 Quote Link to comment
Skitals Posted January 16, 2020 Share Posted January 16, 2020 Oh, and to fix your gpu issue, for single gpu on x570 you need to disable the framebuffer in unraid. Add this parameter to your syslinux.cfg: video=efifb:off When you boot unraid, you will get no video output after the bootloader. A gtx1070 should work fine with a good vbios. This was my previous setup before upgrading to a 5700XT + second gpu. 1 Quote Link to comment
Endy Posted January 16, 2020 Author Share Posted January 16, 2020 4 hours ago, Skitals said: The numbers in the address stand for Bus:Device.Function. You want to pass all functions of a device. Typically saying passing the whole IOMMU group means the same thing... but in this case there is weirdness with the groupings. It's the same reason you want to pass through your gpu audio along with the graphics card, even if they are in different groups (xx:xx.0 and xx:xx.1). Thanks for this. It's starting to make sense. In all my searching I would keep finding these things that people say to do, but usually not with any explanations as to why. 3 hours ago, Skitals said: Oh, and to fix your gpu issue, for single gpu on x570 you need to disable the framebuffer in unraid. Add this parameter to your syslinux.cfg: video=efifb:off When you boot unraid, you will get no video output after the bootloader. A gtx1070 should work fine with a good vbios. This was my previous setup before upgrading to a 5700XT + second gpu. Yes, that was already done. Kind of like I was saying before, I saw it mentioned somewhere to do it, but no explanation of why. Thank you both for the help. So far everything is now working. Quote Link to comment
mattz Posted June 2, 2020 Share Posted June 2, 2020 (edited) I wanted to mention, this issue has just recently been affecting me. I am on an MSI x470 Gaming M7 AC motherboard. The issue occurred when I switched from the 2700x CPU to the 3900x CPU (I wanted more cores!!). I swapped the CPUs and all the VFIO Bus:Device.Function numbers changed (that's probably expected). However, the `USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller` (IOMMU group 22 in the pic below) I had passed-through with the 2700x no longer works in pass-thru, even after adjusting for the vfio numbers. Using it now locks up the system no matter what combo of "Non-essential" components I tag with vfio and pass with it. @Skitals described this problem with all x570 boards... Looks like it's the same for x470 boards with a 3000 series CPU. Why the heck is this a problem in the 3000 series, BTW?? I am going down the path of passing through the other USB controllers, as stated above. Unfortunately, on this board one of the controllers is the front USB, which are not very accessible to me. The other contains 3 regular USB ports and 1 USB-C port on the back to use, but that is contained in a very large IOMMU group that includes things like the Ethernet controller (IOMMU group 17), so I have my doubts about isolating that back USB panel. Will provide an update when I get this going... Worst case I buy a separate, PCI USB controller and go that route. Edited June 2, 2020 by mattz added IOMMU groups Quote Link to comment
mattz Posted June 3, 2020 Share Posted June 3, 2020 So, I think this is the resolution to the problem "PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0": https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=efaa35873d66bf4a4903f757333692766e34e448 It should be brought into some new version of Linux... Does anyone know what version and when Unraid will get it?? My first time looking through these commits. Quote Link to comment
mattz Posted June 4, 2020 Share Posted June 4, 2020 Wanted to follow-up. The cause was totally that FLR issue posted above. Luckily, someone on this forum had already compiled a kernel with a temporary fix, and I used that. Note that I tried Unraid 6.9.0-beta1 and it did not yet have the FLR fix in the Linux kernel. Find that custom kernel for Unraid 6.8.3 here: Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.