erickg Posted June 9, 2020 Posted June 9, 2020 I have a custom build with Ryzen 3600, ASRock B450M Pro4-F, and GTX 1660 Super. I intend to have a Windows 10 VM to power some graphical-intensive application. While Unraid uses KVM, the GPU passthrough seems need some tweaks. I am not familiar with IOMMU and virtualization so this will be my logs and understanding of that. I am happy to share more information if you are interested. Tools > System Devices gives a pretty good view of devices information. My focus is the graphic card. It belongs to IOMMU Group 16. IOMMU group 16:[10de:21c4] 06:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1) [10de:1aeb] 06:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1) [10de:1aec] 06:00.2 USB controller: NVIDIA Corporation Device 1aec (rev a1) [10de:1aed] 06:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] (rev a1) lspci -nnv reveals further information: 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] [10de:21c4] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Device [1b4c:13d3] Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at f000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express <?> Capabilities: [bb0] Resizable BAR <?> 06:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) Subsystem: Device [1b4c:13d3] Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at f7080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting 06:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1) (prog-if 30 [XHCI]) Subsystem: Device [1b4c:13d3] Flags: bus master, fast devsel, latency 0, IRQ 41 Memory at f2000000 (64-bit, prefetchable) [size=256K] Memory at f2040000 (64-bit, prefetchable) [size=64K] Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Power Management version 3 Capabilities: [100] Advanced Error Reporting Kernel driver in use: xhci_hcd 06:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:1aed] (rev a1) Subsystem: Device [1b4c:13d3] Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at f7084000 (32-bit, non-prefetchable) [size=4K] Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Power Management version 3 Capabilities: [100] Advanced Error Reporting Only that USB controller has a kernel driver. I don't know why it has a USB controller. It is perhaps for USB in the display. The problem would be an error of IOMMU if I try to create a Win 10 VM. internal error: qemu unexpectedly closed the monitor: 2020-06-09T10:07:21.970912Z qemu-system-x86_64: -device vfio-pci,host=0000:06:00.0,id=hostdev0,bus=pci.0,addr=0x6: vfio 0000:06:00.0: group 16 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver. Sure, vfio does not have information about these 4 devices. Concepts As I said, I have no idea how virtualization works but I understand the basic of PCI-e and DMA. Various resources are helpful: - https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Setting_up_IOMMU - http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html - http://vfio.blogspot.com/2014/08/vfiovga-faq.html - https://www.kernel.org/doc/Documentation/vfio.txt The idea of GPU passthrough is to maximize the performance by assigning the GPU to a single VM (Win 10). VFIO is a driver framework to support virtualization control than common user-space IO interfaces. The understanding of binding devices to vfio has to be delayed until making sense of IOMMU. From the kernel's perspective, devices are the main target of I/O driver. Devices typically create a programming interface made up of I/O access, interrupts, and DMA. Without going into the details of each of these, DMA is by far the most critical aspect for maintaining a secure environment as allowing a device read-write access to system memory imposes the greatest risk to the overall system integrity. Basically I need to achieve such that vfio handles this device (Group 16) so KVM has a chance to handle all the details. Then the documentaion of vfio is sufficient. 06:00.02 needs to be unbind and all other device needed to have an id and binded with vfio. With all the commands, I can create a new VM without warning. The VM can be started. But it's not over yet. Now I have a black screen. It's not worse but better. The console is gone. So the VM has the graphic card. I have to dig deeper. Quote
erickg Posted June 10, 2020 Author Posted June 10, 2020 (edited) I am trying to probe with various of settings to see if there is some progress. It seems installing Windows 10 with VNC and then try GPU passthrough is a good idea. I am using 4 cores, 4GB and no other devices with the VM and made the backup. With Unraid 6.8.3, there is no way to edit the VM configuration so I destroy that VM config and created a new VM. I attempted to boot up the system without using installed Windows. But I saw messages from dmesg that said vfio is waiting for the graphic card but not responded. After that, Unraid will crash and I have to restart the array. The reason being: [ 2955.522429] br0: port 2(vnet0) entered forwarding state [ 2957.518776] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1e@0x258 [ 2957.518804] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x19@0x900 [ 2957.521145] vfio-pci 0000:06:00.0: BAR 3: can't reserve [mem 0xf0000000-0xf1ffffff 64bit pref] [ 2957.521522] vfio-pci 0000:06:00.0: No more image in the PCI ROM [ 2957.546577] vfio-pci 0000:08:00.4: enabling device (0000 -> 0002) [ 2969.776449] vfio-pci 0000:08:00.4: not ready 1023ms after FLR; waiting [ 2971.824472] vfio-pci 0000:08:00.4: not ready 2047ms after FLR; waiting [ 2974.896481] vfio-pci 0000:08:00.4: not ready 4095ms after FLR; waiting [ 2980.272695] vfio-pci 0000:08:00.4: not ready 8191ms after FLR; waiting [ 2989.488611] vfio-pci 0000:08:00.4: not ready 16383ms after FLR; waiting Then we read memory mappings by cat /proc/iomem. e0000000-ffffffff : Reserved e0000000-fec2ffff : PCI Bus 0000:00 e0000000-f20fffff : PCI Bus 0000:06 e0000000-efffffff : 0000:06:00.0 f0000000-f1ffffff : 0000:06:00.0 f1000000-f17e8fff : efifb f2000000-f203ffff : 0000:06:00.2 f2000000-f203ffff : xhci-hcd f2040000-f204ffff : 0000:06:00.2 e0000000-fec2ffff : PCI Bus 0000:00 e0000000-f20fffff : PCI Bus 0000:06 e0000000-efffffff : 0000:06:00.0 f0000000-f1ffffff : 0000:06:00.0 f2000000-f203ffff : 0000:06:00.2 f2000000-f203ffff : xhci-hcd f2040000-f204ffff : 0000:06:00.2 f2200000-f22fffff : PCI Bus 0000:01 f2200000-f22fffff : PCI Bus 0000:02 f2200000-f22fffff : PCI Bus 0000:04 f2200000-f2203fff : 0000:04:00.0 f6000000-f70fffff : PCI Bus 0000:06 f6000000-f6ffffff : 0000:06:00.0 f7080000-f7083fff : 0000:06:00.1 f7084000-f7084fff : 0000:06:00.3 The card has a PCI bus and one region that I didn’t touch is the efifb. I don’t have more choices but to modify kernel parameters. Unraid uses syslinux so /boot/syslinux/syslinux.cfg it is. Or Main > Boot Device > Flash > Syslinux Configuration. I have to add some parameters video=vesafb:off video=efifb:off The result is not great. It's not booting yet. [ 112.241841] virbr0: port 1(virbr0-nic) entered listening state [ 112.254954] virbr0: port 1(virbr0-nic) entered disabled state [ 317.096329] xhci_hcd 0000:06:00.2: remove, state 4 [ 317.096339] usb usb4: USB disconnect, device number 1 [ 317.096547] xhci_hcd 0000:06:00.2: USB bus 4 deregistered [ 317.096554] xhci_hcd 0000:06:00.2: remove, state 4 [ 317.096557] usb usb3: USB disconnect, device number 1 [ 317.097267] xhci_hcd 0000:06:00.2: USB bus 3 deregistered [ 378.185877] r8169 0000:04:00.0: invalid short VPD tag 00 at offset 1 [ 441.198294] vfio-pci 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 467.277160] br0: port 2(vnet0) entered blocking state [ 467.277170] br0: port 2(vnet0) entered disabled state [ 467.277215] device vnet0 entered promiscuous mode [ 467.277362] br0: port 2(vnet0) entered blocking state [ 467.277364] br0: port 2(vnet0) entered forwarding state [ 467.979374] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1e@0x258 [ 467.979402] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x19@0x900 [ 467.982513] vfio-pci 0000:06:00.0: No more image in the PCI ROM [ 468.007177] vfio-pci 0000:08:00.4: enabling device (0000 -> 0002) [ 480.220983] vfio-pci 0000:08:00.4: not ready 1023ms after FLR; waiting [ 482.268993] vfio-pci 0000:08:00.4: not ready 2047ms after FLR; waiting [ 485.341003] vfio-pci 0000:08:00.4: not ready 4095ms after FLR; waiting [ 490.844991] vfio-pci 0000:08:00.4: not ready 8191ms after FLR; waiting Edited June 10, 2020 by erickg Quote
erickg Posted June 10, 2020 Author Posted June 10, 2020 I was about to write about IOMMU but kernel seems unhappy with other issues. For example, 0000:08:00.4 is a sound controller on the motherboard. By not using the device, I was about to launch Windows 10. However, there is no image on the screen. The remaining errors are: 556.753561] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x1e@0x258 [ 556.753590] vfio_ecap_init: 0000:06:00.0 hiding ecap 0x19@0x900 [ 556.756856] vfio-pci 0000:06:00.0: No more image in the PCI ROM [ 559.068985] vfio-pci 0000:06:00.0: No more image in the PCI ROM [ 559.069002] vfio-pci 0000:06:00.0: No more image in the PCI ROM This is apparently not a problem with IOMMU anymore. It's quite frustrating experience to get to this step with no clue the next. Quote
erickg Posted June 10, 2020 Author Posted June 10, 2020 With some further internet searching and looking into kernel source code, this might attributes to the vbios. Open source NVIDIA patcher seems not to support Turing-based devices so I would refrain from providing a vbios to the VM. IMO, trying GPU passthrough for GTX 1660 Super required too much efforts. It's not really a happy ending Quote
erickg Posted June 17, 2020 Author Posted June 17, 2020 I have to admit my lazinees towards success. I install a Windows 10 with a SSD. Then I realize that I have the chance to export a GPU ROM (vbios). So I did. Archwiki mentioned KVM guest will see a shadow copy of a vbios which would be invalid. That is the white screen and the error in the previous log about ROM. I tried with the original ROM from GPU-Z which does not work. Even though I don't know why removing NVIDIA header from vbios helps but it works. Following this guide and editing a ROM without NVIDIA header. Now the steps are crystal clear. 1. setup necessary steps as Unraid guide. 2. pass all device in the IOMMU group of NVIDIA card to vfio. 3. disable efifb. My boot paramaters for Unraid looks like this: kernel /bzimage append video=efifb:off vfio-pci.ids=10de:21c4,10de:1aeb,10de:1aec,10de:1aed initrd=/bzroot 4. Use SATA for isoes and modified vbios to use NVIDIA card. Genesis is a little bit tedious but the reward is great! 2 Quote
tk40 Posted November 17, 2021 Posted November 17, 2021 I just wanted to commend that I also have a GTX 1660 Super card and these instructions that erikg gave work perfectly! Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.