Nvidia GTX 960 passthrough issue


Recommended Posts

Hi,

 

This graphic card passthrough issue seems very popular with kvm 😵

So like many of us I got this VFIO error message.

Quote

internal error: qemu unexpectedly closed the monitor: 2020-03-25T10:05:32.401100Z qemu-system-x86_64: -device vfio-pci,host=0000:02:00.0,id=hostdev0,bus=pci.0,multifunction=on,addr=0x5: vfio 0000:02:00.0: group 1 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver.

 

I tried several workarounds without success

  • First I tried to change the slot of the video card on the mother board but the error came back with a different slot number
     
  • I added multifunction='on' and changed the virtual slot of the video sound card  in the KVM xml but error remains
<hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </hostdev>
  • finally I tried VFIO-PCI CFG plugin and after reboot the UNRAID array cannot start again because the Parity disk disappeared...

 

I'm stuck and I will very appreciate if someone can help me to sort out this error.

Thank you

nas-diagnostics-20200325-2148.zip

Link to comment
11 hours ago, Fabiolander said:

group 1 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver.

Check your IOMMU groupings under tools >> system devices. Doesn't look like your GPU is in it's own group separated from other devices. You can only passthrough devices if they are separated from other devices. If there are multiple devices in a single group, you have to pass them trough all together, BUT don't blindly passthrough devices if you don't know what they are.

 

To split up the groupings even further enabling "PCIe ACS override" in the VM manager settings might help. Server restart required.

 

11 hours ago, Fabiolander said:

finally I tried VFIO-PCI CFG plugin and after reboot the UNRAID array cannot start again because the Parity disk disappeared...

As said earlier, you have multiple devices in group 1

/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.2
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.1
/sys/kernel/iommu_groups/1/devices/0000:03:00.0

The last one is an LSI controller

03:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
	Subsystem: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:3060]
	Kernel driver in use: mpt3sas
	Kernel modules: mpt3sas

 

You can do a couple things. As mentioned earlier, try the ACS override to split up the groups or try to change the PCIE slot your GPU is plugged in. Always check the groupings before you passthrough any device!

Link to comment

Waouhhh thanks a lot @bastl !

 

I went to setting > VM manager and I change PCIe ACS override settings from 'off' to 'both'

Bingo@! Now NVIDIA Graphic and Audio are in the same group !

IOMMU group 0:	[8086:0c00] 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
IOMMU group 1:	[8086:0c01] 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
IOMMU group 2:	[8086:0c05] 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06)
IOMMU group 3:	[8086:0c09] 00:01.2 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x4 Controller (rev 06)
IOMMU group 4:	[8086:8cb1] 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
IOMMU group 5:	[8086:8cba] 00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
IOMMU group 6:	[8086:8cad] 00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
IOMMU group 7:	[8086:8ca0] 00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
IOMMU group 8:	[8086:8c90] 00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
IOMMU group 9:	[8086:8c96] 00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
IOMMU group 10:	[8086:8ca6] 00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
IOMMU group 11:	[8086:8cc4] 00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller
		[8086:8c82] 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
		[8086:8ca2] 00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
IOMMU group 12:	[10de:1401] 02:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
		[10de:0fba] 02:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
IOMMU group 13:	[1000:0072] 03:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
IOMMU group 14:	[1969:e091] 05:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13)

After a reboot the VM starts !

 

Now a new issue : Connecting to the VM with anydesk the max resolution the VM is able to display is a very laggy 640 x 480.

I managed to install the latest NVIDIA graphic driver but nothing change I'm stuck in 640x480

 

Do you have an idea of what I missed ?

 

 

Link to comment
7 hours ago, bastl said:

@Fabiolander You have to connect a display to the card or a HDMI dummy plug, that simulates a connected display. And you have to remove the VNC graphics from the VM.

Thanks a lot @bastl for your help.

 

The VNC card is disabled ( I connect to the VM with anydesk )

A old HDMI monitor is connected to the video card

 

The screen resolution remains locked to 640x480 and the OS is frozen now.

 

image.png.8c9e9e58e1efc592f870c7f7da895dd2.png

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='4'>
  <name>WIN10 Gaming 1</name>
  <uuid>4da06f4b-18e3-b9f1-d6ca-80516e89d87b</uuid>
  <description>WIN10 Gaming 1</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>17301504</memory>
  <currentMemory unit='KiB'>17301504</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='6'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/4da06f4b-18e3-b9f1-d6ca-80516e89d87b_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='2' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disks/SSD-UNASSIGNED/WIN10 Gaming 1/vdisk1.img' index='1'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:fc:0f:43'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-WIN10 Gaming 1/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/tmp/MSI.GTX960.2048.150528.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

other things I discovered :

Since I enforced 'PCIe ACS override' the 'UNRAID NVIDIA' plugin does not detect any video card anymore

'GPU diagnostic' returns this error Error 300: Vendor utility not found.

 

I guess it is because the videocard is now rerooted and locked by the VM but not sure so I share this info with you.

 

Thanks again for your time 🤘

Link to comment
57 minutes ago, Fabiolander said:

The screen resolution remains locked to 640x480 and the OS is frozen now.

Check your device manager in Windows if the card has any errors. Sounds something is wrong with your driver.

58 minutes ago, Fabiolander said:

I guess it is because the videocard is now rerooted and locked by the VM but not sure so I share this info with you.

I'am not using a Nvidia build of unraid. Can't really tell if there is something you have to do differently, but as soon as you passthrough a GPU to a VM it's gone from the host os. So I guess this could be a "normal" warning. Keep in mind, as long as the VM is running none of the host os or dockers are able to use this GPU. Better check the depending Nvidia build forums if you can find any advice for this.

Link to comment

Maybe I will not use this VM for gaming finally. It looks super tricky to make it works properly with video by pass.

 

In my dream I was thinking about moving my actual video editing hackintosh (i9 with 64GB RAM and Radeon64) to UNRAID server where I can host one WIN10 Gaming VM and one MAC Video editing VM. I'm afraid this project is a bit too much optimistic 🤣🤣

 

For now it is maybe a better compromise to keep this UNRAID server with the i7 configuration and to use it only as NAS, Media server and Home Assistant. I will continue to use dual boot between WIN10 & Catalina for other purposes.


I just install 2 cheap 10GB cards between both and the transfert rate is close to the Nirvana 😎😎

 

Thanks again for your help

 

image.thumb.png.74104932b16f46c887be362a2d36c46b.png

  

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.