Unable to pass nvidia GTX960 to VM


Recommended Posts

I am unable to pass my GPU to either a Win8.1 or Ubuntu VM unless I set the PCIe ACS Override to YES in the VM settings.

 

If the override is disabled, I get the following:

Error: internal error: early end of file from monitor: possible problem:
2015-06-25T23:17:14.280593Z qemu-system-x86_64: -device vfio-pci,host=03:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error, group 1 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2015-06-25T23:17:14.280610Z qemu-system-x86_64: -device vfio-pci,host=03:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 1
2015-06-25T23:17:14.280618Z qemu-system-x86_64: -device vfio-pci,host=03:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed
2015-06-25T23:17:14.280624Z qemu-system-x86_64: -device vfio-pci,host=03:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

 

Excluding the slots in syslinux.cfg via "pci-stub.ids=03:00.0,03:00.1" didn't make a difference.

 

One hint is mentioning the "iommu_group", how does one do that?

 

XML:

<domain type='kvm' id='2' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>Win8</name>
  <uuid>9588dff4-d597-4142-e498-60a388f0ca43</uuid>
  <metadata>
    <vmtemplate name="Custom" icon="windows.png" os="windows"/>
  </metadata>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='5'/>
    <vcpupin vcpu='6' cpuset='6'/>
    <vcpupin vcpu='7' cpuset='7'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='8' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='writeback'/>
      <source file='/mnt/cache/VM/Win8/Win8.qcow2'/>
      <backingStore/>
      <target dev='hda' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:00:00:00'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Win8.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
        <address bus='1' device='2'/>
      </source>
      <alias name='hostdev0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ioh3420,bus=pci.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=03:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=03:00.1,bus=root.1,addr=00.1'/>
  </qemu:commandline>
</domain>

Link to comment

Thank you for your assistance!

 

Motherboard is a ASRock - Z87 Extreme9/ac with an i7-4771 CPU. IOMMU is enabled.

 

PCI Devices

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V (rev 05)

00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)

00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)

00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)

00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5)

00:1c.5 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #6 (rev d5)

00:1c.6 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #7 (rev d5)

00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)

00:1f.0 ISA bridge: Intel Corporation Z87 Express LPC Controller (rev 05)

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)

00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)

01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)

02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)

02:09.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)

02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)

02:11.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)

03:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)

03:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)

05:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)

72:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)

73:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)

74:00.0 PCI bridge: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch (rev aa)

75:01.0 PCI bridge: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch (rev aa)

75:02.0 PCI bridge: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch (rev aa)

75:03.0 PCI bridge: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch (rev aa)

76:00.0 Network controller: Broadcom Corporation BCM4352 802.11ac Wireless Network Adapter (rev 03)

77:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03)

78:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

 

IOMMU Groups

/sys/kernel/iommu_groups/0/devices/0000:00:00.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.0

/sys/kernel/iommu_groups/1/devices/0000:01:00.0

/sys/kernel/iommu_groups/1/devices/0000:02:08.0

/sys/kernel/iommu_groups/1/devices/0000:02:09.0

/sys/kernel/iommu_groups/1/devices/0000:02:10.0

/sys/kernel/iommu_groups/1/devices/0000:02:11.0

/sys/kernel/iommu_groups/1/devices/0000:03:00.0

/sys/kernel/iommu_groups/1/devices/0000:03:00.1

/sys/kernel/iommu_groups/1/devices/0000:05:00.0

/sys/kernel/iommu_groups/2/devices/0000:00:02.0

/sys/kernel/iommu_groups/3/devices/0000:00:03.0

/sys/kernel/iommu_groups/4/devices/0000:00:14.0

/sys/kernel/iommu_groups/5/devices/0000:00:16.0

/sys/kernel/iommu_groups/6/devices/0000:00:19.0

/sys/kernel/iommu_groups/7/devices/0000:00:1a.0

/sys/kernel/iommu_groups/8/devices/0000:00:1b.0

/sys/kernel/iommu_groups/9/devices/0000:00:1c.0

/sys/kernel/iommu_groups/10/devices/0000:00:1c.4

/sys/kernel/iommu_groups/11/devices/0000:00:1c.5

/sys/kernel/iommu_groups/12/devices/0000:00:1c.6

/sys/kernel/iommu_groups/13/devices/0000:00:1d.0

/sys/kernel/iommu_groups/14/devices/0000:00:1f.0

/sys/kernel/iommu_groups/14/devices/0000:00:1f.2

/sys/kernel/iommu_groups/14/devices/0000:00:1f.3

/sys/kernel/iommu_groups/15/devices/0000:72:00.0

/sys/kernel/iommu_groups/16/devices/0000:73:00.0

/sys/kernel/iommu_groups/17/devices/0000:74:00.0

/sys/kernel/iommu_groups/18/devices/0000:75:01.0

/sys/kernel/iommu_groups/19/devices/0000:75:02.0

/sys/kernel/iommu_groups/20/devices/0000:75:03.0

/sys/kernel/iommu_groups/21/devices/0000:76:00.0

/sys/kernel/iommu_groups/22/devices/0000:77:00.0

/sys/kernel/iommu_groups/23/devices/0000:78:00.0

Link to comment

So here's the bad news, your GPU is in an IOMMU group with your Marvell sata controller.  This is the worst case scenario because that means if you use pcie ACS Override, peer to peer dma theoretically could occur on accident between those devices.  If sata to gpu, not a huge deal (worse case your VM locks up or something), but if gpu to SATA, well, silent corruption is possible IF the device isn't well behaved.

 

The issue is that we have no way of detect a well behaved device or not, which is why we state that use of this Override is experimental.

 

If you had 2 gpus in a IOMMU group, breaking them up with the Override would be more acceptable from a risk standpoint because at least there isn't a chance of silent corruption, just faulty graphics for the VM.

 

Now to put this in perspective though, I have a test system (our AVS 10/4) that has this setting enabled and the GPU is in a group with the lsi storage controller on board.  No data corruption has occurred to that system yet over the last 6 months (as far as we can tell).

 

So what can you do?  Well, you can try to move the GPU to another PCI slot in the system to see if that puts it in its own group, separate from the storage, USB, or Ethernet controllers.

Link to comment

Guess my only option here without a new motherboard is to get all the non video cards off the PCIe slots. I've got three drives on the Marvel card, I'll move their data off and then take them out of the array to be used as replacement drives in the event another drive fails.

Link to comment

No worries! Turns out, there's only one drive on the Marvel card, the other two were for the front hot-swap bays that I used for preclearing, so only have to take one drive out. Just a matter of time to move all the data off to the cache drive and let the mover put the data back on other drives (after flagging the shares to exclude the drive being removed) and rebuilding parity.

 

This is actually good news, means that I can put three graphics cards in there for VM's at the cost of just one drive.

 

 

Link to comment

No worries! Turns out, there's only one drive on the Marvel card, the other two were for the front hot-swap bays that I used for preclearing, so only have to take one drive out. Just a matter of time to move all the data off to the cache drive and let the mover put the data back on other drives (after flagging the shares to exclude the drive being removed) and rebuilding parity.

 

This is actually good news, means that I can put three graphics cards in there for VM's at the cost of just one drive.

Now that's what I call math I can get behind!!!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.