Unstable GPU Passthrough on 2nd PCI-E slot (Windows 10)


Recommended Posts

Here are my system specs:

 

Unraid 6.2.4

i5-6500

16GB DDR4 RAM

Gigabyte GA-H170M-D3H (F21 BIOS - Latest)

NVIDIA GTX 1080 (Slot 1)

NVIDIA GTX 950 (Slot 2)

 

Whenever I run a VM and passthrough the GTX 950, my VM eventually (after a couple of minutes) freezes and then this error is displayed in the syslog:

 

Jan 12 14:02:00 Tower kernel: pcieport 0000:00:1c.4: AER: Device recovery failed
Jan 12 14:02:00 Tower kernel: pcieport 0000:00:1c.4: AER: Multiple Uncorrected (Non-Fatal) error received: id=00e4
Jan 12 14:02:00 Tower kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=00e4(Requester ID)
Jan 12 14:02:00 Tower kernel: pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00100000/00000000

 

Any thoughts/ideas/suggestions?

Link to comment

Have you checked for any bios updates for your mainboard?

 

Yup. Updated to the latest.

 

Can you change pcie speeds in your bios for the slot the GTX 950 is in to gen1?

You can read more here

 

Don't know if it works, but worth a try.

 

Welp, I coudn't see an option exactly like the one mentioned in the site you linked, but I did try altering the Max Link Speed to to Gen1 (from Auto), and no dice.

 

I do have a macOS Sierra VM that seems to work correctly with this card in the 2nd slot. Any idea why a Windows VM would not? Should be noted as well that I copied the .img for my VM that runs my GTX 1080 and then altered the passthrough to the GTX 950. That is what I am running when this inevitably fails.

 

Here is my XML for Windows 10:

<domain type='kvm'>
  <name>Gaming (2)</name>
  <uuid>b4941539-29d3-279b-c282-afcfe64c2985</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>6291456</memory>
  <currentMemory unit='KiB'>6291456</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/b4941539-29d3-279b-c282-afcfe64c2985_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='2' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/vdisks/Gaming/vdisk2.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.118-2.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:fe:25:0b'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <source mode='connect'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x062a'/>
        <product id='0x4101'/>
      </source>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

And here is my XML for the macOS Sierra:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>macOS Sierra</name>
  <uuid>102df743-14f6-13f8-a269-0d4ec54538d4</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Ubuntu" icon="ubuntu.png" os="ubuntu"/>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-2.5'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/102df743-14f6-13f8-a269-0d4ec54538d4_VARS-pure-efi.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Penryn</model>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/vdisks/macOS Sierra/vdisk1.img'/>
      <target dev='hda' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x02' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:51:66:48'/>
      <source bridge='br0'/>
      <model type='e1000-82545em'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </interface>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x03' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x1f' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1532'/>
        <product id='0x002e'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b07'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0a5c'/>
        <product id='0x21e8'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x2357'/>
        <product id='0x0105'/>
      </source>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='none' model='none'/>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='usb-kbd'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='usb-mouse'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='isa-applesmc,osk=ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc'/>
    <qemu:arg value='-smbios'/>
    <qemu:arg value='type=2'/>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='Penryn,vendor=GenuineIntel'/>
  </qemu:commandline>
</domain>

 

Also, here are my system devices:

00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:191f] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:1912] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Thermal subsystem [8086:a131] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31)
00:1b.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Root Port #19 [8086:a169] (rev f1)
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #3 [8086:a112] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1)
00:1d.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #13 [8086:a11c] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a144] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 950] [10de:1402] (rev a1)
05:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev a1)

 

And IOMMU Groups:

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/5/devices/0000:00:17.0
/sys/kernel/iommu_groups/6/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:02:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.4
/sys/kernel/iommu_groups/7/devices/0000:05:00.0
/sys/kernel/iommu_groups/7/devices/0000:05:00.1
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.4
/sys/kernel/iommu_groups/9/devices/0000:00:1f.0
/sys/kernel/iommu_groups/9/devices/0000:00:1f.2
/sys/kernel/iommu_groups/9/devices/0000:00:1f.3
/sys/kernel/iommu_groups/9/devices/0000:00:1f.4
/sys/kernel/iommu_groups/10/devices/0000:00:1f.6

 

The GTX 950 is what I am having trouble with.

Link to comment

I see you are not passing through the sound of the GPU, try that first.

The problem looks to be your pcie root port, if the above doesn't work you could try to enable the ACS override, but most likely won't work.

 

Download DDU and clean out the drivers from the 1080 and see if that makes any difference.

 

Link to comment

I see you are not passing through the sound of the GPU, try that first.

The problem looks to be your pcie root port, if the above doesn't work you could try to enable the ACS override, but most likely won't work.

 

Download DDU and clean out the drivers from the 1080 and see if that makes any difference.

 

 

I tried sound card, that did not work.

 

What is interesting is I tried DDU and then switched the passthrough to GTX 950. The VM booted fine and stayed booted for as long as I wanted, until I installed the drivers for the display adapter. As soon as I installed any driver (nvidia or from microsoft) the VM crashed again with the same errors. Any thoughts?

Link to comment

I tried sound card, that did not work.

 

What is interesting is I tried DDU and then switched the passthrough to GTX 950. The VM booted fine and stayed booted for as long as I wanted, until I installed the drivers for the display adapter. As soon as I installed any driver (nvidia or from microsoft) the VM crashed again with the same errors. Any thoughts?

 

Have you tried SeaBios instead of OVMF for your windows install? I could not get my drivers installed for several cards with OVMF (VM would crash and occasionally whole server during driver install) and Windows 10. Switched to SeaBios, and cards installed without issue...

Link to comment

Have you tried SeaBios instead of OVMF for your windows install? I could not get my drivers installed for several cards with OVMF (VM would crash and occasionally whole server during driver install) and Windows 10. Switched to SeaBios, and cards installed without issue...

 

Tried a brand new installation with SeaBios instead of OVMF, same problem and same error, while installing the nvidia drivers. I also tried to use newer virtio drivers and that also had the same problem.

Link to comment

Have you tried SeaBios instead of OVMF for your windows install? I could not get my drivers installed for several cards with OVMF (VM would crash and occasionally whole server during driver install) and Windows 10. Switched to SeaBios, and cards installed without issue...

 

Tried a brand new installation with SeaBios instead of OVMF, same problem and same error, while installing the nvidia drivers. I also tried to use newer virtio drivers and that also had the same problem.

 

Too bad :(.

 

Here's some shots in the dark (I'm a firm believer in the "Try it all" technique to isolate the problem. More than likely these won't be the solution, but it doesn't hurt to rule them out):

 

1. Have you tried dumping the vbios then passing that to the xml of the vm? This sometimes helps with cards that act a bit flaky.

Here's a great youtube video by u/gridrunner that walks you through it: https://www.youtube.com/embed/mM7ntkiUoPk

(Note: Sorry about the YouTube embed link. UnRaid seems to eat any other youtube link I include...)

 

2. Try adding xvga='yes' to your hostdev tag for the card:

<hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>

 

Honestly, not entirely sure if that's necessary. I see it sometimes, other times I don't. I've googled if it's necessary, and can't find an answer, but at this point, it's worth a shot.

 

3. Hyper-v Off/On. Have you tried both off and on? Since 6.2, I don't believe it's an issue with nvidia gpus, but since we're trying everything, I thought I'd throw it in there.

 

If you have the time, it wouldn't hurt to try all these things in your OVMF as well as SeaBios VM...

 

Good luck! That's the extent of my pitiful VM knowledge :)

 

Link to comment

Too bad :(.

 

Here's some shots in the dark (I'm a firm believer in the "Try it all" technique to isolate the problem. More than likely these won't be the solution, but it doesn't hurt to rule them out):

 

1. Have you tried dumping the vbios then passing that to the xml of the vm? This sometimes helps with cards that act a bit flaky.

Here's a great youtube video by u/gridrunner that walks you through it: https://www.youtube.com/embed/mM7ntkiUoPk

(Note: Sorry about the YouTube embed link. UnRaid seems to eat any other youtube link I include...)

 

2. Try adding xvga='yes' to your hostdev tag for the card:

<hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>

 

Honestly, not entirely sure if that's necessary. I see it sometimes, other times I don't. I've googled if it's necessary, and can't find an answer, but at this point, it's worth a shot.

 

3. Hyper-v Off/On. Have you tried both off and on? Since 6.2, I don't believe it's an issue with nvidia gpus, but since we're trying everything, I thought I'd throw it in there.

 

If you have the time, it wouldn't hurt to try all these things in your OVMF as well as SeaBios VM...

 

Good luck! That's the extent of my pitiful VM knowledge :)

 

 

Well, I tried those steps and had the same result every time. I think this is related to my motherboard.

 

I actually swapped out the positioning of the GTX1080 and the GTX950 and got the same exact result from the GTX1080.

 

I know the PCIe lane for the 2nd PCIe slot is x4 not x16, but the board is labeled for crossfire... which makes me think there would be no issue getting the other slot to work.

 

If I were to purchase another motherboard (that has 2 PCIe x16 lanes) would it be reasonable to assume this problem would go away?

Link to comment

Well, I tried those steps and had the same result every time. I think this is related to my motherboard.

 

I actually swapped out the positioning of the GTX1080 and the GTX950 and got the same exact result from the GTX1080.

 

I know the PCIe lane for the 2nd PCIe slot is x4 not x16, but the board is labeled for crossfire... which makes me think there would be no issue getting the other slot to work.

 

If I were to purchase another motherboard (that has 2 PCIe x16 lanes) would it be reasonable to assume this problem would go away?

 

That definitely sounds like a short-coming of the mobo... A couple more thoughts before you invest more $$:

 

1. You said you switched from auto to Gen1 in your bios. The manual for that board shows Gen2 and 3 are options as well. Can you try setting it explicitly to Gen3 (and then 2 if no luck)?

 

2. Have you tried resetting your bios to default (and then make sure to go through and tweak the settings you want... If you're not sure, take a picture of every screen before you do a reset, and if in doubt after the reset, refer to the pictures. This will also let you see any unusual pre-reset settings...) Could the voltage for the 2nd pci-e slot be set to something other than auto? (I usually have all my voltages set to auto, unless I'm messing around with overclocking... which I haven't done in a long time :))

 

3. Finally, you mentioned that your macOS VM has no issues with that slot and the GTX950. Why not move the GTX1080 to the second slot, and map it to your macOS install, and use the first slot for Windows and the GTX950? And if it doesn't work with the GTX1080, use the GTX950 for macOS (since you know it has no issues), and use the 1080 for Windows in the first slot.

 

As far as a new motherboard.... well... there's no guarantees that it is a lane-width issue. My understanding is a video card will usually work in a 4x, 8x, or 16x slot (assuming the slot is the correct length). They just won't be as fast as they could be if they're designed for 16x, and operating in a 4x... That said, more than likely a new board will not have these issues, though I would try and source a board that someone else has successfully used with multiple vga cards and VMs...

 

Also, you mention CrossFire. Keep in mind that CF only supports AMD cards, not Nvidia... Not that that should really matter, as you are not attempting to use CF/SLI...

 

Link to comment

hi, i hope i'm not jumping into this without having much clue.

but to my understanding, the GPU cards should be in separate iommu groups for passthrough.

 

but is see the 950 is in group 7 along with different device. Not sure if that PCI device is normal to be in that group (e.g. bundled).

 

Also, i see that the 1080 is also with another device (group 1).

 

As i remember, on my asus x99-m ws matx motherboard, my 550ti and 1060 are in totally separate groups, without any other devices.

 

-d

 

 

Link to comment

hi, i hope i'm not jumping into this without having much clue.

but to my understanding, the GPU cards should be in separate iommu groups for passthrough.

 

but is see the 950 is in group 7 along with different device. Not sure if that PCI device is normal to be in that group (e.g. bundled).

 

Also, i see that the 1080 is also with another device (group 1).

 

As i remember, on my asus x99-m ws matx motherboard, my 550ti and 1060 are in totally separate groups, without any other devices.

 

-d

 

 

 

I think you might be on to something...

 

The GTX1080 is in slot 1, which according to the system devices is: Intel Corporation Skylake PCIe Controller (x16)

 

The GTX950 is in slot 2, which according to the system devices is: either Intel Corporation Sunrise Point-H PCI Express Root Port #3 or Intel Corporation Sunrise Point-H PCI Express Root Port #5

 

Why does my motherboard not have 2 Intel Corporation Skylake PCIe Controller (x16)? Or at least another that says Intel Corporation Skylake PCIe Controller (x4) or something?

Link to comment

hi,

 

the available pcie lanes are coming from processor and chipset.

 

To my understanding, the h170 platform can allocate 16 lanes from CPU (to GPU) and another 16 lanes from the chipset . But most probably this allocation from chipset is not for GPU, but to other devices, like storage, etc).

 

In case of H170, apparently these 16 lanes are only 1x16, e.g. does not allow for example 2x8 - e.g. to be split...

see this table:

https://www.pugetsystems.com/labs/articles/Z170-H170-H110-B170-Q150-Q170---What-is-the-Difference-635/

 

my doubt is: most probably the slot 1 one GPU (closest to your CPU) will use the entire lanes from CPU. And the other GPU will not work properly with the lanes from the chipset.

 

The platform does not support SLI (not that you're trying to do SLI) and this might have something to do with this limitation.

 

Maybe others can step in with more clarifications, opinions, personal experience. I think if we get single confirmation that somebody successfully used an H170 chipset for 2 GPU passthrough, is good enough for you to keep trying :)

 

What i would try next :

 

1)

See if you can allocate to both your GPU's in Bios (if there is such option) the Gen2 setting instead of Gen3/Auto. Maybe this will allocate only 8 lanes per GPU... worth trying.

 

2) keep only one GPU, in second slot - does it work properly when passthrough? (in this was we eliminate the possibility that the slot 2 is faulty)

 

3) disable the onboard GPU, also make sure you do not boot into unraid GUI option; use a second PC to connect to the admin console, ssh, etc. - do you get different IOMMU groups?

 

 

Also, maybe it's worth trying to use the rom dump method but not sure what are the steps in case you have integrated GPU (cannot try as i have x99).

 

-d

Link to comment

hi, i hope i'm not jumping into this without having much clue.

but to my understanding, the GPU cards should be in separate iommu groups for passthrough.

 

but is see the 950 is in group 7 along with different device. Not sure if that PCI device is normal to be in that group (e.g. bundled).

 

Also, i see that the 1080 is also with another device (group 1).

 

As i remember, on my asus x99-m ws matx motherboard, my 550ti and 1060 are in totally separate groups, without any other devices.

 

-d

 

 

 

I think you might be on to something...

 

The GTX1080 is in slot 1, which according to the system devices is: Intel Corporation Skylake PCIe Controller (x16)

 

The GTX950 is in slot 2, which according to the system devices is: either Intel Corporation Sunrise Point-H PCI Express Root Port #3 or Intel Corporation Sunrise Point-H PCI Express Root Port #5

 

Why does my motherboard not have 2 Intel Corporation Skylake PCIe Controller (x16)? Or at least another that says Intel Corporation Skylake PCIe Controller (x4) or something?

 

It is your iommu groups which is the problem.

Enable PCIe ACS Override  this is found in settings vm manager then toggle the advanced settings. You will then need to reboot the server afterwards for it to take effect. After you have done this please repost the iommu groups. Have a look here http://lime-technology.com/forum/index.php?topic=53573

 

Link to comment

It is your iommu groups which is the problem.

Enable PCIe ACS Override  this is found in settings vm manager then toggle the advanced settings. You will then need to reboot the server afterwards for it to take effect. After you have done this please repost the iommu groups. Have a look here http://lime-technology.com/forum/index.php?topic=53573

 

Here are my IOMMU groups after Enabling the PCIe ACS Override:

 

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:14.0
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/5/devices/0000:00:17.0
/sys/kernel/iommu_groups/6/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1b.3
/sys/kernel/iommu_groups/6/devices/0000:02:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.4
/sys/kernel/iommu_groups/7/devices/0000:06:00.0
/sys/kernel/iommu_groups/7/devices/0000:06:00.1
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/9/devices/0000:00:1f.0
/sys/kernel/iommu_groups/9/devices/0000:00:1f.2
/sys/kernel/iommu_groups/9/devices/0000:00:1f.3
/sys/kernel/iommu_groups/9/devices/0000:00:1f.4
/sys/kernel/iommu_groups/10/devices/0000:00:1f.6
/sys/kernel/iommu_groups/11/devices/0000:01:00.0
/sys/kernel/iommu_groups/11/devices/0000:01:00.1

 

And devices:

00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:191f] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:1912] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31)
00:1b.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Root Port #19 [8086:a169] (rev f1)
00:1b.3 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Root Port #20 [8086:a16a] (rev f1)
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #3 [8086:a112] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a144] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
02:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 950] [10de:1402] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev a1)

Link to comment

Ok so that has not worked for your 950. Its still in an iommu group with other items. However the 1080 is in its own group now.

Are there any other pcie slots that you can put your 950 in. It may then be in its own group.

 

Nope, this board only has 2 PCIe slots. The 1st is x16 and the 2nd is x4 (according to the manual).

 

If I were to purchase this board: GIGABYTE GA-Z170XP-SLI -- do you think that I could run 2 VMs at once, each with their own GPU passed through? I am starting to think this is either a limitation of my motherboard or the H170M chipset.

Link to comment

Ok so that has not worked for your 950. Its still in an iommu group with other items. However the 1080 is in its own group now.

Are there any other pcie slots that you can put your 950 in. It may then be in its own group.

 

Nope, this board only has 2 PCIe slots. The 1st is x16 and the 2nd is x4 (according to the manual).

 

If I were to purchase this board: GIGABYTE GA-Z170XP-SLI -- do you think that I could run 2 VMs at once, each with their own GPU passed through? I am starting to think this is either a limitation of my motherboard or the H170M chipset.

 

Its hard to know, to be honest, I haven't used it myself. But I had a sky lake system last year. I used this mobo ASRock Z170M EXTREME 4

With that, i had a Nvidia gtx 970 and a Radeon Hd 6450 passed through at the same time. I did have to use the acs override patch.

I like ASRock boards myself. I was recommended to use them myself when I built my first unraid server and I have always used them since.

Link to comment

most probably you missed my post, which is last reply in the first page of this topic...

 

hi,

 

the available pcie lanes are coming from processor and chipset.

 

To my understanding, the h170 platform can allocate 16 lanes from CPU (to GPU) and another 16 lanes from the chipset . But most probably this allocation from chipset is not for GPU, but to other devices, like storage, etc).

 

In case of H170, apparently these 16 lanes are only 1x16, e.g. does not allow for example 2x8 - e.g. to be split...

see this table:

https://www.pugetsystems.com/labs/articles/Z170-H170-H110-B170-Q150-Q170---What-is-the-Difference-635/

 

my doubt is: most probably the slot 1 one GPU (closest to your CPU) will use the entire lanes from CPU. And the other GPU will not work properly with the lanes from the chipset.

 

The platform does not support SLI (not that you're trying to do SLI) and this might have something to do with this limitation.

 

Maybe others can step in with more clarifications, opinions, personal experience. I think if we get single confirmation that somebody successfully used an H170 chipset for 2 GPU passthrough, is good enough for you to keep trying :)

 

What i would try next :

 

1)

See if you can allocate to both your GPU's in Bios (if there is such option) the Gen2 setting instead of Gen3/Auto. Maybe this will allocate only 8 lanes per GPU... worth trying.

 

2) keep only one GPU, in second slot - does it work properly when passthrough? (in this was we eliminate the possibility that the slot 2 is faulty)

 

3) disable the onboard GPU, also make sure you do not boot into unraid GUI option; use a second PC to connect to the admin console, ssh, etc. - do you get different IOMMU groups?

 

 

Also, maybe it's worth trying to use the rom dump method but not sure what are the steps in case you have integrated GPU (cannot try as i have x99).

 

-d

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.