Jump to content

Can't get working second gpu on second VM


deivis163

Recommended Posts

Hello,

I have  MSI Z170A PC MATE motherboard, intel i7 6700 CPU, 1x nvidia 960GTX and 1x nvidia 650GTX.

My motherboard have only 2x pcie express slots 1 of them running 16x other one running 4x, I have a problem with second one slot which is running on 4x, I can't start GPU which are inserted in this slot.

I can see that this GPU are in the same IOMMU group with other devices. How I need to transfer that GPU to isolated group for only this GPU?

//

PCI devices

 

00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)

00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16) (rev 07)

00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 06)

00:08.0 System peripheral: Intel Corporation Sky Lake Gaussian Mixture Model

00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)

00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)

00:15.0 Signal processing controller: Intel Corporation Sunrise Point-H LPSS I2C Controller #0 (rev 31)

00:15.1 Signal processing controller: Intel Corporation Sunrise Point-H LPSS I2C Controller #1 (rev 31)

00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)

00:17.0 SATA controller: Intel Corporation Device a102 (rev 31)

00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)

00:1c.2 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #3 (rev f1)

00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)

00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)

00:1d.2 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #11 (rev f1)

00:1d.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #12 (rev f1)

00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-H LPSS UART #0 (rev 31)

00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)

00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)

00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)

00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)

01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)

01:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)

03:00.0 USB controller: ASMedia Technology Inc. Device 1242

04:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)

04:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)

06:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)

08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

 

IOMMU Groups

 

/sys/kernel/iommu_groups/0/devices/0000:00:00.0

/sys/kernel/iommu_groups/1/devices/0000:00:01.0

/sys/kernel/iommu_groups/2/devices/0000:00:02.0

/sys/kernel/iommu_groups/3/devices/0000:00:08.0

/sys/kernel/iommu_groups/4/devices/0000:00:14.0

/sys/kernel/iommu_groups/4/devices/0000:00:14.2

/sys/kernel/iommu_groups/5/devices/0000:00:15.0

/sys/kernel/iommu_groups/5/devices/0000:00:15.1

/sys/kernel/iommu_groups/6/devices/0000:00:16.0

/sys/kernel/iommu_groups/7/devices/0000:00:17.0

/sys/kernel/iommu_groups/8/devices/0000:00:1c.0

/sys/kernel/iommu_groups/9/devices/0000:00:1c.2

/sys/kernel/iommu_groups/9/devices/0000:00:1c.4

/sys/kernel/iommu_groups/9/devices/0000:03:00.0

/sys/kernel/iommu_groups/9/devices/0000:04:00.0

/sys/kernel/iommu_groups/9/devices/0000:04:00.1

/sys/kernel/iommu_groups/10/devices/0000:00:1d.0

/sys/kernel/iommu_groups/11/devices/0000:00:1d.2

/sys/kernel/iommu_groups/11/devices/0000:00:1d.3

/sys/kernel/iommu_groups/11/devices/0000:06:00.0

/sys/kernel/iommu_groups/11/devices/0000:08:00.0

/sys/kernel/iommu_groups/12/devices/0000:00:1e.0

/sys/kernel/iommu_groups/13/devices/0000:00:1f.0

/sys/kernel/iommu_groups/13/devices/0000:00:1f.2

/sys/kernel/iommu_groups/13/devices/0000:00:1f.3

/sys/kernel/iommu_groups/13/devices/0000:00:1f.4

/sys/kernel/iommu_groups/14/devices/0000:01:00.0

/sys/kernel/iommu_groups/14/devices/0000:01:00.1

 

Starting machine error:

internal error: early end of file from monitor: possible problem:

2016-06-02T14:56:00.315917Z qemu-system-x86_64: -device vfio-pci,host=04:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: error, group 9 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.

2016-06-02T14:56:00.315930Z qemu-system-x86_64: -device vfio-pci,host=04:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: vfio: failed to get group 9

2016-06-02T14:56:00.315937Z qemu-system-x86_64: -device vfio-pci,host=04:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device initialization failed

2016-06-02T14:56:00.315943Z qemu-system-x86_64: -device vfio-pci,host=04:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized

 

Could you someone give advice how I can solve this issue?

Link to comment

Your second Card are in the same IOMMU Group as other devices.

You have to try to enable the PCIe ACS Override Function.

 

You can find this here:

Settings -> VM Manager -> Enable PCIe ACS Override

set it to enable

 

Please reboot your UNRAID System.

 

Link to comment

Thx for reply,

I have tried to enable this option, but everything is the same after reboot. I think this option will help if I have both GPU in same group. I think I have to manually change IOMMU group to this my nvidia 650gtx device, but I don't know how to do it.. I hope there is a solution how to solve this issue and kindly peoples will give me minds how I can try to solve this issue. I saw in other posts that on skylake platform this is not so easy, but I hope it is possible.

Link to comment

Your USB3 Controller should be enabled.

00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)

 

You have to be deactivate these one

03:00.0 USB controller: ASMedia Technology Inc. Device 1242

 

Insert your usb flash drive into the Intel USB 3 Controller.

Maybe this could be your solution.

Link to comment

I checked and in BIOS menu I can disable USB, but then I'm disabling all USB controllers, I don't have permission to disable only one controller. Maybe it is possible somehow move to other group or disable that controller via UNRAID command line?

Link to comment

ok. check the id from the USB Controller with the command:

 

lspci -n 

 

 

and add this to the syslinux.cfg

 

vfio-pci.ids=10de:1381

 

replace 10de:1381 with your device id and reboot your system.

 

 

 

Link to comment

OMG it helped! Thank you so much.

 

But now I have one more problem, both of my VM's are restarting automatically due to interupts.

I see messages in ssh console:

Message from syslogd@Tower at Jun  4 03:09:18 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:10:36 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:10:37 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:11:08 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:11:10 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:12:26 ...

kernel:Disabling IRQ #16

 

Message from syslogd@Tower at Jun  4 03:12:28 ...

kernel:Disabling IRQ #16

 

And here is my /proc/interupts

 

            CPU0      CPU1      CPU2      CPU3      CPU4      CPU5      CPU6      CPU7

  0:        54          0          0          0          0          0          0          0  IR-IO-APIC-edge      timer

  1:          3          0          0          0          0          0          0          0  IR-IO-APIC-edge      i8042

  5:          0          0          0          0          0          0          0          0  IR-IO-APIC-edge      parport0

  7:        40          0          0          0          0          0          0          0  IR-IO-APIC-edge

  8:        24          0          0          0          0          0          0          0  IR-IO-APIC-edge      rtc0

  9:          0          0          0          0          0          0          0          0  IR-IO-APIC-fasteoi  acpi

  12:          3          0          0          0          0          0          0          0  IR-IO-APIC-edge      i8042

  16:    6309442          0          0          0          0          0          0          0  IR-IO-APIC  16-fasteoi

120:          0          0          0          0          0          0          0          0  DMAR_MSI-edge      dmar0

121:          0          0          0          0          0          0          0          0  DMAR_MSI-edge      dmar1

123:      76849          0          0          0          0          0          0          0  IR-PCI-MSI-edge      xhci_hcd

124:    1045372          0          0          0          0          0          0          0  IR-PCI-MSI-edge      0000:00:17.0

125:    370711          0          0          0          0          0          0          0  IR-PCI-MSI-edge      eth0

NMI:          0          0          0          0          0          0          0          0  Non-maskable interrupts

LOC:    1377038    1151678    1022694    1006239    724426    840385    964953    1038576  Local timer interrupts

SPU:          0          0          0          0          0          0          0          0  Spurious interrupts

PMI:          0          0          0          0          0          0          0          0  Performance monitoring interrupts

IWI:        46          0          0          0          0          0          0          0  IRQ work interrupts

RTR:          0          0          0          0          0          0          0          0  APIC ICR read retries

RES:    756658    1516290    695308    1107540    888187    985263    773691    1076654  Rescheduling interrupts

CAL:      2345      2935      2045      2924      2948      2243      1948      1716  Function call interrupts

TLB:      15873      19224      19742      17211      10748      10689      8770      5506  TLB shootdowns

TRM:          0          0          0          0          0          0          0          0  Thermal event interrupts

THR:          0          0          0          0          0          0          0          0  Threshold APIC interrupts

MCE:          0          0          0          0          0          0          0          0  Machine check exceptions

MCP:        11        11        11        11        11        11        11        11  Machine check polls

HYP:          0          0          0          0          0          0          0          0  Hypervisor callback interrupts

ERR:        40

MIS:          0

 

Maybe this problem is already known for you?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...