Pass through 2x RTX Titans (with nvlink) to single VM


TheSkaz

Recommended Posts

Does this mean anything useful in regards to my issue?

 

Sep 17 08:44:57 Tower kernel: vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Sep 17 08:44:57 Tower kernel: Linux agpgart interface v0.103
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:01:00.2: remove, state 4
Sep 17 08:44:57 Tower kernel: usb usb2: USB disconnect, device number 1
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:01:00.2: USB bus 2 deregistered
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:01:00.2: remove, state 4
Sep 17 08:44:57 Tower kernel: usb usb1: USB disconnect, device number 1
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:01:00.2: USB bus 1 deregistered
Sep 17 08:44:57 Tower kernel: vfio-pci 0000:50:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:50:00.2: remove, state 4
Sep 17 08:44:57 Tower kernel: usb usb16: USB disconnect, device number 1
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:50:00.2: USB bus 16 deregistered
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:50:00.2: remove, state 4
Sep 17 08:44:57 Tower kernel: usb usb15: USB disconnect, device number 1
Sep 17 08:44:57 Tower kernel: xhci_hcd 0000:50:00.2: USB bus 15 deregistered
Sep 17 08:44:57 Tower kernel: nvidia: loading out-of-tree module taints kernel.
Sep 17 08:44:57 Tower kernel: nvidia: loading out-of-tree module taints kernel.
Sep 17 08:44:57 Tower kernel: nvidia: module license 'NVIDIA' taints kernel.
Sep 17 08:44:57 Tower kernel: nvidia: module license 'NVIDIA' taints kernel.
Sep 17 08:44:57 Tower kernel: Disabling lock debugging due to kernel taint
Sep 17 08:44:57 Tower kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 247
Sep 17 08:44:57 Tower kernel: vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Sep 17 08:44:57 Tower kernel: vfio-pci 0000:50:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Sep 17 08:44:57 Tower kernel: nvidia 0000:4e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Sep 17 08:44:57 Tower kernel: NVRM: The NVIDIA probe routine was not called for 2 device(s).
Sep 17 08:44:57 Tower kernel: NVRM: This can occur when a driver such as: 
Sep 17 08:44:57 Tower kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv 
Sep 17 08:44:57 Tower kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s).
Sep 17 08:44:57 Tower kernel: NVRM: Try unloading the conflicting kernel module (and/or
Sep 17 08:44:57 Tower kernel: NVRM: reconfigure your kernel without the conflicting
Sep 17 08:44:57 Tower kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module
Sep 17 08:44:57 Tower kernel: NVRM: again.
Sep 17 08:44:57 Tower kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020

Edited by TheSkaz
Link to comment

I have the VM up and able to boot with both gpus showing. in the VM logs for the machine, I am getting hundreds of these:

 

2020-09-22T06:21:28.221139Z qemu-system-x86_64: vfio_region_write(0000:01:00.0:region1+0x801b8, 0x0,8) failed: Device or resource busy

 

that is my primary video card for the system and 1 of the 2 gpus for the VM. anything that attempts to use the gpus freezes. 

Link to comment
  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.