Libvirt: xml and pci addresses assignment


Recommended Posts

Hi, I'm trying to understand how to assign a passed through device (vfio) to a predefined address (domain:bus:slot:function).

As far as I know, in libvirt we have to specify the source address of the device in the host and a target address for the guest.

So, for example:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </hostdev>

This will passthrough the device in the host at 0000:00:1b.0 (in the host, domain, bus, slot, function) to 0000:00:02.0 (in the guest, domain, bus, slot, function).

 

In my xml I have the following devices passed through:

    <hostdev mode='subsystem' type='pci' managed='yes'>	// GPU VIDEO [GeForce GTX TITAN Black] [10de:100c]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>	// GPU AUDIO [10de:0e1a]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>	// MB AUDIO C600/X79 High Definition Audio Controller [8086:1d20]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>	// SATA Marvell 88SE9230 Controller [1b4b:9230]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>	// USB 3.0 Fresco Logic FL1100 Controller [1b73:1100]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>	// FIREWIRE VIA VT6315 Controller [1106:3403]
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

 

Ok, everything shows up in the vm, everything works.

Now, one can stop here, but since I like to understand, I did an lspci -nn in the guest, and this is what is showing:

00:00.0 Host bridge [0600]: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller [8086:29c0] (subsys 1af4:1100)
00:01.0 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
00:01.1 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
00:01.2 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
00:01.3 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
00:01.4 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
01:00.0 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000e]
00:01.5 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
03:00.0 Communication controller [0780]: Red Hat, Inc (null) [1af4:1043] (rev 01) (subsys 1af4:1100)
00:02.0 Audio device [0403]: Intel Corporation C600/X79 series chipset High Definition Audio Controller [8086:1d20] (rev 06) (subsys 1043:84d8)
00:02.3 PCI bridge [0604]: Red Hat, Inc. (null) [1b36:000c]
04:00.0 FireWire (IEEE 1394) [0c00]: VIA Technologies, Inc. VT6315 Series Firewire Controller [1106:3403] (rev 01) (subsys 1106:3403)
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller [1b4b:9230] (rev 10) (subsys 1b4b:9230)
00:07.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 03) (subsys 1af4:1100)
00:07.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 03) (subsys 1af4:1100)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801IB (ICH9) LPC Interface Controller [8086:2918] (rev 02) (subsys 1af4:1100)
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation (null) [10de:100c] (rev a1) (subsys 10de:0010)
00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] [8086:2922] (rev 02) (subsys 1af4:1100)
06:00.1 Audio device [0403]: NVIDIA Corporation GK110 HDMI Audio [10de:0e1a] (rev a1) (subsys 10de:1066)
02:01.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] (subsys 1af4:0001)
00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930] (rev 02) (subsys 1af4:1100)
02:08.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] (subsys 1af4:0001)
08:00.0 USB controller [0c03]: Fresco Logic (null) [1b73:1100] (rev 10) (subsys 1b73:1100)

 

It doesn't correspond to what I assigned...

GPU (video) should be at 0000:03:00.0, instead it's at 0000:06:00.0

GPU (audio) should be at 0000:03:00.1, instead it's at 0000:06:00.1

Fresco USB should be at 0000:04:00.0 instead it's at 0000:08:00.0

Firewire should be at 0000:05:00.0 instead it's at 0000:04:00.0

I have also two network bridges which are at the wrong addresses.

Mainboard audio is correct, it's specified in the xml at 0000:00:02.0 and shows at 0000:00:02.0

Also the other bridges and virtual usb controller are correct.

 

I know marvell sata it's wrong in the xml since 0000:00:00.0 is reserved and hardcoded in qemu, maybe libvirt automatically reassigns it at 000:05:00.0

 

So, I modified the xml to assign the assigned addresses that showed in the guest (for every vfio devices, for example I assigned for marvell SATA <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/>), but with the result that the lspci command returns (again) not correct addresses.

 

Anyone is able to explain?

Edited by ghost82
  • Thanks 1
Link to comment

This post is UPDATED

 

I think I found the answer myself :D

 

This sentence was my first finding:

Quote

QEMU, and consequently libvirt, uses the bus property of a device's PCI address only to match it with the PCI controller that has the same index property, and not to set the actual PCI address, which is decided by the guest OS.

 

So I understand that it's the guest (the vm) that assigns pci addresses "regardless" of what's specified in the xml in the <address> line of code.

However we can define to which bus a device must connect.

 

This is not true: I was editing the xml in a wrong way; after using "virsh edit vmname", which is the correct method to edit the xml of a virtual machine all is right and adddresses in the xml reflext that of the lspci command in the guest.

 

A Q35 machine has:

pcie-root

pcie-root-port

(pcie-to-pci-bridge)

 

These entries have "Index=xx".

I think we can think about:

pcie-root for integrated devices (on the mainboard), such as built-in ethernet, built-in sata controllers, built-in usb controllers, etc.

pcie-root-port for third party devices, such as a pcie gpu, pcie controllers, pcie usb, sata controllers, etc. (mac os doesn't start if the sata controller is attached to bus 0x00)

pcie-to-pci-bridge for third party legacy pci devices.

 

Our vfio and virtual devices are attached to these entries.

For example, let's consider:

....
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
....
    <interface type='bridge'>
      <mac address='aa:bb:cc:11:22:33'/>
      <source bridge='br0'/>
      <model type='virtio-net'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>

 

Here we have pcie-root at index=0, 2 pcie-root-port at index 1 and 2 and a virtual network, attached as a pcie hotpluggable device, because it's attached at bus 0x02 (the bus we are specifying for the network, 0x02, attaches to index=2, so on the pci-root-port with index=2).

Let's say we need to attach the network card because we want to be built-in: mac os, for example, requires a en0 interface built-in to have working apple services.

Or, in mac os, the applealc kext for audio requires a built-in audio device for HDEF.

All we need to do is change to:

....
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
....
    <interface type='bridge'>
      <mac address='aa:bb:cc:11:22:33'/>
      <source bridge='br0'/>
      <model type='virtio-net'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>

Now the network attaches to bus 0x00, which corresponds to index=0, that means pcie-root and it will be recognized by the os as built-in.

Note: we need to attach it to slot 2 because 0000:00:00.0 is reserved and 0000:00:01.0 has already a pcie-root-port attached (index=1).

 

By knowing how a q35 machine topology works we can build our virtual hardware, define our topology and simplify the xml (by looking at my xml I had more pcie-root-port entries than I needed).

 

This is how my mac os vm is now configured:

 

pcie-root (index=0 --> bus 0x00) :

- Marvell SATA controller (vfio) (slot 0, function 0)

- MB Audio (vfio) (slot 2, function 0)

- Virtio ethernet 2 (slot 3, function 0)

- USB virtual UHCI controller (slot 4, function 0, multifunction)

- USB virtual EHCI controller (slot 4, function 1)

- Firewire controller (vfio) (slot 5, function 0)

- Virtio ethernet 1 (slot 6, function 0)

 

pcie-root-port (index=1 --> bus 0x01, multifunction)

- Marvell SATA controller (vfio) (slot 0, function 0)

 

pcie-root-port (index=2 --> bus 0x02) :

- virtio-serial (slot 0, function 0)

 

pcie-root-port (index=3 --> bus 0x03) :

- GPU (video) (vfio) (slot 0, function 0, multifunction)

- HDMI audio (vfio) (slot 0, function 1)

 

pcie-root-port (index=4 --> bus 0x04) :

- Fresco USB controller (vfio) (slot 0, function 0)

 

topology.thumb.png.42482a682c72b7b54978efe23c1178b2.png

 

In a q35 machine there are also reserved and hardcoded addresses, that can't be removed (for example, you can't remove from a virtual machine the ahci virtual sata controller)

0000:00:00.0 is for host bridge

0000:00:1f.2 is for the SATA controller

0000:00:1f.0 is the ISA bridge

0000:00:1f.3 is the SMBus

 

If the xml lacks instructions for usb and sound chip the following addresses will be used:

0000:00:1a.0 for USB2 controller

0000:00:1b.0 for ICH9 sound chip

0000:00:1d.0 for USB2 controller

 

As you can see the lspci command in the guest reflects the addresses specified in the xml:

 

lspci.thumb.png.339088610a925717bdc6ce75e0c4c5bf.png

Edited by ghost82
  • Like 1
  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.