SOLVED - Device is ineligible for IOMMU domain attach due to platform RMRR.


1812

Recommended Posts

I was setting up my last server tonight and was having issues getting a vm running via passing through a gpu, getting iommu error as follows:

 

internal error: process exited while connecting to monitor: 2016-12-18T02:12:40.093534Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to set iommu for container: Operation not permitted
2016-12-18T02:12:40.093567Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to setup container for group 18
2016-12-18T02:12:40.093575Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to get group 18
2016-12-18T02:12:40.093588Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: Device initialization failed

 

The offending device is a gt 710.

 

the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file. So, it's there, and correct for this server (actually copied the file from one of the others that works perfectly.

 

This server, is one of 3 that are nearly identical in every way: memory, cpu, gpu (including location.)

 

so I started digging through the logs and noticed a few differences between the working server vs this one with the error in regards to the gpu. Instead of posting the entire log, i'll list the differences:

 

working server

 

Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:03:00.1
Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:07:00.0
Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:07:00.1
Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:3e:00.0

 

error server

 

Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:03:00.1
Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:07:00.0
Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:3e:00.0

 

 

so device 0000:07:00.1, the audio portion of the gpu is missing. Now, before you say "ok, the gpu is bad" I actually swapped it out from another server with a known working one. So the gpu is good.

 

continuing on....

 

iommu assignments from the log:

 

working server

Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:04:00.0 to group 18
Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:07:00.0 to group 19
Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:07:00.1 to group 19
Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:3e:00.0 to group 20

 

error server

Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:03:00.1 to group 14
Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:07:00.0 to group 18
Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:07:00.1 to group 18
Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:3e:00.0 to group 19

 

so, different group assignment, but both required parts of the gpu are there. but before the error server iommu groupings is the following:

 

error server

Dec 17 17:59:12 Brahms2 kernel: DMAR: Ignoring identity map for HW passthrough device 0000:07:00.0 [0xcf63e000 - 0xcf63ffff]
Dec 17 17:59:12 Brahms2 kernel: DMAR: Setting identity map for device 0000:07:00.1 [0xcf63e000 - 0xcf63ffff]

 

a little bit later the following shows up:

 

error server

Dec 17 18:06:59 Brahms2 kernel: br0: port 2(vnet0) entered forwarding state
Dec 17 18:06:59 Brahms2 kernel: vfio-pci 0000:07:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.
Dec 17 18:06:59 Brahms2 kernel: br0: port 2(vnet0) entered disabled state

 

I compared the bios dates on the error server and the other working ones, and oddly enough it has a roughly 1 year NEWER date. Perhaps functionality broke because of that. Either way, I'm not going to pay hp for their most up to date firmware on older hardware like this.

 

Looking around this forum, this issue  only appears a few times but with no real resolutions for the gpu assignment (one for hdd availability but no gpu conclusion stated.)

 

In one of the previous threads on this topic, jonp made a suggestion about using acs override only for the offending device:

 

I wanted to chime in on this thread because this is definitely an oddball and the first time I've seen an RMRR message on a GTX 7xx GPU.  That said, RMRR errors like this aren't a good sign, though I do have one more thing for you to try before we have to give up hope.  First, if you want some light-reading on the subject, this RedHat article is pretty comprehensive and specifically covers assigning PCI devices to VMs in this situation.

 

Now with the boring stuff out of the way, the last thing I would ask you to try is a change the PCIe ACS Override option in the syslinux.cfg.  The way this normally looks is as such:

 

 

append pcie_acs_override=downstream initrd=/bzroot

 

I want you to change that to this:

 

append pcie_acs_override=id:10de:1381,10de:0fbc initrd=/bzroot

 

Don't forget to reboot your system after applying this change.

 

This is different than the vfio-pci.ids thing you tried before.  Please report back if this changes anything for you.

 

 

This suggestion seemed to not work for the original poster, but I thought i'd give it a shot. So i changed the device id's appropriately, and rebooted.

 

This split the gpu into 2 different iommu groups. I then fired up the vm and NOPE. Same error. I went back to the log and it still showed "0000:07:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement."

 

So I tried removing that part of the gpu (07:00.1) from the xml (since it was now in a different iommu group) and the vm booted up just fine. Since i'm not using hdmi audio passthrough, it doesn't really matter that the audio portion of the gpu is not there.

 

This was a bit of a long read, but I thought it was better to document it incase someone else comes across this issue. Actually, this post started off as a request for help, and as I got half way trough, I thought I should at least search the forums first regarding this problem and voila! This turned from a question to a solved problem.

 

 

 

 

  • Like 1
  • Upvote 1
Link to comment

Thank you for your post and detailed walk through the details and your thought process.

 

I am having a similar error on my first VM setup. I am functioning under the trial license so am brand new to unRAID.

 

You made the comment:

the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file.

 

Though I have different hardware, I figured I'd give this a shot.

Since I am new to unRAID, where do I find the sysconfig file?

 

 

 

 

 

 

 

Link to comment

Thank you for your post and detailed walk through the details and your thought process.

 

I am having a similar error on my first VM setup. I am functioning under the trial license so am brand new to unRAID.

 

You made the comment:

the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file.

 

Though I have different hardware, I figured I'd give this a shot.

Since I am new to unRAID, where do I find the sysconfig file?

 

the sysconfig file is on your flash drive, in the syslinux folder. modified with the text added, it would look like this.

default /syslinux/menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label unRAID OS
  menu default
  kernel /bzimage
  append vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot
label unRAID OS GUI Mode
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui
label unRAID OS Safe Mode (no plugins, no GUI)
  kernel /bzimage
  append initrd=/bzroot unraidsafemode
label Memtest86+
  kernel /memtest

 

are you getting the "unable to get iommu group" error or something else? allowing unsafe interrupts can sometimes fix that, as is the problem on my hardware.

 

I only started using unRaid about 6 months ago and learn something new almost every other day...

Link to comment
  • 2 months later...

hey -- I'm still getting the same error on my dl380 g7 when trying to pass through the following video card:

# lspci|grep -i nvi
09:00.0 VGA compatible controller: NVIDIA Corporation GT218 [NVS 300] (rev a2)
09:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1)

 

This is the error on dmesg:

[  156.261493] pci 0000:09:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.

It is strange that it's complaining about 09:00.1 even though I'm only passing through the video card -- not the audio device:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>

 

I've tried several grub config edits, but none have worked:

append pcie_acs_override=id:10de:1381,10de:0fbc initrd=/bzroot
append vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot
append pcie_acs_override=downstream  initrd=/bzroot

any ideas what I'm missing here?

 

Thanks!

Link to comment

It's is probably complaining because you're trying to only pass through 1 device from the group when allowing unsafe interrupts (which i won't allow,) and conversely not allowing unsafe interrupts when only passing the single devices which isn't working because you are not using the correct device  id's from your specific cards.

 

Try this in your syslinux.cfg: 

 

append vfio-pci.ids=XXXX:XXXX,YYYY:YYYY vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

 

XXXX:XXXX is your video card id

YYYY:YYYY is your sound card id

 

you can obtain your video/sound card id's by going to tools>system devices, and look under pci device for 09:00.0 and 09.00.1 

 

the device id will be at the end of he description in the brackets. replace the X's and Y's above with those numbers, reboot, and try again.

 

If it doesn't work, then post your server diagnostics and xml for the vm.

  • Thanks 1
Link to comment

thanks!

 

after a little bit more "engineering", this following combo worked!

  append pcie_acs_override=id:10de:10d8,10de:0be3 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

append pcie_acs_override=id:10de:10d8,10de:0be3 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

 

I'm not home to verify video output is working as expected, but i can can verify the VM started and windows detected my nvidia card fine:

 

snip.JPG

Link to comment

Updating this as the issue has come up 3 times in 2 days recently, including on another of my servers after updating from 2010 firmware to 2015 firmware.

 

 

According to HP,  the RMRR problem stems from an upgrade to Linux above kernel version 3.16. There are discussions around the internet about an unofficial path but with varying results.

 

HP has a published sheet on this, with a fix for some:

 

https://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271259&docId=emr_na-c04781229&docLocale=en_US

 

 

The steps outlined above solves the problem that occurs when part of the GPU (specifically the audio component) becomes marked ineligible. The resolution above splits the video and audio components into 2 iommu groups, and then does not attempt to passthrough the ineligible component.The workaround for me to getting sound back was to use a USB audio device. It may also work with a pci sound card or other pci audio device.

 

 

So this topic should more appropriately be marked "solved" vs solved, as the underlying issue still remains. 

 

For those of you on newer (G8+) hardware, there is a bios fix: https://docs.hpcloud.com/hos-4.x/helion/networking/enabling_pcipt_on_gen9.html

 

For those of you on older hp hardware, you can try going into bios and booting with the backup rom/earliest firmware you have.

 

 

 

Edited by 1812
Link to comment
  • 2 years later...

I stumbled on this thread after having basically the same issue with trying to present a PCI card to a VM through qemu on an HP DL380 G6 (old, I know, but it was cheap on eBay).

 

The card was already in its own IOMMU group, so I had no reason to think it would be fixed with the pcie_acs_override setting (I tried anyway - it didn't). 

 

I finally got the VM to boot by:

  • setting vfio_iommu_type1.allow_unsafe_interrupts=1
  • downgrading the BIOS from a 2016 ROM to the backup one already present from 2009
    • This was the key one for me. Once I had downgraded the BIOS without allow_unsafe_interrupts set, the RMRR error went away and was replaced with a helpful error telling me allow_unsafe_interrupts had to be enabled. 

Anyway, just wanted to share my experience in case anyone else has the same issue. 

Edited by calan
  • Thanks 1
Link to comment
  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.