1812 Posted December 18, 2016 Share Posted December 18, 2016 I was setting up my last server tonight and was having issues getting a vm running via passing through a gpu, getting iommu error as follows: internal error: process exited while connecting to monitor: 2016-12-18T02:12:40.093534Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to set iommu for container: Operation not permitted 2016-12-18T02:12:40.093567Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to setup container for group 18 2016-12-18T02:12:40.093575Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: vfio: failed to get group 18 2016-12-18T02:12:40.093588Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.2,addr=0x1: Device initialization failed The offending device is a gt 710. the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file. So, it's there, and correct for this server (actually copied the file from one of the others that works perfectly. This server, is one of 3 that are nearly identical in every way: memory, cpu, gpu (including location.) so I started digging through the logs and noticed a few differences between the working server vs this one with the error in regards to the gpu. Instead of posting the entire log, i'll list the differences: working server Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:03:00.1 Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:07:00.0 Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:07:00.1 Dec 17 17:50:12 Brahms4 kernel: DMAR: Hardware identity mapping for device 0000:3e:00.0 error server Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:03:00.1 Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:07:00.0 Dec 17 17:59:12 Brahms2 kernel: DMAR: Hardware identity mapping for device 0000:3e:00.0 so device 0000:07:00.1, the audio portion of the gpu is missing. Now, before you say "ok, the gpu is bad" I actually swapped it out from another server with a known working one. So the gpu is good. continuing on.... iommu assignments from the log: working server Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:04:00.0 to group 18 Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:07:00.0 to group 19 Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:07:00.1 to group 19 Dec 17 17:50:12 Brahms4 kernel: iommu: Adding device 0000:3e:00.0 to group 20 error server Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:03:00.1 to group 14 Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:07:00.0 to group 18 Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:07:00.1 to group 18 Dec 17 17:59:12 Brahms2 kernel: iommu: Adding device 0000:3e:00.0 to group 19 so, different group assignment, but both required parts of the gpu are there. but before the error server iommu groupings is the following: error server Dec 17 17:59:12 Brahms2 kernel: DMAR: Ignoring identity map for HW passthrough device 0000:07:00.0 [0xcf63e000 - 0xcf63ffff] Dec 17 17:59:12 Brahms2 kernel: DMAR: Setting identity map for device 0000:07:00.1 [0xcf63e000 - 0xcf63ffff] a little bit later the following shows up: error server Dec 17 18:06:59 Brahms2 kernel: br0: port 2(vnet0) entered forwarding state Dec 17 18:06:59 Brahms2 kernel: vfio-pci 0000:07:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor. Dec 17 18:06:59 Brahms2 kernel: br0: port 2(vnet0) entered disabled state I compared the bios dates on the error server and the other working ones, and oddly enough it has a roughly 1 year NEWER date. Perhaps functionality broke because of that. Either way, I'm not going to pay hp for their most up to date firmware on older hardware like this. Looking around this forum, this issue only appears a few times but with no real resolutions for the gpu assignment (one for hdd availability but no gpu conclusion stated.) In one of the previous threads on this topic, jonp made a suggestion about using acs override only for the offending device: I wanted to chime in on this thread because this is definitely an oddball and the first time I've seen an RMRR message on a GTX 7xx GPU. That said, RMRR errors like this aren't a good sign, though I do have one more thing for you to try before we have to give up hope. First, if you want some light-reading on the subject, this RedHat article is pretty comprehensive and specifically covers assigning PCI devices to VMs in this situation. Now with the boring stuff out of the way, the last thing I would ask you to try is a change the PCIe ACS Override option in the syslinux.cfg. The way this normally looks is as such: append pcie_acs_override=downstream initrd=/bzroot I want you to change that to this: append pcie_acs_override=id:10de:1381,10de:0fbc initrd=/bzroot Don't forget to reboot your system after applying this change. This is different than the vfio-pci.ids thing you tried before. Please report back if this changes anything for you. This suggestion seemed to not work for the original poster, but I thought i'd give it a shot. So i changed the device id's appropriately, and rebooted. This split the gpu into 2 different iommu groups. I then fired up the vm and NOPE. Same error. I went back to the log and it still showed "0000:07:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement." So I tried removing that part of the gpu (07:00.1) from the xml (since it was now in a different iommu group) and the vm booted up just fine. Since i'm not using hdmi audio passthrough, it doesn't really matter that the audio portion of the gpu is not there. This was a bit of a long read, but I thought it was better to document it incase someone else comes across this issue. Actually, this post started off as a request for help, and as I got half way trough, I thought I should at least search the forums first regarding this problem and voila! This turned from a question to a solved problem. 1 1 Quote Link to comment
twoBrokenThumbs Posted December 22, 2016 Share Posted December 22, 2016 Thank you for your post and detailed walk through the details and your thought process. I am having a similar error on my first VM setup. I am functioning under the trial license so am brand new to unRAID. You made the comment: the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file. Though I have different hardware, I figured I'd give this a shot. Since I am new to unRAID, where do I find the sysconfig file? Quote Link to comment
1812 Posted December 23, 2016 Author Share Posted December 23, 2016 Thank you for your post and detailed walk through the details and your thought process. I am having a similar error on my first VM setup. I am functioning under the trial license so am brand new to unRAID. You made the comment: the trick i've learned on dl 380 servers to passthrough a gpu is by adding vfio_iommu_type1.allow_unsafe_interrupts=1 to the sysconfig file. Though I have different hardware, I figured I'd give this a shot. Since I am new to unRAID, where do I find the sysconfig file? the sysconfig file is on your flash drive, in the syslinux folder. modified with the text added, it would look like this. default /syslinux/menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot label unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemode label Memtest86+ kernel /memtest are you getting the "unable to get iommu group" error or something else? allowing unsafe interrupts can sometimes fix that, as is the problem on my hardware. I only started using unRaid about 6 months ago and learn something new almost every other day... Quote Link to comment
assassinmunky Posted March 13, 2017 Share Posted March 13, 2017 hey -- I'm still getting the same error on my dl380 g7 when trying to pass through the following video card: # lspci|grep -i nvi 09:00.0 VGA compatible controller: NVIDIA Corporation GT218 [NVS 300] (rev a2) 09:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1) This is the error on dmesg: [ 156.261493] pci 0000:09:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor. It is strange that it's complaining about 09:00.1 even though I'm only passing through the video card -- not the audio device: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> I've tried several grub config edits, but none have worked: append pcie_acs_override=id:10de:1381,10de:0fbc initrd=/bzroot append vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot append pcie_acs_override=downstream initrd=/bzroot any ideas what I'm missing here? Thanks! Quote Link to comment
1812 Posted March 13, 2017 Author Share Posted March 13, 2017 It's is probably complaining because you're trying to only pass through 1 device from the group when allowing unsafe interrupts (which i won't allow,) and conversely not allowing unsafe interrupts when only passing the single devices which isn't working because you are not using the correct device id's from your specific cards. Try this in your syslinux.cfg: append vfio-pci.ids=XXXX:XXXX,YYYY:YYYY vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot XXXX:XXXX is your video card id YYYY:YYYY is your sound card id you can obtain your video/sound card id's by going to tools>system devices, and look under pci device for 09:00.0 and 09.00.1 the device id will be at the end of he description in the brackets. replace the X's and Y's above with those numbers, reboot, and try again. If it doesn't work, then post your server diagnostics and xml for the vm. 1 Quote Link to comment
assassinmunky Posted March 13, 2017 Share Posted March 13, 2017 thanks! after a little bit more "engineering", this following combo worked! append pcie_acs_override=id:10de:10d8,10de:0be3 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot append pcie_acs_override=id:10de:10d8,10de:0be3 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot I'm not home to verify video output is working as expected, but i can can verify the VM started and windows detected my nvidia card fine: Quote Link to comment
1812 Posted March 13, 2017 Author Share Posted March 13, 2017 that's good! sometimes vfio works, sometimes using pcie acs override after specifying device id' works... I'm sure there is a technical reason between the two, but I don't remember it off the top of my head? Quote Link to comment
1812 Posted March 21, 2017 Author Share Posted March 21, 2017 (edited) Updating this as the issue has come up 3 times in 2 days recently, including on another of my servers after updating from 2010 firmware to 2015 firmware. According to HP, the RMRR problem stems from an upgrade to Linux above kernel version 3.16. There are discussions around the internet about an unofficial path but with varying results. HP has a published sheet on this, with a fix for some: https://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271259&docId=emr_na-c04781229&docLocale=en_US The steps outlined above solves the problem that occurs when part of the GPU (specifically the audio component) becomes marked ineligible. The resolution above splits the video and audio components into 2 iommu groups, and then does not attempt to passthrough the ineligible component.The workaround for me to getting sound back was to use a USB audio device. It may also work with a pci sound card or other pci audio device. So this topic should more appropriately be marked "solved" vs solved, as the underlying issue still remains. For those of you on newer (G8+) hardware, there is a bios fix: https://docs.hpcloud.com/hos-4.x/helion/networking/enabling_pcipt_on_gen9.html For those of you on older hp hardware, you can try going into bios and booting with the backup rom/earliest firmware you have. Edited March 21, 2017 by 1812 Quote Link to comment
calan Posted March 19, 2020 Share Posted March 19, 2020 (edited) I stumbled on this thread after having basically the same issue with trying to present a PCI card to a VM through qemu on an HP DL380 G6 (old, I know, but it was cheap on eBay). The card was already in its own IOMMU group, so I had no reason to think it would be fixed with the pcie_acs_override setting (I tried anyway - it didn't). I finally got the VM to boot by: setting vfio_iommu_type1.allow_unsafe_interrupts=1 downgrading the BIOS from a 2016 ROM to the backup one already present from 2009 This was the key one for me. Once I had downgraded the BIOS without allow_unsafe_interrupts set, the RMRR error went away and was replaced with a helpful error telling me allow_unsafe_interrupts had to be enabled. Anyway, just wanted to share my experience in case anyone else has the same issue. Edited March 19, 2020 by calan 1 Quote Link to comment
Impulse1337 Posted October 21, 2021 Share Posted October 21, 2021 On 3/13/2017 at 4:27 PM, 1812 said: If it doesn't work, then post your server diagnostics and xml for the vm. Well, it doesn't work, and have tried every step here.. I get this error when starting the VM.. I attached the diagnostics and the VM XML. Could you help? be001srvfs01-diagnostics-20211021-2253.zip server02.xml Quote Link to comment
ghost82 Posted October 22, 2021 Share Posted October 22, 2021 9 hours ago, Impulse1337 said: Well, it doesn't work Did you try the steps in the current method? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.