snowmirage Posted September 19, 2017 Share Posted September 19, 2017 I'm hoping someone can give me some ideas on what else I can try here. I'm running an EVGA SR-2 Motherboard and attempting to passthrough an AMD RX 480 to a windows 10 vm, I'm also trying to pass a NVIDIA GTX 980 to another windows 10 vm. Passing the 980 is working, to do so I needed to use the Q35.27 machine type and seabios. Trying to pass the RX 480 even when in the same slot causes not only the VM to not boot at all but the entire unraid OS crashes, can't access unraid via the web GUI, ssh, or even get the direct console to respond to keyboard input requiring me to hard reset the entire system. I was able to tail -f /var/log/syslog and saw these messages when it crashes (instantly as soon as I power on the VM when passing the RX 480 through). I had to take a picture of these with my cell phone then type them out here as syslog appears to clear it self when the system reboots. Tower kernel: pcieport 0000:00:07.0: can't find device of ID0000 Tower kernel: DMAR: DRHD: handling fault status reg 100 Tower kernel: pcieport 0000:00:07.0: AER: Uncorrected (Fatal) error received: id=0000 The above messages repeat at least 6 or more times. I'll attach the diag from the system as well. What I have tried Passing through both the RX 480 and its audio device Passing through only the RX 480 without its audio device Making sure both the RX 480 and its audio device are the only devices in its IOMMU group (I do not have ACS enabled) Tried using the same slot as the GTX 980 that did pass through successfully I don't know what device is being referenced in that crash I feel like there should be other errors that I"m not able to see. But that isn't the RX 480 that I'm passing through to the VM. I feel like I'm very close to having this working. I tried previously with another motherboard and then it was super easy to pass through any AMD card I wanted, but any nvidia card was a nightmare now on this board that has flipped and only Nvidia is playing nice. Any advice would be greatly appreciated. tower-diagnostics-20170919-1419.zip Link to comment
saarg Posted September 19, 2017 Share Posted September 19, 2017 This might be motherboard related as I have no problem passing through my three RX480. Have you tried checking if there are newer BIOS available for your motherboard? Link to comment
snowmirage Posted September 19, 2017 Author Share Posted September 19, 2017 I did and made sure I flashed to the latest. Link to comment
saarg Posted September 19, 2017 Share Posted September 19, 2017 Have you tried with ACS override enabled? The port mentioned is your root port I guess (Haven't checked your diagnostics). Link to comment
snowmirage Posted September 19, 2017 Author Share Posted September 19, 2017 Looks like it is a root port. (middle one below) IOMMU group 3 [8086:340a] 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13) IOMMU group 4 [8086:340e] 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13) IOMMU group 5 [8086:342d] 00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 13) But its in its own group with nothing else? I'll give ACS a try and see if that changes anything Link to comment
snowmirage Posted September 19, 2017 Author Share Posted September 19, 2017 Just tried it with ACS turned on. After a reboot validated the RX 480 and its sound card were still in its own IOMMU group But alas same error Saw the same thing via the console before the entire machine froze. Any other ideas? Link to comment
snowmirage Posted September 19, 2017 Author Share Posted September 19, 2017 I remembered that at some point in attempting to get the GTX 980 to work I added this to the syslinux config vfio_iommu_type1.allow_unsafe_interrupts=1 I disabled that and now when trying to start a VM with the RX 480 attached I'm getting this. But the system didn't crash. Sep 19 15:36:44 Tower kernel: vgaarb: device changed decodes: PCI:0000:08:00.0,olddecodes=io+mem,decodes=io+mem:owns=none Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered blocking state Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered disabled state Sep 19 15:36:44 Tower kernel: device vnet0 entered promiscuous mode Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered blocking state Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered forwarding state Sep 19 15:36:44 Tower kernel: vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered disabled state Sep 19 15:36:44 Tower kernel: device vnet0 left promiscuous mode Sep 19 15:36:44 Tower kernel: br0: port 2(vnet0) entered disabled state Sep 19 15:36:45 Tower kernel: vgaarb: device changed decodes: PCI:0000:08:00.0,olddecodes=io+mem,decodes=io+mem:owns=none Doesn't make sense to me that its able to pass through a LSI HBA card and a GTX 980 but trying a RX 480 crashes the system .... hmmmmmm Link to comment
saarg Posted September 20, 2017 Share Posted September 20, 2017 Well, pass through isn't always easy. Too many bad implementations from vendors. I did a quick Google search on the first errors you posted and there are some hits that might be worth checking. Link to comment
snowmirage Posted September 20, 2017 Author Share Posted September 20, 2017 Thanks I suspect I've read through those same hits Most of what I was able to find before my initial post seemed to point to either... A few bugs from back in 2015 which I suspect likely managed to trickle their was down in to the updated versions of unraid since then, Or bad implementations / bugs in motherboard hardware / bios in which case I'm a bit SOL as the latest bios release from EVGA for this board was years ago now. Thankfully It seems that the 980 I'm passing through is working great, I bit the bullet last night and managed to snag a 2nd one for just over $200 on ebay. Was really hoping to get that RX 480 working in this crazy build but I guess thats the price I have to pay using older hardware even if it is one of the most interesting motherboards ever built. thanks for the help Link to comment
saarg Posted September 20, 2017 Share Posted September 20, 2017 You could post on the vfio mailing list and see if you get an answer there. They might know if this is bios or bug in qemu/libvirt. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.