Jump to content
We're Hiring! Full Stack Developer ×

I'm getting an error when trying to add a video card to my Windows 10 VM


JustinChase

Recommended Posts

I already have a Windows 8 VM running successfully without issues, so I know passthru and everything works fine.

 

I've created a new Windows 10 VM, and got it all installed, and updated, and installed some programs using VNC.  But, when I edit the VM and add the second video card and audio device, I'm getting an error when I try to start this VM.

 

internal error: early end of file from monitor: possible problem:
2016-01-19T23:53:38.960578Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: vfio: error, group 9 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2016-01-19T23:53:38.960601Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: vfio: failed to get group 9
2016-01-19T23:53:38.960609Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: Device initialization failed
2016-01-19T23:53:38.960617Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: Device 'vfio-pci' could not be initialized
!

Not valid!

 

I stopped the Windows 8 VM as a test, in case i was over-allocating resources, but it made no difference.

 

screenshot of the setup is attached.

 

Thoughts?

vm.png.c718dc0cc430707aa2783693aab9df21.png

Link to comment

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 5 (rev d0)
00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family Z97 LPC Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GF116 [GeForce GTX 550 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF116 High Definition Audio Controller (rev a1)
03:00.0 PCI bridge: ASMedia Technology Inc. Device 1184
04:01.0 PCI bridge: ASMedia Technology Inc. Device 1184
04:03.0 PCI bridge: ASMedia Technology Inc. Device 1184
04:05.0 PCI bridge: ASMedia Technology Inc. Device 1184
04:07.0 PCI bridge: ASMedia Technology Inc. Device 1184
05:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11)
06:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
07:00.0 Multimedia controller: Philips Semiconductors SAA7160 (rev 02)
08:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
09:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 720] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
0a:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller

 

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/6/devices/0000:00:19.0
/sys/kernel/iommu_groups/7/devices/0000:00:1a.0
/sys/kernel/iommu_groups/8/devices/0000:00:1b.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.3
/sys/kernel/iommu_groups/9/devices/0000:00:1c.4
/sys/kernel/iommu_groups/9/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:03:00.0
/sys/kernel/iommu_groups/9/devices/0000:04:01.0
/sys/kernel/iommu_groups/9/devices/0000:04:03.0
/sys/kernel/iommu_groups/9/devices/0000:04:05.0
/sys/kernel/iommu_groups/9/devices/0000:04:07.0
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/9/devices/0000:06:00.0
/sys/kernel/iommu_groups/9/devices/0000:07:00.0
/sys/kernel/iommu_groups/9/devices/0000:08:00.0
/sys/kernel/iommu_groups/9/devices/0000:09:00.0
/sys/kernel/iommu_groups/9/devices/0000:09:00.1
/sys/kernel/iommu_groups/9/devices/0000:0a:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3

Link to comment

It looks like the video and audio from that card are the only things on the 9 group, so I'm not sure what the problem is.  I suspect it might be a problem with a driver in windows, but I can't install the nVidia driver in windows via VNC without the hardware installed, so chicken/egg situation it seems.

 

* I just saw your reply, I'll do that and check back.

 

thanks!!

Link to comment

It looks like the video and audio from that card are the only things on the 9 group, so I'm not sure what the problem is.  I suspect it might be a problem with a driver in windows, but I can't install the nVidia driver in windows via VNC without the hardware installed, so chicken/egg situation it seems.

 

* I just saw your reply, I'll do that and check back.

 

thanks!!

 

Nope you're looking at the address for the video card not the IOMMU group 9... There's a ton of stuff on group 9

 

/sys/kernel/iommu_groups/9/devices/0000:00:1c.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.3
/sys/kernel/iommu_groups/9/devices/0000:00:1c.4
/sys/kernel/iommu_groups/9/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:03:00.0
/sys/kernel/iommu_groups/9/devices/0000:04:01.0
/sys/kernel/iommu_groups/9/devices/0000:04:03.0
/sys/kernel/iommu_groups/9/devices/0000:04:05.0
/sys/kernel/iommu_groups/9/devices/0000:04:07.0
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/9/devices/0000:06:00.0
/sys/kernel/iommu_groups/9/devices/0000:07:00.0
/sys/kernel/iommu_groups/9/devices/0000:08:00.0
/sys/kernel/iommu_groups/9/devices/0000:09:00.0
/sys/kernel/iommu_groups/9/devices/0000:09:00.1
/sys/kernel/iommu_groups/9/devices/0000:0a:00.0

 

Your original error message was

 

internal error: early end of file from monitor: possible problem:
2016-01-19T23:53:38.960578Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: vfio: error, group 9 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2016-01-19T23:53:38.960601Z qemu-system-x86_64: -device vfio-pci,host=09:00.0,bus=pcie.0,multifunction=on,x-vga=on: vfio: failed to get group 9

Link to comment

You need to enable PCIe ACS Override in settings ==> VM manager.  That may help.  You've got a lot of stuff on IOMMU group 9...

 

Check the wiki...

 

I did this, then rebooted, then tried again.  The VM seems to start without any errors, but I do not see anything on the monitor, nor can I connect to it via RDP.

 

I'll try setting it back to VNC, and add just the audio card and see if that lets me install the nVidia driver, then try again.

Link to comment

Edit: Nevermind, looked at your pic...

 

Why Q35, and why min/max for ram allocation?

I'm not saying this is the issue, but I don't think either are recommended (440FX for Win, Min/Max I thought was still experimental).

 

One more thing, you have a lot of important shit in Group 9 (SATA controllers especially), so while you shouldn't have an issue, that worries me a little to assume isolation with that many important items grouped together... I'm just saying you must tread lightly young Padawan.  :P

Link to comment

I actually thought Q35 was the recommendation, and I had no idea min/max was experimental.  Both are easy enough to change.

 

I'm not really sure what the isolation does, which means I'm also not sure what risk I'm assuming by doing this.  Are there any other options I can try to limit the risk here?

 

I can't move the card to another slot, I don't have any others I can put it into.  Best case is swap cards, but I can't see how that would be any better.

 

I'll do a little more reading on the Q35 vs 440fx and the isolation issue.

 

Sadly, after getting the VM pretty much configured, I realized I picked the Pro install instead of home, and don't have any licenses I can use to register the pro version, so I spend the last couple hours reinstalling the Home version :( :( :(

Link to comment

I'm not really sure what the isolation does, which means I'm also not sure what risk I'm assuming by doing this.  Are there any other options I can try to limit the risk here?

 

This is a bit of oversimplification, and also not guaranteed to be completely factually correct, but here goes.

 

The ACS downstream patch is assuming isolation of devices when it is not specifically the case.

The idea is that the card being used has proper Access Control Services (ACS) on it, but the board/kernel doesn't see this capability.

 

Your board/chipset is grouping your devices together from what it can see has isolation between each other.

What can happen with the ACS downstream patch is that a device in one group (now that they're all split up) attempts to access memory that another device is using directly (DMA between devices).

The chipset doesn't know the access happens, and that memory is in use by another device.

With proper ACS support, DMA between cards (I believe) doesn't happen, as the ACS on the ports (not device) doesn't allow for this situation to happen.

 

The worry with having SATA cards/drives in a group together is that your data "could" get corrupted silently if a write/request to another device interfere's with your SATA card or the other way around. This could also lead to instability, or other odd behaviors IF this happens.

 

However a LOT of people use this patch and haven't specifically had issues, but it is certainly not the recommended way.

So I am likely casting a bad shadow on something that may never be an issue, however you should at least be aware of the potential issues that could occur.

Link to comment

Thank you for taking the time to explain that for me.  Perfectly correct or not, i get the jist.

 

So, I assume there isn't any other way to 'split apart' my card from the other devices.  it seems weird to me that so much is assigned to group 9, but not much in any of the other groups.

 

Is there some setting or other in my BIOS which might alleviate the need for this patch?

 

it's the 'silent errors' which concern me.

 

thanks again

Link to comment

Thank you for taking the time to explain that for me.  Perfectly correct or not, i get the jist.

 

So, I assume there isn't any other way to 'split apart' my card from the other devices.  it seems weird to me that so much is assigned to group 9, but not much in any of the other groups.

 

Is there some setting or other in my BIOS which might alleviate the need for this patch?

 

Not really, no.

My previous MB was a Z97-Extreme4 and it was EXACTLY the same way.

All expansion slots (16x slots (not specifically wired as 16x) and all 1X slots were all in the same group!

I used the patch without issues, however I felt as if I was pushing my luck with 4 video cards (I had one in a 1X slot) and a SATA card, NIC, etc...

 

It also concerned me that the Z97 chipset does not officially support Vt-d from the intel ARK site http://ark.intel.com/products/82012/Intel-DH82Z97-PCH

Even though some manufacturers offer it somehow/someway.

From what I had read it sounded as if the chipset didn't properly support all vt-d functions, and therefore wasn't spec'd that way by Intel (possibly hearsay).

 

I contacted ASRock hoping they could do a better job of IOMMU grouping, however once I reached someone who cared enough, they wanted a laundry list of information to look into.

This was far too much trouble for me, so I gave up on any hope of some of them being split into separate groups without applying the patch.

 

Link to comment

is it possible that your motherboard shares those PCI slots that you're trying to use?

 

I'm not sure they would show up in unraid if that were the case, but you never know! Maybe check your mobo manual.

 

In the old days the MB manual would show you which slots share which resources with each other (remember IRQ listings?).

Today it seems that this isn't the case too often.

The one thing that is typically mentioned is if eSata ports are used, the SATA internal ports are disabled.

Or the M2 slot would share with other SATA slots.

 

As far as PCIe devices, it seems as if it is what it is.

You have 8 lanes from the Platform Control Hub (PCH) Z97 in this case)) and 16 from the CPU (a 4790k used in this example).

How the MB chooses to use them is somewhat assigned based on the cards you have.

If you have a 16X card (in appropriate slot) and no others, it gets all 16X.

If you have another card installed in a 16x slot (possibly only wired as 8x) you'd split those lanes between them.

 

A lot of times the PCIe 1X lanes come from the PCH, and not directly from the CPU, so it'd be nice if those devices are in separate groups, however even that isn't necessarily the case (definitely wasn't for my Z97-Extreme4).

 

At this point I'm just rambling, hopefully it was helpful to someone...  ;)

You'd think knowing these details prior to choosing your hardware would be nice, but I don't see that it is published typically.

 

Link to comment

At this point I'm just rambling, hopefully it was helpful to someone...  ;)

You'd think knowing these details prior to choosing your hardware would be nice, but I don't see that it is published typically.

 

I think you're spot on here, it's just the sort of detail you need to know to purchase a new mobo for Unraid 6 imho.  And one which isn't really documented and there isn't a lot of warning about the potential problems with virtualisation and your hardware in terms of compatibility.

Link to comment

At this point I'm just rambling, hopefully it was helpful to someone...  ;)

You'd think knowing these details prior to choosing your hardware would be nice, but I don't see that it is published typically.

 

I think you're spot on here, it's just the sort of detail you need to know to purchase a new mobo for Unraid 6 imho.  And one which isn't really documented and there isn't a lot of warning about the potential problems with virtualisation and your hardware in terms of compatibility.

Sadly, this is my third motherboard fit my server and I really don't want to replace it again.

 

I suppose I'll have to ask a lot of questions here before I replace it once again.

 

Thanks again for all the input everyone.

 

Any suggestions for a good replacement are most welcome.

Link to comment

Thank you for taking the time to explain that for me.  Perfectly correct or not, i get the jist.

 

So, I assume there isn't any other way to 'split apart' my card from the other devices.  it seems weird to me that so much is assigned to group 9, but not much in any of the other groups.

 

Is there some setting or other in my BIOS which might alleviate the need for this patch?

 

it's the 'silent errors' which concern me.

 

thanks again

 

No, there is no other way to split apart the card into it's own group other than procuring different hardware that doesn't group the devices together or enabling the override.  The most ideal hardware for pass through would be a Xeon E5 or any of the Extreme edition desktop processors.  Both of those lines have ACS built in so each device should get it's own IOMMU group.

Link to comment

At this point I'm just rambling, hopefully it was helpful to someone...  ;)

You'd think knowing these details prior to choosing your hardware would be nice, but I don't see that it is published typically.

 

I think you're spot on here, it's just the sort of detail you need to know to purchase a new mobo for Unraid 6 imho.  And one which isn't really documented and there isn't a lot of warning about the potential problems with virtualisation and your hardware in terms of compatibility.

 

Exactly, those are "server" things, and manufacturers feel as if they don't have to share it.

Link to comment

thanks jonp, I appreciate the confirmation.

 

One of these days I'll investigate a new motherboard, but I'll just keep my fingers crossed in the meantime :)

You'll likely do fine without it, just keep it in mind is all.

Ideally if you can get some of the SATA drives out of the group, you minimized the potential..  Again, a LOT of people use the ACS Downstream option without any known issues.

Link to comment
  • 2 weeks later...

Okay, this evening I moved some things around.  i moved the better video card (GT720) to the slot that feeds the HTPC, I moved the old HTPC video card (550Ti) to the unused PCIe 3.0 slot (which had been unused), and moved the SATA card to the other PCIe 1 slot.

 

I was hoping that by putting the second video card into the second PCIe 3.0 slot, it would move it out of group 9, and maybe fix this issue.  Well, that kinda worked.  It moved out of group 9, but now both video cards are in group 1.  But, the good news is that only the 2 video cards are in this group, nothing else.

 

So, if I turn on ACS override now, do I reduce/remove my chance of write errors, since none of the SATA controllers are in the the group with my video cards?  Or, is my risk basically the same since they are all in group 9, which I'm 'ungrouping'?

 

I'm not entirely sure how this all works exactly, so I wanted to get some feedback on my situation before making the desicion on whether or not to run the risk of write problems.

 

I'm guessing the risk is much less in the new situation, but that really is just a guess.

 

feedback is most welcome.

 

Here is everything in group 1...

 

/sys/kernel/iommu_groups/1/devices/0000:00:01.0	00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
/sys/kernel/iommu_groups/1/devices/0000:00:01.1	00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06)
/sys/kernel/iommu_groups/1/devices/0000:01:00.0	01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 720] (rev a1)
/sys/kernel/iommu_groups/1/devices/0000:01:00.1	01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
/sys/kernel/iommu_groups/1/devices/0000:02:00.0	02:00.0 VGA compatible controller: NVIDIA Corporation GF116 [GeForce GTX 550 Ti] (rev a1)
/sys/kernel/iommu_groups/1/devices/0000:02:00.1	02:00.1 Audio device: NVIDIA Corporation GF116 High Definition Audio Controller (rev a1)

Link to comment

 

 

So, if I turn on ACS override now, do I reduce/remove my chance of write errors, since none of the SATA controllers are in the the group with my video cards?  Or, is my risk basically the same since they are all in group 9, which I'm 'ungrouping'?

 

Yes, you completely remove the chance of illegal peer to peer DMA with your SATA controller due to the ACS override in this scenario since the natural IOMMU grouping doesn't include that controller anyway.

Link to comment

 

 

So, if I turn on ACS override now, do I reduce/remove my chance of write errors, since none of the SATA controllers are in the the group with my video cards?  Or, is my risk basically the same since they are all in group 9, which I'm 'ungrouping'?

 

Yes, you completely remove the chance of illegal peer to peer DMA with your SATA controller due to the ACS override in this scenario since the natural IOMMU grouping doesn't include that controller anyway.

Great, thanks for confirming. [emoji3]

Link to comment

 

 

So, if I turn on ACS override now, do I reduce/remove my chance of write errors, since none of the SATA controllers are in the the group with my video cards?  Or, is my risk basically the same since they are all in group 9, which I'm 'ungrouping'?

 

Yes, you completely remove the chance of illegal peer to peer DMA with your SATA controller due to the ACS override in this scenario since the natural IOMMU grouping doesn't include that controller anyway.

 

Are we 100% sure about this?  I ask because i started seeing some really weird behavior with my media software, which makes me thing something's not right with how my data is being handled on the drives.

 

I've been updating my music, updating tags, adding/changing cover art, moving files, etc.  Pretty much all/only music files.  This morning i went to look at some videos i have, and the thumbnails that had existed for months are replaced with music/album cover art, or missing altogether.

 

it could certainly be a media player issue, and I brought it up with them, but wanted to circle around to the ACS override issue, since it's such a weird thing that's happening.  In the attached screenshot, you can see the thumbnails of my video files (which I've not messed with).  They are videos of an online training course, and most/all thumbnails should be of that guy, but you can see many are album covers instead.

messed_up.jpg.bbee4951580f9b541bfbd65cb5c8f7bf.jpg

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...