Win10 VM graphics pass-through broke after AMD BIOS update


Recommended Posts

 Wanted to follow-up.  The cause for my issue [with the Ryzen 3900x hanging while trying to pass-through USB Controller 3.0] was totally that FLR issue posted above.  Luckily, someone on this forum had already compiled a kernel with a temporary fix, and I used that.  Find that custom kernel for Unraid 6.8.3 here:

 

On 6/3/2020 at 6:50 AM, killeriq said:

i was playing around with that , but cant definitely tell which step did the fix... "i assume as i moved to 6.9.0 beta 1 it upgraded kernel and fix it somehow."

Note that I tried Unraid 6.9.0-beta1 and it did not yet have the FLR fix in the Linux kernel.  It will eventually make it into the Linux Kernel, but probabaly not until 5.8...  So, might be a while before it makes it into Unraid, read more about the commit - https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0d14f06cd665

 

@killeriq - not sure how you got it to work with the Unraid 6.9.0 beta 1, but if it works, I would say that's the important part.

 

Edited by mattz
additional note about original problem with 3900x
Link to comment
  • 4 weeks later...
On 6/3/2020 at 2:53 AM, mattz said:

 

@killeriq- I think I'm in the same boat now.  I just upgraded my x470 board to the Ryzen 3900x from the 2700x (wanted the cores!).  However, I am no longer able to pass through my motherboard's USB Controller 3.0 the same way I did with the 2700x.  I now get the same error you had and the whole system will lock up, requiring a hard reboot:

 


kernel: vfio-pci 0000:0c:00.0: not ready 1023ms after FLR; waiting

It is something others are encountering--the only way to fix it is to avoid passing through that particular USB controller, and use other USB Controllers, if you can: 

There is also a Kernel patch, it appears, that could fix it.  So, I am not sure, does the latest Unraid BIOS fix it for you?  It could be the kernel patch made it in... 

 

 

After i added 2nd GPU card - needed to do some testing...all was good. Then removed it , kept only one and the same issue started again and FREEEZEs.

Read through your notes, some custom patch has to be applied (for version 6.8.3). I was already on 6.9.1b22 so not able to revert 2version back.

 

Anyway not really sure how i was able to run it before without any patch, but i assume this is the way:

 

I wasnt able to start VM module, soon as i wanted it freeze with error bellow.

 

So what to do:

1. in BIOS disable IOMMU

2. Start the Unraid2

3. Start VM module. Make all possible VMs with "AMD Starship/Matisse PCIe Dummy Function | Non-Essential Instrumentation (0c:00.0)" on Disabled AUTO start, then restart unraid

4. Enable IOMMU in BIOS

5. Unraid shold boot , VM module should be visible. Edit the VMs and look for "AMD Starship/Matisse PCIe Dummy Function | Non-Essential Instrumentation (0c:00.0)" added into your VM image - you shold UNTICK IT, then SAVE...next time when you EDIT VM image is not present anymore.

6. Start the VM and all should be running fine

 

 

I added limetech to my reply , to include patch...as seems like all users with new Ryzen 3xxx series have the same problem.

 

"AMD Starship/Matisse PCIe Dummy Function | Non-Essential Instrumentation (0c:00.0)" source of issues

 

Jul 5 13:02:30 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 1023ms after FLR; waiting
Jul 5 13:02:32 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 2047ms after FLR; waiting
Jul 5 13:02:35 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 4095ms after FLR; waiting
Jul 5 13:02:40 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 8191ms after FLR; waiting
Jul 5 13:02:50 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 16383ms after FLR; waiting
Jul 5 13:03:07 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 32767ms after FLR; waiting
Jul 5 13:03:42 unRAIDTower kernel: vfio-pci 0000:0c:00.0: not ready 65535ms after FLR; giving up
Jul 5 13:03:43 unRAIDTower kernel: clocksource: timekeeping watchdog on CPU10: Marking clocksource 'tsc' as unstable because the skew is too large:
Jul 5 13:03:43 unRAIDTower kernel: clocksource: 'hpet' wd_now: b4700ed2 wd_last: b3954a18 mask: ffffffff
Jul 5 13:03:43 unRAIDTower kernel: clocksource: 'tsc' cs_now: 1d337ecfa60 cs_last: 1d337dd658c mask: ffffffffffffffff
Jul 5 13:03:43 unRAIDTower kernel: tsc: Marking TSC unstable due to clocksource watchdog
Jul 5 13:03:43 unRAIDTower kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Jul 5 13:03:43 unRAIDTower kernel: sched_clock: Marking unstable (510899129422, -8570651)<-(510996221197, -105679272)
Jul 5 13:03:45 unRAIDTower kernel: clocksource: Switched to clocksource hpet

 

  • Thanks 2
Link to comment
13 hours ago, killeriq said:

I added limetech to my reply , to include patch...as seems like all users with new Ryzen 3xxx series have the same problem.

 

"AMD Starship/Matisse PCIe Dummy Function | Non-Essential Instrumentation (0c:00.0)" source of issues

Good idea adding limetech.  They may defer for it to be included into the Linux Kernel, which should come based on that commit I reference.  However, with the Ryzen 3600 and others SO CHEAP and performant I am sure there are quite a few people moving on them.

 

BTW - Those steps you had to take, good points.  Super annoying, it's because the VM image will "remember" devices that are "removed".  You can also edit the XML directly to remove the reference so you don't need the checkbox; however, it's a little bit of guesswork to figure out which XML element(s) it is.

 

Link to comment
11 hours ago, mattz said:

BTW - Those steps you had to take, good points.  Super annoying, it's because the VM image will "remember" devices that are "removed".  You can also edit the XML directly to remove the reference so you don't need the checkbox; however, it's a little bit of guesswork to figure out which XML element(s) it is.

 

I was in the state where i had VM Module OFF, soon as i enabled it...Server got frozen and needed to reboot. So i coudn't get into any VM config via WebUI.

 

FYI: someone replied that Limetech will fix it in next release.

But still Ryzen 3xxx are over 6 months on the market and still having such issue...

 

Everyone complains about Windows, but those HW implementations seems to be much faster there...linux been always delayed, in case you are not a Linux guru who compiles his own kernel :D

Link to comment
  • 3 weeks later...

Does anyone know when the next release is? I've used the kernel that was linked by another member here and that allowed me to finally pass my audio card (

kernel + pcie_no_flr=1022:1487 because pcie_no_flr=1022:149c,1022:1487 crashes the system

). But trying to pass both the audio card and the usb controller crashes unRaid (I'm done testing random things I've rebooted my poor server out of more than 20 unRaid hangups, I'm going to kill my array if I keep this up. 

I just bought my own copy of unraid 3 days ago, I love it for dockers and arrays but VMs have been an absolute nightmare. 

 

 

Link to comment
16 minutes ago, RaidBoi1904 said:

Does anyone know when the next release is? I've used the kernel that was linked by another member here and that allowed me to finally pass my audio card (


kernel + pcie_no_flr=1022:1487 because pcie_no_flr=1022:149c,1022:1487 crashes the system

). But trying to pass both the audio card and the usb controller crashes unRaid (I'm done testing random things I've rebooted my poor server out of more than 20 unRaid hangups, I'm going to kill my array if I keep this up. 

I just bought my own copy of unraid 3 days ago, I love it for dockers and arrays but VMs have been an absolute nightmare. 

 

 

@RaidBoi1904 You are a champ for jumping head-first into this issue with a new UnRaid setup.  And, sorry to hear the problems all at once...  they are not so bad when they pop up once every 2 years after a major hardware upgrade.  But your first time out can be rough.

 

So, to pass through Audio and USB (or anything), you will need to isolate them (in addition to the no_flr hack right now for this mobo/cpu combo). 

 

It looks like you know where you're going-  Main > Flash > Syslinux Configuration to add these lines

image.thumb.png.ce046d6181e5eb332134361c79ff8f58.png

 

My setup looks like this for just the USB -- notice the vfio-pci.ids for isolation--I don't know if I need all of them, but I do them as a group and it works:

pcie_no_flr=1022:149c,1022:1487,1022:1485 vfio-pci.ids=1022:149c,1022:1487,1022:1485 

 

You will also need to isolate the Audio device to pass-through.  on my mobo it looks like it's 10de:10f0, so you would add that to vfio-pci.ids:

image.thumb.png.441fa793f86024d9e6ece25158b81478.png

 

I use the Arctis Pro Wireless headset that has an external USB driver, so don't need the audio controller.

 

 

Link to comment

@mattz

Thank you for your reply. Since I wrote this message I have tried a ton of things to get this to work, out of desperation i went back and tried all of those things once again! I did learn a few things so it is not all wasted, but I still haven't achieved a working solution (I did have a mobo brick it self via wifi update, that was cool lol and apparently also common!). In several occasions I got all the devices installed with their drivers but the computer always ended up freezing and refused to reboot (displaying a blue screen with a kernel security error). 

 

It turns out I can't pass any of the usb hubs to my VM, they appear to share an ID so regardless of what usb hub I pass as soon as one is passed unraid is no longer able to access the unraid USB. 
39723329_Annotation2020-07-27195607.thumb.png.2e5289336d347221140c87ad20242aa7.png
 

As I write this I'm thinking i should go change the groupings in the VM setting to see if that gives them different ids. 

At any rate I'm back from trying beta .22 .23 .24 with regular and custom kernels. I'm now on the stable branch with a custom kernel and the following flags:

 

label Unraid OS
  menu default
  kernel /bzimage
  append pcie_no_flr=1022:1487,144d:a808,8086:2526 vfio-pci.ids=1022:1487,144d:a808,8086:2526 initrd=/bzroot

This is how my current VM looks: 
image.thumb.png.ceb17e72faa72bcb36551fd090e5e3bd.png

I've put my video card and its sound card on the same buss  in the xml like so:
 

<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/isos/zotak-1070.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/>
    </hostdev>

Right now I'm stuck with error 43, although I've passed this card to 20+ different VMs with this bios and this xml without issues (passing the video card is the only thing that always worked and now it doesn't lol). So I'm some how worse off than a week ago. 

 

---------------------

Update #1: I don't like this at all but the fix to my error 43 was disabling the UEFI boot... that makes a lot of sense to me. I will attempt to pass audio now, giving up on the usb hubs for now as changing the grouping method didn't change the "1022:149c" id for all of my usb hubs =/.
Update #2: I'm unable to  install drivers for the sound device I passed through, it appears to be working ok with the generic windows driver at the moment. I will start testing some games to see if this try doesn't crash. 
image.png.2e16e692b1956cfc112c652343017d18.png

Oh I was going to pass group 39 but that is the usb hub with the same vendor id as group 31. As I type this I will go see if I can find a fix for passing same vendorID devices and ductape one more thing to this vm!

Edited by RaidBoi1904
Link to comment
  • 7 months later...

Wanted to close the loop on this.  I *think* this issue has been fully resolved with the release of UnRaid 6.9.0, since they are using the Linux Kernel 5.10.x branch: https://wiki.unraid.net/Unraid_OS_6.9.0

 

The original Linux Kernel fix for the AMD 3xxx/Xen CPUs was implemented in 5.8.x, so we should be good now:  https://github.com/torvalds/linux/commit/39a1af76195086349c4302f01e498a5fcbcb11d6

 

I have not yet tried it, but I will when I have potentially a few days to feel the frustration if I have to revert.  :)

Link to comment
7 hours ago, mattz said:

Wanted to close the loop on this.  I *think* this issue has been fully resolved with the release of UnRaid 6.9.0, since they are using the Linux Kernel 5.10.x branch: https://wiki.unraid.net/Unraid_OS_6.9.0

 

The original Linux Kernel fix for the AMD 3xxx/Xen CPUs was implemented in 5.8.x, so we should be good now:  https://github.com/torvalds/linux/commit/39a1af76195086349c4302f01e498a5fcbcb11d6

 

I have not yet tried it, but I will when I have potentially a few days to feel the frustration if I have to revert.  :)

 

I updated my Asus ROG STRIX X370-F bios that previously had this issue and forced me to run a bios from 2018.  Win10 VM with gpu passthrough running issue free on 6.9.1.

  • Like 1
Link to comment
21 hours ago, xsinmyeyes said:

 

I updated my Asus ROG STRIX X370-F bios that previously had this issue and forced me to run a bios from 2018.  Win10 VM with gpu passthrough running issue free on 6.9.1.

 

Just did my upgrade to Unraid 6.9.1, and it is all smoothly running! 

 

I have not yet removed the Kernel VFIO definitions in the boot flash, but I will switch over to the new, integrated menu in the Settings when I get chance.  Because I should be able to remove both the pci_no_flr (no longer need) and vfio-pci.ids (now in Settings > VFIO-PCI Config). 🍻

 

image.thumb.png.7b7dc97d5cddb7a3c291f6a97689daae.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.