Win10 VM graphics pass-through broke after AMD BIOS update


Recommended Posts

UPDATE - Solution

It looks like AGESA AMD BIOS version 1.0.0.4 Patch B (1004 B) fixes the Nvidia pass-through D3 errors for folks with x470 and x370 Ryzen AMD motherboards.  This patch was introduced in late November 2019 to AMD motherboards.  Mobos confirmed working after updating to this BIOS version:

* Asus Prime X470 Pro + 3900x

* MSI B450 Gaming AC

* MSI x470 Gaming M7 AC + 2700x

* X370 Taichi + ryzen 3900x

* Gigabyte AX-370

* Asus x470-F with 2700x and 3900x

* ...

 

Check your motherboard manufacturer's website to see if the latest BIOS includes this version, test it out, and report back.

 

----------

 

UPDATE - As of August 2019, it looks like AMD Tech Support addressed this issue.  Unsure how long it will take to roll out to your mobo manufacturer:

Quote

This is with reference to your issue with the AGESA update breaking VFIO IOMMU GPU passthrough.

This issue has been addressed with the updated AGESA A1003 ABB BIOS version. Please check for the availability of this BIOS on the motherboard manufacturers web site.

However, it people are reporting to this AMD Tech Support thread that the 1003 ABB BIOS update alone is not fixing the issue and may also require a Linux kernel update.

 

Temporary "fix" for this issue - Downgrade your motherboard BIOS to the last version that worked!  Use the utilities below.  Check for a BIOS version before March 2019, and look in the change log before they upgraded for the Ryzen 3000 CPUs.

 

Asus downgrade utility Afuefix64 from overclock.net

MSI downgrade utility Flash Tool at MSI forums

Asrock downgrade utility AFUWIN Tool


-----------------

 

Motherboards and CPUs reported affected so far on this thread:

 

3x Asus PRIME X470-Pro, (1201 broken)

3x Asus ROG STRIX X370-F Gaming, with Ryzen 5 2600X, 2x AMD Ryzen 7 1700 (BIOS 4207 worked, 4801 broken)

1x MSI x470 Gaming M7 AC (good at v7B77v14 BIOS) with Ryzen 2700x

1x MSI B450 gaming carbon Ac with ryzen 2600

1x Asus B450-F Gaming (good at BIOS 2008) w/ ryzen 2600

1x Asus Crosshair Hero VI (x370) - Bios 6903 failed; 6808 is lastest tested and "working"

1x Asrock x370 Taichi  with Ryzen 2700 (good at P5.10, broken at P5.60)

1x Asus ROG STRIX B350-F Gaming motherboard with Ryzen 1700 (broken at BIOS version 5008)

1x x570 board with Ryzen 2600

 

-------------------

 

Original post-

 

A BIOS update broke my Win10 gaming VM.  I updated my MSI x470 Gaming M7 AC mobo from 7B77v14 to 7B77v18 (I must have missed a few).  After I saw this I updated Unraid to 6.6.7 to no avail. 

 

The error is a double-whammy.  I now get a couple errors:

  1.  vfio: Cannot reset device 0000:1f:00.3, depends on group 20
  2. vfio: Unable to power on device, stuck in D3

I can fix the first by removing the calls to my Audio card (which is 1f:00.3), but I am left with the "stuck in D3" error.  And after I unsuccessfully try to start the Win10 VM I have tried to reboot Unraid, but it hangs after the shut down procedure and requires a hard-reset to bring the system back up.

 

The BIOS update did do something funny:  It re-assigned all my CPU pin pairings, so I had to fix that in both the PIN assignments on the VM and in my append isolcpus code (attached that shot).  However, as far as I see, it did not change any of the IOMMU groups, vfio-pci.ids, or device numbers.

 

Through my searching, the only references to the Stuck in D3 are related to graphics card pass-through, which I am using.  However, it has worked without a hitch until this mobo BIOS update, so I am not sure what has changed for me to tackle.  I wouldn't think the mobo upgrade would affect that.

 

What can I try to fix this D3 error?  What else can I provide to better assist with this?  Attached my Win10 VM XML, and put some other specs below.

 

Note- I have some Ubuntu VMs that are working just fine after the update--after re-assigning the CPU pin pairings.  Of course, I do not pass through GPUs or any devices to these instances. 

 

Thanks for reading!  I appreciate the help.

 

Raid OS 6.6.7

Mobo - MSI x470 Gaming M7 AC, latest BIOS v18 (March 7, 2019)

CPU - Ryzen 2700x

GPU - EVGA 1070 FTW
 

image.png

vm-win10-xml-20190316.xml

Edited by mattz
updated with solution BIOS version
  • Upvote 1
Link to comment

David, Thanks for the response!  Tried switching to i440fx-3.0 (Unraid doesn't have 3.1 yet, it seems), but that didn't help and had the same messages about stuck in D3 and depends on group.

 

Also started from scratch:  Created a new Windows 10 VM with the 3.0 machine with newer virtio 1.16 driver ISO but had the same error messages when I tried attaching the Nvidia 1070 GPU (stuck in D3) or attaching the sound card (depends on group xx).

 

Then switched on the PCIe ACS Override to better separate the IOMMU groups to target the "depends on groups".  Same error messages.  Tackling that "Depends on" message has been more difficult than I thought.  What exactly does that mean?  Here is what I tried:

  • Sound card is Group 33, so I isolated that with append vfio-pci.ids=1022:1457.  Restart, attach, log says "Depends on Group 31"
  • Then isolate Group 31 with vfio-pci.ids=1022:1457,1022:1455.  Restart, attach both of them to the VM, now log says "Depends on Group 32"  What does the SATA controller have to do with anything?
  • Then isolate Group 32 with vfio-pci.ids=1022:1457,1022:1455,1022:7901.  Restart, attach all three of them to the VM. Now log goes back to a "Stuck in D3" message, even though I didn't add the Nvidia GPU back.

I think this mobo is just yanking my chain.  What are some next steps I can try to get this going?  My thought right now is to revert the BIOS to v14, but I assume I will just end up in this same place if I need to update the BIOS at a later date. :(

 

Thanks again, would still appreciate the input.

 

image.thumb.png.612681f0dc3a497de3a690599b96506a.png

 

 

image.png

Edited by mattz
moved image to attachment
Link to comment

I'm not an expert, but why are you passing the sound from group 31, 32, 33? Your GPU is 26 and the correspoding sound is group 27, you should always pass those as a pair. If you want to pass another sound device, use it as an additional one.

Also: Do you have a valid rom for the GPU? Is Hyper-V disabled?

Link to comment

@Jaster Good Q's - I do have the GPU audio as primary, my Audio controller as 2nd Audio.  However, passing through those other groups because of the Log Messages I was getting (screen shot below).  I hope I do not need to pass through anything else, but after the BIOS update I got those messages.

 

For the other Q's

  • GPU ROM is valid... or at least, it worked for me until the mobo BIOS update.  Would I need a new ROM after a mobo BIOS update??
  • HVM is enabled

image.thumb.png.24995b1e554449c74cf734c8eebeeaca.png

 

This is the selection I have for the error below:

 

image.thumb.png.8cc6a62fddefed383b74f71ff9a74efb.png

image.thumb.png.d73380d97e823068fdffb49c4f4e38e0.png

Link to comment

Ok, removed all PCIe isolation:

image.thumb.png.35f3ebe66d99839261f2de0c146349c8.png 

 

Removed the GPU and pass-through the Mobo Audio:

image.thumb.png.8242c01a7fc962282c3429d3e09c2251.png

 

The VM boots OK and I do see the audio device!  But I do still see the warnings in the Log:

image.thumb.png.c0359abcbfa7490577fe3976df7deaca.png

image.thumb.png.eced674bf23b7550cd09e640b2d55978.png

 

 

Now... If I add back the GPU, GPU ROM, and GPU Audio card... 

image.thumb.png.928680d1401f60a9cc6671fe334a89f0.png

 

I get the "stuck in D3" error again:

image.thumb.png.37accff15d2e3e38e9dbb8a670702998.png

 

 

It is strange that the second shot, the "Other PCI Devices" the "USB 3.0 Host controller" went away, especially since it shouldn't have been available in the first place since I didn't isolate it.

 

 

Link to comment

And... just for good measure, if I try it _without_ the GPU ROM BIOS, I get the same error, plus some other messages...

image.thumb.png.c1da8b56a46a62ade9edb66b80d3451c.png

image.thumb.png.21bc73c98d7b2ee1f6f8ea7a0b8efe69.png

 

I am doing a full computer reboot after every attempt with a "stuck in D3" error to make sure everything is clean.

 

Would having a second graphics card be a way out of this mess?  Again, this all worked BEFORE the v18 BIOS update, when I was on v14...

Link to comment

Ok, so... I can't revert my BIOS, the MSI M-Flash does not allow to go to an older version.  So at this point I am stuck unable to load my GPU into any VM, even Ubuntu.

 

Searching for anything around a D3 power state with Nvidia has turned up very little.  I did see a few references to a Linux Kernel (even the few posts that show D3 on this forum), and it pointed to Linux Kernel 4.19 (as of Unraid 6.6.7 it is still 4.18).  I did notice that a Kernel update to 4.19 is coming in the 6.7.0 release--not to mention qemu to v.3.1 which @david7279 originally mentioned, so maybe I hold my breath until then! 

Should I get into the Next branch or wait until Stable?  Would it do any good for me to get into the conversation about this next version or is my problem such an edge case I should wait for Stable?

 

Cheers,

 

image.thumb.png.e07ec2f2b9781fea4b01381c74cd8c6f.png

 

 

Link to comment

You know, I really don't get it.  It seems like this is a Threadripper / Ryzen 2 issue from over a year ago.  Some pretty ingenious folks on reddit posted a fix for it.  However, the final fix was a BIOS update to x399 mobos. 

 

My CPU is a Ryzen 2 2700x... And it only broke with this error message on the _LATEST_ BIOS update.  I have emailed MSI's tech support, but I am not getting much help from them... They still think I need to reinstall Windows to make it work.  😕

  • Like 1
Link to comment

Hey mattz,

 

Just wanted to chime in that I'm having the same issues with my MSI b450 gaming carbon Ac after updating my bios to the same version you did( 7B85v16). Same two errors on W10 VM Start, same CPU pairings changing after update. Running 6.7.0 rc4. GPU stubbed and unable to pass through to VM. Cheers

 

Running: ryzen 2600

Sapphire Rx 580

MSI B450 gaming carbon Ac. 

 

Note: I did find a tool that is supposed to be able to downgrade the MSI BIOS here

 

 

Edited by tw0884
Add MSI BIOS downgrade tool
  • Like 1
  • Upvote 1
Link to comment
20 hours ago, tw0884 said:

Hey mattz,

 

Just wanted to chime in that I'm having the same issues with my MSI b450 gaming carbon Ac after updating my bios to the same version you did( 7B85v16). Same two errors on W10 VM Start, same CPU pairings changing after update. Running 6.7.0 rc4. GPU stubbed and unable to pass through to VM. Cheers

 

Running: ryzen 2600

Sapphire Rx 580

MSI B450 gaming carbon Ac. 

 

Note: I did find a tool that is supposed to be able to downgrade the MSI BIOS here

 

 

@tw0884 - Sorry to hear you are also having problems.  But it also makes me feel good I'm not just crazy and someone else sees it.  I'll bet there are a lot more ppl noticing the problem as the update.  Awesome find on that downgrade tool!  I am going to give that a try... hope I don't brick my mobo but so hungry for a fix for the VM.

 

FAILED - I bought a $40 video card and tried using it to no avail:  1)  The mobo has no way to specify that card as the primary unless I move it to the top physical slot and 2) when I have a second card, the PCI-E goes from 16x lanes down to 8x lanes.  I am not sure how big of an impact that may have, but I am not a fan of losing half the lanes.

 

FAILED - I also tried using the Flashback+ built into the BIOS with no luck.  I put in the flash drive with a "MSI.ROM" for the v14 BIOS, the computer powers on for 1 second, powers off, then the Flashback+ LED flashes 3 times and stays on.  I guess that indicates an error with the flash process.  I hit the power button and the computer boots up with the current BIOS v18.  I have read that it is very particular about USB sticks, so I could go out an buy a few other brands but... Blah.

 

I am going to load Windows 10 natively on an old SSD I have to see if everything boots OK.  And then try out the MSI downgrade BIOS tool (thank you tw0884!).

 

And then, the more desperate move, I will look to buy a new motherboard, different brand.  MSI said I could start the RMA process, but if I have to disconnect everything and be out a motherboard for a week (or 2??), I am going to bail on MSI--the same problem will happen anyway once the BIOS is updated.  I do worry, though, that it's not exclusively an MSI problem but linked to how Ryzen 2 talks to the sysytem... That means that the Linux Kernel(??) would have to get changed (maybe it's fixed in v5?) or... something else?  But I don't want to sit idly by for it.  However, since it was only the latest BIOS update where I had this problem, hoping other brands completely avoided causing this issue.

  • Like 1
Link to comment

@mattz,

 

No problem for the tool. 

 

Fix with second GPU: I also found this to be the case with a Gt710 and I never actually got it working with unRAID ( didn't need it once I got the VM working)

 

I have since booted into windows 10 from the MSI boot menu as I had W10 installed on a passed through NVME (was accessible by bare metal and VM).All was well within my W10 NVME and it ran fine. Launching through unRAID VM interface caused my unRaid to go haywire again( unmount cache drive, showing 4 trillion writes to cache) and the VM hung, not allowing unRAID to be restarted.

 

I think my initial problems with the docker page was due to some docker files being moved from cache (speculation). I ran mover and docker is now responsive again. 

 

I'm pretty tempted to update to rc6 for a fix but may try the downgrade first...let me know how this turns out as I cannot run unRAID and W10 concurrently ATM. 

  • Like 1
Link to comment
2 hours ago, tw0884 said:

I'm pretty tempted to update to rc6 for a fix but may try the downgrade first...let me know how this turns out as I cannot run unRAID and W10 concurrently

@tw0884 - that Flash Tool worked like a charm with the older (v14) BIOS!  I recommend you do the mobo BIOS downgrade.  Updating to rc5 didn't help me at all with my problem.  Maybe the Linux Kernel bump they brought in on rc6 will help, but I wouldn't bet on it...  Maybe for a bigger upgrade I would try again.

 

I have successfully booted into my Windows 10 VM with the GPU pass-through!  I am ecstatic.  Sucks that the only solution is to use an unofficial flash tool to downgrade the BIOS to get this working... but good enough for me!

  • Like 1
Link to comment
27 minutes ago, tw0884 said:

@mattz

 

Glad it worked out. Did you wipe the CMOS after the downgrade?

I did not, but it did it for me.  Everything was cleared, including any OC Profile you may have had saved.  So you'll have to modify back all your settings.  Using custom settings on the Smart Fan curve is a real pain.  😐

  • Like 1
Link to comment

I just registered for saying thank you for this thread. 6 Hours i searched and searched without any clue why the passthrough won't work on my Asus PRIME x470-Pro.

The answer was the newest Bios-Update. I managed to downgrade with the tool Afuefix64 and the Instructions found here: 

https://www.overclock.net/forum/11-amd-motherboards/1640394-ryzen-bios-mods-how-update-bios-correctly.html

Below "How to flash a official bios".

When you are trying to downgrade it, will warn you with the message "rom file information does not match system bios", but it works nonetheless.

 

Thanks again!

 

  • Like 1
Link to comment
1 hour ago, Borbosch said:

I just registered for saying thank you for this thread. 6 Hours i searched and searched without any clue why the passthrough won't work on my Asus PRIME x470-Pro.

The answer was the newest Bios-Update. I managed to downgrade with the tool Afuefix64 and the Instructions found here: 

https://www.overclock.net/forum/11-amd-motherboards/1640394-ryzen-bios-mods-how-update-bios-correctly.html

Below "How to flash a official bios".

When you are trying to downgrade it, will warn you with the message "rom file information does not match system bios", but it works nonetheless.

 

Thanks again!

 

@Borbosch  Glad we can commiserate with you on this one.  I find it worrisome and hopeful that your Asus motherboard also had this error.  At one point I was hoping it was just an MSI issue, but is sounds much larger. 

 

I hope that a future Linux Kernel update (maybe??) is able to address this issue.  Otherwise, all these manufacturers are going to have to come out with "fix" BIOS updates again.  And being stuck on an old BIOS version is not a long term solution for this issue, but it does work right now...

  • Like 1
Link to comment
  • 5 weeks later...
9 hours ago, xsinmyeyes said:

I also wanted to say thank you for this thread.

I  just built a Ryzen 5 2600X with an Asus ROG STRIX X370-F setup. Wasted a few hours defeated by the stuck in D3 issue. I rolled back the bios to an early version (4009) and windows VM works with GPU passthrough (gtx 1060). 

 

What a way to discover a work-around, right??  I bet there are _a lot_ of folks going through the same issue.  Wonder if there are any other keywords we can use for this thread? 

I imagine one day there will be some sort of update to correct this, but I am not sure if it'll be a motherboard BIOS or Linux Kernel, or ...

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.