amelius Posted November 4, 2017 Share Posted November 4, 2017 Hi, I've been trying to get my GPU(s) to properly pass through to my VMs, and I keep encountering two weird things. 1) If i don't reboot between VM startups, I get a weird error: "internal error: Unknown PCI header type '127'" But more problematically, 2) "vfio: Unable to power on device, stuck in D3" seems to happen in the logs whenever I boot up a VM with gpu passthrough, and the GPU doesn't get passed through, nothing shows up on screens, and if I check what the output is in VNC, it doesn't appear in device manager for windows, and for ubuntu, the whole OS seems to hang on login. System Specs: Threadripper 1950X Asus ROG Zenith Extreme Motherboard 64 gb ddr4-3000 memory 3x Samsung 960 Evo (this is my array) 2x GTX 1080 Ti founder's edition (what i'm trying to pass through, one to a Windows 10 vm, one to a Ubuntu 16.04 VM) I've thus far tried blacklisting the GPUs, I've tried manually specifying the ROM dump, both VMs are using OVMF and Q35, both work fine when only VNC is specified as the graphics adapter. I've tried disabling kvm, to avoid the nvidia issue where the GPUs don't work if it's in kvm, but i'm not sure if I did that right. VM XML files are attached. Syslinux config: default menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append iommu=pt vfio-pci.ids=10de:1b06 initrd=/bzroot label unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemode label unRAID OS GUI Safe Mode (no plugins) kernel /bzimage append initrd=/bzroot,/bzroot-gui unraidsafemode label Memtest86+ kernel /memtest So, anyone got any ideas? ubuntuvm.xml windowsvm.xml Quote Link to comment
coppit Posted November 6, 2017 Share Posted November 6, 2017 LOL. Did you pull the trigger on Threadripper as soon as the passthrough fix was announced? I ran into exactly the same issues today. (Stuck in D3, and need to reboot after each VM boot.) System Specs: Threadripper 1950X ASRock - X399 Taichi 64 GB DDR4 memory Lots of drives 2x GTX 960 I'm trying to pass through one of my GPUs to a Windows 10 VM, and the other to a different Windows 10 VM. I have a 3rd GPU in the first slot, and I use that for my unraid bootup. Before the server rebuild, I was able to pass the NVIDIA GPUs through with my Xeon processor as long as I did the ACS override. I would hear the fans spin up when the VM started. No such luck this time. The first problem I ran into was the ROM error, so I followed the instructions in this video on how to download a ROM and edit it to work with KVM. I didn't try blacklisting the devices in the kernel, but I did try disable_idle_d3=1 to the boot options. No luck. Attaching my VM XML. juggernaut-2017-11-04.xml Quote Link to comment
coppit Posted November 6, 2017 Share Posted November 6, 2017 I was going to try blacklisting my cards in the kernel boot params, but both my cards have the same IDs, so I'm not sure if "vfio-pci.ids=10de:1401,10de:0fba" will work... $ lspci -nn | grep NVIDIA 09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev ff) 09:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev ff) 41:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev ff) 41:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev ff) Quote Link to comment
mikeyosm Posted November 6, 2017 Share Posted November 6, 2017 Bugger that. I pulled the trigger on an Asus Zenith Extreme x399 but not on a processor yet. I'm thinking of whether to return it and go x299 to avoid all the TR4 passthrough issues... Advice? Quote Link to comment
amelius Posted November 6, 2017 Author Share Posted November 6, 2017 So after struggling with trying to use Unraid and looking around some forums, I ended up pivoting to ESXi, which actually has no problem with GPU passthrough (though the configuration is a pain), but has a shocking amount of difficulty passing through USB devices (but this is also surmountable). The only downside is the lack of convenient software RAID support. Quote Link to comment
mikeyosm Posted November 6, 2017 Share Posted November 6, 2017 6 minutes ago, amelius said: So after struggling with trying to use Unraid and looking around some forums, I ended up pivoting to ESXi, which actually has no problem with GPU passthrough (though the configuration is a pain), but has a shocking amount of difficulty passing through USB devices (but this is also surmountable). The only downside is the lack of convenient software RAID support. I thought nvidia passthrough was a no go with ESXi? Also, ESXi does not show temperatures for motherboard etc? Quote Link to comment
amelius Posted November 6, 2017 Author Share Posted November 6, 2017 20 minutes ago, mikeyosm said: I thought nvidia passthrough was a no go with ESXi? Also, ESXi does not show temperatures for motherboard etc? Idk where you heard that, sure, that's what their site *claims* but in reality, it's totally not an issue, you just need to set hypervisor.cpuid.v0 = FALSE and it's all fine. Also, ESXi handles that D3 issue no problem, since you can set the way it handles turning PCI devices on and off. (Tip, if you want to not reboot between using the same GPU, make sure you don't do a forced shutdown on a VM that has a GPU passed through, only a proper shutdown will make it available again for that same (or another) vm without a reboot of the host. If you want a guide that outlines passing through in ESXi, https://www.reddit.com/r/Amd/comments/72ula0/tr1950x_gtx_1060_passthrough_with_esxi/ has a rough outline that works perfectly. I tested it with my configuration and had no issues with getting up and running on both Windows 10 and Ubuntu 16.04 with a GTX 1080 Ti passed through on each one (though on Ubuntu I ran into an annoying login loop i've run into before, but that's just an issue with Xorg and Nvidia drivers not playing nice, not an issue with the passthrough though.) Quote Link to comment
amelius Posted November 6, 2017 Author Share Posted November 6, 2017 6 hours ago, coppit said: I didn't try blacklisting the devices in the kernel, but I did try disable_idle_d3=1 to the boot options. No luck. Tried that, didn't help. I looked around, and it seems that this is an issue with KVM virtualization, and that the only ones that work are a) the windows hypervisor, and b) ESXi. I've tested ESXi, and it's working well for me. If you want to make use of your system and not wait for fixes for this issue, you might want to give ESXi a shot. Quote Link to comment
mikeyosm Posted November 6, 2017 Share Posted November 6, 2017 3 minutes ago, amelius said: Idk where you heard that, sure, that's what their site *claims* but in reality, it's totally not an issue, you just need to set hypervisor.cpuid.v0 = FALSE and it's all fine. Also, ESXi handles that D3 issue no problem, since you can set the way it handles turning PCI devices on and off. (Tip, if you want to not reboot between using the same GPU, make sure you don't do a forced shutdown on a VM that has a GPU passed through, only a proper shutdown will make it available again for that same (or another) vm without a reboot of the host. If you want a guide that outlines passing through in ESXi, https://www.reddit.com/r/Amd/comments/72ula0/tr1950x_gtx_1060_passthrough_with_esxi/ has a rough outline that works perfectly. I tested it with my configuration and had no issues with getting up and running on both Windows 10 and Ubuntu 16.04 with a GTX 1080 Ti passed through on each one (though on Ubuntu I ran into an annoying login loop i've run into before, but that's just an issue with Xorg and Nvidia drivers not playing nice, not an issue with the passthrough though.) Ah, I see, that's good to know. I did read somewhere that hypervisor.cpuid.v0 = FALSE disables certain performance enhancements within the W10 VM so I was reluctant to use this parameter. How about temperature monitoring? What does ESXi / vCenter show in terms of temperatures for your devices? Quote Link to comment
amelius Posted November 6, 2017 Author Share Posted November 6, 2017 Just now, mikeyosm said: Ah, I see, that's good to know. I did read somewhere that hypervisor.cpuid.v0 = FALSE disables certain performance enhancements within the W10 VM so I was reluctant to use this parameter. How about temperature monitoring? What does ESXi / vCenter show in terms of temperatures for your devices? I haven't really bothered with temperature monitoring at all yet, it supposedly might require extra drivers (not really sure about it), but I don't really care, I have a custom watercooling loop that has 600W more thermal dissipation capability than the balls-to-the-wall TDP my system components can generate overclocked. As for the hypervisor.cpuid.v0 = False disabling performance enhancements, maybe it does, but when I benchmarked it against another rig with a 1080 Ti in it, the performance difference was pretty negligible (and even then, the one that one out simply had a higher overclock by a bit anyways). I also tested it in a game, and had like... 2 fps difference at 4k and 1440p. I would say that would put any potential performance hit squarely in the "entirely imperceptible" category. Quote Link to comment
coppit Posted November 6, 2017 Share Posted November 6, 2017 (edited) 2 hours ago, mikeyosm said: Bugger that. I pulled the trigger on an Asus Zenith Extreme x399 but not on a processor yet. I'm thinking of whether to return it and go x299 to avoid all the TR4 passthrough issues... Advice? Wait... Am I to understand that when folks report success with GPU passthrough on the Ryzen threads, none of it is with Threadripper? Ugh. I assumed all Ryzen chips would work at this point! Edited November 6, 2017 by coppit Quote Link to comment
amelius Posted November 6, 2017 Author Share Posted November 6, 2017 11 minutes ago, coppit said: Wait... Am I to understand that when folks report success with GPU passthrough on the Ryzen threads, none of it is with Threadripper? Ugh. I assumed all Ryzen chips would work at this point! Yeah, Threadripper still has issues for KVM based virtualization, so far, the only thing I've seen that works is ESXi which operates on a different virtualization system entirely. I've heard that Windows Hypervisor also works, but I've not had a reason to test that. Quote Link to comment
coppit Posted November 7, 2017 Share Posted November 7, 2017 (edited) #!$#!@! I wish I had seen this before buying: https://www.reddit.com/r/Amd/comments/6vbe6w/threadripper_broken_on_linux_for_pci_passthrough/ Edited November 7, 2017 by coppit Quote Link to comment
heratic Posted January 4, 2018 Share Posted January 4, 2018 On 06/11/2017 at 9:48 PM, amelius said: Idk where you heard that, sure, that's what their site *claims* but in reality, it's totally not an issue, you just need to set hypervisor.cpuid.v0 = FALSE and it's all fine. Also, ESXi handles that D3 issue no problem, since you can set the way it handles turning PCI devices on and off. (Tip, if you want to not reboot between using the same GPU, make sure you don't do a forced shutdown on a VM that has a GPU passed through, only a proper shutdown will make it available again for that same (or another) vm without a reboot of the host. If you want a guide that outlines passing through in ESXi, https://www.reddit.com/r/Amd/comments/72ula0/tr1950x_gtx_1060_passthrough_with_esxi/ has a rough outline that works perfectly. I tested it with my configuration and had no issues with getting up and running on both Windows 10 and Ubuntu 16.04 with a GTX 1080 Ti passed through on each one (though on Ubuntu I ran into an annoying login loop i've run into before, but that's just an issue with Xorg and Nvidia drivers not playing nice, not an issue with the passthrough though.) So are you saying you can passthrough GTX geforce cards through on ESXi? I have always read that was not possible. When did that change? Quote Link to comment
amelius Posted January 4, 2018 Author Share Posted January 4, 2018 8 hours ago, heratic said: So are you saying you can passthrough GTX geforce cards through on ESXi? I have always read that was not possible. When did that change? I have no idea if or when it changed but as long as you set the property I listed above, it works fine, I gave 3 passes through 1080 Tis, and a Titan V passed through, all working. Only caveat is if you don't shut down a VM gracefully (you power it off instead of shutting down) the GPU associated with that VM won't work till you reboot the entire thing. Quote Link to comment
amelius Posted January 30, 2018 Author Share Posted January 30, 2018 Update: Unraid can now passthrough GPUs properly for TR as well. Quote Link to comment
mattz Posted March 23, 2019 Share Posted March 23, 2019 (edited) Not to dredge up ancient history, but I am now having the "stuck in D3" issue after an upgrade of my BIOS from v14 to v18 on the MSI X470 Gaming M7 motherboard with a Ryzen 2 2700x CPU. Using an Nvidia EVGA 1070 card. This just happened last weekend (Mar 2019). The BIOS update seemed innocuous enough, but it did quite a number to my VM setup... In fact, I cannot pass through my GPU or sound card without errors. Sounds like all the TR folks went with a BIOS update in late 2017 and got everything working. Why this cropped up for me now is beyond me! I am going to buy a super cheap secondary GPU to try to run for Unraid so I can get my main GPU on a VM again... Any other options you can see for me? Edited March 23, 2019 by mattz Deleted extra screen shot. Quote Link to comment
shuruga2 Posted April 22, 2019 Share Posted April 22, 2019 (edited) On 3/22/2019 at 11:49 PM, mattz said: Not to dredge up ancient history, but I am now having the "stuck in D3" issue after an upgrade of my BIOS from v14 to v18 on the MSI X470 Gaming M7 motherboard with a Ryzen 2 2700x CPU. Using an Nvidia EVGA 1070 card. This just happened last weekend (Mar 2019). The BIOS update seemed innocuous enough, but it did quite a number to my VM setup... In fact, I cannot pass through my GPU or sound card without errors. Sounds like all the TR folks went with a BIOS update in late 2017 and got everything working. Why this cropped up for me now is beyond me! I am going to buy a super cheap secondary GPU to try to run for Unraid so I can get my main GPU on a VM again... Any other options you can see for me? Did you get anywhere with this? I've tried 3 different cards now all with exactly the same result (as described in the first post) R5 230 GT 710 GTX 1070 I had a GT 760 I was passing through without any trouble but "things" started acting up. The eventual solution was to pull the 760 but not before I tried a system BIOS update. It sounds like that was a mistake and I cant flash back Currently running the 710 as system and passing the 230 If I dont try and pass through a rom the screen never lights up, vfio: Unable to power on device, stuck in D3 appears in the log but the VM eventually does start (I get a steam notification) If I shut down the VM I cannot restart it Turning off the VM service then restarting it give a libvert failed to start error After any attempt to start a VM the server will not restart/reboot and I have to hit the reset button or power cycle 1700x on a Prime x370-pro Edited April 22, 2019 by shuruga2 added my hardware Quote Link to comment
mattz Posted April 29, 2019 Share Posted April 29, 2019 @shuruga2 - Sorry about the delayed response. The only success I had with this was to downgrade the BIOS back to an earlier edition, and that works like a charm. You should be able to do this with some *unsupported* software. Check the post I started here--other folks have helped me with my MSI bios downgrade, but I think someone mentioned Asus (which is your Prime x370, right?): Quote Link to comment
shuruga2 Posted April 29, 2019 Share Posted April 29, 2019 18 hours ago, mattz said: @shuruga2 - Sorry about the delayed response. The only success I had with this was to downgrade the BIOS back to an earlier edition, and that works like a charm. You should be able to do this with some *unsupported* software. Check the post I started here--other folks have helped me with my MSI bios downgrade, but I think someone mentioned Asus (which is your Prime x370, right?): Thanks, I'll take a look and see what I can do Quote Link to comment
react Posted May 25, 2019 Share Posted May 25, 2019 Hello all, Same here with the x470 gaming pro and ryzen 2700. Downgraded from v19 to v17, solved the issue. Cheers Quote Link to comment
Goon666 Posted June 15, 2019 Share Posted June 15, 2019 hello, I have same issue, though with another mobo (asus). downgraded as well but maybe too much. in order to isolate the agesa/am4 combopi version that are causing the issue, can you provide me with the linkks to your msi bioses please? regards greg bahde Quote Link to comment
Goldmaster Posted October 1, 2021 Share Posted October 1, 2021 wonder if anyone else has had any luck with this as im in a similar issue with windows 11 hanging like so. there are bios updates here but im unsure if any of then would fix the issue. https://www.asus.com/Motherboards-Components/Motherboards/Workstation/Pro-WS-WRX80E-SAGE-SE-WIFI/HelpDesk_BIOS/ Quote Link to comment
Fabrizio_G Posted April 8 Share Posted April 8 In my case I had the same problem, I ended up solving it by disabling the PCIE Bifurcation in the BIOS. Previously it was configured for x4/x4/x4/x4 and when I changed it to x16 the VM booted perfectly and the STUCK IN D3 error no longer appeared. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.