• [6.9.2] passthrough gpu not working


    Brydezen
    • Annoyance

    I recently updated from 6.8.3 to 6.9.2 - everything worked fine. But none of my virtual machines works when passing through any gpu to them. They just boot loop into windows recovery or just don't load up at all. I get the tianocore loading screen on my display just fine. But after that it either freezes or boot loops. I have tried many different things I could find in other threads but nothing seems to work for me. I have no idea what to do at this point. 

    tower-diagnostics-20210412-1039.zip




    User Feedback

    Recommended Comments



    I just tried a completely new install of unraid. And at first glance everything seemed great. My main windows VM actually started up and was somewhat usable. WAAAAAY more than just updating the normal usb. But still not enough for me to be happy running on daily. So I decided to just try and make a completely new VM. Everything went great over VNC and the first boot with a graphics card also seemed fine. But after the nvidia drivers where about 15% into the installation the VM just freezed up. At this point I don't know what else to do. Can't be on 6.8.2 forever. I want to upgrade to 6.9.2 or beyond. But I don't know what to do at this point. I'm begining to give up on unraid. If it's this much hassle I might just end up switching to another hypervisor. I feel like I have done close to everything I can. Trying to run the VM in a bazillion different configurations: Hypervisor: Yes, No. USB: 2.0, 3.0 etc etc. More ram, less ram and so on. Once in a while it will actually start up into the desktop itself even with only the gpu and rdp. But crash after like 2 - 3 minutes

    Link to comment

    I feel your pain.  I get the exact same scenarios too...  I think sometimes if you make a change big enough to the VM config the GPU hardware info changes slightly and Windows notices that and the Nvidia driver doesn't load up fully/right away. It then sorts itself out a bit after Windows boots up and re-inits the GPU fully and it dies.  That's why sometimes it appears to work. That's what I suspect anyhow.  I popped the cork thinking it was fixed a couple times and then came back to a black screen and boot loop... I tried some older Nvidia drivers (About 2 years old) too and no luck.

    Link to comment

    I got some good news. I have been able to reinstall a new Windows 10 machine with only the GPU passed through and connect to it over RDP. Tho windows keeps reporting error 43 with the GPU in device management. I followed this guide to setup the VM itself: 

    I then also unticked the GPU from Tools > System Devices and added them directly to the flash drive command using: 

    pci-stub.ids=XXXX:XXXX,XXXX:XXXX

    but I have not overcome the error 43 yet. But it is for sure a step further than I have ever come before. Think I will try and follow this long guide next: 

     

    Link to comment
    3 hours ago, mikeg_321 said:

    I hope you are on to something!  I was headed down that road at one point too but stopped.  I was under the impression that with newer Nvidia drivers that VM's should no longer be an issue with error 43. Perhaps it's still an issue though despite them allowing VM's.  Worth a try for sure.  Let me know if you want me to test or check anything on my system.    NVIDIA enables GeForce GPU passthrough for Windows virtual machines

    If you don't mind trying it out? I have seen people talking about lower graphics driver versions. But I haven't tried it yet. My next move is trying to go back to Legacy. Right now I'm on UEFI. Wanna see if they gets me further. But installing the machine on Q35 and adding those lines to my template for sure got me further. Don't have time until Monday to work on it a bit more. But only gonna give it two more days before migrating back to 6.8.3 - can be spending this much time on something that should just work.

    Edited by Brydezen
    Link to comment

    Hey same for me since months. I've tried another hypervisor "Proxmox" : passthrough gpu works on kernel 5.3.10 but after 5.4.34 got the same crash/reboot loop vm after windows load driver and sometimes crash proxmox.

    Link to comment

    @Hetimop.  Sorry to hear you are having trouble too.  What Motherboard are you using?  I have this theory that the issue is linked to certain Motherboards maybe but would like to dis-prove that.  Also thinking it's something in newer Kernels that disagrees with KVM/Nvidia and another factor (Maybe Motherboard or BIOS or the like...) 

     

    I wonder if one of the unRAID 6.9 RC versions had a Kernel less than 5.4.34.  I would like to try that just for fun if a person could find an RC release somewhere.  Anyone know where I could get an old 6.9 release candidate to test with?

     

    @Brydezen.  What did you want me to try out? The first part hiding the KVM stuff? or the 2nd part patching the driver?  I have done the KVM hiding stuff already and didn't work 100% like you are seeing.   That last part (Patching the driver) guide you posted is very outdated and I'm afraid would not even work any longer. Someone on the last pages basically stated that. What Nvidia driver version did you use where you are seeing the code 43?

     

    When I have a chance hopefully this week I'll run through the first guide (hiding vm with commands) part again and see if any luck.  I'm also going to try the Nvidia studio driver instead.  I used that in a Win11 VM recently and it fixed some other unrelated issues.  

    Link to comment
    On 1/9/2022 at 7:17 PM, mikeg_321 said:

    @Hetimop.  Sorry to hear you are having trouble too.  What Motherboard are you using?  I have this theory that the issue is linked to certain Motherboards maybe but would like to dis-prove that.  Also thinking it's something in newer Kernels that disagrees with KVM/Nvidia and another factor (Maybe Motherboard or BIOS or the like...) 

     

    I wonder if one of the unRAID 6.9 RC versions had a Kernel less than 5.4.34.  I would like to try that just for fun if a person could find an RC release somewhere.  Anyone know where I could get an old 6.9 release candidate to test with?

     

    @Brydezen.  What did you want me to try out? The first part hiding the KVM stuff? or the 2nd part patching the driver?  I have done the KVM hiding stuff already and didn't work 100% like you are seeing.   That last part (Patching the driver) guide you posted is very outdated and I'm afraid would not even work any longer. Someone on the last pages basically stated that. What Nvidia driver version did you use where you are seeing the code 43?

     

    When I have a chance hopefully this week I'll run through the first guide (hiding vm with commands) part again and see if any luck.  I'm also going to try the Nvidia studio driver instead.  I used that in a Win11 VM recently and it fixed some other unrelated issues.  

    Wanted you to maybe try a new Windows 10 VM on 6.9.2 Q35 with all UEFI booting. 

    I just tried it. And seemed to still give me error 43. Then I tried pulling out my old main VM vdisk. I created a new template for it. And low and behold it somewhat worked. It was not unusable but still wasn't able to play any games. I then tried to upgrade to the latest Nvidia geforce game ready driver. Using a "clean install" in the advanced section. And after doing that it went back to totally unusable. I blame nvidia for the issue now. But hard to say for sure. Before it was running 471.68 (nvidia driver) - Not sure what to do now. Maybe I will try this guide and see if it can fix the VM for good. 
    https://forums.unraid.net/topic/103501-gpu-passthrough-doesnt-work-after-updating-to-unraid-69/?do=findComment&comment=961341

    Link to comment

    UPDATE:
    I pulled the plug spend over 5 days trying to fix it. So rolled back to 6.8.3 - before I did I also tried 6.10-RC2 as last straw. I read somewhere that the linux kernel had problems with VFIO passthrough in version 5.1, 5.2 and 5.3 - and unraid just updated to 5.1 in 6.9.2 - so I blame it on the kernel choice. I hope later versions of unraid could advance beyond those kernels with potential problems.

    https://www.heiko-sieger.info/running-windows-10-on-linux-using-kvm-with-vga-passthrough/#Kernel_51_through_53_having_Issues_with_VFIO-solved_with_latest_kernel_update

     

    EDIT: Not saying he is right. But seems odd that so many are having problems with the 5.1 kernel in unraid 6.9(.X)

    Edited by Brydezen
    Link to comment
    On 1/10/2022 at 1:49 PM, Brydezen said:

    Wanted you to maybe try a new Windows 10 VM on 6.9.2 Q35 with all UEFI booting. 
     

    I tried a bunch more stuff last night including a change to UEFI and still nothing (Although was on 6.10RC2).  I believe this is kernel/Nvidia driver interaction level stuff but must also have something to do with specific hardware.  If not somewhat hardware dependent I would expect that more people would be saying me too on this thread.  It feels like there must be a majority of folks on 6.9x plus with working Nvidia VM's in win 10.  Maybe we need to start a poll or something...

     

    By the way, I spun up an Ubuntu VM and no issues at all.  I have one more thing to try and then am also going to have to give up for a bit.

    Link to comment

    At last, some success and a viable work-around to this issue I think! Enable MSI's for the GPU.  I was helping someone on this in another thread for an audio issue and it dawned on me that the audio portion of the GPU won't work well without MSI's enabled, maybe the GPU needs it now too.  It's a bit of a chicken and egg scenario though... The "fix" needs to be implemented in the Windows registry with the GPU passed through and working, but the Nvidia driver install or any update to the GPU will undo the fix and cause a BSOD.    

     

    We'll now always need to force the GPU to use Message Signalled Interrupts (MSI's). Something in the newer kernel or Hypervisor or both has made this a requirement now for some setups.  Although I still think this is likely not biting everyone.  It must also depend on your motherboard/CPU and interrupts in use etc I guess. 

     

    To close out this painful experience.  Here is a screenshot of the actual error that windows throws when the the GPU is initialised with Line Based Interrupts: 

    1280987453_MSIissue.png.c3de18da86180a585f64f6348450ba44.png

    That is also a clue as it indicates that there is something timing out related to nvlddmkm.sys (Nvidia driver).  Windows waits and then times out waiting for this process and throws an error is how I read this (Video TDR Failure).

     

    Why Nvidia doesn't enable MSI's by default I don't know. If they did this would not be a problem for us and audio pass-through would also work better. (edited: or maybe this is the part that makes this hardware dependant - perhaps MSI is enabled by default on newer motherboards)  This is a bit tricky like I said.  I'll post how I did it below this shortly but involves using Safe Mode.

     

    I have had my VM running now for 2 hours doing benchmarks and surviving multiple reboots so I think this is the solution we need but I have yet to implement on my main unRAID server so hopefully not jumping the gun here.

     

    It won't survive driver updates so the process will likely need to be re-done after that.  I suspect anything that slightly changes the address the GPU is referenced to in Windows will revert things back to a boot loop as I think the driver undoes the MSI changes when it installs or updates.  It does for the audio part for sure based on my experiences.

    Edited by mikeg_321
    Link to comment

    To enable MSI's we'll need to:

     

    1. Boot into the VM with only VNC graphics enabled
    2. Enable Safe Mode via MSCONFIG (use Search and type in MSCONFIG and click or run the program "System Configuration."
    3. Go to the Boot tab and select safe boot and I always enable network but is not necessarily needed for this.
    4. Press OK and power-down the VM

     

    msconfig.jpeg.250d1cee24bea595887c13107e1a28be.jpeg

     

    1. Now within the unRAID VM interface - change graphics adapter and assign your Nvidia GPU (passed through to the VM as typically done for 6.8 and prior.)
    2. Boot back into the VM (The GPU should display Windows content to your display and be usable, but is not using the NVIDIA driver so should not crash.  This is important because we need the GPU ID.
    3. Go to Device Manager --> Display Adapters -->GPU Properties. Click on Details and change drop-down to Device Instance Path

    Copy or note down the entire address as you'll need to locate that in the Registry.

     

    2048914273_DeviceInst.jpeg.2376a12942b4e7156c3066e121ecedbc.jpeg

     

    1. Now open up Regedit (Run:regedit)
    2. Find the device instance under: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\<your device instance info> (screenshot Green boxes)
    3. Add 2 new keys and 1 Dword as per the screenshot - red boxes.
      1. new key: Interrupt Management
      2. new key: MessageSignaledInterruptProperties
      3. new Dword: MSISupported
    4. Set the DWORD value MSISupported to 1 for enabled:
    5. Close Regedit

    MSI-Regedit.jpeg.765f2d3e53f4e4e43d3e6915943e7680.jpeg

     

    1. Now go back into MSCONFIG and disable Safe mode (reverse of enabling).
    2. Reboot and if all went well the GPU will function as expected.

     

    Reference to this post on enabling MSI's (It has more details and is where I heard of MSI's a while back) - note there is also a utility to enable MSI's but it doesn't seem to work in Safe Mode so the manual implementation is needed in this case.

    Link to comment
    14 hours ago, mikeg_321 said:

    To enable MSI's we'll need to:

     

    1. Boot into the VM with only VNC graphics enabled
    2. Enable Safe Mode via MSCONFIG (use Search and type in MSCONFIG and click or run the program "System Configuration."
    3. Go to the Boot tab and select safe boot and I always enable network but is not necessarily needed for this.
    4. Press OK and power-down the VM

     

    msconfig.jpeg.250d1cee24bea595887c13107e1a28be.jpeg

     

    1. Now within the unRAID VM interface - change graphics adapter and assign your Nvidia GPU (passed through to the VM as typically done for 6.8 and prior.)
    2. Boot back into the VM (The GPU should display Windows content to your display and be usable, but is not using the NVIDIA driver so should not crash.  This is important because we need the GPU ID.
    3. Go to Device Manager --> Display Adapters -->GPU Properties. Click on Details and change drop-down to Device Instance Path

    Copy or note down the entire address as you'll need to locate that in the Registry.

     

    2048914273_DeviceInst.jpeg.2376a12942b4e7156c3066e121ecedbc.jpeg

     

    1. Now open up Regedit (Run:regedit)
    2. Find the device instance under: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\<your device instance info> (screenshot Green boxes)
    3. Add 2 new keys and 1 Dword as per the screenshot - red boxes.
      1. new key: Interrupt Management
      2. new key: MessageSignaledInterruptProperties
      3. new Dword: MSISupported
    4. Set the DWORD value MSISupported to 1 for enabled:
    5. Close Regedit

    MSI-Regedit.jpeg.765f2d3e53f4e4e43d3e6915943e7680.jpeg

     

    1. Now go back into MSCONFIG and disable Safe mode (reverse of enabling).
    2. Reboot and if all went well the GPU will function as expected.

     

    Reference to this post on enabling MSI's (It has more details and is where I heard of MSI's a while back) - note there is also a utility to enable MSI's but it doesn't seem to work in Safe Mode so the manual implementation is needed in this case.

    So you are saying that you now have a fully functional VM with a GPU passed through with no hickups at all just by enabling MSI interrupts?

     

    Also have some other questions about your VM:

    what bios and version did you use?

    Did you do a fresh reinstall?

    What Nvidia driver did you install?

    Do you have HyperV enabled on the VM? if Yes, then what do you have in their?

    Any other special XML you have added?

    Tell as much as you can. So I can try and recreate on my own machine 🤞🏻

     

    I thought it was mostly enabled if you had audio issues on your VM. Looking at the lspci -v -s <ID> I can se my current VM on 6.8.3 does have MSI enabled on the GPU. Just seems odd it's should all be down too that. Maybe someone can or have created a script for manually check if it's enabled on every boot.

     

    EDIT: This little snippet in powershell can grab the "DISPLAY" aka GPU installed and give you the patch. Will see if I can get some sort of script up and running for check if MSISupported is set to 1 or not. 

    gwmi Win32_PnPSignedDriver | ? DeviceClass -eq "DISPLAY" | Select DeviceID

    EDIT 2: Think I have most of the pieces for creating a noob script. I'm no way good at powershell. This is my first ever attempt at creating on. But it will: Check if their is a graphics card - Get the Device instance path. Check is the "MessageSignaledInterruptProperties" exsist in the registery keys. And then check if "MSISupported" exsist and what value it has. And based on what value it has it should change it. And if it changes I will make it do a automatic reboot (maybe) Or maybe just a popup saying its been changes and you should reboot.

    Edited by Brydezen
    Link to comment
    8 hours ago, Brydezen said:

    So you are saying that you now have a fully functional VM with a GPU passed through with no hickups at all just by enabling MSI interrupts?

     

    Also have some other questions about your VM:

    what bios and version did you use?

    Did you do a fresh reinstall?

    What Nvidia driver did you install?

    Tell as much as you can. So I can try and recreate on my own machine 🤞🏻

     

    • I know hard to believe right!  but Correct.  Fully functional, Unigine Benchmark numbers are real good. Same or better than 6.8.3 VMs
    • unRAID 6.10RC2 (Suspect 6.9.2 will work)
    • Looks like I settled on Q35(5.1) and OVMF-TPM. Suspect newer version will work too.
    • Was a recent fresh install - few days old with a lot of trial and error miles on it. So not virgin (I was able to enable MSI on it taking it from dead to working in obvious fashion)
    • Nvidia driver is the Studio series (not Game Ready): (I don't believe the studio thing matters, it was just something I had tried before that didn't work out - i.e. it was still BSOD on that version before enabling MSI's)

              Version:511.09

              Release Date:2022.1.4

              Operating System:Windows 10 64-bit, Windows 11

    • Attaching my VM config so you can see others things I have in there.  It does have a couple of the KVM hiding options in there (the stuff from one of your earlier posts).  

      <kvm>

          <hidden state='on'/>
        </kvm>)

    • Also passing in my GPU BIOS but that may or may not make a difference.  All my VMs on 6.8.3 don't need the BIOS to run but I pass in anyhow.  

    • NOTE: the VM config file is from when I had VNC and GPU enabled - 2 video cards essentially.  After it was working I just deleted the video lines pertaining to VLC and saved the config.  Booted up fine and ran benchmarks like that.

    Quote

    I thought it was mostly enabled if you had audio issues on your VM. Looking at the lspci -v -s <ID> I can se my current VM on 6.8.3 does have MSI enabled on the GPU. Just seems odd it's should all be down too that. Maybe someone can or have created a script for manually check if it's enabled on every boot.

    Agreed. It was needed just for audio in past.  This is the part that confuses and slightly worries me.  I thought mine were MSI enabled too when I went from 6.8 to 6.9.  So either they were not or the upgrade itself triggered the Nvidia driver to disable it or there's more here than I thought. (Hope not).  Acid test will be when I migrate my main server past 6.8.  I think I'll go to 6.10RC2 in the next little bit here.

     

    A tool to handle this would be great. Just not sure how it'll work when you need to install the driver first then run the tool... but the install of the driver crashes the machine.  I could only work around that by going to safe mode but sounds like you have some ideas here which would be fantastic!  Maybe just run the script from Safe mode or early in the boot process before the GPU inits fully.

     

    Good luck and let me know if you need any other info.  I'll keep an eye on this too and post back once I get my main system past 6.8 too.

    Workig_VM_Config w MSI ON.txt

    Link to comment
    26 minutes ago, mikeg_321 said:
    • I know hard to believe right!  but Correct.  Fully functional, Unigine Benchmark numbers are real good. Same or better than 6.8.3 VMs
    • unRAID 6.10RC2 (Suspect 6.9.2 will work)
    • Looks like I settled on Q35(5.1) and OVMF-TPM. Suspect newer version will work too.
    • Was a recent fresh install - few days old with a lot of trial and error miles on it. So not virgin (I was able to enable MSI on it taking it from dead to working in obvious fashion)
    • Nvidia driver is the Studio series (not Game Ready): (I don't believe the studio thing matters, it was just something I had tried before that didn't work out - i.e. it was still BSOD on that version before enabling MSI's)

              Version:511.09

              Release Date:2022.1.4

              Operating System:Windows 10 64-bit, Windows 11

    • Attaching my VM config so you can see others things I have in there.  It does have a couple of the KVM hiding options in there (the stuff from one of your earlier posts).  

      <kvm>

          <hidden state='on'/>
        </kvm>)

    • Also passing in my GPU BIOS but that may or may not make a difference.  All my VMs on 6.8.3 don't need the BIOS to run but I pass in anyhow.  

    • NOTE: the VM config file is from when I had VNC and GPU enabled - 2 video cards essentially.  After it was working I just deleted the video lines pertaining to VLC and saved the config.  Booted up fine and ran benchmarks like that.

    Agreed. It was needed just for audio in past.  This is the part that confuses and slightly worries me.  I thought mine were MSI enabled too when I went from 6.8 to 6.9.  So either they were not or the upgrade itself triggered the Nvidia driver to disable it or there's more here than I thought. (Hope not).  Acid test will be when I migrate my main server past 6.8.  I think I'll go to 6.10RC2 in the next little bit here.

     

    A tool to handle this would be great. Just not sure how it'll work when you need to install the driver first then run the tool... but the install of the driver crashes the machine.  I could only work around that by going to safe mode but sounds like you have some ideas here which would be fantastic!  Maybe just run the script from Safe mode or early in the boot process before the GPU inits fully.

     

    Good luck and let me know if you need any other info.  I'll keep an eye on this too and post back once I get my main system past 6.8 too.

    Workig_VM_Config w MSI ON.txt

    That was my next question in regards to how you installed the nvidia drivers. Because loading the VM up with no graphics driver but MSI enabled works fine. But halfway through it might crash because it maybe changes the device instance path because of the update. So you just installed them in safe mode? Because having the installer crash midway is for sure gonna cause some kind of weird problem if it's not allowed to finish fully.

     

    Think I might prep a fresh new VM in Q35 5.1 on my 6.8.3 machine and try and migrate that to either 6.9.2 or 6.10RC2

    Edited by Brydezen
    Link to comment

    That's right.  My steps to install GPU driver were:

    1. Boot into Windows with just VNC Video Driver enabled- set the boot mode to Safe Mode (Like the screenshot in my earlier post)
    2. Shutdown and reconfig with the NVidia GPU passed through now and remove VNC config
    3. Boot up and you should be into Safe Mode with the GPU displaying things but with a basic display adapter driver
    4. Note the device instance of GPU/HDMI sound and enable MSI for both.
    5. Uncheck Safe Mode booting
    6. Reboot and cross your fingers.  As long as the Device instance didn't change you should be up and running.

     

    Yeah, with a fresh install and MSI enabled on 6.8.3 with any luck it will stay enabled when you boot up on 6.9+.  If not though try the above as it seems the device instance updates and/or MSI disables based on just unRAID changing version.  Probably new HyperVisor triggers something in Windows looking like new hardware.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.