• [6.9.2] passthrough gpu not working


    Brydezen
    • Annoyance

    I recently updated from 6.8.3 to 6.9.2 - everything worked fine. But none of my virtual machines works when passing through any gpu to them. They just boot loop into windows recovery or just don't load up at all. I get the tianocore loading screen on my display just fine. But after that it either freezes or boot loops. I have tried many different things I could find in other threads but nothing seems to work for me. I have no idea what to do at this point. 

    tower-diagnostics-20210412-1039.zip




    User Feedback

    Recommended Comments



    I haven't upgraded my Prod 6.8.3 to 6.9.x as yet so don't know if I will have a similar issue. Have you tried adding a vBIOS to the card. I don't have ones on my 6.8.3 machine but may be a requirement in 6.9.x

    Link to comment

    Try update the machine type to the latest (or even second latest version).

    My VMs with a GTX 1060 passed to it work fine in 6.9.2

    Link to comment
    1 hour ago, SimonF said:

    I haven't upgraded my Prod 6.8.3 to 6.9.x as yet so don't know if I will have a similar issue. Have you tried adding a vBIOS to the card. I don't have ones on my 6.8.3 machine but may be a requirement in 6.9.x

    I have tried using the vBIOS. Both one I dumped myself and downloaded. Nothing works with that either.

    Edited by Brydezen
    Link to comment
    13 minutes ago, tjb_altf4 said:

    Try update the machine type to the latest (or even second latest version).

    My VMs with a GTX 1060 passed to it work fine in 6.9.2

    Weird. Maybe I will just reinstall my usb using the unraid tool. And transfer my settings files over. If not I will just downgrade I think. It's on my second day of troubleshooting.

    Link to comment

    Have you tried other machine types.

     

    I see you are running now running -machine pc-i440fx-5.1,

     

    I think 4.2 was max on 6.8.3.

     

    If you revert then you need to manually add cache back into 6.8.3.

     

    This was on the announcement for 6.9.0.

    Reverting back to 6.8.3

    If you have a cache disk/pool it will be necessary to either:

    restore the flash backup you created before upgrading (you did create a backup, right?), or

    on your flash, copy 'config/disk.cfg.bak' to 'config/disk.cfg' (restore 6.8.3 cache assignment), or

    manually re-assign storage devices assigned to cache back to cache

     

    This is because to support multiple pools, code detects the upgrade to 6.9.0 and moves the 'cache' device settings out of 'config/disk.cfg' and into 'config/pools/cache.cfg'.  If you downgrade back to 6.8.3 these settings need to be restored.

    Link to comment
    1 minute ago, SimonF said:

    Have you tried other machine types.

     

    I see you are running now running -machine pc-i440fx-5.1,

     

    I think 4.2 was max on 6.8.3.

     

    If you revert then you need to manually add cache back into 6.8.3.

     

    This was on the announcement for 6.9.0.

    Reverting back to 6.8.3

    If you have a cache disk/pool it will be necessary to either:

    restore the flash backup you created before upgrading (you did create a backup, right?), or

    on your flash, copy 'config/disk.cfg.bak' to 'config/disk.cfg' (restore 6.8.3 cache assignment), or

    manually re-assign storage devices assigned to cache back to cache

     

    This is because to support multiple pools, code detects the upgrade to 6.9.0 and moves the 'cache' device settings out of 'config/disk.cfg' and into 'config/pools/cache.cfg'.  If you downgrade back to 6.8.3 these settings need to be restored.

    I have tried all the new machine types. And the 4.2 it was on when I ran 6.8.3. Don't seem to have any difference at all. I did see that in the announcement. I also had to reformat my drive for the 1MiB "alignment bug" as I was getting crazy read and writes for some reason. 

    Link to comment

    Try 5.0, instead of 5.1.

    One of my Windows VMs black screens on 5.1, but is happy on 5.0

    Edited by tjb_altf4
    Link to comment
    39 minutes ago, tjb_altf4 said:

    Try 5.0, instead of 5.1.

    One of my Windows VMs black screens on 5.1, but is happy on 5.0

    I just did a fresh install of unraid on my usb copied all the files over. Tried 5.1, 5.0 and 4.2 I still get the same result. Nothing seems to work for me.

    Link to comment

    Is your server UEFI or legacy? Legacy seems to have less issues.

    When you first setup you VM, are you using noVNC/RDP, then installing GPU drivers? are you installing the virtio drivers and guest agent?

    Link to comment
    16 minutes ago, tjb_altf4 said:

    Is your server UEFI or legacy? Legacy seems to have less issues.

    When you first setup you VM, are you using noVNC/RDP, then installing GPU drivers? are you installing the virtio drivers and guest agent?

    Pretty sure it's UEFI. I said yes to the UEFI option when running the make_bootable script. And made sure my boot order was using UEFI: General USB 1.0 or something. 

    I do install all the virtio drivers the vm needs. But not 100% sure I have installed the guest agent tho. I might need to try and do that using vnc

     

    EDIT: I just tried installing the guest agent. It had no positive effect on the virtual machine.

    Edited by Brydezen
    Link to comment

    I could not get it to even boot into legacy mode for some reason. It just kept telling me it's not a bootable device. So i'm giving up now. going back to 6.8.3 - don't wanna waste more time troubleshooting when nothing seems to help at all. Hopefully the next release will work for me.

    Link to comment

    Brydezen, Did you ever get this figured out?  I just tried to go from 6.8.2 to 6.9x and having what appears to be the same thing as you described.  I've tried a million things like fresh VM windows install, passing through bios and various settings and nothing fixes for me.  I am on a legacy boot too.   Same thing happens on 6.10RC2 too, I've gone back to 6.8.3 now.

     

    When I have more time I am going to go back to 6.9 and try posting a problem report with diagnostics etc and tackle further.   My AMD cards work fine. 2 Nvidia ones don't RTX 2070 and a 1060.  Only thing I have not done that I can think might have an impact is to remove all add-ons/plug-ins in case something conflixt but I kind of doubt it.  

     

    I even ran with a VNC video and a 2nd vid card (Nvidia passed through).  You can actually see Windows halt with an error once the Nvidia drivers kick in and reboot after.  

     

    It kind of looks like we have some hardware in common too based on your diagnostics.  I'm running dual Xeon's on an Asrock EP2C602.  Are you by chance on that board or similar one too?

    Link to comment

    I am encountering the exact same issue with Unraid 6.9.2 and am beyond frustrated. I have followed every one of SpaceInvaderOne's tutorials, read countless forums, edited the XML, switched to Legacy, dumped vBIOS, tried i440 and Q35. Nothing, I repeat, nothing works. I am trying to passthrough an MSI GeForce GTX 970 Gaming GPU. I managed to get the card to show up in Device Manager in the VM but the second I attempt to install drivers, the entire server becomes unresponsive/crashes and I have to hard reset everything. PLEASE HELP!

    Link to comment
    On 4/12/2021 at 7:13 PM, SimonF said:

    Built new w10 vm with k4000 works ok on 6.9.2 with no bios. Will try copying image from 6.8.3 machine.

    What motherboard and cpu are you running?

    Link to comment
    On 12/8/2021 at 5:53 AM, mikeg_321 said:

    Brydezen, Did you ever get this figured out?  I just tried to go from 6.8.2 to 6.9x and having what appears to be the same thing as you described.  I've tried a million things like fresh VM windows install, passing through bios and various settings and nothing fixes for me.  I am on a legacy boot too.   Same thing happens on 6.10RC2 too, I've gone back to 6.8.3 now.

     

    When I have more time I am going to go back to 6.9 and try posting a problem report with diagnostics etc and tackle further.   My AMD cards work fine. 2 Nvidia ones don't RTX 2070 and a 1060.  Only thing I have not done that I can think might have an impact is to remove all add-ons/plug-ins in case something conflixt but I kind of doubt it.  

     

    I even ran with a VNC video and a 2nd vid card (Nvidia passed through).  You can actually see Windows halt with an error once the Nvidia drivers kick in and reboot after.  

     

    It kind of looks like we have some hardware in common too based on your diagnostics.  I'm running dual Xeon's on an Asrock EP2C602.  Are you by chance on that board or similar one too?

    I'm using the same motherboard. But I doubt it's the motherboard. But could maybe be the UEFI boot option. Will maybe try and do a complete fresh install of unraid with legecy and UEFI. Just copy over the disk and cache arrays. So everything else is totally clean. It's a long shot. But I can't keep staying at 6.8.3 - every plugin is getting outdated for my version. I just don't have that much time do deal with it. I need it to just work so it's really frustrating. Please let me know if you find a way to fix your issue

    Link to comment
    4 hours ago, Dythnire2022 said:

    I am encountering the exact same issue with Unraid 6.9.2 and am beyond frustrated. I have followed every one of SpaceInvaderOne's tutorials, read countless forums, edited the XML, switched to Legacy, dumped vBIOS, tried i440 and Q35. Nothing, I repeat, nothing works. I am trying to passthrough an MSI GeForce GTX 970 Gaming GPU. I managed to get the card to show up in Device Manager in the VM but the second I attempt to install drivers, the entire server becomes unresponsive/crashes and I have to hard reset everything. PLEASE HELP!

    What motherboard are you using? And are you using UEFI or legacy boot? My problems also start the second a graphics card is installed. Work just fine with VNC. But a graphics card makes the VM unusable

    Link to comment
    1 minute ago, SimonF said:

    Was a MSI B460M-A Pro + Celeron G5900 at the time.

    Alright. Can you remember what boot option you where running? UEFI or Legacy?

    Link to comment
    18 minutes ago, Brydezen said:

    Alright. Can you remember what boot option you where running? UEFI or Legacy?

    Was UEFI, replaced CPU and motherboard with 12600K + MSI Z690Pro-A.

     

    and fast boot turned off.

    Edited by SimonF
    Link to comment

    Just FYI I'm on Legacy boot and have always been that way. (Click on Flash under Main and I have Server Boot Mode: Legacy & Permit UEFI boot mode unchecked.

     

    Have either of you tried a Seabios VM? That was going to be my next move and maybe booting in safe mode to eliminate docker conflicts (although that feels unlikely).  I think you can boot a VM in safe mode, but not sure.  I like your idea of a clean install of UnRaid.

     

    Attached my Diags for reference (Current working setup on 6.3).  Will let you know if I get anywhere on this too.  Just not sure when I'll have time to go back to 6.10 and test.

    tower-diagnostics-20211229-1020.zip

    Edited by mikeg_321
    Link to comment
    46 minutes ago, mikeg_321 said:

    Just FYI I'm on Legacy boot and have always been that way. (Click on Flash under Main and I have Server Boot Mode: Legacy & Permit UEFI boot mode unchecked.

     

    Have either of you tried a Seabios VM? That was going to be my next move and maybe booting in safe mode to eliminate docker conflicts (although that feels unlikely).  I think you can boot a VM in safe mode, but not sure.  I like your idea of a clean install of UnRaid.

     

    Attached my Diags for reference.  Will let you know if I get anywhere on this too.  Just not sure when I'll have time to go back to 6.10 and test.

    tower-diagnostics-20211229-1020.zip

    So you are running Legacy and i'm running UEFI. Same motherboard and both problems with nvidia gpu's. Seems weird. Have you done any customization outside of the unraid web gui? I have done a few things related to CPU's but now I can't remember exactly what it was. That's why I was thinking about doing a completly fresh unraid install and test. 

    I haven't tried any Seabios VM's. Seems like most people recommend OVMF so never really striked me to use the Seabios.

     

    if a completly fresh unraid install don't work i'm sadly moving to another hypervisor. Makes me kinda sad if it has to come down to that. But seems like no one is really interested in helping anymore.

    Edited by Brydezen
    Link to comment

    I definitely did some customisation outside the UI way back on 6.3 ish versions, but I think I've taken most of those out (as long as I didn't forget as well).  I mostly was stubbing devices and had to put in something else I think to split out IOMMU grouping. Newer UnRaid versions didn't need those options so I think I removed.   I'll have to go and double check as maybe there is something in there but I kind of don't think so... but worth checking.  Another reason for trying safe mode if that will work.  It removes the bootloader/Grub kernel options I believe. - edit (Just remembered I rolled back to 6.8 so I guess I can't really check what I had as a 6.10 config.)

     

    I'm pretty much out of options here too... I spent at least 2 days solid trying various things and am stumped.  I was thinking it must be our BIOS or something deep down that we can't fix that is not agreeing with the latest hypervisor code that was updated in the 6.9+ UnRaid version.  Was going to try a different Bios but noticed you are on a newer bios than me so stopped on that path.  I'm not sure how to get further help but suspect this is more a Hypervisor issue vs UnRaid problem...  Would be nice if there was a way to get more help here though as I'm sure a few others must be impacted.  I searched about 10-15 pages manually in the KVM topic 2-3 weeks back and found a few posts that are like ours.

     

    I think I was another thread on here where a guy used Seabios and had some luck, but was very vague so tough to tell if it was the same root cause/"fix".  Even if that works it's not really a solution long term as I agree, OVMF seems to be the way to go from what I have seen... but it would be useful to know and may narrow things down.

     

    @Dythnire2022.  What mother board and CPU are you using?  If same or similar to myself/Brydezen that would help us narrow this down one way or another.

    Edited by mikeg_321
    Link to comment

    Well, I spun up a 2nd server (very similar Asrock Mother board as my main server EP2C602 based with 4 LAN ports).  I tried fresh UnRaid 6.10RC2 and 6.9.0 installs. No tweaks, no Dockers or apps or special settings.  Same results in VM's where they crash and reboot endlessly as OP described - error is Stop Code:Video TDR failure nvlddmkm.sys or dxgkrnl.sys.  Seems to alternate maybe or could be 2 error screens after each other.  (Can only see this if the Nvidia card is a 2nd graphics card and the primary is VLC.)

     

    I've tried all I can find on the forums to resolve with no luck at all like Multifunction=on and with/without Hyper-V and what feels like about 100 other different little tweaks.  Nothing works..  I even tried a multitude of different BIOS settings and Vid card in different PCIe slots.  The only thing I haven't done is get a SeaBios system running.  I've tried but it just won't work (Black screens.. video card gets disabled by Windows).

     

    So I'm pretty much done as there's nothing more I can think to try. It would be helpful if there was some error in a log to see, but I'm not really sure where to look for a smoking gun type error that might shed more light. Diags from my test server taken after Windows VM crash attached for reference and hopeful for some help to narrow in on things if anyone has the skills to direct me. 🙂 

     

    I'm stuck on 6.8.3 for now which sucks as I also need the Radeon reset patch which is super simple to install on 6.10...  I'd be happy for any suggestions on digging more into what might be causing the VM/Nvidia driver crash issue.  Just not sure where to turn next. My gut tells me this is some mismatch with this M/B and the hypervisor or 5.x kernel as that is new as of UnRaid 6.9+ I believe.  (was kernel 4.x before).

    tower-test 6_9_0-diagnostics-20211229-1754.zip

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.