• [6.10-RC1] Quadro Card dont work in VM's correctly


    Mirai Misaka
    • Solved Minor

    After i updated to to 6.10 rc1 one of my gpu's dont work correctly anymore.
    i have 2 gpu's in my system, primary slot GTX1650
    secondary slot Quadro P620
    the 1650 works just fine with V-bios dump
    the Quadro P620 works if i use it for plex Docker (with the Nvidia Driver) but in VM's it shows me code 43 in windows and in Pop os vm it cant Initilaze the gpu correctly either
    AFAIK i dont need a vbios dump for Quadros in order to use them in VM's, and it worked before the update (from 6.9)


    Tested the P620 in a different system and verified that the gpu itself is working correctly 

    Replaced the 620 with an GT710, and with the Vbios Dump it worked fine, so the slot is working
    got no Display output what so ever on: pass through the 620 without the sound device or with the Vbios  (not changed)

     

    I add some Pictures and logs if that is of any help

    P620 used in different windows PC, wokes fine (secondary Slot).jpg

    Device manager in Win11 VM P620.jpg

    POP os screen before Desktop P620.jpg

    hiiragi-diagnostics-20210809-1814.zip WIN11 LOG on p620.txt




    User Feedback

    Recommended Comments

    i was able to recreate the problem.

    i first checked the GPU again, its fine:

    after the i took my main PC apart and tested it with a separate stick, first on 6.10-rc1 and after it from 6.9.2 this time i got issues on both, that was not the exact same as i was having on my real server.

    then i deleted everything on the test machine again and started from 6.9.0, brought the vm to post with that p620 and worked my way up from there.

     

    i think the new QEMU version is causing this, but i'm not sure:

     

    i will post my "test report" here for more details what i did, maybe that does help the devs save some time:

    and sorry for typos etc. it really late...

     

    test with separate system: all drives removed just on empty 1TB NVME for array and USB stick for boot of unraid
    2 gpu's
    primary slot GT710
    secondary P620

    first test new stick directly with 6.10-rc1 non UEFI
    GT710 worked,
    p620 did not, no display output, when paired together with the GT710 it did boot up on the GT710 and Showed the P620 with Code 43.

    erased the stick (not ssd) and put 6.9.2 on it, Same issue, did manage to bring the P620 online with deinstalling it together with driver in Device Manager. but after vm restart P620 was code 43 again, that trick did not work twice.

    marked the p620 in system devices together with its sound part and binded to vfio, rebooted, no effect
    ---------------------------------------

    Erased the stick again but this time on 6.9.0 with UEFI and secure erased the SSD.

    there are 2 gpu's in the system (primary slot GT710 and Secondary P620, both with displays attached to them) unraid always chooses the P620 as primary GPU.
    if i try to start the vm on the primary GPU i got a messed up display on that.
    back in the day i had that error as well, fixed it it with the following command:

    echo 0 > /sys/class/vtconsole/vtcon0/bind
    echo 0 > /sys/class/vtconsole/vtcon1/bind
    echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

    i did put it on the User script plugin (aside of APPS the only installed plugin during testing)
    if triggers, primary GPU can function without problems.
    Vm under P620 booted 5 times without any issue.

    (had a minor issue with the P620 during testing on 6.9.0, did just select the GPU itself and not the Sound card. Result: first vm boot was fine, after it, GPU didn't trigger on vm boot. added the Sound card of the GPU and all was fine after that)

    --------------------------------------------

    updated to 6.9.2:

    Works as 6.9.0.

    When primary GPU is used without the script: log fills up with: "Aug 10 22:43:49 Tower kernel: vfio-pci 0000:24:00.0: BAR 1: can't reserve [mem 0xc0000000-0xcfffffff 64bit pref]"
    both gpu's the same, most likely did that as well on 6.9.0 so i hues known issue:

    Vm under P620 booted 5 times without any issue.
    script is set to auto on start up of array (like on Main server)

    tested safe mode boot, no gui,no plugins: Passed

    reboot and Vm boot test again: passed

    everything normal
    --------------------------------------------

    updated to 6.10-RC1

    set Password,
    didn't log in,
    VM tab reads Libvirt service Failed to start.
    did log in into my servers.
    vm tab still dead
    restart array,
    vm tab still dead
    found issue, stopped array, enabled users shares and start array back up, vm tab is back

    vm boot on 620, vm did boot but issue is back that gpu is showing display output but locked to 600×800 P620 is back at c43

    vm does boot up with 620 every time but always locked to 600×800 code 43 unlike direct testing of 6.10-rc1 in non uefi mode, there was no display output what so ever

    disabled trigger for script on startup, and reboot

    issue is like on 6.9.2, error log fill up and screen is messed up.
    re- anabled script trigger and reboot

    vm boot up like before 600×800

    stopped vm's deleted libvirt and re setup the vm
    now the network driver is gone removed the virt io drivers and reinstalled them
    vm reboot, network back up but gpu is not
    deinstalled gpu in device manager with drivers and reinstalled it, no change

    -------------------------------------------

    restored to 6.9.2

    login screen is still like 6.10-rc1
    pw the same
    tried to start vm:

    Execution error: unsupported configuration: Emulator '/usr/local/sbin/qemu' does not support machine type 'pc-i440fx-6.0'
     

    deleted libvirt

    re setup vm
    vm on P620 works normal
    vm booted fine multiple times
    QEMU version the problem?

    -----------------------------------

    back to 6.10-rc1

    boot vm, again gpu locked at 600×800 code 43

    changed machine in the vm tab to i440fx-6.0 (from5.1)
    no change still c43
    changed to q35-6.0

     

    internal error: Bus 0 must be PCI for integrated PIIX3 USB or IDE controllers
    !

    Not valid!


    created new vm wit q35-6.0
    same issue
    changed first vm to i440fx-5.0
    same issue


    done for today

     

    Edited by Mirai Misaka
    Link to comment

    Okay little update

     

    An other User did had trouble getting his 1030 to work in a vm on 6.10-rc1. But it was his first time setting up a vm so that does not really mean anything.

     

    Because I wondered if maybe the GPU architecture is the problem I replaced the GT710 with a GTX1080ti

     

    in 6.10-rc1 the 1080ti just worked fine, every time. 620 still c43

     

    I went back to 6.9.2 and 620 did work again, then I tired something different and I know I may should have wrote the earlier but the driver that I use is the geforce driver, because the P620 did always work fine with it.

     

    After I encountered the 620 c43 in 6.10-rc1 I tried to reinstall the driver and I tried the proper quadro one (471.41)

     

    no matter how often I tried to install the quadro driver it did not work.

     

    With the 1080ti as soon windows saw it, it loaded the geforce driver anyway

     

    but when I install the quadro driver (471.41) in 6.9.2 on the p620 and update to 6.10-rc1 it does work!!!

    Multiple reboots of the vm and the p620 did work fine with the vm that was on quadro drivers before the update to 6.10-rc1

     

    booted the vm with the 1080ti, windows switched back to the geforce driver (471.68), 1080ti=fine

    Back on the p620 and it was on c43 again. Was not able to install the quadro driver again

     

    restored to 6.9.2 first boot of vm with both gpu’s in did shut down after a few second on its own. Second boot of vm, was fine. Both gpus worked fine and were automatically on the geforce driver 471.68

    • Like 1
    Link to comment

    I used this procedure; before upgrading the server backup vm xml. I restored the xml for the windows vm, next I started it and got the code 43 error I uninstalled the existing driver and restarted windows and installed nvidia 451.48 (this is what windows installs for p620 gpu) for me this worked on 6.9.2 (disclaimer I've not tried on 6.10.0-rc1 yet). Finally I updated to the latest nvidia driver.

     

    I did find that the xml order was changed by my playing around with it and that did make a difference in the stability (for whatever reason). Also I used a rom file with stripped header (for me I can't get it to boot otherwise).

     

    This is just what I've tried hopefully it gives you some clues.

     

     

    Edited by sand
    Link to comment
    16 hours ago, Mirai Misaka said:

    tryed again with rc2 today, same issue, and also with using Q35-4.2

    I've read that uraid bundles an nvidia driver per release version. So the inflexibility of this will invariably cause compatibility issues by pinning to one compatability list.

     

    A possible solution I'd suggest is to allow the user can say choose between 2 gpu driver versions (i.e., 1. a latest or 2. a compatability version). For me this would be a way to simply resolve a lot of this compatibility type issues, BUT obviously given this has other touch points in unraid would be more work to build/test vs. supporting one driver per unraid version.

    Edited by sand
    Link to comment

    I'm experiencing the same issue with 6.10rc2 trying to pass my primary GPU 1660Ti to VM. Code 43 all day. Roll back to 6.9.2 and it's fine. Tried a bunch of common sense stuff like you did but still no success.

    Link to comment
    On 11/4/2021 at 9:33 AM, bigbangus said:

    I'm experiencing the same issue with 6.10rc2 trying to pass my primary GPU 1660Ti to VM. Code 43 all day. Roll back to 6.9.2 and it's fine. Tried a bunch of common sense stuff like you did but still no success.

    Please attach your system diagnostics.

    Link to comment
    On 8/11/2021 at 12:01 AM, Mirai Misaka said:

    i was able to recreate the problem.

    Where you able to solve the issue?

    From what I see in your Diagnostics you run BIOS version 5837 on your Motherboard, the latest version is 5861.

     

    Can you set the Generation for your PCIe slots? I would recommend that you set it to Gen3 or Gen2, this should have no performance impact on most cards.

     

    On 8/11/2021 at 6:49 PM, Mirai Misaka said:

    in 6.10-rc1 the 1080ti just worked fine, every time. 620 still c43

    Do you use a GPU BIOS file in your VM configuration?

    If you use a driver version 465.89+ you should not need a GPU BIOS file: Click

     

     

    I've tried it now with my Nvidia T400 that is basically a Quadro but the Quadro line is no more.

    No, issue here passing it through to a Windows 10 and a Windows 11 VM with driver version 472.39.

    Please note that this is not my primary graphics card and the unRAID terminal uses the iGPU from my i5-10600

    I don't subbed the card or bound it to VFIO, also I'm using the Nvidia Driver plugin (I generally don't recommend using the driver plugin and using the same card in a VM since this can crash the host at times if you are not really careful!).

    Link to comment
    On 11/3/2021 at 10:45 AM, sand said:

    I've read that uraid bundles an nvidia driver per release version.

    Only if you install the Nvidia Driver plugin, otherwise it is not installed by default.

     

    On 11/3/2021 at 10:45 AM, sand said:

    A possible solution I'd suggest is to allow the user can say choose between 2 gpu driver versions (i.e., 1. a latest or 2. a compatability version).

    This is already possible, the plugin let you choose between the latest production branch driver, the latest new feature branch driver, maybe the latest beta driver and also the last v470.x driver,

    But keep in mind that the Nvidia Driver plugin is for another use case and that is if you want to utilize the card in Docker containers.

     

    You don't need the plugin if you want to use the card in a VM!

    • Like 1
    Link to comment
    8 hours ago, ich777 said:

    Where you able to solve the issue?

    From what I see in your Diagnostics you run BIOS version 5837 on your Motherboard, the latest version is 5861.

     

    Can you set the Generation for your PCIe slots? I would recommend that you set it to Gen3 or Gen2, this should have no performance impact on most cards.

     

    Do you use a GPU BIOS file in your VM configuration?

    If you use a driver version 465.89+ you should not need a GPU BIOS file: Click

     

     

    I've tried it now with my Nvidia T400 that is basically a Quadro but the Quadro line is no more.

    No, issue here passing it through to a Windows 10 and a Windows 11 VM with driver version 472.39.

    Please note that this is not my primary graphics card and the unRAID terminal uses the iGPU from my i5-10600

    I don't subbed the card or bound it to VFIO, also I'm using the Nvidia Driver plugin (I generally don't recommend using the driver plugin and using the same card in a VM since this can crash the host at times if you are not really careful!).

    i updated the BIOS, that is always a good idea but unfortunatly that did not solve the issue, (Quadro driver 472.39 but works with the 1650 without issue)
    I also tried to set the Pcie gen but im not sure if my Mainboard let me even do this, could at least not find the option for that. But it should be Gen 3 anyway AFAIK.

    I setteld on using the P620 for plex Transcoding, does a fine job there, but would like to use it as an display output as well.

    The GTX1650 works fine with and without the Bios dump file on all vm's
    p620 does not have an Vbios dump file

    The 1650 is on the primary slot and the P620 on the secondary, if i replace it with an old gt710, it works fine on that slot.

    i will attach an updated diagnostic just in case there is something usefull in it

    Thank you all for your time and recommandations

    hiiragi-diagnostics-20211108-2111.zip

    Edited by Mirai Misaka
    Link to comment

    Hi @Mirai Misaka, if you try to install a new Windows VM from scratch, do you still get the Code 43 error?  We need to isolate whether this is an issue within the guest OS or something to do with the host configuration.  @bigbangus had a similar issue after updating to 6.10, but found the problem was that the NVIDIA driver needed updating there.

     

    Another possible issue is whether or not this system features an integrated graphics device and whether that device is enabled or not.  Generally speaking when you want to pass through a GPU, you need one GPU per guest and one for the host.  In your setup, you seem to only have the two GPUs.

     

    I know that some users have found workarounds to making the primary GPU pass through, but NVIDIA does not officially support that configuration as an FYI:  https://nvidia.custhelp.com/app/answers/detail/a_id/5173/~/geforce-gpu-passthrough-for-windows-virtual-machine-(beta)

     

    Quote

     

    Do you need to have more than one GPU installed or can you leverage the same GPU being used by the host OS for virtualization?

     

    One GPU is required for the Linux host OS and one GPU is required for the Windows virtual machine.

     

     

    Link to comment

    Hi @jonp, i tried Installing the Windows VM from scratch, still got the same Code 43 error.
    i tried different Q35 and i440fx versions, and both BIOS versions. (on sea bios no GPU at all, not the 620 and not the 1650, vnc was fine),
    used a fresh Iso for win 10 and the latest Virt io drivers (virtio-win-0.1.208-1.iso)and lastest drivers, both the normal and Quadro driver from Nvidia
    but found that i could not install the virt io drivers on qemu 6.1 (both on i440fx and q35), needed that change to 4.2, installation worked fine there and switched back to 6.1 but tried of course 4.2 and even 2.4 as well, all the same issue in the end:
    p620 code 43 and the 1650 worked fine except sea bios.

    the 1650 is the primary GPU, i use a script that runs on startup to make it work for vm's

    echo 0 > /sys/class/vtconsole/vtcon0/bind
    echo 0 > /sys/class/vtconsole/vtcon1/bind
    echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

    but use that since ages and always worked.

    TBH: i settled to use the 620 just for plex, works fine for that and don't have that much time to work on this issue anyway, would still like to solve it, not just for me (having options is always nice) but also for other people that might run into the same issue but its not a high priority for me any more.

    (there is still the possibility that my GPU itself has a wired bug, because otherwise way more people would have the same issue, but it did work in any other use case and machine)

    Of course if anyone has a suggestion on what I could try, I will gladly do it but that might take some time, the RL keeps rolling…

     

    But thank you all for your time and help

    Link to comment

    Hi

     

    Any resolution or progress on this issue. I am facing the same. upgraded to 6.10.0 RC2 and both Quadro NS 510 and P400 show Code 43 now. 

     

    If I downgrade back to 6.9.2 all works perfectly.

     

    appreciate your help

    Link to comment

    Im happy to report that the issue seems to be fixed, at least sort of.


    I just tried it  last week again and was kinda supprised that the win10 VM (I440fx)(Driver:Quadro 511.65 Typ DCH) just installed the driver and worked.

    On a Win 11 VM on Q-35 i had to install the driver manuelly, worked but crashed every like 20 minutes. not sure if that was because of the vm type: Q35 or if the vm was just broke, stopped woking completly after a bit of testing soooo...
    new win11 vm on I440fx and everything works perfekt for days. so i deem this as fixed, not sure if unraid changed something or MS / Nvidia. But im happy now, hope this helps others that had the same issue.
    (Everything on 6.10-RC2)

    Link to comment

    Finally this issue was solved for me similar to what Mirai mentioned above. Th elatest nvidia driver for P400 Quadro solves the probelem for a Windows VM. I have also a an Nvidia NVS 510 and for this one it's not working yet. the latest driver is about 1 month old vs the P400 one. so I hope it will be fixed in the futue. 

     

    I moved the NVS 510 to Plex container for now. and wokring with P400 on the windows VM on 6.10.0 RC3

     

    Thank you all.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.