First time Unraid User trying to get GFX passthrough working for first time


Vidamus

Recommended Posts

Hi All,

 

I am trying to turn my Video Editing PC into an Unraid server with Windows installed into a VM with hardware graphics pass through.

 

The primary purpose of this machine is to be both a NAS and a Video Editing Workstation. The key idea being that all my source footage and video projects are safely stored on a redundant NAS, but at the same time my Video Editing Workstation has "local machine" access to the files because it's running on the same hardware.

 

The machine specs are:
MSI MPG X570 Gaming Edge Wifi Motherboard

AMD Ryzen 7 3700X

32Gb DDR4 2133Mhz

MSI nVidia GTX 1660 Super

nVidia GTX 1050 Ti (brand escapes me right now)

 

Parity 1: WD Red Pro 16Tb

Data 1: WD Red Pro 16Tb

Cache: Seagate Firecuda 1Tb

 

Unassigned: Gigabyte Aorus 1Tb SSD (my existing Windows install)

 

So far I have successfully configured and started the the Array as above and I am trying to get a Windows 10 VM going for the first time.

 

Note that I am not trying to import the existing Win 10 install, I am starting from scratch.

 

I am having trouble trying to understand how to properly setup the Graphics Cards for GPU passthrough. I wanted to use the GTX 1660 Super in the PCI-e Slot 1 for GPU passthrough and the GTX 1050 Ti for Unraid or perhaps other VMs in future.

 

When I assign the GPU to the VM and try to start it, I get an "Execution Error".

Quote

internal error: qemu unexpectedly closed the monitor: 2022-01-19T06:57:11.158239Z qemu-system-x86_64: -device vfio-pci,host=0000:2d:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio 0000:2d:00.0: group 31 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver.

 

After some digging around I got a hunch (still haven't found docs on this yet) that the thing to do to fix this might be to use Tools -> System Devices to "Bind Selected to VFIO at Boot". However, every time I select the GTX 1660 super here and bind it to VFIO at boot, Unraid then hangs on boot and I have to delete /config/vfio-pci.cfg to get it to boot again.

After further digging around, my next hunch was that I can't bind the default boot graphics card to VFIO at boot, and the GTX 1660 is the default boot card. So I went to the BIOS to see if I could select a different boot GFX device but I can't see any BIOS settings relating to this anywhere. I don't think the BIOS allows it.

 

Where can I go from here? Am I on the right path? I am open to buying new hardware if I have to, but I want to make sure I have exhausted all my options with this hardware first.

Thanks in advance for any guidance anyone can offer.

Cheers

 

Link to comment

My current concern is that IF:

  • the Motherboard does not allow to me choose the boot graphics device AND
  • The motherboard always selects the card in PCIe slot 1 as the boot graphics device (Which is the only x16 slot) AND
  • I cannot bind the default boot graphics device to VFIO

 

THEN: That means that it is impossible to passthrough any graphics card on the x16 slot on this Motherboard.

 

Is that correct, or are there other things I can try here?

Edited by Vidamus
Link to comment
43 minutes ago, Vidamus said:

I wanted to use the GTX 1660 Super in the PCI-e Slot 1 for GPU passthrough and the GTX 1050 Ti for Unraid

Not a good idea, unless you can specify in the bios which gpu to use as primary.

Not impossible to passthrough a gpu in the first slot, but I would move it to another slot (swap the gpus).

 

After this change it is recommended to attach to vfio at boot all the components of the gpu, now it is iommu group 31 (digital audio, video, usb controller serial bus controller), 2d:00.0, 2d:00.1, 2d:00.3, 2d:00.4.

These should be setup as a multifunction device in the xml of the vm.

 

Your diagnostics is not useful:

- you attached to vfio the 1050 ti.

- there is no vm defined.

 

43 minutes ago, Vidamus said:

However, every time I select the GTX 1660 super here and bind it to VFIO at boot, Unraid then hangs on boot

Are you sure it hangs? When you attach the gpu to vfio it gets isolated from the host and it seems frozen, but it isn't, like this:

https://forums.unraid.net/topic/115168-videoefifboffvesafboff-not-working-in-unraid-610-rc1/?tab=comments#comment-1046963

 

Summarizing:

1a. attach 1060 to vfio (let it in slot 1), select both the 4 components, check that unraid doesn't crash, connect an external device in the lan and try to connect to the server

2a. setup the vm with the gpu passthrough (as a multifunction device), try to start the vm, attach new diagnostics if it doesn't work

 

1b. if in 1a unraid crashes, but I can not see a reason for this, swap the gpus

2b. attach 1060 to vfio, select both the 4 components

3b. setup the vm with the gpu passthrough (as a multifunction device), try to start the vm, attach new diagnostics if it doesn't work

 

 

Edited by ghost82
Link to comment
4 hours ago, ghost82 said:

 

Not a good idea, unless you can specify in the bios which gpu to use as primary.

Not impossible to passthrough a gpu in the first slot, but I would move it to another slot (swap the gpus).

 

After this change it is recommended to attach to vfio at boot all the components of the gpu, now it is iommu group 31 (digital audio, video, usb controller serial bus controller), 2d:00.0, 2d:00.1, 2d:00.3, 2d:00.4.

These should be setup as a multifunction device in the xml of the vm.

 

Your diagnostics is not useful:

- you attached to vfio the 1050 ti.

- there is no vm defined.

 

Are you sure it hangs? When you attach the gpu to vfio it gets isolated from the host and it seems frozen, but it isn't, like this:

https://forums.unraid.net/topic/115168-videoefifboffvesafboff-not-working-in-unraid-610-rc1/?tab=comments#comment-1046963

 

Summarizing:

1a. attach 1060 to vfio (let it in slot 1), select both the 4 components, check that unraid doesn't crash, connect an external device in the lan and try to connect to the server

2a. setup the vm with the gpu passthrough (as a multifunction device), try to start the vm, attach new diagnostics if it doesn't work

 

1b. if in 1a unraid crashes, but I can not see a reason for this, swap the gpus

2b. attach 1060 to vfio, select both the 4 components

3b. setup the vm with the gpu passthrough (as a multifunction device), try to start the vm, attach new diagnostics if it doesn't work

 

 

Thanks @ghost82, that was a very useful post!

You were right, it was not hanging on boot, the video merely freezes after VFIO binding as you said. My router assigned a different IP after reboot and I'd not realised it yet. I misinterpreted the frozen screen and inaccessible web interface as a crash.

 

I now have the GTX 1660 Super (all 4 devices) bound to VFIO on boot.

Now when I start the VM, the "frozen" boot log display switches to a black screen but then goes no further. Actually it's not quite a "black" screen, my monitor reports that there is no video output signal and goes to sleep.

 

So I am now working my way through this guide here: https://wiki.unraid.net/Manual/VM_Management#Help.21_I_can_start_my_VM_with_a_GPU_assigned_but_all_I_get_is_a_black_screen_on_my_monitor.21

 

If/when I get stuck again I will post a new diagnostic report.
 

Link to comment
5 hours ago, Vidamus said:

Now when I start the VM, the "frozen" boot log display switches to a black screen but then goes no further

 

That's probably because the gpu is still in use by the host (by efifb?) even if you attached it to vfio.

If the gpu is still in the top slot, passing also a vbios (dumped from your card) may help in initializing the gpu.

 

5 hours ago, Vidamus said:

try to start the vm, attach new diagnostics if it doesn't work

 

Link to comment

I still cannot get the graphics card to initialise. I have

  • updated the motherboard BIOS to the latest version
  • downloaded the latest vBIOS for the video card from TechPowerup, put it on the array, and specified the file location in the VM config
  • I tried creating a another VM with machine Q35-5.1 (all other settings the same)

But I still get the same result that the video card is not initialised when the VM starts.

 

The latest diagnostics are attached.

 

Regarding what slot the card is in, if I cannot get the GTX 1660 Super working in slot 1 then I'd rather buy new hardware. The whole purpose of this project was to install Unraid while still b eing able to use this machine as a Windows 10 media workstation with native performance. So I am not interested in running the primary video card in a 4x socket just to get it to work. 

 

tower-diagnostics-20220121-1409.zip

Link to comment

. You should switch to machine type pc-q35-5.1.

 

. Redo the vm and pass ALL the components of the gpu, not only video and audio.

 

. You need to assign the components to a multifunction device, add the multifunction parameter and put components in same bus, same slot and different function.

For example, with your old settings, this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/system/firmware/MSI.GTX1660Super.6144.191029.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>

 

Should be replaced with this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/system/firmware/MSI.GTX1660Super.6144.191029.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2d' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x3'/>
    </hostdev>

 

But note that for a pc-q35 machine type the address lines outside the <source></source> will not have bus=0x00; so add the multifunction='on' parameter in the address line outside the <source></source> of the video component and change bus, slot, function of devices 2d:00.1, 2d:00.2 and 2d:00.3 (audio and usb) so that bus and slot are the same as the video component and function is different.

 

. MSI.GTX1660Super.6144.191029.rom: this sounds like a downloaded rom: although it is possible to use a downloaded rom, one should pay attention to download and modify a proper rom; first remove the header (spaceinvaderone published a video about it), but consider that even roms with different gpu revisions can cause issues with layouts, so the suggestion is to dump your own rom if you are not sure what to check and how to modify it.

Edited by ghost82
Link to comment

Thanks again for sucvh a detailed and helpful post @ghost82!

 

I tried the following steps:

  • Created a new Q35 machine VM using similar settings to the existing i440fx vm
    • I now have 2 VMs called "i440fx" and "Q35" respectively
  • Used GPU-Z to dump the current vBIOS from both of my video cards.
  • I updated the vBIOS for both VMs to use the dumped vBIOS
  • In the XML of both VMs I:
    • Added all 4 devices related to the video card to VM config
    • set multifunction='on' for the primary video card device
    • set bus and slot the same on the 3 other video card devices
    • set function to a different, incrementing number on each device

I then tried starting each VM, the Q35 first. Unfortunately in each case the video card still does not initialise and I have to "force stop" the VM to get it to shut down.


After attempting to start both, and the force stopping each one, I then took a new diagnostics report which I have attached.

 

Many thanks for your time in helping me with this. It's a lot more complicated than it appears.

 

tower-diagnostics-20220122-1020.zip

Link to comment

I got it working! Hooray!!

 

The last piece of the puzzle was that I had to hex edit the vbios and remove the nVidia header. I found this in a Spaceinvader One video.

 

Now that I have the vBIOS file right, (and also the multifunction group) it works on both i440fx and Q35 machines.

 

Thanks again for all the help!

 

Link to comment
7 hours ago, Vidamus said:

The last piece of the puzzle was that I had to hex edit the vbios and remove the nVidia header

Yes, this is valid for vbios dumped with gpu-z or downloaded from techpowerup (becuse uploaded roms were probably dumped with gpu-z).

That header is not added if the nvidia roms are dumped directly from linux.

Gpu-z adds an additional header to the rom to be able to flash the gpu chip, but when the rom is passed through the system doesn't like this header and must be removed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.