Jump to content

Please Help with GPU Passthrough


n0rx

Recommended Posts

Hello,

 

First post to the forums, but have been running Unraid for 3 years and have decent Linux experience.

Thanks to everyone that came before me with GPU passthrough Q&A's - your time has been incredibly helpful over the last few years.

 

I've recently run into a problem with GPU passthrough after swapping out my old NVIDIA RTX A2000 workstation GPU with a new RTX A4500.

I managed to get GPU passthrough working on the A2000 when installing it about 1 year ago, after 4 days of trial and error. 

It's been several days since swapping in the A4500 and still no luck with getting GPU passthrough working properly.

More specifically: When I boot up the Windows 11 VM I'm passing the GPU through to, my screen just goes black. It switches to black from the static blue GRUB menu that was displayed during PC boot, which tells me something is going on.

 

I've watched every SpaceInvader video on the subject of GPU passthrough, and gone through what must be dozens of forums posts trying to find a solution. Below is what I've tried so far and has not worked:

 

1) Ensured all virtualisation settings are enabled in motherboard BIOS (I know everything is set correctly due to A2000 working previously)

2) Appending "video=efifb:off", "earlymodules=vfio-pci", "vesafb:off" and "gfxpayload=text" to Unraid syslinux boot parameters

3) Four different VBIOS options: (1) None, (2) Personally created using SpaceInvader User Script, (3) from TechPowerup, (4) TechPowerup with NVIDIA headers removed using hex edit

4) Attempted 3 different Windows 11 installation types: (1) Booting into existing OS install that I used with A2000, (2) New vanilla OS install using image from MS website, (3) Custom NTLite OS install with preloaded NVIDIA drivers

5) Binding NVIDIA graphics and audio parts to VFIO (both parts were already in same IOMMU group by default)

6) Edit VM XML to ensure virtual NVIDIA graphics and audio counterparts are on same bus and slot, but have different functions '0x0' & '0x1' (also to append multifunction='on')

7) Make sure I'm connecting screen to GPU using an HDMI cable (not Displayport, which I had problems with before) - Note: As my GPU only has Displayport output ports, I'm using a DP -> HDMI adapter.

8 ) Switch around the different physical GPU display output ports being used (in case video output was only being sent to select ports)  

9) Tried both Q35-7.1 and i440fx-7.1 machine types

10) Installed Windows 11 VM via Unraid/noVNC, then installed a VNC Service (TightVNC) and configured it to start on Windows boot. I then passed through the GPU to VM and tried to connect to the VNC Service from another computer. Either it doesn't connect ("Host is down") or it does connect and I just see a black screen.

11) Tried not passing through the NVIDIA audio component (09:00.01) to see if this was causing the issue.

12) Connected 2 different PC screens (both Displayport and HDMI interfaces).

13) After installing Windows 11 VM using VNC, uninstalled the Virtio/RedHat display driver.

14) Disabled "Above 4G Decode" and "Resizable BAR" in motherboard BIOS.

15) Attempted PoP!_OS Linux VM installation with pre-installed NVIDIA drivers to see if Windows 11 or drivers were the issue (still got black screen).

16) Tried using PCIe ACS Override ("Both") and then binding NVIDIA graphics & audio at VFIO.

17) Booted Windows-To-Go (Windows 11 Live USB) and dumped my GPU VBIOS using GPU-Z. This VBIOS was identical to the above mentioned TechPowerup VBIOS, but this step at least confirmed that the GPU is working (image rendered to screen, GPU-Z recognised GPU).

 

Additionally, when I physically swapped the cards I forgot to uncheck "Autostart VM" and so on first Unraid boot-up I mistakenly started the Windows 11 VM with a custom A2000 VBIOS on the new A4500 card. I don't know if this might have bricked the card, and have not been able to find much info on this topic. The new A4500 card is definitely on/running when I turn the PC on and it renders the BIOS menu so I think it's not bricked.

 

I'm running out of options and don't know what do try next. The A4500 was released at a similar time to the A2000 and are part of the same family of GPUs. I'm confused as to why I can't get this new card working. It really shouldn't be this difficult to get GPU passthrough working, I can see this being a technical hurdle too large for the majority of the Unraid potential market. I've spent more time on GPU passthrough than building my server and installing Unraid itself.

 

Anyway, enough complaining. PLEASE HELP!! 🙂

 

Attached/below are 5 files showing my Windows 11 VM config (UI & XML) along with logs after starting up when the screen goes black. Also posted a snippet from my IOMMU groups showing NVIDIA graphics and audio counterparts.

Edited by n0rx
Link to comment

Bump. I'm still within my 14 day return policy for the new RTX A4500 GPU. If there's something wrong with it I want to find out soon so I can return it.
Also reaching my limit of patience with Unraid OS and am considering migrating away if I can't find a solution soon. I need to know if the card is working before my return policy expires, so would need to migrate to another OS anyway. 

 

If anyone has any info, even if only leads on potential new solutions for me to try, or comments/feedback on what I have already tried - PLEASE let me know.

Edited by n0rx
Link to comment

Anyone have any thoughts? Could really use the help of someone with more knowledge and experience.

I've edited the original post saying that I booted Windows-To-Go (Live USB), which confirmed that the GPU is in fact working.

It was rendering the Live USB to the screen and GPU-Z recognised the GPU.

 

However I'm still no further to getting this GPU passthrough working in Unraid.

 

Here are some interesting VFIO related lines taken from dmesg:

 

"vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none"

"pci 0000:09:00.0: BAR 1: assigned to efifb"

"vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258"

Link to comment

Thanks for pointing this out ghost82. 

 

Please find attached the requested Diagnostics.

At the time the Diagnostics was captured, the Windows 11 VM was running, PCI ACS Override was disabled, and the NVIDIA video/audio parts were bound to VFIO/each other (in System Devices). Below were my Unraid syslinux boot parameters.

 

"kernel /bzimage
append initrd=/bzroot video=efifb:off"

 

I know that "video=efifb:off" is being recognised because without this parameter the GRUB UI doesn't display at PC boot and the Windows 11 VM logs also give this additional warning:

 

"2023-11-02T15:31:27.415492Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:09:00.0","id":"hostdev0","bus":"pci.4","multifunction":true,"addr":"0x0","romfile":"/mnt/cache/isos/vbios/rtx a4500 v3 GPUZ HexEdit 2.rom"}: Failed to mmap 0000:09:00.0 BAR 1. Performance may be slow"

 

So to me it looks like "video=efifb:off" is necessary, but not sufficient, to getting the Windows 11 VM to work.

I've also tried appending parameters "video=vesafb:off" and "video=simplefb:off" which didn't work.

 

Edited by n0rx
Link to comment

Configuration seems good to me.

I imagine the vbios is dumped from your card and hex edited, and not a downloaded one, right?

Did you try to enable remote desktop in windows vm (booted without gpu passthrough) and see if it boots or if it's hanging?

Once you enable remote desktop and the os is able to boot but with a black screen, try to install the gpu drivers.

 

I noticed that your mb bios is not the latest, I would try to update to v. 4401 released on 31st of october.

Link to comment

I've tried all the VBIOS options (dumped, downloaded, hex edit) I can think of, covered in point 3 of my original post.

Also, I cover my attempt to remote connect/desktop into the VM from another computer in point 10 of my original post (it is hanging, so I can't get into the OS to install the GPU drivers). Anyway, I've tried installing a custom Windows iso pre-loaded with NVIDIA drivers as well as Pop!_OS Linux also pre-loaded with NVIDIA drivers.

 

I'll try the mobo bios update - I think I last updated it in December 2022 so I don't suspect that it's the issue. But it won't hurt to try.

 

Do you still think the "BAR 1: assigned to efifb" message is indicative of the issue?

Even if you won't know what the solution is, it would be good to have a lead on the problem.

 

Do you think re-installing Unraid might resolve the issue?

I don't think so, but I'm running out of options and am willing to try anything at this stage.

Edited by n0rx
Link to comment
6 hours ago, n0rx said:

Do you still think the "BAR 1: assigned to efifb"

No, I've seen that line in other logs, most probably efifb attaches early and then it is detached because of the syslinux directive. If you look at the memory I'm quite sure efifb will not be there.

 

6 hours ago, n0rx said:

Do you think re-installing Unraid might resolve the issue?

I think not..

 

In my opinion it is either related to the gpu itself (note that I'm not saying the gpu isn't working) or the motherboard (and that's why I suggested a bios update, because agesa was updated).

Link to comment

After a very frustrating 2 weeks and 2 dozen different attempts at getting this to work, I gave up and installed Windows 11 directly onto my disk.

The GPU works exactly as expected in Windows 11, so I'm guessing there's a compatibility issue between this GPU and Unraid GPU passthrough.

 

Thanks for your help anyway ghost82!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...