VM Boot issue w/ GPU Passthrough


Recommended Posts

  • Dell Precision Tower 5810
  • Intel Xeon E5-2690 V4
  • nvidia Quadro RTX 4000

 

Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup)

 

When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*:

 

char device redirected to  /dev/pts/6 (label charserial0)

 

I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header)

 

I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video.

 

I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine.

 

No matter what I do, the VM hangs at the exact same spot

 

ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue)

 

 

I am at a loss for what to try. Any advice helps!  Thanks.

Screenshot 2023-05-06 105124.png

Link to comment
11 hours ago, Modna said:
  • Dell Precision Tower 5810
  • Intel Xeon E5-2690 V4
  • nvidia Quadro RTX 4000

 

Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup)

 

When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*:

 

char device redirected to  /dev/pts/6 (label charserial0)

 

I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header)

 

I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video.

 

I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine.

 

No matter what I do, the VM hangs at the exact same spot

 

ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue)

 

 

I am at a loss for what to try. Any advice helps!  Thanks.

Screenshot 2023-05-06 105124.png

can you post diagnostics

Link to comment
Posted (edited)

 

 

Quote

You need to pass a vbios for that quadro otherwise it wont work.

 

I have tried this as well, all of the below:

  • VBIOS extracted with a script on unraid
  • VBIOS download from techpowerup
  • VBIOS extracted directly off the card using GPUZ in windows
  • VDIOS extracted directly off the card using GPUZ in windows AND using a hexeditor to remove the "nvidia header"

 

No matter what, the VM stopped at the exact same point

Edited by Modna
Link to comment
  • 2 weeks later...
11 hours ago, Modna said:

Here is diagnostics with vbios attached to VM.

Thanks, I have no solution, everything is correctly, as far as I can see, configured.

The only thing I found is:

May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected]
May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected]
May 23 20:57:30 Tower kernel: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.

which may break the gpu passthrough...

Some suggest to boot with 'nohugeiomap' kernel argument (in your syslinux configuration), don't know if it can work..

Link to comment
Posted (edited)

Thank you for looking into this!

 

I have never done edits to the syslinux configuration - but I googled around and I don't see "nohugeiomap"
 anywhere on forums or unraid documentation. Want to make sure I actually add/edit the right thing

 

I found "nohugeiomap" on some archlinux documentation but I am not sure how to actually set that properly (https://lwn.net/Articles/635357/)

 

 

EDIT Just changed "append initrd=/bzroot" to "append initrd=/bzroot nohugeiomap" under Unraid OS within the syslinux configuration of the flash drive with no change in symptoms

 

 

 

 

Edited by Modna
Link to comment

DOUBLE EDIT: I went and checked syslog.txt and your kernel argument did get rid of the warning you saw:

 

pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.

 

That no longer shows, up it now shows:

 

May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected]
May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected]
May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: No more image in the PCI ROM
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 5
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 6
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 7
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 8

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.