VM Boot issue w/ GPU Passthrough


Recommended Posts

  • Dell Precision Tower 5810
  • Intel Xeon E5-2690 V4
  • nvidia Quadro RTX 4000

 

Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup)

 

When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*:

 

char device redirected to  /dev/pts/6 (label charserial0)

 

I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header)

 

I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video.

 

I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine.

 

No matter what I do, the VM hangs at the exact same spot

 

ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue)

 

 

I am at a loss for what to try. Any advice helps!  Thanks.

Screenshot 2023-05-06 105124.png

Link to comment
11 hours ago, Modna said:
  • Dell Precision Tower 5810
  • Intel Xeon E5-2690 V4
  • nvidia Quadro RTX 4000

 

Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup)

 

When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*:

 

char device redirected to  /dev/pts/6 (label charserial0)

 

I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header)

 

I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video.

 

I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine.

 

No matter what I do, the VM hangs at the exact same spot

 

ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue)

 

 

I am at a loss for what to try. Any advice helps!  Thanks.

Screenshot 2023-05-06 105124.png

can you post diagnostics

Link to comment

 

 

Quote

You need to pass a vbios for that quadro otherwise it wont work.

 

I have tried this as well, all of the below:

  • VBIOS extracted with a script on unraid
  • VBIOS download from techpowerup
  • VBIOS extracted directly off the card using GPUZ in windows
  • VDIOS extracted directly off the card using GPUZ in windows AND using a hexeditor to remove the "nvidia header"

 

No matter what, the VM stopped at the exact same point

Edited by Modna
Link to comment
  • 2 weeks later...
11 hours ago, Modna said:

Here is diagnostics with vbios attached to VM.

Thanks, I have no solution, everything is correctly, as far as I can see, configured.

The only thing I found is:

May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
May 23 20:57:30 Tower kernel: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.

which may break the gpu passthrough...

Some suggest to boot with 'nohugeiomap' kernel argument (in your syslinux configuration), don't know if it can work..

Link to comment

Thank you for looking into this!

 

I have never done edits to the syslinux configuration - but I googled around and I don't see "nohugeiomap"
 anywhere on forums or unraid documentation. Want to make sure I actually add/edit the right thing

 

I found "nohugeiomap" on some archlinux documentation but I am not sure how to actually set that properly (https://lwn.net/Articles/635357/)

 

 

EDIT Just changed "append initrd=/bzroot" to "append initrd=/bzroot nohugeiomap" under Unraid OS within the syslinux configuration of the flash drive with no change in symptoms

 

 

 

 

Edited by Modna
Link to comment

DOUBLE EDIT: I went and checked syslog.txt and your kernel argument did get rid of the warning you saw:

 

pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.

 

That no longer shows, up it now shows:

 

May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: No more image in the PCI ROM
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 5
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 6
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 7
May 25 19:22:14 Tower  acpid: input device has been disconnected, fd 8

 

Link to comment

I found this on proxmox forum. allow unsafe interrupts is in vm settings.

 

I only added intel_iommu=on iommu=pt to /etc/kernel/cmdline and options vfio_iommu_type1 allow_unsafe_interrupts=1 to /etc/modprobe.d/iommu_unsafe_interrupts.conf as well as the blacklistings. No Kernel update (since I reinstalled the whole thing) and bam... it worked out of the box even without rom file  However I didn't install the Nvidia drivers.

 

Not checked diags as yet so you may have them set already bios latest? 

nvidia-error-43-with-quadro-rtx4000.1259

 

Link to comment

thank you, but I don't totally get your instructions.

 

/etc/kernel/cmdline doesn't exist on my unraid system (just /etc/kernel isn't a directory or file)

 

And /etc/modprobe.d is a directory, so I am not sure what you mean when you want me to add those options to it.

 

I'm sorry that I am so ignorant to this stuff - I have never had to dive this deep to make a passthrough VM work

Link to comment

Add this to your syslinux config (in the block of unRAID OS label): Main - Boot Device - Flash - Syslinux Configuration

 

label unRAID OS
  menu default
  kernel /bzimage
  append intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

 

Reboot

Then, obviously start unraid without gui

Link to comment

I added those append options under "Unraid OS" within the Syslinux configuration for the flash drive (through the web browser, not manually editing).

 

I also have "Unraid OS" selected as default boot menu.

 

Unfortunately no change, exact same symptom (Ignore the "nohugeiomap", I found that recommended somewhere and tried it. No change with or without it using the new append optoins

 

image.thumb.png.22c0b95bb8f79a32ad94c73ab0d133ee.png

Link to comment
  • 4 months later...

Hey @Modna, were you able to find any solution to your problem?

 

I'm in a similar situation with an RTX A4500 workstation GPU (it is not a Quadro series).

My VM log file ends with the following:

 

"char device redirected to /dev/pts/0 (label charserial0)"

 

Opened the below thread earlier today. Any info you might have would be greatly appreciated.

 

 

Edited by n0rx
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.