Modna Posted May 8, 2023 Share Posted May 8, 2023 Dell Precision Tower 5810 Intel Xeon E5-2690 V4 nvidia Quadro RTX 4000 Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup) When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*: char device redirected to /dev/pts/6 (label charserial0) I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header) I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video. I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine. No matter what I do, the VM hangs at the exact same spot ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue) I am at a loss for what to try. Any advice helps! Thanks. Quote Link to comment
SimonF Posted May 8, 2023 Share Posted May 8, 2023 11 hours ago, Modna said: Dell Precision Tower 5810 Intel Xeon E5-2690 V4 nvidia Quadro RTX 4000 Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup) When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*: char device redirected to /dev/pts/6 (label charserial0) I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header) I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video. I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine. No matter what I do, the VM hangs at the exact same spot ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue) I am at a loss for what to try. Any advice helps! Thanks. can you post diagnostics Quote Link to comment
Modna Posted May 8, 2023 Author Share Posted May 8, 2023 No problem, diagnostics attached. Diag. ending in "1637" is after fresh boot, without attempting to boot the VM Diag. ending in "1638" is after attempting to boot the VM tower-diagnostics-20230508-1638.zip tower-diagnostics-20230508-1637.zip Quote Link to comment
ghost82 Posted May 9, 2023 Share Posted May 9, 2023 May 8 16:11:37 Tower kernel: pci 0000:03:00.0: vgaarb: setting as boot VGA device You need to pass a vbios for that quadro otherwise it wont work. Quote Link to comment
Modna Posted May 10, 2023 Author Share Posted May 10, 2023 (edited) Quote You need to pass a vbios for that quadro otherwise it wont work. I have tried this as well, all of the below: VBIOS extracted with a script on unraid VBIOS download from techpowerup VBIOS extracted directly off the card using GPUZ in windows VDIOS extracted directly off the card using GPUZ in windows AND using a hexeditor to remove the "nvidia header" No matter what, the VM stopped at the exact same point Edited May 10, 2023 by Modna Quote Link to comment
Modna Posted May 14, 2023 Author Share Posted May 14, 2023 (edited) Anyone with advice would be hugely appreciated. I've been fighting this for 2 weeks now and I'm hurtin' Edited May 14, 2023 by Modna Quote Link to comment
ghost82 Posted May 14, 2023 Share Posted May 14, 2023 Attach diagnostics with passed vbios, attach vbios. Quote Link to comment
Modna Posted May 24, 2023 Author Share Posted May 24, 2023 tower-diagnostics-20230523-2058.zip Here is diagnostics with vbios attached to VM. Also attached is vbos TU104.rom pulled directly off the card, as well as TU104mos.rom which is the "modified" vbios where I removed the header. Thanks! TU104mod.rom TU104.rom Quote Link to comment
ghost82 Posted May 24, 2023 Share Posted May 24, 2023 11 hours ago, Modna said: Here is diagnostics with vbios attached to VM. Thanks, I have no solution, everything is correctly, as far as I can see, configured. The only thing I found is: May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 May 23 20:57:30 Tower kernel: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override. which may break the gpu passthrough... Some suggest to boot with 'nohugeiomap' kernel argument (in your syslinux configuration), don't know if it can work.. Quote Link to comment
Modna Posted May 26, 2023 Author Share Posted May 26, 2023 (edited) Thank you for looking into this! I have never done edits to the syslinux configuration - but I googled around and I don't see "nohugeiomap" anywhere on forums or unraid documentation. Want to make sure I actually add/edit the right thing I found "nohugeiomap" on some archlinux documentation but I am not sure how to actually set that properly (https://lwn.net/Articles/635357/) EDIT Just changed "append initrd=/bzroot" to "append initrd=/bzroot nohugeiomap" under Unraid OS within the syslinux configuration of the flash drive with no change in symptoms Edited May 26, 2023 by Modna Quote Link to comment
Modna Posted May 26, 2023 Author Share Posted May 26, 2023 DOUBLE EDIT: I went and checked syslog.txt and your kernel argument did get rid of the warning you saw: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override. That no longer shows, up it now shows: May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: No more image in the PCI ROM May 25 19:22:14 Tower acpid: input device has been disconnected, fd 5 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 6 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 7 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 8 Quote Link to comment
SimonF Posted June 8, 2023 Share Posted June 8, 2023 I found this on proxmox forum. allow unsafe interrupts is in vm settings. I only added intel_iommu=on iommu=pt to /etc/kernel/cmdline and options vfio_iommu_type1 allow_unsafe_interrupts=1 to /etc/modprobe.d/iommu_unsafe_interrupts.conf as well as the blacklistings. No Kernel update (since I reinstalled the whole thing) and bam... it worked out of the box even without rom file However I didn't install the Nvidia drivers. Not checked diags as yet so you may have them set already bios latest? Quote Link to comment
SimonF Posted June 8, 2023 Share Posted June 8, 2023 https://forum.proxmox.com/threads/nvidia-error-43-with-quadro-rtx4000.125964/ Quote Link to comment
SimonF Posted June 10, 2023 Share Posted June 10, 2023 On 6/1/2023 at 6:54 AM, Modna said: Anyone? I have not found anything further at this point apart from the above Quote Link to comment
Modna Posted June 12, 2023 Author Share Posted June 12, 2023 thank you, but I don't totally get your instructions. /etc/kernel/cmdline doesn't exist on my unraid system (just /etc/kernel isn't a directory or file) And /etc/modprobe.d is a directory, so I am not sure what you mean when you want me to add those options to it. I'm sorry that I am so ignorant to this stuff - I have never had to dive this deep to make a passthrough VM work Quote Link to comment
ghost82 Posted June 12, 2023 Share Posted June 12, 2023 Add this to your syslinux config (in the block of unRAID OS label): Main - Boot Device - Flash - Syslinux Configuration label unRAID OS menu default kernel /bzimage append intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot Reboot Then, obviously start unraid without gui Quote Link to comment
Modna Posted June 13, 2023 Author Share Posted June 13, 2023 I added those append options under "Unraid OS" within the Syslinux configuration for the flash drive (through the web browser, not manually editing). I also have "Unraid OS" selected as default boot menu. Unfortunately no change, exact same symptom (Ignore the "nohugeiomap", I found that recommended somewhere and tried it. No change with or without it using the new append optoins Quote Link to comment
n0rx Posted October 27, 2023 Share Posted October 27, 2023 (edited) Hey @Modna, were you able to find any solution to your problem? I'm in a similar situation with an RTX A4500 workstation GPU (it is not a Quadro series). My VM log file ends with the following: "char device redirected to /dev/pts/0 (label charserial0)" Opened the below thread earlier today. Any info you might have would be greatly appreciated. Edited October 27, 2023 by n0rx Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.