Modna Posted May 8 Share Posted May 8 Dell Precision Tower 5810 Intel Xeon E5-2690 V4 nvidia Quadro RTX 4000 Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup) When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*: char device redirected to /dev/pts/6 (label charserial0) I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header) I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video. I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine. No matter what I do, the VM hangs at the exact same spot ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue) I am at a loss for what to try. Any advice helps! Thanks. Quote Link to comment
SimonF Posted May 8 Share Posted May 8 11 hours ago, Modna said: Dell Precision Tower 5810 Intel Xeon E5-2690 V4 nvidia Quadro RTX 4000 Attempting to setup a VM with GPU passthrough. (Done this before on consumer hardware, first time on this workstation/server setup) When I build a VM with a virtual VNC GPU, it works fine - boots and runs normally. When I pass a GPU through to it, the VM starts and then hangs almost immediately (Log file attached in image). Last line is *always*: char device redirected to /dev/pts/6 (label charserial0) I have done with with the GPU devices (gpu, audio, USB controller) bound to VFIO, I have done it without it bound. I have done it with the BIOS loaded (a bios I used a script in unraid to pull, a bios I pulled from techpowerup, a raw bios I pulled on GPUZ, and a bios I pulled on GPUZ and used a hexeditor to remove the nvidia header) I have modified the passthrough device pcie slot location to match between all of the GPU devices within the VM as spaceinvaderone recommends in his video. I have also tried this with both the i440fx-7.1 Machine and the Q35-7.1 machine. No matter what I do, the VM hangs at the exact same spot ALSO: I pulled the nvidia GPU and slotted in an AMD Vega 64 instead and repeated all of the above, with the exact symptoms (I have used this exact GPU in pass through on basic consumer hardware with no issue) I am at a loss for what to try. Any advice helps! Thanks. can you post diagnostics Quote Link to comment
Modna Posted May 8 Author Share Posted May 8 No problem, diagnostics attached. Diag. ending in "1637" is after fresh boot, without attempting to boot the VM Diag. ending in "1638" is after attempting to boot the VM tower-diagnostics-20230508-1638.zip tower-diagnostics-20230508-1637.zip Quote Link to comment
ghost82 Posted May 9 Share Posted May 9 May 8 16:11:37 Tower kernel: pci 0000:03:00.0: vgaarb: setting as boot VGA device You need to pass a vbios for that quadro otherwise it wont work. Quote Link to comment
Modna Posted May 10 Author Share Posted May 10 (edited) Quote You need to pass a vbios for that quadro otherwise it wont work. I have tried this as well, all of the below: VBIOS extracted with a script on unraid VBIOS download from techpowerup VBIOS extracted directly off the card using GPUZ in windows VDIOS extracted directly off the card using GPUZ in windows AND using a hexeditor to remove the "nvidia header" No matter what, the VM stopped at the exact same point Edited May 10 by Modna Quote Link to comment
Modna Posted May 14 Author Share Posted May 14 (edited) Anyone with advice would be hugely appreciated. I've been fighting this for 2 weeks now and I'm hurtin' Edited May 14 by Modna Quote Link to comment
ghost82 Posted May 14 Share Posted May 14 Attach diagnostics with passed vbios, attach vbios. Quote Link to comment
Modna Posted May 24 Author Share Posted May 24 tower-diagnostics-20230523-2058.zip Here is diagnostics with vbios attached to VM. Also attached is vbos TU104.rom pulled directly off the card, as well as TU104mos.rom which is the "modified" vbios where I removed the header. Thanks! TU104mod.rom TU104.rom Quote Link to comment
ghost82 Posted May 24 Share Posted May 24 11 hours ago, Modna said: Here is diagnostics with vbios attached to VM. Thanks, I have no solution, everything is correctly, as far as I can see, configured. The only thing I found is: May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected] May 23 20:57:30 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected] May 23 20:57:30 Tower kernel: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override. which may break the gpu passthrough... Some suggest to boot with 'nohugeiomap' kernel argument (in your syslinux configuration), don't know if it can work.. Quote Link to comment
Modna Posted May 26 Author Share Posted May 26 (edited) Thank you for looking into this! I have never done edits to the syslinux configuration - but I googled around and I don't see "nohugeiomap" anywhere on forums or unraid documentation. Want to make sure I actually add/edit the right thing I found "nohugeiomap" on some archlinux documentation but I am not sure how to actually set that properly (https://lwn.net/Articles/635357/) EDIT Just changed "append initrd=/bzroot" to "append initrd=/bzroot nohugeiomap" under Unraid OS within the syslinux configuration of the flash drive with no change in symptoms Edited May 26 by Modna Quote Link to comment
Modna Posted May 26 Author Share Posted May 26 DOUBLE EDIT: I went and checked syslog.txt and your kernel argument did get rid of the warning you saw: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override. That no longer shows, up it now shows: May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected] May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap [email protected] May 25 19:22:14 Tower kernel: vfio-pci 0000:03:00.0: No more image in the PCI ROM May 25 19:22:14 Tower acpid: input device has been disconnected, fd 5 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 6 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 7 May 25 19:22:14 Tower acpid: input device has been disconnected, fd 8 Quote Link to comment
Modna Posted Thursday at 05:54 AM Author Share Posted Thursday at 05:54 AM Anyone? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.