tyrelius

Members

Joined
February 23, 20215 yr
Last visited
September 8, 20232 yr

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

6
Reputation
Neutral

1

tyrelius started following Another GPU Passthrough Problem Thread
- October 31, 20214 yr
Another GPU Passthrough Problem Thread
Another GPU Passthrough Problem Thread

tyrelius posted a topic in VM Engine (KVM)

So, I've gone down so many different GPU passthrough problem threads, and none seem to fix my issue. Through SpaceInvaderOne, I managed to find a workaround, though it's not that great and presents issues of its own. Here's my particular situation, which I can't seem to find any info on. I have a Ryzen Threadripper 1950X on an AsRock X399 Taichi on firmware version P3.90, with multiple GPUs that I'm trying to set up for passthrough. I honestly don't think the GPU models matter, as the issue affects all of them equally, but in case it might, one is an ASUS ROG STRIX Radeon Vega 64 8GB, one is a MacVidCards flashed EVGA Nvidia GTX 980 4GB, and one is a Zotac Nvidia GT 710 1GB PCIe x1 card. I have my virtualization all setup proper in the BIOS/UEFI settings. I have my IOMMU groups mapped out using the ACS Override Both option (needed to separate out the PCIe x1 slot and M.2 wifi slot). I'm running on Unraid 6.9.2, clean install, no upgrades from previous versions. And I have the GPUs and wifi card stubbed/bound to VFIO at boot. So, all is working great, except for passthrough. I have passed through an M.2 NVMe M.2 drive without issues to a VM. Booted from it even. SpaceInvaderOne to credit for that. But right now, I'm not doing that. Just trying to passthrough only a GPU. I dumped the vBIOS using SpaceInvaderOne's tutorial. In order to do so, I had to modify the script to force reset, which puts the server to sleep in the middle of the script in order to reset the card. Any other way didn't work for any of the GPUs in any of the slots. Each and every one required the forced reset, which according to his documentation should only be needed if trying to do the primary GPU (in my case, the Vega 64). If I pass through a GPU, I have to first reset the card using the sleep method from SpaceInvaderOne. I even have a special script for it, literally reverse engineered from SpaceInvaderOne's vBIOS dump script. #!/bin/bash gpuid="45:00.0" gpuid=$(echo "$gpuid" | sed 's/ *$//') gpuid=$(echo "$gpuid" | sed 's/^ *//g') dumpid="0000:$gpuid" mygpu=$(lspci -s $gpuid) disconnectid=$(echo "$dumpid" | sed 's?:?\\:?g') disconnectid2=$(echo "$disconnectid" | sed 's/$.*$0/\11/') vganame=$( lspci | grep -i "$gpuid" ) echo "Disconnecting the graphics card" echo "1" | tee -a /sys/bus/pci/devices/$disconnectid/remove echo "Entered suspended (sleep) state ......" echo echo " PRESS POWER BUTTON ON SERVER TO CONTINUE" echo echo -n mem > /sys/power/state echo "Rescanning pci bus" echo "1" | tee -a /sys/bus/pci/rescan echo "Graphics card has now sucessfully been disconnected and reconnected" echo However, every time the VM is done with the GPU, I have to run the reset, which requires sleeping the server, if I want to use that GPU again. For any and all of the GPUs. Safe shutdowns of the VMs don't seem to release the cards properly. And if I reboot the whole server, I also have to run the reset before I can use any of the GPUs in passthrough for the first time. One issue that has been really hard to get around is that when I install the drivers for the GPU inside the VM, when it resets the GPU from inside the VM, it makes the GPU stop sending signal, and I have to force stop the VM, run the reset script that sleeps the server, and then can go back in and try again. But again, the driver install process resets the card (I think) and it causes the card to need a reset again. I have started messing with trying to use the reset command on different higher level PCIe devices, hoping to maybe reset the slot. That's not worked at all either. And a remove/rescan only seems to work if I sleep the server in between the remove and reset. Without the sleep, it's like the GPU isn't powering down to be ready for use again, so the rescan just adds it back, but the GPU is still locked in use by the last use of it, be a VM or the boot process. My problem is this: how can I reset the GPUs without having to sleep the server? And how can I make it so that the cards properly release, or whatever they are supposed to do, to be ready for use again, without having to reset them?
- October 31, 20214 yr

tyrelius

Joined

Last visited

Noob

Posts

Reputation

Another GPU Passthrough Problem Thread

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)