Troubleshoot Crashes on Gaming VM


Recommended Posts

Hey Guys,

 

I have the following Setup running:

 

Unraid 6.8 RC6

Gigabyte Aorus AX370 Gaming K7

Ryzen 5 3600

Zotac GTX 1060

32GB DDR4 (Crucial Ballistix Sport)

 

I passed the GPU to a Windows 10 VM but getting Random Crashes on different Games.. On Forza Horizon 4 it doesn't take long until Game hangs and closes itself..

 

How would you proceed to get the root cause?

 

Thank you :)

Link to comment

How long are you using this setup? Did it worked before and the issues started recently or did you always had issues with GPU passthrough? Error messages, logs, diagnostics all could be helpful to diagnose your issues. What else is running on your Unraid server? Couple more VMs at the same time, dockers maybe? You have to give as more info about your configuration, what your setup looks like and what you tried already.

Link to comment

I'm still in trial, started with RC4 and it worked.. I think it all started, when I wanted to pass through a whole USB Controller.. But now, even when I don't pass the controller I get these problems.. Recreated the Windows VM and did a fresh Windows install.. I had to do some ACS tweaks in BIOS and Unraid to get better IOMMU Groups, can this be related?

Link to comment

@glockmane Can you post the log from the VM when it crashes? Make sure that the GPU and its HDMI audio device is not grouped together with something else and select the HDMI audio aswell for passthrough. Reduce Everything to a minimum. No other devices like USB controllers or onboard sound. Select mouse and keyboard in the USB devices list and not the complete controller to passthrough.

Link to comment

Okay, I'll try to get a crash again and pull a log (mostly only the game crashes, don't know if the log will show anything at this point..)..

 

GPU and HDMI Audio are on different IOMMU Groups and do not share them with other devices..

 

Both devices are attached to the VM:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/images/Zotac.GTX1060.6144.170630.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>

Docker Containers are stopped and other VMs too, so they should be not the problem..

 

Edit:

 

Trying i440fx-4.1 again, as it seems to be back with RC6.. I remember with RC4 I had this version selected..

Edited by glockmane
Link to comment
4 hours ago, glockmane said:

Trying i440fx-4.1 again, as it seems to be back with RC6.. I remember with RC4 I had this version selected.

RC5 rolled back the Qemu version from 4.1 to 4.0.1 because of qcow2 vdisk corruptions on XFS disks. 4.1.1 fixed this and is implemented into RC6. So machine type i440fx and Q35 where downgraded from 4.1 to 4.0 in RC5.

 

What you can try is to setup a VM as Q35. It's basically a more modern machine type and I myself run it for my main VM over a year now without any issues. Select the Windows10 template and only switch the types and you're good to go. Switching an existing VM can work, but it also can cause issues like Windows activation is lost, already passed through devices can have issues. Some people via AMD cards Q35 fixed their GPU passthrough issues, some couldn't install drivers without, worth a try. 3rd gen Ryzen and BIOS versions extra for them for older boards also had some issues. Maybe also a thing you can play around with. Some BIOS versions work, some had issues, different for every board manufacturer and every board.

Link to comment

Okay, Forza Horizon crashed again with i440fx-4.1.. Recreated the machine with Q35 (why the hell does it hang for minutes at tiano core logo?) and after some playing same freeze.. Now trying with ACS override off..

 

Edit:

 

Disabled ACS override in unraid and ACS and AER Cap in BIOS.. Now I'm back with the bad IOMMU Grouping like when I first installed Unraid.. Let's see if it still crashes..

 

Edit2:

 

Und Bluescreen :(

 

Edit3:

 

I think I could have found the cause of the instabilities.. I got random RAM related errors.. As a first step I disabled Intel XMP Profile and reduced the Speed from 3200Mhz to 2400Mhz.. Later I will try to get to 3200 again, but first I'll test, if the instabilities are gone.. Shame on me for not running RAM tests before posting here *facepalm*

 

Thank you guys for your help!

Edited by glockmane
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.