Jump to content

macOS VM with AMD GPU passthrough crashes unraid host


Recommended Posts

Hi,

 

I know this kind of issue has been discussed several times with several different solution approaches.
Unfortunately none of theses approaches seem to solve my problem.

I have a Win10, Win11 and macOS Big Sur VM running completely fine with a NVIDIA Quadro K2000 passed through.

Now that Apple decided to end support for Kepler based GPUs in Monterey, I wanted to expand the life of my macOS VM a little more and bought a RX460 on ebay. The GPU is fully functional with the Win VMs on unraid and also works fine with a Monterey VM on a Proxmox Server I do have for testing purposes. But on my unraid server the RX460 in combination with my daily driver macOS VM crashes the host system after having used the VM and trying to start the same or another VM:

  • Windows VM - shutdown and start another VM/reboot - works
  • Ubuntu VM - shutdown and start another VM/reboot - works
  • macOS VM - shutdown and start another VM/reboot - crash
  • same macOS VM with VNC only - shutdown and start another VM/reboot - works

 

What I have done so far:

  • installed ich777s AMD Vendor Reset plugin
  • adjusted the macOS VM xml to use the GPU as a multifunction device
  • added VTI=12 to boot-args

I didn't append "pcie_no_flr=1022:149c,1022:1487" since my system does not have any PCI devices with that address

 

The crash completely freezes the system, so I am not able to generate a diagnostics file AFTER starting a VM following a macOS shutdown.

But I used unraid's console window to capture this final error which is repeated infinitely:

 

Mar 8 11:40:53 BadBoysUncle kernel: DMAR: Invalidation Time-out Error (ITE) cleared
Mar 8 11:40:53 BadBoysUncle kernel: DMAR: VT-d detected Invalidation Time-out Error: SID 0
Mar 8 11:40:53 BadBoysUncle kernel: DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x50000000003, qw1 = 0x87fff001
Mar 8 11:40:53 BadBoysUncle kernel: DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1001510cc

 

I also attached a diagnostics file that has been created after a shutdown of the macOS VM.

Maybe someone is able to help. Thanks, d.

badboysuncle-diagnostics-20220308-1008.zip

Link to comment

Hi,

your syslog ends with something related to the gnif patch logged.

Also your qemu log reports something related to 05:00.1 not being able to reset:

2022-03-08T09:08:25.543430Z qemu-system-x86_64: vfio: Cannot reset device 0000:05:00.1, no available reset mechanism.

 

On gnif github I remember I read some users complaining that on some gpus (5000 series if I remember well) audio reset wasn't working properly**.

Since your 05:00.1 is gpu digital audio, can you try to passthrough only the video without the audio (keeping it attached to vfio at boot) to the vm and see if it crashes?

From this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
    </hostdev>

 

To this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>

 

------

2 hours ago, disco4000 said:

and also works fine with a Monterey VM on a Proxmox Server

 

This may be helpful, how do you run it?Any vendor reset patch?Which version?ich777 patch should be version 1.1

 

**Update:

here a link:

https://github.com/gnif/vendor-reset/issues/29

Edited by ghost82
Link to comment

OK, after intensive testing this morning I ended up removing the multifunction config and uninstalling the Vendor Reset plugin.

I got a pair of 1st gen Soundsticks via USB connected so I don't need the digital audio output anyway.

 

Besides having only one more crash from the macOS VM (after several boots and shutdowns with other VMs), two other strange things happened:

  • After a reboot into a Windows VM the system console showed this message after initializing the GPU:
    vfio-pci 0000:05:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xaa55
  • Another boot into a Windows VM resulted in the infamous Error Code 43 inside the hardware manager

Next thing I will do is isolating the respective CPUs I use for my VMs. Maybe this brings it into a state that could be called "acceptable stability".

Link to comment
1 hour ago, disco4000 said:
vfio-pci 0000:05:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xaa55

This is non sense, it expects 0xaa55 and got 0xaa55, so what's the problem? :D

Are you sure you pasted it right?

 

If the gpu is isolated and the vbios extracted correctly, if you open it in a hex editor first hex values should be 'AA 55'.

If the gpu is grabbed by something else, part of the vbios will be masked, but the error will be different, something like 'expecting 0xaa55, got 0xffff'.

Similar error if the vbios flashed to the card was modded.

Btw, this is not an 'error', but an harmless warning.

 

1 hour ago, disco4000 said:

Another boot into a Windows VM resulted in the infamous Error Code 43

Is the gpu attached to vfio at boot?Are you sure it doesn't attach to efifb?

 

1 hour ago, disco4000 said:

Next thing I will do is isolating the respective CPUs I use for my VMs

That should be no difference...

 

RX460 needs reset patch, so removing the plugin without having any reset patch is not a good idea.

Edited by ghost82
Link to comment
22 hours ago, ghost82 said:

Are you sure you pasted it right?

Completely posted it wright, was able to reproduce this.

 

22 hours ago, ghost82 said:

Is the gpu attached to vfio at boot?Are you sure it doesn't attach to efifb?

Was attached to vfio at boot and not attached to efifb.

 

22 hours ago, ghost82 said:

RX460 needs reset patch, so removing the plugin without having any reset patch is not a good idea.

You're totally right. Without the plugin the VMs ended up with a black screen after a couple of shutdowns.

 

I ended up reconfiguring the complete setup from scratch and it worked for about 4-5 hours with several VM restarts, but unfortunately the Time-Out error struck again.

 

Since I need the VM for work and can't afford to spend more time on tweaking and testing, I reverted to the Quadro, which still is capable enough for light UI/UX design tasks.

If Apple sticks to its usual support cycles, BigSur should receive security updates for another two years. And maybe at that point the ARM transition will already have ended the hackintosh era. 😞

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...