Is this a version of the AMD reset bug?


eribob

Recommended Posts

Hi! 

I am passing through a Rx 580 (Sapphire nitro+ 4GB) to my Windows 10 VM. It works fine until the VM has to restart for any reason (even if it is only a "normal" restart). When it boots up again the display turns on and I see the "Tiano Core" bios logo and the circulating dots notifying me that windows 10 is starting. However, when this phase of the boot process is finished and Windows 10 is about to start the screen goes black and turns off. Now I have to force stop the VM and reboot the server in order to get it working again. If I try to start the VM again without rebooting the server, the same thing happens again.

 

I find it strange that the GPU is turning on at first, but then seems to turn off again, just as windows is about to load. Could it have something to do with the AMD drivers that are probably being loaded as windows is starting? 

 

I have the vBIOS (downloaded using GPU-Z) passed through to the VM. I tried removing that and it did not help. The GPU is not bound to VFIO at boot. I tried binding it and it did not help. I am running Unraid 6.9 beta 25. 

 

Windows 10 is version 2004.

Radeon drivers are 19.12.2 (not the latest actually - I can try to update that.)

 

Hardware: x570 Taichi, Ryzen 9 3950X, 80GB DDR4, Rx580 and and R7 370 (used for other VM:s).

 

Here is my VM XML: 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='5' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>Windows</name>
  <uuid>8dcf9d3b-6010-947b-543a-6216bf778f0b</uuid>
  <description>Main computer</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>16</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='10'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <vcpupin vcpu='12' cpuset='14'/>
    <vcpupin vcpu='13' cpuset='30'/>
    <vcpupin vcpu='14' cpuset='15'/>
    <vcpupin vcpu='15' cpuset='31'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-5.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/8dcf9d3b-6010-947b-543a-6216bf778f0b_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' dies='1' cores='8' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/VM_disks/Windows/windows_disk.img' index='5'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Win10_1809Oct_v2_Swedish_x64.iso' index='4'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso' index='3'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-Samsung_SSD_840_EVO_250GB_S1DBNSBF464063N' index='2'/>
      <backingStore/>
      <target dev='hdd' bus='sata'/>
      <alias name='sata0-0-3'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-INTEL_SSDSC2CT180A3_CVMP2224015B180CGN' index='1'/>
      <backingStore/>
      <target dev='hde' bus='sata'/>
      <alias name='sata0-0-4'/>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='5' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='5'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='6' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='6'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='7' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='7'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='8' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='8'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x04' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='9'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0xb'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0xc'/>
      <alias name='pci.11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0xd'/>
      <alias name='pci.12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0xe'/>
      <alias name='pci.13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0xf'/>
      <alias name='pci.14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:99:fc:d9'/>
      <source bridge='br0'/>
      <target dev='vnet4'/>
      <model type='virtio-net'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x07' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/4'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/4'>
      <source path='/dev/pts/4'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-5-Windows/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/isos/RX580.rom'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0f' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <rom file='/mnt/user/isos/RX580.rom'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev4'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev5'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x2'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
  <qemu:commandline>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,topoext=on,invtsc=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-reset,hv-frequencies,host-cache-info=on,l3-cache=off,-amd-stibp'/>
  </qemu:commandline>
</domain>

I also attach diagnostics after the error occurred. In the VM error log I get the following message (0000:0f:00.1 is the address of my GPU):

2020-09-05T11:49:00.017364Z qemu-system-x86_64: vfio_err_notifier_handler(0000:0f:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
2020-09-05T11:49:00.017488Z qemu-system-x86_64: vfio_err_notifier_handler(0000:0f:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest

Any help would be very much appreciated!

/Erik

monsterservern-diagnostics-20200905-1354-GPU-VM-error.zip

Link to comment

Hi! Yes I have tried it. It works, however, after a few resets I suddenly got a read error on one of my disks. I suspect it was some kind of bug because the smart-report from the disk was without errors. So I did a rebuild of the disk data and it works fine. My suspicion is that putting the server in a suspended state and then restarting it may have caused this bug. 

 

So I stopped using the reset script. 

 

What I am curious about is the fact that my monitor turns on and I see the boot logo from the VM bios. Then it turns off again and I have to force stop it. It sounds more like a driver/software issue that might be solvable... 

 

/erik 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.