[Solved]VM stuck in D3 after a while


Go to solution Solved by ghost82,

Recommended Posts

I have two VMs setup, one works just fine all the time and second one works for a while then gets stuck in D3 (vfio: Unable to power on device, stuck in D3) error, any hints or tips are appreciated :) 

 

My unraid server specs: 

Motherboard: Z490 Aorus Prox AX 

BIOS dated: Mar 25 2020 

CPU: Intel Core i9-10850K 

GPUs:  2* AMD Radeon RX 5500 

 

VM #1 setup, this one works consistently without any issues: 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='1'>
  <name>Zahra-VM</name>
  <uuid>69128a02-abdf-37fb-4c9d-79118377eade</uuid>
  <description>Zahra</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>6291456</memory>
  <currentMemory unit='KiB'>6291456</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='12'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='14'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='15'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/69128a02-abdf-37fb-4c9d-79118377eade_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/Zahra-VM/vdisk1.img' index='3'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows.iso' index='2'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:33:46:de'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-Zahra-VM/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

VM #2 setup, this one works for a while then gets stuck in D3 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='3'>
  <name>Fatimah-VM</name>
  <uuid>f62c76b8-6933-15eb-f6e3-3c2639e6c5dc</uuid>
  <description>Fatimah</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>6291456</memory>
  <currentMemory unit='KiB'>6291456</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='19'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/f62c76b8-6933-15eb-f6e3-3c2639e6c5dc_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/Windows 10/vdisk1.img' index='3'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows.iso' index='2'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:b9:c1:13'/>
      <source bridge='br0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-Fatimah-VM/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

VM #2 Log

-no-hpet \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x7.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x7 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x7.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x7.0x2 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 \
-blockdev '{"driver":"file","filename":"/mnt/cache/Windows 10/vdisk1.img","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-3-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"}' \
-device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-3-format,id=virtio-disk2,bootindex=1,write-cache=on \
-blockdev '{"driver":"file","filename":"/mnt/user/isos/Windows.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}' \
-device ide-cd,bus=ide.0,drive=libvirt-2-format,id=sata0-0-0,bootindex=2 \
-blockdev '{"driver":"file","filename":"/mnt/user/isos/virtio-win-0.1.173-2.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \
-device ide-cd,bus=ide.1,drive=libvirt-1-format,id=sata0-0-1 \
-netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=37 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b9:c1:13,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=39,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
-device vfio-pci,host=0000:03:00.1,id=hostdev1,bus=pci.5,addr=0x0 \
-device vfio-pci,host=0000:0b:00.0,id=hostdev2,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-12-30 18:56:37.049+0000: Domain id=3 is tainted: high-privileges
2021-12-30 18:56:37.049+0000: Domain id=3 is tainted: host-cpu
char device redirected to /dev/pts/1 (label charserial0)
2021-12-31T08:21:10.903151Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2021-12-31T08:21:10.907147Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3

 

Edited by iskoala
Link to comment
5 hours ago, ghost82 said:

You need to provide diagnostics file, you didn't specify which device is stuck in D3, we can only read devices like these:

06:00.0

06:00.1 

08:00.0

03:00.0

03:00.1

0b:00.0

 

in other words this is like reading arabic for me (NB: I'm not from Arabia :D)

 

My bad, first time doing this wasn't too sure what I needed to provide. Here are the diagnostics report and syslog. 

 

if I'm reading the syslog correctly, it seems like its the sounds part of the gpu i'm passing through 

Jan  2 00:01:43 vader kernel: vfio-pci 0000:03:00.1: refused to change power state from D0 to D3hot
[1002:7340] 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev ff)
[1002:ab38] 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio (rev ff)

 

vader-diagnostics-20220102-0725.zip vader-syslog-20220102-1531.zip

Edited by iskoala
Link to comment

For Fatimah vm:

are you using the amd reset bug patch?series 5000 gpus suffer this issue.

Is it happening as soon as you boot your unraid or after a reboot/shutdown of a vm?

Change also gpu to multifunction, replace this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

With this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

It is also good practice to attach to vfio at boot all the devices you want to passthrough to the vm.

Link to comment
6 hours ago, ghost82 said:

For Fatimah vm:

are you using the amd reset bug patch?series 5000 gpus suffer this issue.

Is it happening as soon as you boot your unraid or after a reboot/shutdown of a vm?

Change also gpu to multifunction, replace this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

With this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

It is also good practice to attach to vfio at boot all the devices you want to passthrough to the vm.

 

Thanks a lot for helping on this and taking the time to reply, please bare with my ignorance  here :D ... I much appreciate your help. I will change to multifunction and report back. 

 

1- - For the AMD reset bug patch do you mean kernel patch ? I know this card has the AMD reset bug, I usually use the script below to reset it (its still annoying) so if there is a better way to deal with the reset issue it would be awesome. 

2-- The issue is actually not on boot or reboot the VM runs for a while fine, sometimes hours, then eventually freeze and get stuck in this error. 

3--  RE "attach to vfio at boot all the devices you want to passthrough to the vm." do you mind expanding a bit further on how do I do that ? or if you have a handy link.   (Already found the answer to this part :) 

 

This is what I use for the reset bug, its annoying since it puts the server in sleep mode and I have to turn it back on to recover. 

#!/bin/bash
echo "disconnecting amd graphics"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.0/remove
echo "1" | tee -a /sys/bus/pci/devices/0000\:03\:00.0/remove
echo "disconnecting amd sound counterpart"
echo "1" | tee -a /sys/bus/pci/devices/0000\:06\:00.1/remove
echo "1" | tee -a /sys/bus/pci/devices/0000\:03\:00.1/remove
echo "entered suspended state press power button to continue"
echo -n mem > /sys/power/state
echo "reconnecting amd gpu and sound counterpart"
echo "1" | tee -a /sys/bus/pci/rescan
echo "AMD graphics card successfully reset"

 

Edited by iskoala
Link to comment
13 hours ago, iskoala said:

1- - For the AMD reset bug patch do you mean kernel patch ? I know this card has the AMD reset bug, I usually use the script below to reset it (its still annoying) so if there is a better way to deal with the reset issue it would be awesome. 

There is an unraid docker to compile your custom kernel with the patch applied: unraid kernel helper:

I never used it because I don't need it, if you have issue, post in the support thread.

 

13 hours ago, iskoala said:

2-- The issue is actually not on boot or reboot the VM runs for a while fine, sometimes hours, then eventually freeze and get stuck in this error.

This could be more problematic, check for bios updates for your motherboard.

 

Link to comment
5 hours ago, ghost82 said:

There is an unraid docker to compile your custom kernel with the patch applied: unraid kernel helper:

I never used it because I don't need it, if you have issue, post in the support thread.

 

This could be more problematic, check for bios updates for your motherboard.

 

 

 

For the reset bug, I followed up on the thread you linked and looks like there is a plugin in 6.10.x (or I can use the kernal helper with 6.9.x or below), I will apply that fix. 

Few things I updated yesterday,  is updating the bios on the motherboard and changing the passthrough to multifunction like you suggested, the VM been running for ~10 hours no crashes so far, so thats a good sign (usually it crashes within 2-3 hours). 

 

Will keep it running for another day or so to make sure no crashes. 

 

Thanks @ghost82 for all your help and guidance, much appreciated  :) 

 

Link to comment
  • Solution
34 minutes ago, iskoala said:

is updating the bios on the motherboard

I think this was the most important thing you did, since you have that errors not on boot, but after a xx amount of time the vm was running, apparently random.

Anyway, glad that you (hopefully) solved your issue.

Edited by ghost82
Link to comment
9 hours ago, ghost82 said:

There is an unraid docker to compile your custom kernel with the patch applied: unraid kernel helper:

This is now deprecated, the last version that I support is 6.9.2

 

3 hours ago, iskoala said:

For the reset bug, I followed up on the thread you linked and looks like there is a plugin in 6.10.x (or I can use the kernal helper with 6.9.x or below), I will apply that fix. 

Exactly, I deprecated the Kernel Helper because nearly everything can now be integrated to unRAID via plugins.

  • Like 1
Link to comment
  • iskoala changed the title to [Solved]VM stuck in D3 after a while
4 hours ago, ghost82 said:

I think this was the most important thing you did, since you have that errors not on boot, but after a xx amount of time the vm was running, apparently random.

Anyway, glad that you (hopefully) solved your issue.

@ghost82 Things have been running pretty smooth so far, thanks a ton for your help and patience here, much appreciated :D 

 

 

2 hours ago, ich777 said:

This is now deprecated, the last version that I support is 6.9.2

 

Exactly, I deprecated the Kernel Helper because nearly everything can now be integrated to unRAID via plugins.

@ich777 Thanks a ton, I updated to 6.10rc2 and installed the plug instead, things been running pretty smoothly.

One  thing I noticed in the look is this msg "vfio: Cannot reset device 0000:03:00.1, no available reset mechanism." I'm assuming this is expected since the sound part doesn't need a reset, is that a fair assumption ? 

 

 

 

Link to comment
9 hours ago, iskoala said:

"vfio: Cannot reset device 0000:03:00.1, no available reset mechanism."

It means that the vendor reset patch is not working properly.

You should not be able to power on the vm after this error occurs.

If you read in github, issues in the gnif repo (gnif/vendor-reset/issues/) you will find some opened issues about audio not resetting properly, most based on 5700/5700 XT.

Some users had luck without passing through the audio portion of the gpu to the vm..however you can understand that this is a workaround and not a fix.

Link to comment
14 hours ago, ghost82 said:

It means that the vendor reset patch is not working properly.

You should not be able to power on the vm after this error occurs.

If you read in github, issues in the gnif repo (gnif/vendor-reset/issues/) you will find some opened issues about audio not resetting properly, most based on 5700/5700 XT.

Some users had luck without passing through the audio portion of the gpu to the vm..however you can understand that this is a workaround and not a fix.

 

hmmm thats interesting, so in my case I'm actually able to boot/reboot the VM (gracefully and after a force stop) I didn't see any weird behavior on the VM I just noticed the msg in the log by accident, it shows up every time I reboot or boot. any logs other than the VM log I should look into ?

Link to comment
11 hours ago, iskoala said:

so in my case I'm actually able to boot/reboot the VM (gracefully and after a force stop) I didn't see any weird behavior on the VM I just noticed the msg in the log by accident

If it works don't touch it :D

Usually that message means that the device cannot be resetted and this means that it can't be reinitialized after a reboot or shutdown.

Maybe that message appears before the patch is able to properly reset it, so a false positive (?)

  • Like 1
Link to comment
On 1/5/2022 at 6:12 AM, ghost82 said:

If it works don't touch it :D

Usually that message means that the device cannot be resetted and this means that it can't be reinitialized after a reboot or shutdown.

Maybe that message appears before the patch is able to properly reset it, so a false positive (?)

Exactly ! They have been running smoothly for 4 days or now, so I backup a copy of the templates and dont plan to touch it :D

Thanks a ton for all your help on this :) 

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.