Update to 6.10.1 broke VMs


Recommended Posts

Recently updated to unraid version 6.10.1 and Lost functionality of my virtual machines. I have my main /Gaming VM set up with Windows 10, and had recently set up a Windows 11 that I mess around with once in a while. Both were functioning fully with the use of my CPU, GPU, and peripherals.

 

now anytime I boot up either VM I get an error message saying 

Quote

Unable to power on device, stuck in D3


I try to stop the VM and all of unraid just freezes and forces me to go and manually restart the computer. If I remove my GPU from the VM it seems to work no problem. But as soon as I add it back, everything dies. I have been combing the forums for the last couple days and have not found much. Few people mentioned updating the bios, and I have done that, made no difference. I have created new VM‘s with different settings to the same results. 
 

stuck and seeking assistance lol diagnostics attached

gringots-diagnostics-20220526-2109.zip

Link to comment
6 hours ago, Malachi89 said:

Recently updated to unraid version 6.10.1 and Lost functionality of my virtual machines. I have my main /Gaming VM set up with Windows 10, and had recently set up a Windows 11 that I mess around with once in a while. Both were functioning fully with the use of my CPU, GPU, and peripherals.

 

now anytime I boot up either VM I get an error message saying 


I try to stop the VM and all of unraid just freezes and forces me to go and manually restart the computer. If I remove my GPU from the VM it seems to work no problem. But as soon as I add it back, everything dies. I have been combing the forums for the last couple days and have not found much. Few people mentioned updating the bios, and I have done that, made no difference. I have created new VM‘s with different settings to the same results. 
 

stuck and seeking assistance lol diagnostics attached

gringots-diagnostics-20220526-2109.zip 126.3 kB · 0 downloads

 

Your vfio config is wrong:

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:01:00.0|10de:2184 0000:01:00.1|10de:1aeb 0000:01:00.2|10de:1aec 0000:01:00.3|10de:1aed
---
Processing 0000:01:00.0 10de:2184
Error: Vendor:Device 10de:2184 not found at 0000:01:00.0, unable to bind device
---
Processing 0000:01:00.1 10de:1aeb
Error: Device 0000:01:00.1 does not exist, unable to bind device
---
Processing 0000:01:00.2 10de:1aec
Error: Device 0000:01:00.2 does not exist, unable to bind device
---
Processing 0000:01:00.3 10de:1aed
Error: Device 0000:01:00.3 does not exist, unable to bind device

 

All 10de nvidia devices are not in your build.

 

You have to reassign amd devices to vfio at boot:

03:00.0 1002:731f

03:00.1 1002:ab38

 

But first you may need to use acs override patch, because your gpu is not isolated in its iommu group (I'm not sure it will work without isolating from pci bridges):

/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:03:00.0
/sys/kernel/iommu_groups/1/devices/0000:03:00.1

 

Moreover you need to pass the gpu audio and video as multifunction, for example in your win10 vm you have:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

 

change it to:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

Win11 vm has vnc.

 

Reboot server after applying the changes.

Edited by ghost82
Link to comment
2 hours ago, ghost82 said:

 

Your vfio config is wrong:

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:01:00.0|10de:2184 0000:01:00.1|10de:1aeb 0000:01:00.2|10de:1aec 0000:01:00.3|10de:1aed
---
Processing 0000:01:00.0 10de:2184
Error: Vendor:Device 10de:2184 not found at 0000:01:00.0, unable to bind device
---
Processing 0000:01:00.1 10de:1aeb
Error: Device 0000:01:00.1 does not exist, unable to bind device
---
Processing 0000:01:00.2 10de:1aec
Error: Device 0000:01:00.2 does not exist, unable to bind device
---
Processing 0000:01:00.3 10de:1aed
Error: Device 0000:01:00.3 does not exist, unable to bind device

 

All 10de nvidia devices are not in your build.

 

You have to reassign amd devices to vfio at boot:

03:00.0 1002:731f

03:00.1 1002:ab38

 

But first you may need to use acs override patch, because your gpu is not isolated in its iommu group (I'm not sure it will work without isolating from pci bridges):

/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/1/devices/0000:03:00.0
/sys/kernel/iommu_groups/1/devices/0000:03:00.1

 

Moreover you need to pass the gpu audio and video as multifunction, for example in your win10 vm you have:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

 

change it to:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

Win11 vm has vnc.

 

Reboot server after applying the changes.

 

Oooofff.... that went over my head a bit. I don't know enough about that side of things to implement your fixes. I appreciate your help though. Maybe best if I downgrade back to 6.9 till they fix the kinks in the vms

Link to comment
5 hours ago, Malachi89 said:

till they fix the kinks in the vms

Sorry but there's nothing to fix!You just need to follow best practice for vms.

 

5 hours ago, Malachi89 said:

I don't know enough about that side of things to implement your fixes

 

7 hours ago, ghost82 said:

But first you may need to use acs override patch

Go to Settings -> VM Manager and set PCIe ACS override to both.

Reboot the server.

 

7 hours ago, ghost82 said:

You have to reassign amd devices to vfio at boot:

03:00.0 1002:731f

03:00.1 1002:ab38

Go to Tools -> System Devices and put a check next to:

VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ff)

 

and

 

Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38] (rev ff)

 

7 hours ago, ghost82 said:

Moreover you need to pass the gpu audio and video as multifunction

Move to the vm tab, left click on the icon of the vm, edit, click on xml view (top right), find this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

Replace with this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

Reboot server.

 

Try.

 

Attach new diagnostics after the changes if it doesn't work

Link to comment
5 hours ago, ghost82 said:

Sorry but there's nothing to fix!You just need to follow best practice for vms.

 

 

Go to Settings -> VM Manager and set PCIe ACS override to both.

Reboot the server.

 

Go to Tools -> System Devices and put a check next to:

VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev ff)

 

and

 

Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38] (rev ff)

 

Move to the vm tab, left click on the icon of the vm, edit, click on xml view (top right), find this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>

Replace with this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/ISOs/vbios/5700 xt.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
    </hostdev>

 

Reboot server.

 

Try.

 

Attach new diagnostics after the changes if it doesn't work


Been running VM's since 6.5 with no issues till now. This is a lot of extra work just to get these going. Glad to have someone with your knowledge on hand :P

So I followed your steps (very thorough, thank you so much) on the Windows 10 VM, and added the GPU to the 11 vm and made sure to make similar. (Same GPU, I only run one of the VM's at a time) and they seem to boot without any hiccups to the system at least, no errors being tossed my way this time. But they both just boot into a black screen. 
 

Updated diagnostics attached

gringots-diagnostics-20220527-1606.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.