Unraid 6.8 Freezes when VM shutdown or reboot


sendas

Recommended Posts

Hi I'm newbie at this and would appreciate  any insight.

 

 

Not sure how to trouble shoot this issue where the server crashes when I reboot or shutdown the VM.

On unraid  6.8  two VM's one Windows with a RTX 2080 passthrough and the other a Linux Fedora with a RX 580 Passthrough both have the issue.

 

If I ever need to restart or shutdown a VM there is a 80% chance the Unraid server locks up, no longer reachable via webgui or ssh.

 

I attached my syslog and my iommu groups and the xml for the windows VM.

 

groups.txt syslog.txt windows-vm-2080.txt

Link to comment

@sendas You are passing through the "NVIDIA Corporation TU104 USB 3.1 Host Controller" from the GPU.

 

From your xml:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>

For the audio part in case you wanna pass it through, it should be function='0x1' instead of 0x2.

 

Try not to passthrough any other devices from a 2xxx Nvidia card except of the GPU and the audio part. Couple days ago a guy had issues as soon as he handed over the USB device from his Nvidia card to the VM. The VM crashed and almost always the server became unstable, frooze or crashed as well.

 

Edited by bastl
Link to comment

Ok will try without it.

 

I was reading the Arch wiki warning of devices that dont accept RESET.  Theres a bash script that displays the devices showing which ones do or do not support reset. this was the output. I'm assuming if its missing [RESET] it means that the USB controller cannot be reset?

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Passing_through_a_device_that_does_not_support_resetting

 

IOMMU group 21
[RESET]    03:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] [10de:1e81] (rev a1)

IOMMU group 23
    03:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1)

Link to comment

@sendas Not sure how exact the output of the command is. In case someone reads this and want's to check output. Here is the command:

for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done

 

The HDMI Audio Controllers of both of my 10xx Nvidia cards aren't marked as resetable but both are working. Same for my onboard audio controller. Never had an issue with restarting a VM with it passed through. Not sure if @limetech did something special to reset these devices which have no RESET info. If so and you can confirm the USB controller of the card is the issue for you, we already have 2 users reporting this. Maybe one of the devs can than have a look into it.

IOMMU group 35
        0b:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]

IOMMU group 28
[RESET] 09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
IOMMU group 29
        09:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)

IOMMU group 49
[RESET] 43:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
IOMMU group 50
        43:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1)

 

Link to comment

Removing the Nvidia USB-C hub seems to have removed the issue. I tested a mix of restarts and shutdowns, well over 20 times with no issue.

 

It would be nice to be able to use the built in hub but if its not a very VM friendly controller I'll look into passing through one of the built in controllers and see if that works.

 

Bastl thanks for trying that out. I find it interesting the audio is missing the RESET.  My Nvidia audio is also missing the RESET, but I never use it, just use a USB headset.

  • Like 1
Link to comment

Read through this thread and it sounds like the long and short of it is that if you don't pass through / use the USB C controller on the GPU, the lockups don't occur.

 

Unfortunately with any type of hardware passthrough using KVM, this is a possibility.  Hardware passthrough today works fairly well, but relies on a combination of the right firmware in the device and the right quirks in the kernel/KVM/QEMU.  Over time, these things may improve, but for now, we lack the ability to provide any type of resolution on these hardware-specific issues.

Link to comment
  • 3 months later...

When I run the command nothing happens??

 

for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done

 

image.png.52a1ae4bdf7dbd89e4f8fa782c3b2601.png

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.