sendas Posted November 24, 2019 Share Posted November 24, 2019 Hi I'm newbie at this and would appreciate any insight. Not sure how to trouble shoot this issue where the server crashes when I reboot or shutdown the VM. On unraid 6.8 two VM's one Windows with a RTX 2080 passthrough and the other a Linux Fedora with a RX 580 Passthrough both have the issue. If I ever need to restart or shutdown a VM there is a 80% chance the Unraid server locks up, no longer reachable via webgui or ssh. I attached my syslog and my iommu groups and the xml for the windows VM. groups.txt syslog.txt windows-vm-2080.txt Quote Link to comment
trurl Posted November 24, 2019 Share Posted November 24, 2019 There is no stable 6.8 release yet and you don't mention which Release Candidate you are running. Instead of syslog you should always go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post. It includes syslog and many other things. 1 Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 Ok thanks, its 6.8 RC6 tower-diagnostics-20191124-1906.zip Quote Link to comment
bastl Posted November 24, 2019 Share Posted November 24, 2019 (edited) @sendas You are passing through the "NVIDIA Corporation TU104 USB 3.1 Host Controller" from the GPU. From your xml: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> For the audio part in case you wanna pass it through, it should be function='0x1' instead of 0x2. Try not to passthrough any other devices from a 2xxx Nvidia card except of the GPU and the audio part. Couple days ago a guy had issues as soon as he handed over the USB device from his Nvidia card to the VM. The VM crashed and almost always the server became unstable, frooze or crashed as well. Edited November 24, 2019 by bastl Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 Interesting, I intentionally did a passthrough on the USB-C to use as a dedicated USB controller for the VM. I'm out of PCI-E slots for adding another card. Quote Link to comment
bastl Posted November 24, 2019 Share Posted November 24, 2019 @sendas As far as I know the USB-C is only usable for VR headsets. Correct me if I'am wrong. Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 The USB-C port seems to be working as a normal USB hub when the VM is running. I can plug in a usb headset or thumbdrive no problem. The vm is rock solid for days, if I dont try and restart it. Quote Link to comment
bastl Posted November 24, 2019 Share Posted November 24, 2019 @sendas Try without it and report back how it behaves without the USB controller. Just an idea. 😉 Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 Ok will try without it. I was reading the Arch wiki warning of devices that dont accept RESET. Theres a bash script that displays the devices showing which ones do or do not support reset. this was the output. I'm assuming if its missing [RESET] it means that the USB controller cannot be reset? https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Passing_through_a_device_that_does_not_support_resetting IOMMU group 21 [RESET] 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] [10de:1e81] (rev a1) IOMMU group 23 03:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) Quote Link to comment
bastl Posted November 24, 2019 Share Posted November 24, 2019 @sendas Not sure how exact the output of the command is. In case someone reads this and want's to check output. Here is the command: for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done The HDMI Audio Controllers of both of my 10xx Nvidia cards aren't marked as resetable but both are working. Same for my onboard audio controller. Never had an issue with restarting a VM with it passed through. Not sure if @limetech did something special to reset these devices which have no RESET info. If so and you can confirm the USB controller of the card is the issue for you, we already have 2 users reporting this. Maybe one of the devs can than have a look into it. IOMMU group 35 0b:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457] IOMMU group 28 [RESET] 09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1) IOMMU group 29 09:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1) IOMMU group 49 [RESET] 43:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) IOMMU group 50 43:00.1 Audio device [0403]: NVIDIA Corporation GP102 HDMI Audio Controller [10de:10ef] (rev a1) Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 Removing the Nvidia USB-C hub seems to have removed the issue. I tested a mix of restarts and shutdowns, well over 20 times with no issue. It would be nice to be able to use the built in hub but if its not a very VM friendly controller I'll look into passing through one of the built in controllers and see if that works. Bastl thanks for trying that out. I find it interesting the audio is missing the RESET. My Nvidia audio is also missing the RESET, but I never use it, just use a USB headset. 1 Quote Link to comment
sendas Posted November 24, 2019 Author Share Posted November 24, 2019 As a side note, I'm getting 400 more points on my passmark score after removing that Nvidia usb hub. Not sure whats going on there, now on to passthrough a different controller. Quote Link to comment
jonp Posted November 25, 2019 Share Posted November 25, 2019 Read through this thread and it sounds like the long and short of it is that if you don't pass through / use the USB C controller on the GPU, the lockups don't occur. Unfortunately with any type of hardware passthrough using KVM, this is a possibility. Hardware passthrough today works fairly well, but relies on a combination of the right firmware in the device and the right quirks in the kernel/KVM/QEMU. Over time, these things may improve, but for now, we lack the ability to provide any type of resolution on these hardware-specific issues. Quote Link to comment
BrianK Posted March 1, 2020 Share Posted March 1, 2020 When I run the command nothing happens?? for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do echo "IOMMU group $(basename "$iommu_group")"; for device in $(\ls -1 "$iommu_group"/devices/); do if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then echo -n "[RESET]"; fi; echo -n $'\t';lspci -nns "$device"; done; done Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.