Rebooting GPU Passthrough VM takes entire unraid to freeze


takkkkkkk

Recommended Posts

So this is where i am right now.

1. Booted Windows with vbios.
2. NVIDIA control panel announced that there was a new driver.
3. Updated the driver.
4. Almost at the end... bang, everything frooze. I noticed that really quick, because the internet in the whole house died... and the kids went crazy :-)
5. Hard reset unraid.
6. Booted Windows again.
7. New driver defintly installed, because of new Ultra HD resolutions.
8. Started diablo 3
9. Gave me a D3D error.
10. Downloaded and installed directx, that went fine.
11. Restartet Windows without any problems.
12. Started diablo 3 again, that gave me some wired shit colors and resolution on the screen.
13. Tried another game, that worked fine.
14. Went to NVIDIA control panel and reinstalled the driver, without any issue.
15. Tried diablo again, without luck.
16. Investigated more into diablo, it seems the game has taken the higest accessible solution and used that as standard.
17. Changed the settings in D3Prefs file, and now the game is running again.
18. Shutdown Windows.... bang... everything frooze within Seconds.
Back to 1. :-(

I was watching unraid and Windows log the whole time.
The last line in unraid log, says

kernel: usb 2-1-port3: cannot reset (err = -110) 








Sent from my iPhone using Tapatalk

Link to comment
3 hours ago, perhansen said:

I can use the vm for 10 hours of gaming, without any problems. The issue first occure when i shutdown the vm or restart it, not when i’am using the vm.

You said something completly different earlier. I must have read it wrong 😂 even if there isn't much headroom to interpret it differently.

10 hours ago, perhansen said:

No, actually not everytime. When i only use the vm for a short amount of time, lets say 30minuts, its working fine. But when i game with my son, for a couple of hours, it hangs the intire system.

 

Link to comment
1 minute ago, bastl said:

@perhansen Please post xml of that VM and also the diagnostics might be helpful looking into the system devices and the IOMMU groupings

bigassserver-diagnostics-20191104-2113.zip

Here you go :

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10 NVIDIA</name>
  <uuid>3872329d-c795-ad63-e319-f9a42aec7665</uuid>
  <description>gamer</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="/mnt/cache/VM/icons/windows_teamgreen.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='11'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='12'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='14'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-4.0.1'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disks/KINGSTON_SA400S37120G_50026B7682823576/Windows 10/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/iso/Win10_1903_V2_English_x64.iso'/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/iso/virtio-win-0.1.160-1.iso'/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:0c:02:99'/>
      <source bridge='virbr0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/disks/KINGSTON_SA400S37120G_50026B7682823576/Asus.GTX1660.6144.190225.rom'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc521'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x2101'/>
        <product id='0x8501'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

Link to comment
8 minutes ago, perhansen said:

 

9b0f7ba7c6696ab02e6ffcaf9f04e7c0.jpg

 

Not actual sure if you even need the 2 extra Nvidia devices. The 10 series were the last cards where these devices won't exist. The newer 20 series and the 16xx series indroduced these. If you aren't passthrough any other USB controllers from the board itself or axtra addin cards, we might found the culprit. Remove both and see what happens.

 

  • Like 1
Link to comment
7 minutes ago, bastl said:

Not actual sure if you even need the 2 extra Nvidia devices. The 10 series were the last cards where these devices won't exist. The newer 20 series and the 16xx series indroduced these. If you aren't passthrough any other USB controllers from the board itself or axtra addin cards, we might found the culprit. Remove both and see what happens.

 

I just booted windows, started a game and shutdown the vm. No issues so fare, i will test more tomorrow.

 

I can't believe it was that easy. I've been struggling with this for so long and then it was just a fucking usb controller. I just thought it was supposed to be passthroug, because it was in the same IOMMU group and part of the NVIDIA card.

 

You are the man bastl. Thanks for all the help, i appreciate it.

Link to comment

@perhansen The "ActionStar" is an old 2.0 USB Hub, right? It is listed 2 times in your USB devices.

Bus 002 Device 005: ID 2101:8501 ActionStar 
Bus 002 Device 003: ID 2101:8500 ActionStar

Could be also an issue if you only have one device and tell Unraid to pass it through. You only have one entry in the VM settings page, where the system lists 2. Don't know if this can cause issues. Just an idea if nothing else helps.

 

7 minutes ago, perhansen said:

I just thought it was supposed to be passthroug, because it was in the same IOMMU group and part of the NVIDIA card.

In general it's adviced to passthrough the whole group and only if everything depends on the same device you wanna use in the VM. But if you only passthrough a portion of that group and the rest of the devices won't be used by something else, what unraid in general won't do, it should work too. Passing a GPU without the HDMI Audio controller also helped some users. But like with everything in tech there is no solution which helps everyone.

 

Let's hope your issue is solved. Fingers crossed 😉

Link to comment
@perhansen The "ActionStar" is an old 2.0 USB Hub, right? It is listed 2 times in your USB devices.

Bus 002 Device 005: ID 2101:8501 ActionStar Bus 002 Device 003: ID 2101:8500 ActionStar

Could be also an issue if you only have one device and tell Unraid to pass it through. You only have one entry in the VM settings page, where the system lists 2. Don't know if this can cause issues. Just an idea if nothing else helps.

 

In general it's adviced to passthrough the whole group and only if everything depends on the same device you wanna use in the VM. But if you only passthrough a portion of that group and the rest of the devices won't be used by something else, what unraid in general won't do, it should work too. Passing a GPU without the HDMI Audio controller also helped some users. But like with everything in tech there is no solution which helps everyone.

 

Let's hope your issue is solved. Fingers crossed

 

 

Yes, the actionstar is an old usb hub. I use it for the ease og passthrough the usb devices. My server sits in the basement, and i use a hdmi/usb to ethernet converter, that only have one usb port.

Strange thing, even if i unplug multiple times, it shows up twice in the system list.

I will do some testing, and buy a new one if this becomes a problem.

 

 

Sent from my iPhone using Tapatalk

Link to comment
@perhansen @peter_sm If this issue really depends on passthrough of an USB controller for both of you and started with the 6.8 RC versions, it might be worth to open an bug report for it.


You are right, but is there not a different between a normal usb card, and the one connected to the nvidia card i was trying to passthrough?

My problem with the complete lockup of unraid was also i 6.7 as i remember. But that was not a thing i used so much time on, to investigate.


Sent from my iPhone using Tapatalk
Link to comment
2 minutes ago, perhansen said:

You are right, but is there not a different between a normal usb card, and the one connected to the nvidia card i was trying to passthrough?

Not sure if there is a difference in the kernels how USB devices are reset and handled if they are dedicated devices or sub devices like in your case the GPU. I only mentioned it just in case this issue is new for you in the 6.8 RC builds.

  • Like 1
Link to comment

@peter_sm Hard to tell from your logs. The "Windows10_TacX"looks clean and the "Mojave" is is complaining about some not existing features of your CPU which you have set in the xml. This is only a warning and a couple of USB dependent warnings. Not sure how to interpret these.

usb_desc_get_descriptor: 2 unknown type 33 (len 10)
usb_desc_get_descriptor: 1 unknown type 33 (len 10)
usb_desc_get_descriptor: 1 unknown type 33 (len 10)
usb_desc_get_descriptor: 2 unknown type 33 (len 10)
usb_desc_get_descriptor: 1 unknown type 33 (len 10)
usb_desc_get_descriptor: 1 unknown type 33 (len 10)

And from the qemu logs itself that some vdisks aren't existing

2019-11-05 15:15:11.604+0000: 10858: error : qemuOpenFileAs:3264 : Failed to open file '/mnt/disks/nvme2/OSX-Trainer.img': No such file or directory
2019-11-05 15:15:11.941+0000: 10862: error : qemuOpenFileAs:3264 : Failed to open file '/mnt/disks/nvme1/OSX.raw': No such file or directory

But nothing special like in perhansen's case.

 

I think the Unraid console output would be interresting at the time the VM hangs or crashes. Did you tried to remove your passed through USB card you mentioned earlier? Does this work?

Link to comment

I will try more later this week, little bussy right now. First I will try and see the log show when it freezes. Then I will lok into use onboard USB for my Windows 10 VM to see if it's freezes or not.
The USB card is only way to get any USB port work on the MAC osx VM and therefore I have it on my W10 as well.

 

Thanks for your time and I will let you know later.

 

Have a nice day

 

//Peter

Link to comment

Got a freeze today 😞 what happens is that several cors get up to 100% when shutting down VM and then unraid is no´t responding. And nothing in the log.

 

BUT after a success shutdown of the VM the core even go to 100% for a few seconds and go down to normal and then its' continue to perform a nice shutdown.

 

@limetech  Is this info enough to start looking at this issue ? 

2019-11-10 08_28_50-Tower_Dashboard.png

 

Screenshot 2019-11-10 at 08.47.23.png

Edited by peter_sm
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.