• System hangs when VM Changes states (shutdown, restart etc)


    manosioa

    Hi all,

    I've been trying to build a Hackintosh VM in the last months.
    After a lot of research and suggestions from the community, I've decided to buy a new "ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller" to play nice and natively with my Mac setup.

    Following the guides of @SpaceInvaderOne (thank you so much man), I have set up the VM and properly passed though the PCI card.

    Things are going well and everything works fine until I ask the Mac VM to restart or shutdown.
    If any of the 2 options are executed (either though the VM itself or from VM section on Unraid) the VM shuts down then the system (server) hangs.
    I have to manually force reset the whole system (server) cause everything is unresponsive and cannot access WEB GUI or ssh etc.

    I started investigating on the issue and checked the logs.
    Seems that the problem has to do with the ASM1142 PCI card.

    In the system log right after I execute a restart or shutdown I get the following:

    Quote

    Feb 10 14:47:03 Tower avahi-daemon[11251]: Registering new address record for fe80::fc54:ff:fe51:c393 on vnet2.*.
    Feb 10 14:47:04 Tower kernel: vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
    Feb 10 14:47:04 Tower kernel: vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
    Feb 10 14:47:04 Tower kernel: vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1e@0x370
    Feb 10 14:47:04 Tower kernel: pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.
    Feb 10 14:47:05 Tower kernel: vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x280
    Feb 10 14:47:12 Tower kernel: vfio-pci 0000:02:00.0: No more image in the PCI ROM
    Feb 10 14:47:12 Tower kernel: vfio-pci 0000:02:00.0: No more image in the PCI ROM
    Feb 10 14:48:26 Tower kernel: DMAR: DRHD: handling fault status reg 2
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce277000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce278000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce277000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce277000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce277000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce277000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce278000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce278000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: DMAR: [DMA Write] Request device [01:00.0] fault addr 4e0c9ce278000 [fault reason 04] Access beyond MGAW
    Feb 10 14:48:26 Tower kernel: pcieport 0000:00:02.0: AER: Uncorrected (Fatal) error received: 0000:00:02.0
    Feb 10 14:48:26 Tower kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)

    Feb 10 14:48:26 Tower kernel: pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00040000/00000000
    Feb 10 14:48:26 Tower kernel: pcieport 0000:00:02.0: [18] MalfTLP (First)
    Feb 10 14:48:26 Tower kernel: pcieport 0000:00:02.0: TLP Header: 60000005 01000078 578ce0c9 ce277ff0
    Feb 10 14:48:26 Tower kernel: vfio-pci 0000:01:00.0: Relaying device request to user (#0)
    Feb 10 14:48:31 Tower kernel: dmar_fault: 1134 callbacks suppressed
    Feb 10 14:48:31 Tower kernel: DMAR: DRHD: handling fault status reg 100
    Feb 10 14:48:32 Tower kernel: DMAR: DRHD: handling fault status reg 100
    Feb 10 14:48:33 Tower kernel: DMAR: DRHD: handling fault status reg 100

     

    Posting my IOMMU groups: 
     

    Quote

    System Devices
    PCI Devices and IOMMU Groups

    IOMMU group 0:    [8086:2f81] ff:0b.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
    [8086:2f36] ff:0b.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
    [8086:2f37] ff:0b.2 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
    IOMMU group 1:    [8086:2fe0] ff:0c.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    [8086:2fe1] ff:0c.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    [8086:2fe2] ff:0c.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    [8086:2fe3] ff:0c.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    [8086:2fe4] ff:0c.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    [8086:2fe5] ff:0c.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
    IOMMU group 2:    [8086:2ff8] ff:0f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
    [8086:2ff9] ff:0f.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
    [8086:2ffc] ff:0f.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
    [8086:2ffd] ff:0f.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
    [8086:2ffe] ff:0f.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
    IOMMU group 3:    [8086:2f1d] ff:10.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
    [8086:2f34] ff:10.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
    [8086:2f1e] ff:10.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
    [8086:2f7d] ff:10.6 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
    [8086:2f1f] ff:10.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
    IOMMU group 4:    [8086:2fa0] ff:12.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
    [8086:2f30] ff:12.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
    IOMMU group 5:    [8086:2fa8] ff:13.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
    [8086:2f71] ff:13.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
    [8086:2faa] ff:13.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
    [8086:2fab] ff:13.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
    [8086:2fac] ff:13.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
    [8086:2fad] ff:13.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
    [8086:2fae] ff:13.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast (rev 02)
    [8086:2faf] ff:13.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
    IOMMU group 6:    [8086:2fb0] ff:14.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control (rev 02)
    [8086:2fb1] ff:14.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control (rev 02)
    [8086:2fb2] ff:14.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
    [8086:2fb3] ff:14.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
    [8086:2fbc] ff:14.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
    [8086:2fbd] ff:14.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
    [8086:2fbe] ff:14.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
    [8086:2fbf] ff:14.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
    IOMMU group 7:    [8086:2fb4] ff:15.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control (rev 02)
    [8086:2fb5] ff:15.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control (rev 02)
    [8086:2fb6] ff:15.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers (rev 02)
    [8086:2fb7] ff:15.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers (rev 02)
    IOMMU group 8:    [8086:2f68] ff:16.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers (rev 02)
    [8086:2f6e] ff:16.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast (rev 02)
    [8086:2f6f] ff:16.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
    IOMMU group 9:    [8086:2fd0] ff:17.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control (rev 02)
    [8086:2fb8] ff:17.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
    [8086:2fb9] ff:17.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
    [8086:2fba] ff:17.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
    [8086:2fbb] ff:17.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
    IOMMU group 10:    [8086:2f98] ff:1e.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
    [8086:2f99] ff:1e.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
    [8086:2f9a] ff:1e.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
    [8086:2fc0] ff:1e.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
    [8086:2f9c] ff:1e.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
    IOMMU group 11:    [8086:2f88] ff:1f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
    [8086:2f8a] ff:1f.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
    IOMMU group 12:    [8086:2f00] 00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
    IOMMU group 13:    [8086:2f04] 00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02)
    IOMMU group 14:    [8086:2f08] 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
    IOMMU group 15:    [8086:2f28] 00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)
    [8086:2f29] 00:05.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug (rev 02)
    [8086:2f2a] 00:05.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors (rev 02)
    [8086:2f2c] 00:05.4 PIC: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 I/O APIC (rev 02)
    IOMMU group 16:    [8086:8d7c] 00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
    [8086:8d62] 00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05)
    IOMMU group 17:    [8086:8d31] 00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
    IOMMU group 18:    [8086:8d3a] 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
    IOMMU group 19:    [8086:15a1] 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
    IOMMU group 20:    [8086:8d2d] 00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
    IOMMU group 21:    [8086:8d20] 00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
    IOMMU group 22:    [8086:8d10] 00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
    IOMMU group 23:    [8086:8d14] 00:1c.2 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #3 (rev d5)
    IOMMU group 24:    [8086:8d26] 00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
    IOMMU group 25:    [8086:8d47] 00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
    [8086:8d02] 00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
    [8086:8d22] 00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
    IOMMU group 26:    [1b21:1242] 01:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
    IOMMU group 27:    [1002:67df] 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev c7)
    [1002:aaf0] 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
    IOMMU group 28:    [1912:0015] 04:00.0 USB controller: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller (rev 02)

     

    As you can see the "AER: Uncorrected (Fatal) error" on "[8086:2f04] 00:02.0" device that is mentioned in the system log is the PCI bus that the "ASM1142" PCI card is installed.

    In order to completely be sure that its the card that has the issue I tried the following:

    1. Used the ASM1142 PCI card in a Windows 10 VM and got the same error and behaviour (without it everything works fine)
    2. Enabled (previously disabled) "PCIe ACS override" and tried all combinations
    3. Enabled (previously disabled) "VFIO allow unsafe interrupts" and tried all combinations.
    4. Passed through the "IOMMU group 13:    [8086:2f04] 00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02)" that system log mentioned that has the "AER: Uncorrected (Fatal) error"

    Nothing of the above solved the problem.

     

    I have to mention that sometimes the system doesn't hang completely, but the VM page cannot load up (other VMs work fine), dockers run fine. When that happens the RAM that was allocated for the VM that failed stays allocated, even after I manually turn off the VM Manager from settings. When I turn on the VM Manager again, the VM Manager is still unresponsive and cannot have access to it.
    Then if I try to restart the system or shutdown, it hangs and I have to reset again.

    Any help would be greatly appreciated.

    BR

     

    tower-diagnostics-20200211-1059.zip




    User Feedback

    Recommended Comments



    I do have this issue that unraid freez when shutting down VM

    And here some more info, but there is no respons at all on this issue. But looks like there is more that have this and we might have more attention to this?

     

    • Like 1
    Link to comment

    Hi peter,

     

    I have read both yours and the other guy’s posts, and that’s why I posted this here as a bug. 
     

    Let’s hope we’re gonna draw some attention and someone actually help us resolve our issues.

     

    • Like 1
    Link to comment

    Hello,

    Thanks to Modo johnnie.black who linked me this topic, I can confirm I also have that bug.

    I'm also passing a usb port integrated to the MB where an external USB hub is plugged :

    ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller | USB controller (08:00.0)


    This is the only usb 3.1 gen 2 red port I have to the back of my Asus X399 Strix motherboard.
    I use a KVM switch to pass mouse/keyboard/so on from Unraid (other usb 2.0 ports) to windows VM (one usb 3.1).

     

    Here's the diag and log while W10 running just after reboot :

    server-diagnostics-20200211-1503.zip

    image.png.f7d22874bf70b01820dc9c22c5dc2ee2.png

     

    Sincerely,

     

    Edited by dboris
    • Like 1
    Link to comment

    Hi @dboris,

     

    The PCI USB controller you are using is different than mine and the others that were mentioned in the posts linked by @peter_sm.
    So I’m guessing there is a bigger issue here regarding the passthrough of PCI USB controllers (maybe in the latest unraid build?).

     

    Lets hope someone from @limetech will see this and collect all the data posted and start debugging the issue. 
     

    Crossing fingers 🤞

    • Like 1
    Link to comment

    Thanks, I corrected my post. It's been a few weeks for me in the unraid world. I got to so many parity repairs, which are caused by hard reset. It seems like a similar behaviour.
    image.thumb.png.5bc5845d5abd63507ad82c8b1c714b6f.png
    What would be the best test to confirm it ?

    Should I remove passing this usb pci controler in Unraid OS options and try to see if windows VM accept to reboot?

    Usually never works for me.

    Be aware I also ran in a case where no VM were working anymore ; I got to reinstall unraid.

    Edited by dboris
    • Like 1
    Link to comment
    4 minutes ago, dboris said:

    Thanks, I corrected my post. It's been a few weeks for me in the unraid world. I got to so many parity repairs, which are caused by hard reset. It seems like a similar behaviour.
    image.thumb.png.5bc5845d5abd63507ad82c8b1c714b6f.png
    What would be the best test to confirm it ?

    Should I remove passing this usb pci controler in Unraid OS options and try to see if windows VM accept to reboot?

    Usually never works for me.

    Be aware I also ran in a case where no VM were working anymore ; I got to reinstall unraid.

    First and foremost I would recommend turning off PCI ACS override, reboot then post the IOMMU groups here in a quote in order to see what have you passed through.

     

    Things to try:

    1) enable "VFIO allow unsafe interrupts" 

    2) try to boot ur VM with the PCI passed though and restart/shutdown

    3) try to boot ur VM without any devices passed though (except the GPU) and restart/shutdown

    4) remove the PCI devices from the passthrough config

    5) try to boot ur vm with GPU passed though and the USB devices selected on your PCI USB Controller and restart/shutdown

     

    Every time you restart or shutdown your VM, keep a tab open with the System logs so that you can see what is going on.

     

    Then next to each try (2,3,5) post your results so that we can compare and better understand your situation.

     

    • Like 1
    Link to comment
    Quote

     

    I think I had to enable PCI ACS override because the IOMMU group was containing the GPU + other devices 🤨

    I'll update you in the coming days.

    Also the log screen capture I previously posted isn't a generality, after reboot, I got that instead :

    Unraid Errors.PNG

    Edited by dboris
    Link to comment
    23 hours ago, dboris said:

    I think I had to enable PCI ACS override because the IOMMU group was containing the GPU + other devices 🤨

    I'll update you in the coming days.

    Also the log screen capture I previously posted isn't a generality, after reboot, I got that instead :

    Unraid Errors.PNG

    I see.
    The 1822:1453 is the PCI bridge that runs the ASM2142?
    Did i got it right?

    Link to comment

    Actually I have no idea but I reported it as I don't wanted to mislead anyone.

     

     

    Maybe you are right ?

    How could I check if it's the case?

     

    I'll try to turn off PCI ACS override, but do we agree that if the IOMMU groups aren't broken enough for GPU and USB passthrough, I have no other choices ? I'll copy/paste the original IOMMU groups here to investigate.

    image.png

    Edited by dboris
    Link to comment

    In the last screenshot you’ve posted, it’s a different device on a different IOMMU group.

    Here you’re showing us an AMD USB controller (possibly your motherboards usb controller).

     

    In the previous screenshot you were showing us a PCI bridge that had issues with the VM.

     

    let me get this straight.

    you said you’re trying to passthough the following:

    On 2/11/2020 at 3:54 PM, dboris said:

    ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller | USB controller (08:00.0)

    Is the ASM2142 a PCI USB controller connected on a PCI slot? 
     

     

    Link to comment

    To be honest I don't really know.

    I don't have PCI-E usb cards other than the ports provided by the motherboard.
    So ASM2142 is either the USB HUB's chip either the motherboard's chip.

    As I said I have a big USB 3.0 HUB plugged to the only one USB 3.1 port.
    I'm passing through this port / hub in order to have no issues with the USB devices plugged to it.

     

    This morning I found 30mn to test the VM with ACS overide off.
    I saved the devices webpage without ACS overdrive enabled.
    I will share it later as I forgot to start teamviewver so I can't access my system from work (student's shared wifi not allowing me to forward ports etc).


    I started the VM successfully without any error in the log, when ACS overide is disabled, therefore when not passing the USB HUB/port.

    With the actual IOMMU groups I can't passthrough my usb 3.1 hub.

    Possibly I should pass all devices from the group ?
    I think the sata ports are part of it. In a few weeks I will receive a HBA card so possibly not a problem?
    Updating tonight or tomorrow.
     

    Edited by dboris
    Link to comment

    As promised here's the SysDevs. 

    Ideally I would pass the whole group 14 but it also holds the single gigabit input of the server ahaha.

    The VM ran all day, I confirm I got no error in the log.

    Not related, but I had instability on modern warfare, disabling ACS overide didn't helped.

    Server_SysDevs.pdf

    Edited by dboris
    Link to comment

    Here's something to test.  Please disable the use of your AMD GPU in the VM and use VNC.  Then keep the USB controller passed through.  See if that works.  If so, this is a GPU, not a USB controller issue.  The reason this test is a good one to perform is simply because AMD-based GPUs are notorious for causing the exact issues being described here (VM works fine until shutdown/restart).  This is because most AMD GPUs don't support function level resets which are vital for good experience in a VM.

     

    The same could be the case with the USB controller, but that's harder to say as we don't have as much experience with those devices.  In general when it comes to any VFIO / PCI device assignment, some hardware just plain doesn't work well with it.  The VFIO project aims to do the best possible job it can to support generic PCI device assignment to VMs, but there are just some cases where the way the hardware was designed, it just doesn't work correctly and there is little we can do here at LT to resolve these types of issues.

     

    That said, if things were working for you on a previous release that aren't working now, please be sure to include that detail in your posts and be sure to mention which version of Unraid was the last known working version for your setup.

    • Like 2
    Link to comment

    Thanks for the replay Jonp, I will look into this and disable the GPU and test. For me this issue have been here at least 1 year, perhaps longer.

    So it's impossible to say when it was working, BUT definitely it has been working in the past.

    //Peter 

    Link to comment

    That's going to make it really tough to figure out, especially with AMD GPUs as they are notorious for having these problems.  We actually spent a good amount of time at one point communicating directly with some folks at AMD about this and while there was some interest from them at first, it fizzled out quickly.  I just don't think they want to dedicate resources to this to fix it and in some cases, this may not be fixable via a patch.

    Link to comment

    Thanks for the replay Jonp, I will look into this and disable the GPU and test. For me this issue have been here at least 1 year, perhaps longer.

    So it's impossible to say when it was working, BUT definitely it has been working in the past.

    //Peter 

     

     

    Edited by peter_sm
    Link to comment

    I have 2 VM running.

    One VM have an AMD GPU and USB card pass trough and freezing  unraid when shutting down.

    When shutting down the other VM with NVIDIA GPU and NO USB card pass trough it still freezing the unraid server  😞 😞

    And nothing in the log 😞 😞 

     

     

    //Peter

     

     

    Link to comment

    I have done a test to lower qemu settings to 3.0 and have not see any issue yet, I will let the VM's be on for a week or so before I do a new shutdown or reboot. 

    • Like 1
    Link to comment

    My 2 VM:s have been OK since I lower the quemu to 3.0, so far no issues with shutdown (2 W10 and one OSX)

    How did you patch unraid kernel @Leoyzen? and what is your qemu settings for the VM ?

     

    //Peter

    • Like 1
    Link to comment

    Looks stable with qemu 3.0 now I added "pcie_no_flr=1022:149c,1022:1487" to boot parameters and changed qemu to 4.2 on one of my w10 VM .......... so we will see after some day how it works.

    • Like 2
    Link to comment

    Have done several shutdown's / restart's  during last weeks with qemu 4.2 and with the added boot parameters and no issue at all. 

    • Like 1
    Link to comment

    Thanks so much for sharing your tests.

    I've been away for a while.
    Tonight i'm ginna try all your suggestions and will inform you.

     

    • Like 1
    Link to comment

    I'm now on q35 v2.12 and have added "pcie_no_flr=1022:149c,1022:1487". Two shutdowns and starts without any issue.

    I will test later again

    • Like 2
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.