flaggart Posted May 4, 2020 Posted May 4, 2020 (edited) Hi all Over the past few months I have been experiencing complete hard lockup of Unraid and have to power cycle. Each time it happens as a direct result of attempted to reboot the same Windows 10 VM (via the shutdown menu inside the VM, not using web GUI). Syslog as follows: May 4 11:17:41 SERVER kernel: mdcmd (552): spindown 10 May 4 12:25:40 SERVER kernel: mdcmd (553): spindown 0 May 4 13:22:59 SERVER kernel: mdcmd (554): spindown 10 May 4 13:23:00 SERVER kernel: mdcmd (555): spindown 5 May 4 13:52:18 SERVER kernel: mdcmd (556): spindown 9 May 4 16:04:17 SERVER kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: May 4 16:04:17 SERVER kernel: rcu: 24-...0: (1 GPs behind) idle=a12/1/0x4000000000000000 softirq=109480389/109480389 fqs=14466 May 4 16:04:17 SERVER kernel: rcu: (detected by 25, t=60002 jiffies, g=587531517, q=76713) May 4 16:04:17 SERVER kernel: Sending NMI from CPU 25 to CPUs 24: May 4 16:04:17 SERVER kernel: NMI backtrace for cpu 24 May 4 16:04:17 SERVER kernel: CPU: 24 PID: 307 Comm: CPU 1/KVM Tainted: P O 4.19.107-Unraid #1 May 4 16:04:17 SERVER kernel: Hardware name: ASUSTek Computer INC. TS700-E7-RS8/Z9PE-D16 Series, BIOS 5601 06/11/2015 May 4 16:04:17 SERVER kernel: RIP: 0010:qi_submit_sync+0x154/0x2db May 4 16:04:17 SERVER kernel: Code: 30 02 0f 84 40 01 00 00 4d 8b 96 b0 00 00 00 49 8b 42 10 83 3c 30 03 75 0b 41 bc f5 ff ff ff e9 27 01 00 00 49 8b 06 8b 48 34 <f6> c1 10 74 68 49 8b 06 8b 80 80 00 00 00 c1 f8 04 41 39 c3 75 57 May 4 16:04:17 SERVER kernel: RSP: 0018:ffffc90006a8bb50 EFLAGS: 00000093 May 4 16:04:17 SERVER kernel: RAX: ffffc9000001f000 RBX: 0000000000000100 RCX: 0000000000000000 May 4 16:04:17 SERVER kernel: RDX: 0000000000000001 RSI: 000000000000006c RDI: ffff88903f418a00 May 4 16:04:17 SERVER kernel: RBP: ffffc90006a8bba8 R08: 0000000000640000 R09: 0000000000000000 May 4 16:04:17 SERVER kernel: R10: ffff88903f418a00 R11: 000000000000001a R12: 00000000000001b0 May 4 16:04:17 SERVER kernel: R13: ffff88903f418a00 R14: ffff88903f40f400 R15: 0000000000000086 May 4 16:04:17 SERVER kernel: FS: 000014bc64dff700(0000) GS:ffff88a03fa00000(0000) knlGS:0000000000000000 May 4 16:04:17 SERVER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 16:04:17 SERVER kernel: CR2: ffffe4881907d478 CR3: 000000108a6ba005 CR4: 00000000001626e0 May 4 16:04:17 SERVER kernel: Call Trace: May 4 16:04:17 SERVER kernel: modify_irte+0xf0/0x136 May 4 16:04:17 SERVER kernel: intel_irq_remapping_deactivate+0x2d/0x47 May 4 16:04:17 SERVER kernel: __irq_domain_deactivate_irq+0x27/0x33 May 4 16:04:17 SERVER kernel: irq_domain_deactivate_irq+0x15/0x22 May 4 16:04:17 SERVER kernel: __free_irq+0x1d8/0x238 May 4 16:04:17 SERVER kernel: free_irq+0x5d/0x75 May 4 16:04:17 SERVER kernel: vfio_msi_set_vector_signal+0x84/0x231 May 4 16:04:17 SERVER kernel: ? flush_workqueue+0x2bf/0x2e3 May 4 16:04:17 SERVER kernel: vfio_msi_set_block+0x6c/0xac May 4 16:04:17 SERVER kernel: vfio_msi_disable+0x61/0xa0 May 4 16:04:17 SERVER kernel: vfio_pci_set_msi_trigger+0x44/0x230 May 4 16:04:17 SERVER kernel: ? pci_bus_read_config_word+0x44/0x66 May 4 16:04:17 SERVER kernel: vfio_pci_ioctl+0x52d/0x9a2 May 4 16:04:17 SERVER kernel: ? vfio_pci_config_rw+0x209/0x2a6 May 4 16:04:17 SERVER kernel: ? __seccomp_filter+0x39/0x1ed May 4 16:04:17 SERVER kernel: vfs_ioctl+0x19/0x26 May 4 16:04:17 SERVER kernel: do_vfs_ioctl+0x533/0x55d May 4 16:04:17 SERVER kernel: ksys_ioctl+0x37/0x56 May 4 16:04:17 SERVER kernel: __x64_sys_ioctl+0x11/0x14 May 4 16:04:17 SERVER kernel: do_syscall_64+0x57/0xf2 May 4 16:04:17 SERVER kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 4 16:04:17 SERVER kernel: RIP: 0033:0x14bc687a54b7 May 4 16:04:17 SERVER kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48 May 4 16:04:17 SERVER kernel: RSP: 002b:000014bc64dfe2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 May 4 16:04:17 SERVER kernel: RAX: ffffffffffffffda RBX: 000014b84150a200 RCX: 000014bc687a54b7 May 4 16:04:17 SERVER kernel: RDX: 000014bc64dfe2f0 RSI: 0000000000003b6e RDI: 000000000000004b May 4 16:04:17 SERVER kernel: RBP: 000014b84150a200 R08: 000000000000006c R09: 00000000ffffff00 May 4 16:04:17 SERVER kernel: R10: 000014b82cd8406b R11: 0000000000000246 R12: 000000000000006a May 4 16:04:17 SERVER kernel: R13: 0000000000000080 R14: 0000000000000002 R15: 000014b84150a200 The output of the VM at this point is the Windows shutdown sequence saying "Restarting...". If I try to use Virsh to shutdown I get the following, and shortly after a total Unraid lockup: virsh # destroy "Windows 10" error: Failed to destroy domain Windows 10 error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainReset) General info: Unraid 6.8.3 Platform: Asus Z9PE-D16 with 2x Xeon 2667 v2 RAM: 128GB ECC The VM is pinned to the second physical CPU, which is isolated from Unraid. It has a GTX 2070 Super and NVMe drive passed through to it. <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='2'> <name>Windows 10</name> <uuid>39ac96f3-a777-0c5e-419f-596878b407e9</uuid> <description>Win-10</description> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>17301504</memory> <currentMemory unit='KiB'>17301504</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>16</vcpu> <iothreads>2</iothreads> <cputune> <vcpupin vcpu='0' cpuset='8'/> <vcpupin vcpu='1' cpuset='24'/> <vcpupin vcpu='2' cpuset='9'/> <vcpupin vcpu='3' cpuset='25'/> <vcpupin vcpu='4' cpuset='10'/> <vcpupin vcpu='5' cpuset='26'/> <vcpupin vcpu='6' cpuset='11'/> <vcpupin vcpu='7' cpuset='27'/> <vcpupin vcpu='8' cpuset='12'/> <vcpupin vcpu='9' cpuset='28'/> <vcpupin vcpu='10' cpuset='13'/> <vcpupin vcpu='11' cpuset='29'/> <vcpupin vcpu='12' cpuset='14'/> <vcpupin vcpu='13' cpuset='30'/> <vcpupin vcpu='14' cpuset='15'/> <vcpupin vcpu='15' cpuset='31'/> <emulatorpin cpuset='0,16'/> <iothreadpin iothread='1' cpuset='1,17'/> <iothreadpin iothread='2' cpuset='2,18'/> </cputune> <numatune> <memory mode='preferred' nodeset='1'/> </numatune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/39ac96f3-a777-0c5e-419f-596878b407e9_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='1278467890ab'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='8' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <alias name='usb'/> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <alias name='usb'/> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <alias name='usb'/> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:be:8c:35'/> <source bridge='br0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-2-Windows 10/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'> <alias name='input1'/> </input> <input type='keyboard' bus='ps2'> <alias name='input2'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x00' slot='0x1d' function='0x0'/> </source> <alias name='hostdev2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/> </source> <boot order='1'/> <alias name='hostdev3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x2'/> </source> <alias name='hostdev4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x3'/> </source> <alias name='hostdev5'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain> Can anyone advise what the issue might be, or at least on a way to deal with this without it taking down the entire system? Thanks Edited May 6, 2020 by flaggart Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.