josetann Posted September 27, 2017 Share Posted September 27, 2017 (edited) Running unRAID 6.3.5. I have a Windows 10 VM that works pretty well (few small issues, but nothing major). For some reason, if it's been up for a while, shutting down or rebooting the Windows 10 VM will cause it to hang, the network goes crazy (router looks like a Christmas tree, I lose internet connection, I have to unplug the network cable to the unRAID server to get the router working again), and the server becomes more and more inaccessible (webgui is the first to go, eventually the entire thing will lockup). I've tried issuing a reboot and a shutdown from the command line, it won't actually do so. I end up having to do a hard poweroff, not good. It will not exhibit the behaviour shortly after booting. I.e. I can sit here and reboot it over and over and it won't have an issue. It has to be online for an indeterminate amount of time, which makes troubleshooting a bit difficult. I believe the issue stems from the GT 1030 I have passed through to the Windows 10 guest. I cannot be for certain. The other VM uses a virtual graphics card, absolutely no problem when I shut it down or reboot. Here's what you've all been waiting for, the end of the syslog. You can ignore the first four lines regarding mover, this was just to show you that nothing important happened before the 14:36:22 mark. Also, the first two lines regarding usb resetting is normal (when it properly reboots, those messages are repeated multiple times). Sep 26 03:40:01 Tower root: mover started Sep 26 03:40:01 Tower root: mover finished Sep 27 03:40:01 Tower root: mover started Sep 27 03:40:01 Tower root: mover finished Sep 27 14:36:22 Tower kernel: usb 3-2.2: reset full-speed USB device number 3 using xhci_hcd Sep 27 14:36:22 Tower kernel: usb 1-1.5: reset full-speed USB device number 4 using ehci-pci Sep 27 14:37:23 Tower kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: Sep 27 14:37:23 Tower kernel: 5-...: (1 GPs behind) idle=f91/140000000000000/0 softirq=3578633/3578634 fqs=14959 Sep 27 14:37:23 Tower kernel: (detected by 12, t=60002 jiffies, g=5962201, c=5962200, q=10759) Sep 27 14:37:23 Tower kernel: Task dump for CPU 5: Sep 27 14:37:23 Tower kernel: qemu-system-x86 R running task 0 9930 1 0x00000008 Sep 27 14:37:23 Tower kernel: ffff881fa2d20cc0 ffff881fdf157b00 ffff881fd2753fc0 ffff880f97a10000 Sep 27 14:37:23 Tower kernel: 0000000000000000 ffffc9000d08fb88 ffffffff8167c00e 0000000000000002 Sep 27 14:37:23 Tower kernel: ffff881fa2d20cc0 7fffffffffffffff ffff881fa2d20cc0 ffffc9000d08fd20 Sep 27 14:37:23 Tower kernel: Call Trace: Sep 27 14:37:23 Tower kernel: [<ffffffff8167c00e>] ? __schedule+0x2b1/0x46a Sep 27 14:37:23 Tower kernel: [<ffffffff8167c24b>] schedule+0x84/0x95 Sep 27 14:37:23 Tower kernel: [<ffffffff8147872c>] ? qi_submit_sync+0x2b2/0x2d0 Sep 27 14:37:23 Tower kernel: [<ffffffff8147f255>] ? modify_irte+0xd9/0x10f Sep 27 14:37:23 Tower kernel: [<ffffffff8147f2af>] ? intel_irq_remapping_deactivate+0x24/0x26 Sep 27 14:37:23 Tower kernel: [<ffffffff81087f79>] ? __irq_domain_deactivate_irq+0x28/0x39 Sep 27 14:37:23 Tower kernel: [<ffffffff81087f87>] ? __irq_domain_deactivate_irq+0x36/0x39 Sep 27 14:37:23 Tower kernel: [<ffffffff810891d2>] ? irq_domain_deactivate_irq+0x18/0x25 Sep 27 14:37:23 Tower kernel: [<ffffffff81086dc8>] ? irq_shutdown+0x4f/0x5c Sep 27 14:37:23 Tower kernel: [<ffffffff81084b7a>] ? __free_irq+0x10d/0x20a Sep 27 14:37:23 Tower kernel: [<ffffffff81084d23>] ? free_irq+0x69/0x78 Sep 27 14:37:23 Tower kernel: [<ffffffff814ed9f6>] ? vfio_intx_set_signal+0x32/0x190 Sep 27 14:37:23 Tower kernel: [<ffffffff814ee135>] ? vfio_intx_disable+0x33/0x56 Sep 27 14:37:23 Tower kernel: [<ffffffff814ee17d>] ? vfio_pci_set_intx_trigger+0x25/0x141 Sep 27 14:37:23 Tower kernel: [<ffffffff814ee640>] ? vfio_pci_set_irqs_ioctl+0x87/0xa4 Sep 27 14:37:23 Tower kernel: [<ffffffff814ecc42>] ? vfio_pci_ioctl+0x5d1/0x9d5 Sep 27 14:37:23 Tower kernel: [<ffffffff81069ae7>] ? wake_up_q+0x51/0x51 Sep 27 14:37:23 Tower kernel: [<ffffffff814e8c8c>] ? vfio_device_fops_unl_ioctl+0x1e/0x28 Sep 27 14:37:23 Tower kernel: [<ffffffff81130112>] ? vfs_ioctl+0x13/0x2f Sep 27 14:37:23 Tower kernel: [<ffffffff81130642>] ? do_vfs_ioctl+0x49c/0x50a Sep 27 14:37:23 Tower kernel: [<ffffffff8113921f>] ? __fget+0x72/0x7e Sep 27 14:37:23 Tower kernel: [<ffffffff811306ee>] ? SyS_ioctl+0x3e/0x5c Sep 27 14:37:23 Tower kernel: [<ffffffff8167f537>] ? entry_SYSCALL_64_fastpath+0x1a/0xa9 Edited September 27, 2017 by josetann Quote Link to comment
josetann Posted October 2, 2017 Author Share Posted October 2, 2017 Ok, so the troubleshooting continued. Took a while since I had to let the machine sit for so long between reboots/shutdowns. Looks like enabling MSI for the graphics card (both for the graphics device AND the sound device...still crashed with MSI enabled for just the graphics device) may have fixed the issue. Still have the occasional Kodi crash (have app that monitors it and restarts if necessary, crashes are rare enough to not be a real bother) and some sound issues (start to get weird "popping" that's not fixed by a VM powerdown or reboot, but only by unplugging/replugging the hdmi cable), but it's working for the most part. Hope that passing through a dedicated usb port and using an amplifier with usb input will fix that. I'll update if I notice it crashing again. So far it worked after leaving it overnight (but with Hyper-V disabled, that was one of the things I was testing), one reboot after 36 hours with just MSI enabled (left Hyper-V enabled), and one poweroff/restart after 36 hours with MSI enabled. Do note that I did get a crash with Hyper-V off and MSI enabled for the graphics device but MSI off for the sound device (same graphics card, GT 1030). Quote Link to comment
xccrev Posted October 6, 2017 Share Posted October 6, 2017 Are you passing any other pcie devices through? Like a USB card? I had a very similar experience to yours. Happens on reboot, shutdown; takes down the router and network. Vm and unraid eventually go unresponsive.My USB pcie card wasn't playing nicely when it got the signal to shut down. Took the USB card out it went away.Sent from my iPad using Tapatalk Quote Link to comment
josetann Posted October 6, 2017 Author Share Posted October 6, 2017 Was the usb card in MSI mode? No, I didn't have any other PCIe devices passed through, just the graphics card. Enabling MSI mode seems to have fixed it, I just performed a poweroff/poweron for the Windows VM after 2.5 days of uptime, no issues. In fact, the last time I powered it down was to add a USB card (in particular, the Sonnet Allegro Pro USB 3.0 PCIe card). Passed through one of the USB ports (each has its own controller). Working flawlessly so far. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.