Crash when shutting down VM


josetann

Recommended Posts

Running unRAID 6.3.5. I have a Windows 10 VM that works pretty well (few small issues, but nothing major). For some reason, if it's been up for a while, shutting down or rebooting the Windows 10 VM will cause it to hang, the network goes crazy (router looks like a Christmas tree, I lose internet connection, I have to unplug the network cable to the unRAID server to get the router working again), and the server becomes more and more inaccessible (webgui is the first to go, eventually the entire thing will lockup). I've tried issuing a reboot and a shutdown from the command line, it won't actually do so. I end up having to do a hard poweroff, not good.

 

It will not exhibit the behaviour shortly after booting. I.e. I can sit here and reboot it over and over and it won't have an issue. It has to be online for an indeterminate amount of time, which makes troubleshooting a bit difficult.

 

I believe the issue stems from the GT 1030 I have passed through to the Windows 10 guest. I cannot be for certain. The other VM uses a virtual graphics card, absolutely no problem when I shut it down or reboot.

 

Here's what you've all been waiting for, the end of the syslog. You can ignore the first four lines regarding mover, this was just to show you that nothing important happened before the 14:36:22 mark. Also, the first two lines regarding usb resetting is normal (when it properly reboots, those messages are repeated multiple times).

 

Sep 26 03:40:01 Tower root: mover started
Sep 26 03:40:01 Tower root: mover finished
Sep 27 03:40:01 Tower root: mover started
Sep 27 03:40:01 Tower root: mover finished
Sep 27 14:36:22 Tower kernel: usb 3-2.2: reset full-speed USB device number 3 using xhci_hcd
Sep 27 14:36:22 Tower kernel: usb 1-1.5: reset full-speed USB device number 4 using ehci-pci
Sep 27 14:37:23 Tower kernel: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 27 14:37:23 Tower kernel: 	5-...: (1 GPs behind) idle=f91/140000000000000/0 softirq=3578633/3578634 fqs=14959 
Sep 27 14:37:23 Tower kernel: 	(detected by 12, t=60002 jiffies, g=5962201, c=5962200, q=10759)
Sep 27 14:37:23 Tower kernel: Task dump for CPU 5:
Sep 27 14:37:23 Tower kernel: qemu-system-x86 R  running task        0  9930      1 0x00000008
Sep 27 14:37:23 Tower kernel: ffff881fa2d20cc0 ffff881fdf157b00 ffff881fd2753fc0 ffff880f97a10000
Sep 27 14:37:23 Tower kernel: 0000000000000000 ffffc9000d08fb88 ffffffff8167c00e 0000000000000002
Sep 27 14:37:23 Tower kernel: ffff881fa2d20cc0 7fffffffffffffff ffff881fa2d20cc0 ffffc9000d08fd20
Sep 27 14:37:23 Tower kernel: Call Trace:
Sep 27 14:37:23 Tower kernel: [<ffffffff8167c00e>] ? __schedule+0x2b1/0x46a
Sep 27 14:37:23 Tower kernel: [<ffffffff8167c24b>] schedule+0x84/0x95
Sep 27 14:37:23 Tower kernel: [<ffffffff8147872c>] ? qi_submit_sync+0x2b2/0x2d0
Sep 27 14:37:23 Tower kernel: [<ffffffff8147f255>] ? modify_irte+0xd9/0x10f
Sep 27 14:37:23 Tower kernel: [<ffffffff8147f2af>] ? intel_irq_remapping_deactivate+0x24/0x26
Sep 27 14:37:23 Tower kernel: [<ffffffff81087f79>] ? __irq_domain_deactivate_irq+0x28/0x39
Sep 27 14:37:23 Tower kernel: [<ffffffff81087f87>] ? __irq_domain_deactivate_irq+0x36/0x39
Sep 27 14:37:23 Tower kernel: [<ffffffff810891d2>] ? irq_domain_deactivate_irq+0x18/0x25
Sep 27 14:37:23 Tower kernel: [<ffffffff81086dc8>] ? irq_shutdown+0x4f/0x5c
Sep 27 14:37:23 Tower kernel: [<ffffffff81084b7a>] ? __free_irq+0x10d/0x20a
Sep 27 14:37:23 Tower kernel: [<ffffffff81084d23>] ? free_irq+0x69/0x78
Sep 27 14:37:23 Tower kernel: [<ffffffff814ed9f6>] ? vfio_intx_set_signal+0x32/0x190
Sep 27 14:37:23 Tower kernel: [<ffffffff814ee135>] ? vfio_intx_disable+0x33/0x56
Sep 27 14:37:23 Tower kernel: [<ffffffff814ee17d>] ? vfio_pci_set_intx_trigger+0x25/0x141
Sep 27 14:37:23 Tower kernel: [<ffffffff814ee640>] ? vfio_pci_set_irqs_ioctl+0x87/0xa4
Sep 27 14:37:23 Tower kernel: [<ffffffff814ecc42>] ? vfio_pci_ioctl+0x5d1/0x9d5
Sep 27 14:37:23 Tower kernel: [<ffffffff81069ae7>] ? wake_up_q+0x51/0x51
Sep 27 14:37:23 Tower kernel: [<ffffffff814e8c8c>] ? vfio_device_fops_unl_ioctl+0x1e/0x28
Sep 27 14:37:23 Tower kernel: [<ffffffff81130112>] ? vfs_ioctl+0x13/0x2f
Sep 27 14:37:23 Tower kernel: [<ffffffff81130642>] ? do_vfs_ioctl+0x49c/0x50a
Sep 27 14:37:23 Tower kernel: [<ffffffff8113921f>] ? __fget+0x72/0x7e
Sep 27 14:37:23 Tower kernel: [<ffffffff811306ee>] ? SyS_ioctl+0x3e/0x5c
Sep 27 14:37:23 Tower kernel: [<ffffffff8167f537>] ? entry_SYSCALL_64_fastpath+0x1a/0xa9

 

Edited by josetann
Link to comment

Ok, so the troubleshooting continued. Took a while since I had to let the machine sit for so long between reboots/shutdowns.

 

Looks like enabling MSI for the graphics card (both for the graphics device AND the sound device...still crashed with MSI enabled for just the graphics device) may have fixed the issue. Still have the occasional Kodi crash (have app that monitors it and restarts if necessary, crashes are rare enough to not be a real bother) and some sound issues (start to get weird "popping" that's not fixed by a VM powerdown or reboot, but only by unplugging/replugging the hdmi cable), but it's working for the most part. Hope that passing through a dedicated usb port and using an amplifier with usb input will fix that.

 

I'll update if I notice it crashing again. So far it worked after leaving it overnight (but with Hyper-V disabled, that was one of the things I was testing), one reboot after 36 hours with just MSI enabled (left Hyper-V enabled), and one poweroff/restart after 36 hours with MSI enabled. Do note that I did get a crash with Hyper-V off and MSI enabled for the graphics device but MSI off for the sound device (same graphics card, GT 1030).

Link to comment

Are you passing any other pcie devices through? Like a USB card? I had a very similar experience to yours. Happens on reboot, shutdown; takes down the router and network. Vm and unraid eventually go unresponsive.My USB pcie card wasn't playing nicely when it got the signal to shut down. Took the USB card out it went away.


Sent from my iPad using Tapatalk

Link to comment

Was the usb card in MSI mode?

 

No, I didn't have any other PCIe devices passed through, just the graphics card. Enabling MSI mode seems to have fixed it, I just performed a poweroff/poweron for the Windows VM after 2.5 days of uptime, no issues.

 

In fact, the last time I powered it down was to add a USB card (in particular, the Sonnet Allegro Pro USB 3.0 PCIe card). Passed through one of the USB ports (each has its own controller). Working flawlessly so far.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.