Call trace after VM shutdown/reboot


Recommended Posts

As stated, this happens and then I can not connect to my server at all. Only thing I can do is reboot (unclean) and try again. Seems to be happening only when I shut down or reboot the VM. 

 

Quote

Nov 12 15:07:46 Gaming php-fpm[17706]: [WARNING] [pool www] server reached max_children setting (20), consider raising it
Nov 12 15:07:55 Gaming kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Nov 12 15:07:55 Gaming kernel:     27-...0: (1 ticks this GP) idle=c32/0/1 softirq=1547981/1547981 fqs=14983 
Nov 12 15:07:55 Gaming kernel:     35-...0: (1 GPs behind) idle=9fa/1/4611686018427387904 softirq=1679893/1679896 fqs=14983 
Nov 12 15:07:55 Gaming kernel:     (detected by 11, t=60002 jiffies, g=869761, c=869760, q=77420)
Nov 12 15:07:55 Gaming kernel: Sending NMI from CPU 11 to CPUs 27:
Nov 12 15:07:55 Gaming kernel: NMI backtrace for cpu 27
Nov 12 15:07:55 Gaming kernel: CPU: 27 PID: 0 Comm: swapper/27 Tainted: G           O      4.18.17-unRAID #1
Nov 12 15:07:55 Gaming kernel: Hardware name: Cirrascale VB1416/GA-7PESH2, BIOS R17 06/26/2018
Nov 12 15:07:55 Gaming kernel: RIP: 0010:native_queued_spin_lock_slowpath+0xfd/0x16d
Nov 12 15:07:55 Gaming kernel: Code: e1 eb 7b 31 c9 eb 36 c1 e9 12 83 e0 03 ff c9 48 c1 e0 04 48 63 c9 48 05 c0 17 02 00 48 03 04 cd 00 17 da 81 48 89 10 8b 42 08 <85> c0 75 04 f3 90 eb f5 48 8b 0a 48 85 c9 74 c9 0f 18 09 8b 07 66 
Nov 12 15:07:55 Gaming kernel: RSP: 0018:ffff88046fc43dc0 EFLAGS: 00000046
Nov 12 15:07:55 Gaming kernel: RAX: 0000000000000001 RBX: 0000000000000100 RCX: 0000000000000023
Nov 12 15:07:55 Gaming kernel: RDX: ffff88046fc617c0 RSI: 0000000000700000 RDI: ffff88046f421dc0
Nov 12 15:07:55 Gaming kernel: RBP: ffff88046fc43e28 R08: 0000000000900000 R09: ffff88046f421dc0
Nov 12 15:07:55 Gaming kernel: R10: 0000000000000214 R11: 0000000000000084 R12: 0000000000000850
Nov 12 15:07:55 Gaming kernel: R13: ffff88046f421dc0 R14: 0000000000000084 R15: ffff88046f427400
Nov 12 15:07:55 Gaming kernel: FS:  0000000000000000(0000) GS:ffff88046fc40000(0000) knlGS:0000000000000000
Nov 12 15:07:55 Gaming kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 12 15:07:55 Gaming kernel: CR2: fffff804ac8bf070 CR3: 0000000004e0a004 CR4: 00000000001626e0
Nov 12 15:07:55 Gaming kernel: Call Trace:
Nov 12 15:07:55 Gaming kernel: <IRQ>
Nov 12 15:07:55 Gaming kernel: _raw_spin_lock+0x16/0x19
Nov 12 15:07:55 Gaming kernel: qi_submit_sync+0x265/0x2db
Nov 12 15:07:55 Gaming kernel: qi_flush_iotlb+0x66/0x80
Nov 12 15:07:55 Gaming kernel: iommu_flush_iova+0x5c/0xa7
Nov 12 15:07:55 Gaming kernel: iova_domain_flush+0x18/0x22
Nov 12 15:07:55 Gaming kernel: fq_flush_timeout+0x2e/0x90
Nov 12 15:07:55 Gaming kernel: call_timer_fn+0x12/0x6f
Nov 12 15:07:55 Gaming kernel: ? fq_ring_free+0x96/0x96
Nov 12 15:07:55 Gaming kernel: expire_timers+0x7f/0x8e
Nov 12 15:07:55 Gaming kernel: run_timer_softirq+0x72/0x120
Nov 12 15:07:55 Gaming kernel: ? __hrtimer_run_queues+0xbd/0x105
Nov 12 15:07:55 Gaming kernel: ? recalibrate_cpu_khz+0x1/0x1
Nov 12 15:07:55 Gaming kernel: ? ktime_get+0x3a/0x8d
Nov 12 15:07:55 Gaming kernel: __do_softirq+0xce/0x1c8
Nov 12 15:07:55 Gaming kernel: irq_exit+0x56/0x95
Nov 12 15:07:55 Gaming kernel: smp_apic_timer_interrupt+0x7e/0x89
Nov 12 15:07:55 Gaming kernel: apic_timer_interrupt+0xf/0x20
Nov 12 15:07:55 Gaming kernel: </IRQ>
Nov 12 15:07:55 Gaming kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141
Nov 12 15:07:55 Gaming kernel: Code: ff 45 84 ff 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 e2 b7 be ff fb 66 0f 1f 44 00 00 <48> 2b 1c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cb 
Nov 12 15:07:55 Gaming kernel: RSP: 0018:ffffc90003353ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Nov 12 15:07:55 Gaming kernel: RAX: ffff88046fc60c00 RBX: 000008bfdf1b91be RCX: 000000000000001f
Nov 12 15:07:55 Gaming kernel: RDX: 000008bfdf1b91be RSI: 0000000000000000 RDI: 0000000000000000
Nov 12 15:07:55 Gaming kernel: RBP: ffff88046fc69370 R08: 00001a75706d8f8a R09: 00000000000002e9
Nov 12 15:07:55 Gaming kernel: R10: 00000000001feeac R11: 071c71c71c71c71c R12: 0000000000000004
Nov 12 15:07:55 Gaming kernel: R13: 0000000000000004 R14: ffffffff81e588d8 R15: 0000000000000000
Nov 12 15:07:55 Gaming kernel: do_idle+0x192/0x20e
Nov 12 15:07:55 Gaming kernel: cpu_startup_entry+0x6a/0x6c
Nov 12 15:07:55 Gaming kernel: start_secondary+0x197/0x1b2
Nov 12 15:07:55 Gaming kernel: secondary_startup_64+0xa5/0xb0
Nov 12 15:07:55 Gaming kernel: Sending NMI from CPU 11 to CPUs 35:
Nov 12 15:07:55 Gaming kernel: NMI backtrace for cpu 35
Nov 12 15:07:55 Gaming kernel: CPU: 35 PID: 36818 Comm: CPU 15/KVM Tainted: G           O      4.18.17-unRAID #1
Nov 12 15:07:55 Gaming kernel: Hardware name: Cirrascale VB1416/GA-7PESH2, BIOS R17 06/26/2018
Nov 12 15:07:55 Gaming kernel: RIP: 0010:_raw_spin_lock+0xb/0x19
Nov 12 15:07:55 Gaming kernel: Code: ff 48 29 e8 48 3d 24 f4 00 00 77 aa b8 c9 00 00 00 eb cb 89 d8 5b 5d c3 90 90 90 90 90 90 90 31 c0 ba 01 00 00 00 f0 0f b1 17 <85> c0 74 09 89 c6 e8 fc 9a a4 ff 66 90 c3 fa 66 0f 1f 44 00 00 31 
Nov 12 15:07:55 Gaming kernel: RSP: 0018:ffffc900042c3b48 EFLAGS: 00000087
Nov 12 15:07:55 Gaming kernel: RAX: 0000000000600000 RBX: 0000000000000100 RCX: ffff88046fc617c0
Nov 12 15:07:55 Gaming kernel: RDX: 0000000000000001 RSI: 0000000000900000 RDI: ffff88046f421dc0
Nov 12 15:07:55 Gaming kernel: RBP: ffffc900042c3ba8 R08: 00000000006c0000 R09: ffff88046f421dc0
Nov 12 15:07:55 Gaming kernel: R10: 000000000000020c R11: 0000000000000082 R12: 0000000000000830
Nov 12 15:07:55 Gaming kernel: R13: ffff88046f421dc0 R14: 0000000000000082 R15: ffff88046f427400
Nov 12 15:07:55 Gaming kernel: FS:  00001495e7ee3700(0000) GS:ffff88087fbc0000(0000) knlGS:0000000000000000
Nov 12 15:07:55 Gaming kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 12 15:07:55 Gaming kernel: CR2: ffffb38e68cfcc08 CR3: 0000000412c4a002 CR4: 00000000001626e0
Nov 12 15:07:55 Gaming kernel: Call Trace:
Nov 12 15:07:55 Gaming kernel: qi_submit_sync+0x265/0x2db
Nov 12 15:07:55 Gaming kernel: modify_irte+0xe3/0x129
Nov 12 15:07:55 Gaming kernel: intel_irq_remapping_deactivate+0x2d/0x47
Nov 12 15:07:55 Gaming kernel: __irq_domain_deactivate_irq+0x27/0x33
Nov 12 15:07:55 Gaming kernel: irq_domain_deactivate_irq+0x15/0x22
Nov 12 15:07:55 Gaming kernel: __free_irq+0x10a/0x216
Nov 12 15:07:55 Gaming kernel: free_irq+0x42/0x59
Nov 12 15:07:55 Gaming kernel: vfio_msi_set_vector_signal+0x72/0x233
Nov 12 15:07:55 Gaming kernel: ? kvm_fast_pio+0x10a/0x147 [kvm]
Nov 12 15:07:55 Gaming kernel: vfio_msi_set_block+0x64/0x96
Nov 12 15:07:55 Gaming kernel: vfio_msi_disable+0x61/0xa0
Nov 12 15:07:55 Gaming kernel: vfio_pci_set_msi_trigger+0x44/0x228
Nov 12 15:07:55 Gaming kernel: ? pci_bus_read_config_word+0x44/0x66
Nov 12 15:07:55 Gaming kernel: vfio_pci_ioctl+0x4e1/0x974
Nov 12 15:07:55 Gaming kernel: ? vfio_msi_config_write+0x7b/0x89
Nov 12 15:07:55 Gaming kernel: ? __seccomp_filter+0x39/0x1ed
Nov 12 15:07:55 Gaming kernel: vfs_ioctl+0x19/0x26
Nov 12 15:07:55 Gaming kernel: do_vfs_ioctl+0x518/0x540
Nov 12 15:07:55 Gaming kernel: ksys_ioctl+0x39/0x58
Nov 12 15:07:55 Gaming kernel: __x64_sys_ioctl+0x11/0x14
Nov 12 15:07:55 Gaming kernel: do_syscall_64+0x57/0xe6
Nov 12 15:07:55 Gaming kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 12 15:07:55 Gaming kernel: RIP: 0033:0x1495f20e0427
Nov 12 15:07:55 Gaming kernel: Code: 00 00 90 48 8b 05 69 0a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 0a 0d 00 f7 d8 64 89 01 48 
Nov 12 15:07:55 Gaming kernel: RSP: 002b:00001495e7ee0be8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 12 15:07:55 Gaming kernel: RAX: ffffffffffffffda RBX: 000014936573ec00 RCX: 00001495f20e0427
Nov 12 15:07:55 Gaming kernel: RDX: 00001495e7ee0bf0 RSI: 0000000000003b6e RDI: 000000000000003e
Nov 12 15:07:55 Gaming kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000006c
Nov 12 15:07:55 Gaming kernel: R10: 000000000000006c R11: 0000000000000246 R12: 0000000000000002
Nov 12 15:07:55 Gaming kernel: R13: 000000000000006a R14: 0000000000000080 R15: 0000000000000002

Quote

 

M/B: GIGABYTE - GA-7PESH2

CPU: Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz

HVM: Enabled

IOMMU: Enabled

Cache: 320 kB, 2560 kB, 25600 kB

Memory: 32 GB Single-bit ECC (max. installable capacity 768 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500 
 eth0: 1000 Mb/s, full duplex, mtu 1500 
 eth1: 1000 Mb/s, full duplex, mtu 1500

Kernel: Linux 4.18.17-unRAID x86_64

OpenSSL: 1.1.1

 

Using a GTX 1080 as the GPU

If any more information is needed please let me know.
 

Link to comment

So today after 6 days of having no problems restarting the VM I had to after a Nvidia gamestream issue. Restarted IN the VM and all the sudden the server just took a nose dive. Couldn't connect at all, completely locked up. Did a unclean reboot and everything is working fine again. Is there anything else I should be doing here with the W10 VM and nvidia gtx 1080? I don't want to worry about having to reboot the VM causing a hard crash.

Link to comment

After reviewing your logs and hardware configuration, if I was a betting man, this is the result of using a Gigabyte motherboard.  I'm sorry to say but I've had nothing but problems both myself and with other users trying to use that garbage.  For whatever reason, Gigabyte hardware (motherboards, GPUs, etc.) all seem to have issues that mainline providers like Asus, ASRock, and Supermicro, just don't seem to have.  All I can suggest is that you check for a BIOS update, but if I were a betting man, I'd bet heavily on switching motherboards resolving the issue.  That or you'll have to wait for the 4.20 kernel to see if there are any fixes in there specific to your hardware/drivers, but I would doubt that is going to solve your issue.

 

If there was something more specific in the logs / call trace, I would gladly investigate, but the generic messages and lack of any smoking gun leads me to look at the hardware, and without having seen it yet, I was already betting that either the GPU or the motherboard were Gigabyte-based, and it appears at least the motherboard is guilty of that.

 

If you can reproduce the issue using a different motherboard, then I think we'd have something to look at, but considering how many people here are using Xeon CPUs and NVIDIA GPUs with VMs and no issues, I doubt that's going to be the case.

Link to comment

That's very unfortunate to hear. I do have an Asus MOBO and can use the Gigabyte as a backup only, just to make sure I could use the same USB on another MOBO, correct? Any other changes I should/need to make other than the obvious?


Edit: wanted to include that I'm also passing through a USB 3.0 hub as well with the board. I just read elsewhere that others had issues when the hub was included in the passthrough. Might be worth investigating.

Edited by slimshizn
Link to comment

I wanted to reply here in case anyone was following or reading this. As a work around for now, I was able to log out of the VM completely instead of selecting reboot or shutdown, and THEN shut down. Doing this allowed me to not run into a server lock up/crash. I am still troubleshooting absolutely everything I can before I use this MB for file serving and non-pass through VM's. 

Edit: after removing VFIO allow interrupts, rebooting server, starting back up and starting the server back up and shutting down the VM in the same fashion, I had another crash. I'm enabling VFIO allow interrupts again to see if that was the culprit in this work around or just a coincidence. 

Edit 2: Seems to have been the USB controller I had passed through. Took that out of the equation and had a single error which I'm still working out. 

 

Edited by slimshizn
Link to comment
  • 1 year later...
On 12/5/2018 at 5:31 PM, slimshizn said:

I wanted to reply here in case anyone was following or reading this. As a work around for now, I was able to log out of the VM completely instead of selecting reboot or shutdown, and THEN shut down. Doing this allowed me to not run into a server lock up/crash. I am still troubleshooting absolutely everything I can before I use this MB for file serving and non-pass through VM's. 

Edit: after removing VFIO allow interrupts, rebooting server, starting back up and starting the server back up and shutting down the VM in the same fashion, I had another crash. I'm enabling VFIO allow interrupts again to see if that was the culprit in this work around or just a coincidence. 

Edit 2: Seems to have been the USB controller I had passed through. Took that out of the equation and had a single error which I'm still working out. 

I have the same mobo and am having the same issues. what usb controller were you passing through?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.