sudden call traces -- KVM related?


Recommended Posts

started getting call traces, which seem to occur when previously functional VMs are running

 

Feb 28 13:58:53 OCHO kernel: CPU: 33 PID: 17724 Comm: unraidd Not tainted 4.18.20-unRAID #1
Feb 28 13:58:53 OCHO kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Feb 28 13:58:53 OCHO kernel: Call Trace:
Feb 28 13:58:53 OCHO kernel: <IRQ>
Feb 28 13:58:53 OCHO kernel: dump_stack+0x5d/0x79
Feb 28 13:58:53 OCHO kernel: nmi_cpu_backtrace+0x71/0x83
Feb 28 13:58:53 OCHO kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Feb 28 13:58:53 OCHO kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7
Feb 28 13:58:53 OCHO kernel: rcu_dump_cpu_stacks+0x91/0xbb
Feb 28 13:58:53 OCHO kernel: rcu_check_callbacks+0x23f/0x5ca
Feb 28 13:58:53 OCHO kernel: ? tick_sched_handle.isra.5+0x2f/0x2f
Feb 28 13:58:53 OCHO kernel: update_process_times+0x23/0x45
Feb 28 13:58:53 OCHO kernel: tick_sched_timer+0x36/0x64
Feb 28 13:58:53 OCHO kernel: __hrtimer_run_queues+0xb1/0x105
Feb 28 13:58:53 OCHO kernel: hrtimer_interrupt+0xf4/0x20d
Feb 28 13:58:53 OCHO kernel: smp_apic_timer_interrupt+0x79/0x91
Feb 28 13:58:53 OCHO kernel: apic_timer_interrupt+0xf/0x20
Feb 28 13:58:53 OCHO kernel: </IRQ>
Feb 28 13:58:53 OCHO kernel: RIP: 0010:xor_avx_5+0x1c5/0x352
Feb 28 13:58:53 OCHO kernel: Code: c5 fd 7f 98 e0 00 00 00 c4 c1 7d 6f 82 00 01 00 00 c4 c1 7c 57 83 00 01 00 00 c5 fc 57 83 00 01 00 00 c5 fc 57 85 00 01 00 00 <c5> fc 57 80 00 01 00 00 c5 fd 7f 80 00 01 00 00 c4 c1 7d 6f 8a 20 
Feb 28 13:58:53 OCHO kernel: RSP: 0018:ffffc9000b6cfc68 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
Feb 28 13:58:53 OCHO kernel: RAX: ffff880e118ada00 RBX: ffff880e118a5a00 RCX: ffff880e118a5000
Feb 28 13:58:53 OCHO kernel: RDX: 0000000000000000 RSI: ffff880e118ad000 RDI: 0000000000001000
Feb 28 13:58:53 OCHO kernel: RBP: ffff880e118a4a00 R08: ffff880e118a6000 R09: ffff880e118a7000
Feb 28 13:58:53 OCHO kernel: R10: ffff880e118a7a00 R11: ffff880e118a6a00 R12: 0000000000000a00
Feb 28 13:58:53 OCHO kernel: R13: ffff880e118ad000 R14: ffff880e118a4000 R15: ffff880e118a5000
Feb 28 13:58:53 OCHO kernel: ? xor_avx_5+0x2d/0x352
Feb 28 13:58:53 OCHO kernel: check_parity+0x118/0x349 [md_mod]
Feb 28 13:58:53 OCHO kernel: handle_stripe+0xe8a/0x1226 [md_mod]
Feb 28 13:58:53 OCHO kernel: unraidd+0xbc/0x123 [md_mod]
Feb 28 13:58:53 OCHO kernel: ? md_open+0x2c/0x2c [md_mod]
Feb 28 13:58:53 OCHO kernel: md_thread+0xcc/0xf1 [md_mod]
Feb 28 13:58:53 OCHO kernel: ? wait_woken+0x68/0x68
Feb 28 13:58:53 OCHO kernel: kthread+0x10b/0x113
Feb 28 13:58:53 OCHO kernel: ? kthread_flush_work_fn+0x9/0x9
Feb 28 13:58:53 OCHO kernel: ret_from_fork+0x35/0x40
Feb 28 14:01:53 OCHO kernel: INFO: rcu_sched self-detected stall on CPU
Feb 28 14:01:53 OCHO kernel: 	33-....: (1140023 ticks this GP) idle=92a/1/4611686018427387906 softirq=714962/714976 fqs=282448 
Feb 28 14:01:53 OCHO kernel: 	 (t=1140024 jiffies g=275169 c=275168 q=9034795)
Feb 28 14:01:53 OCHO kernel: NMI backtrace for cpu 33
Feb 28 14:01:53 OCHO kernel: CPU: 33 PID: 17724 Comm: unraidd Not tainted 4.18.20-unRAID #1
Feb 28 14:01:53 OCHO kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Feb 28 14:01:53 OCHO kernel: Call Trace:
Feb 28 14:01:53 OCHO kernel: <IRQ>
Feb 28 14:01:53 OCHO kernel: dump_stack+0x5d/0x79
Feb 28 14:01:53 OCHO kernel: nmi_cpu_backtrace+0x71/0x83
Feb 28 14:01:53 OCHO kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Feb 28 14:01:53 OCHO kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7
Feb 28 14:01:53 OCHO kernel: rcu_dump_cpu_stacks+0x91/0xbb
Feb 28 14:01:53 OCHO kernel: rcu_check_callbacks+0x23f/0x5ca
Feb 28 14:01:53 OCHO kernel: ? tick_sched_handle.isra.5+0x2f/0x2f
Feb 28 14:01:53 OCHO kernel: update_process_times+0x23/0x45
Feb 28 14:01:53 OCHO kernel: tick_sched_timer+0x36/0x64
Feb 28 14:01:53 OCHO kernel: __hrtimer_run_queues+0xb1/0x105
Feb 28 14:01:53 OCHO kernel: hrtimer_interrupt+0xf4/0x20d
Feb 28 14:01:53 OCHO kernel: smp_apic_timer_interrupt+0x79/0x91
Feb 28 14:01:53 OCHO kernel: apic_timer_interrupt+0xf/0x20
Feb 28 14:01:53 OCHO kernel: </IRQ>
Feb 28 14:01:53 OCHO kernel: RIP: 0010:unraidd+0xb1/0x123 [md_mod]
Feb 28 14:01:53 OCHO kernel: Code: 48 89 12 48 89 52 08 f0 80 62 20 fe f0 ff 42 28 8b 42 28 ff c8 74 02 0f 0b 48 89 df c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <4c> 89 ff 41 ff c5 e8 1e ed ff ff 4c 89 ff e8 69 e1 ff ff 48 89 df 
Feb 28 14:01:53 OCHO kernel: RSP: 0018:ffffc9000b6cfe50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Feb 28 14:01:53 OCHO kernel: RAX: 0000000000000000 RBX: ffff880e27bf6268 RCX: ffff880e1643d818
Feb 28 14:01:53 OCHO kernel: RDX: ffff880e129489d8 RSI: 0000000000000046 RDI: ffff880e27bf6268
Feb 28 14:01:53 OCHO kernel: RBP: ffffc9000b6cfeb8 R08: 0000000000000000 R09: ffffc9000b6cfdb8
Feb 28 14:01:53 OCHO kernel: R10: 0000000000000001 R11: ffff880e19d36000 R12: ffff880e27bf6000
Feb 28 14:01:53 OCHO kernel: R13: 0000000001ebd7e8 R14: ffff880e27bf6220 R15: ffff880e129489c8
Feb 28 14:01:53 OCHO kernel: ? md_open+0x2c/0x2c [md_mod]
Feb 28 14:01:53 OCHO kernel: md_thread+0xcc/0xf1 [md_mod]
Feb 28 14:01:53 OCHO kernel: ? wait_woken+0x68/0x68
Feb 28 14:01:53 OCHO kernel: kthread+0x10b/0x113
Feb 28 14:01:53 OCHO kernel: ? kthread_flush_work_fn+0x9/0x9
Feb 28 14:01:53 OCHO kernel: ret_from_fork+0x35/0x40

 

can I get some guidance on what I may be looking at here?

 

EDIT:

running 6.6.6

diagnostics attached

ocho-diagnostics-20190228-1409.zip

Edited by therapist
Link to comment
5 minutes ago, johnnie.black said:

Was there a parity check going on? These look like the tunable related call traces.

there was in fact....

 

I have:

Tunable (nr_requests): 128

default

Tunable (md_num_stripes): 4096

user-set

Tunable (md_sync_window): 2048

user-set

Tunable (md_sync_thresh): 2000

user-set

 

been set this way for quite some time. why would parity check cause KVM lock up?

Edited by therapist
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.