therapist Posted February 28, 2019 Share Posted February 28, 2019 (edited) started getting call traces, which seem to occur when previously functional VMs are running Feb 28 13:58:53 OCHO kernel: CPU: 33 PID: 17724 Comm: unraidd Not tainted 4.18.20-unRAID #1 Feb 28 13:58:53 OCHO kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 Feb 28 13:58:53 OCHO kernel: Call Trace: Feb 28 13:58:53 OCHO kernel: <IRQ> Feb 28 13:58:53 OCHO kernel: dump_stack+0x5d/0x79 Feb 28 13:58:53 OCHO kernel: nmi_cpu_backtrace+0x71/0x83 Feb 28 13:58:53 OCHO kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Feb 28 13:58:53 OCHO kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7 Feb 28 13:58:53 OCHO kernel: rcu_dump_cpu_stacks+0x91/0xbb Feb 28 13:58:53 OCHO kernel: rcu_check_callbacks+0x23f/0x5ca Feb 28 13:58:53 OCHO kernel: ? tick_sched_handle.isra.5+0x2f/0x2f Feb 28 13:58:53 OCHO kernel: update_process_times+0x23/0x45 Feb 28 13:58:53 OCHO kernel: tick_sched_timer+0x36/0x64 Feb 28 13:58:53 OCHO kernel: __hrtimer_run_queues+0xb1/0x105 Feb 28 13:58:53 OCHO kernel: hrtimer_interrupt+0xf4/0x20d Feb 28 13:58:53 OCHO kernel: smp_apic_timer_interrupt+0x79/0x91 Feb 28 13:58:53 OCHO kernel: apic_timer_interrupt+0xf/0x20 Feb 28 13:58:53 OCHO kernel: </IRQ> Feb 28 13:58:53 OCHO kernel: RIP: 0010:xor_avx_5+0x1c5/0x352 Feb 28 13:58:53 OCHO kernel: Code: c5 fd 7f 98 e0 00 00 00 c4 c1 7d 6f 82 00 01 00 00 c4 c1 7c 57 83 00 01 00 00 c5 fc 57 83 00 01 00 00 c5 fc 57 85 00 01 00 00 <c5> fc 57 80 00 01 00 00 c5 fd 7f 80 00 01 00 00 c4 c1 7d 6f 8a 20 Feb 28 13:58:53 OCHO kernel: RSP: 0018:ffffc9000b6cfc68 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13 Feb 28 13:58:53 OCHO kernel: RAX: ffff880e118ada00 RBX: ffff880e118a5a00 RCX: ffff880e118a5000 Feb 28 13:58:53 OCHO kernel: RDX: 0000000000000000 RSI: ffff880e118ad000 RDI: 0000000000001000 Feb 28 13:58:53 OCHO kernel: RBP: ffff880e118a4a00 R08: ffff880e118a6000 R09: ffff880e118a7000 Feb 28 13:58:53 OCHO kernel: R10: ffff880e118a7a00 R11: ffff880e118a6a00 R12: 0000000000000a00 Feb 28 13:58:53 OCHO kernel: R13: ffff880e118ad000 R14: ffff880e118a4000 R15: ffff880e118a5000 Feb 28 13:58:53 OCHO kernel: ? xor_avx_5+0x2d/0x352 Feb 28 13:58:53 OCHO kernel: check_parity+0x118/0x349 [md_mod] Feb 28 13:58:53 OCHO kernel: handle_stripe+0xe8a/0x1226 [md_mod] Feb 28 13:58:53 OCHO kernel: unraidd+0xbc/0x123 [md_mod] Feb 28 13:58:53 OCHO kernel: ? md_open+0x2c/0x2c [md_mod] Feb 28 13:58:53 OCHO kernel: md_thread+0xcc/0xf1 [md_mod] Feb 28 13:58:53 OCHO kernel: ? wait_woken+0x68/0x68 Feb 28 13:58:53 OCHO kernel: kthread+0x10b/0x113 Feb 28 13:58:53 OCHO kernel: ? kthread_flush_work_fn+0x9/0x9 Feb 28 13:58:53 OCHO kernel: ret_from_fork+0x35/0x40 Feb 28 14:01:53 OCHO kernel: INFO: rcu_sched self-detected stall on CPU Feb 28 14:01:53 OCHO kernel: 33-....: (1140023 ticks this GP) idle=92a/1/4611686018427387906 softirq=714962/714976 fqs=282448 Feb 28 14:01:53 OCHO kernel: (t=1140024 jiffies g=275169 c=275168 q=9034795) Feb 28 14:01:53 OCHO kernel: NMI backtrace for cpu 33 Feb 28 14:01:53 OCHO kernel: CPU: 33 PID: 17724 Comm: unraidd Not tainted 4.18.20-unRAID #1 Feb 28 14:01:53 OCHO kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 Feb 28 14:01:53 OCHO kernel: Call Trace: Feb 28 14:01:53 OCHO kernel: <IRQ> Feb 28 14:01:53 OCHO kernel: dump_stack+0x5d/0x79 Feb 28 14:01:53 OCHO kernel: nmi_cpu_backtrace+0x71/0x83 Feb 28 14:01:53 OCHO kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Feb 28 14:01:53 OCHO kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7 Feb 28 14:01:53 OCHO kernel: rcu_dump_cpu_stacks+0x91/0xbb Feb 28 14:01:53 OCHO kernel: rcu_check_callbacks+0x23f/0x5ca Feb 28 14:01:53 OCHO kernel: ? tick_sched_handle.isra.5+0x2f/0x2f Feb 28 14:01:53 OCHO kernel: update_process_times+0x23/0x45 Feb 28 14:01:53 OCHO kernel: tick_sched_timer+0x36/0x64 Feb 28 14:01:53 OCHO kernel: __hrtimer_run_queues+0xb1/0x105 Feb 28 14:01:53 OCHO kernel: hrtimer_interrupt+0xf4/0x20d Feb 28 14:01:53 OCHO kernel: smp_apic_timer_interrupt+0x79/0x91 Feb 28 14:01:53 OCHO kernel: apic_timer_interrupt+0xf/0x20 Feb 28 14:01:53 OCHO kernel: </IRQ> Feb 28 14:01:53 OCHO kernel: RIP: 0010:unraidd+0xb1/0x123 [md_mod] Feb 28 14:01:53 OCHO kernel: Code: 48 89 12 48 89 52 08 f0 80 62 20 fe f0 ff 42 28 8b 42 28 ff c8 74 02 0f 0b 48 89 df c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <4c> 89 ff 41 ff c5 e8 1e ed ff ff 4c 89 ff e8 69 e1 ff ff 48 89 df Feb 28 14:01:53 OCHO kernel: RSP: 0018:ffffc9000b6cfe50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Feb 28 14:01:53 OCHO kernel: RAX: 0000000000000000 RBX: ffff880e27bf6268 RCX: ffff880e1643d818 Feb 28 14:01:53 OCHO kernel: RDX: ffff880e129489d8 RSI: 0000000000000046 RDI: ffff880e27bf6268 Feb 28 14:01:53 OCHO kernel: RBP: ffffc9000b6cfeb8 R08: 0000000000000000 R09: ffffc9000b6cfdb8 Feb 28 14:01:53 OCHO kernel: R10: 0000000000000001 R11: ffff880e19d36000 R12: ffff880e27bf6000 Feb 28 14:01:53 OCHO kernel: R13: 0000000001ebd7e8 R14: ffff880e27bf6220 R15: ffff880e129489c8 Feb 28 14:01:53 OCHO kernel: ? md_open+0x2c/0x2c [md_mod] Feb 28 14:01:53 OCHO kernel: md_thread+0xcc/0xf1 [md_mod] Feb 28 14:01:53 OCHO kernel: ? wait_woken+0x68/0x68 Feb 28 14:01:53 OCHO kernel: kthread+0x10b/0x113 Feb 28 14:01:53 OCHO kernel: ? kthread_flush_work_fn+0x9/0x9 Feb 28 14:01:53 OCHO kernel: ret_from_fork+0x35/0x40 can I get some guidance on what I may be looking at here? EDIT: running 6.6.6 diagnostics attached ocho-diagnostics-20190228-1409.zip Edited February 28, 2019 by therapist Quote Link to comment
JorgeB Posted February 28, 2019 Share Posted February 28, 2019 Was there a parity check going on? These look like the tunable related call traces. Quote Link to comment
therapist Posted February 28, 2019 Author Share Posted February 28, 2019 (edited) 5 minutes ago, johnnie.black said: Was there a parity check going on? These look like the tunable related call traces. there was in fact.... I have: Tunable (nr_requests): 128 default Tunable (md_num_stripes): 4096 user-set Tunable (md_sync_window): 2048 user-set Tunable (md_sync_thresh): 2000 user-set been set this way for quite some time. why would parity check cause KVM lock up? Edited February 28, 2019 by therapist Quote Link to comment
JorgeB Posted February 28, 2019 Share Posted February 28, 2019 These can usually be fixed by lowering the md_sync_thresh tunable, start lowering little by little and stop when the call traces stop, you can do it during a check. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.