December 1, 20196 yr Running my monthly parity check on my system and the system became unresponsive numerous times throughout the check. The GUI and console would stop responding for ~30-60 seconds. I've attached a snip of the call trace below as well as the diagnostics, but it seems whatever core of my FX-8370 was pegged to 100% was the core that showed up in the log as "NMI backtrace for cpu X". Nov 30 04:23:03 Tower kernel: NMI backtrace for cpu 1 Nov 30 04:23:03 Tower kernel: CPU: 1 PID: 7410 Comm: unraidd Tainted: P W O 4.19.56-Unraid #1 Nov 30 04:23:03 Tower kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./970 PRO GAMING/AURA, BIOS 0901 11/07/2016 Nov 30 04:23:03 Tower kernel: Call Trace: Nov 30 04:23:03 Tower kernel: <IRQ> Nov 30 04:23:03 Tower kernel: dump_stack+0x5d/0x79 Nov 30 04:23:03 Tower kernel: nmi_cpu_backtrace+0x71/0x83 Nov 30 04:23:03 Tower kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Nov 30 04:23:03 Tower kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7 Nov 30 04:23:03 Tower kernel: rcu_dump_cpu_stacks+0x91/0xbb Nov 30 04:23:03 Tower kernel: rcu_check_callbacks+0x28f/0x58e Nov 30 04:23:03 Tower kernel: ? tick_sched_handle.isra.5+0x2f/0x2f Nov 30 04:23:03 Tower kernel: update_process_times+0x23/0x45 Nov 30 04:23:03 Tower kernel: tick_sched_timer+0x36/0x64 Nov 30 04:23:03 Tower kernel: __hrtimer_run_queues+0xb1/0x105 Nov 30 04:23:03 Tower kernel: hrtimer_interrupt+0xf4/0x20d Nov 30 04:23:03 Tower kernel: smp_apic_timer_interrupt+0x79/0x91 Nov 30 04:23:03 Tower kernel: apic_timer_interrupt+0xf/0x20 Nov 30 04:23:03 Tower kernel: </IRQ> The parity check is still jugging along at ~70%. Any thoughts? tower-diagnostics-20191130-2039.zip
December 1, 20196 yr Community Expert This can usually be fixed by lowering the md_sync_thresh variable little by little until the NMIs stop, alternatively upgrade to v6.8 which uses a different resync process and has so far been immune to this issue.
Archived
This topic is now archived and is closed to further replies.