[SOLVED] Server randomly goes unresponsive [6.9.1]


Recommended Posts

Over the past few days, my server has been going into an unresponsive state at random times. My only recourse has been to force the system down via the power button.

 

I originally captured a "rcu_sched self-detected stall on CPU" error early this morning (before it locked-up). Once I brought the system back online, I ran a XFS repair on my cache drive (after reading this post on the forum), and have not seen any further instances of the error.

Mar 30 02:54:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU

 

However, a few hours after my rebooting and fixing the cache drive, the system went unresponsive again (around 10:28:56 am CST) this morning while I was at work. Once I got back home, I brought the system back back online around 5:41:45 pm CST (17:41:45) after forcing a shutdown with the power button.

 

Unfortunately, the syslog did not have anything useful this time (no mentions of "traces" or "self-detected stall"):

Mar 30 10:00:01 WadeWilson crond[2094]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Mar 30 10:28:50 WadeWilson dhcpcd[1991]: br0: Router Advertisement from fe80::aa5e:45ff:feee:1a38
Mar 30 10:28:56 WadeWilson dhcpcd[1991]: br0: Router Advertisement from fe80::aa5e:45ff:feee:1a38
Mar 30 17:41:45 WadeWilson kernel: mdcmd (36): set md_write_method 1
Mar 30 17:41:45 WadeWilson kernel: 
Mar 30 17:41:45 WadeWilson root: Delaying execution of fix common problems scan for 10 minutes
Mar 30 17:41:45 WadeWilson unassigned.devices: Mounting 'Auto Mount' Devices...

 

A few additional notes:

  • I had a concern that my system was experiencing a bug reported in the forum since the majority of my docker containers are using a static IP on br0. However, I have not found/seen any "kernel panics" either in the syslog.
  • I did try to downgrade to 6.8 yesterday (3/29). However, I was unable to start the array because of my cache drive (assuming because the drive needed the XFS repair and I didn't know it).
  • I also wound up trying 6.9.0-rc2 again yesterday (3/29), because I did not recall having these stability issues while on it. However, that did not make any impact/improvement. Therefore, I went back to 6.9.1.
  • The stability issues were not present when I originally upgraded to 6.9 and 6.9.1 stable when either were released.
  • The only other item worth mentioning is that my server lost power about a week ago due to a local power outage in my neighborhood. The system currently does not have a UPS connected to it.

 

wadewilson-diagnostics-20210330-1747.zip syslog-172.28.3.249.log

Edited by MarkRMonaco
Clarifications in the timeline of the system lock-ups, and additional info
Link to comment

Additional things that I've checked:

  • BIOS Version - Was one version behind. Just brought it current (after the most recent lock-up).
  • Global C-States (BIOS) - Verified it was disabled
  • Current Control (BIOS) - Verified it was set to "Typical Current Idle"
  • XMP Profiles (BIOS) - Verified it was disabled
  • Downcore Control (BIOS) - Verified it was disabled
  • Docker - "Host access to custom networks" was already disabled/off.
Edited by MarkRMonaco
Link to comment
1 hour ago, MarkRMonaco said:

The system went down (unresponsive) some time after 10:28:56 am CST this morning

You have a ton of call traces in your syslog.  I don't think I have ever seen so many.  Call traces are almost always related to hardware issues as yours seem to be.

 

This is just some of them and they seem to be related to CPU processes.  Unfortunately, I can't tell you what they mean but a bunch of call traces will definitely result in an eventual server lockup.  Hopefully, someone will have more insights.

 

Mar 30 02:54:36 WadeWilson kernel: Call Trace:
Mar 30 02:54:36 WadeWilson kernel: <IRQ>
Mar 30 02:54:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 02:54:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 02:54:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 02:54:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 02:54:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 02:54:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 02:54:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 02:54:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 02:54:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 02:54:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 02:54:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 02:54:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 02:54:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 02:54:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 02:54:36 WadeWilson kernel: </IRQ>
Mar 30 02:54:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 02:54:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 02:54:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 02:54:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 02:54:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 02:54:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 02:54:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 02:54:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 02:54:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 02:54:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 02:54:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 02:54:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 02:54:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 02:54:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 02:54:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 02:54:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 02:54:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 02:54:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 02:54:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 02:54:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 02:54:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 02:54:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 02:54:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 02:54:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 02:54:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 02:54:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 02:54:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 02:54:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 02:54:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 02:54:55 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 60222 jiffies s: 3377 root: 0x1/.
Mar 30 02:54:55 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 02:54:55 WadeWilson kernel: Task dump for CPU 7:
Mar 30 02:54:55 WadeWilson kernel: task:kcompactd0      state:R  running task     stack:    0 pid:  425 ppid:     2 flags:0x00004008
Mar 30 02:54:55 WadeWilson kernel: Call Trace:
Mar 30 02:54:55 WadeWilson kernel: ? proactive_compact_node+0x75/0xa2
Mar 30 02:54:55 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 02:54:55 WadeWilson kernel: ? kcompactd+0x1ee/0x22c
Mar 30 02:54:55 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 02:54:55 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 02:54:55 WadeWilson kernel: ? kthread+0xe5/0xea
Mar 30 02:54:55 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 02:54:55 WadeWilson kernel: ? ret_from_fork+0x22/0x30
Mar 30 02:56:16 WadeWilson crond[2053]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "monitor" &>/dev/null
Mar 30 02:57:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mar 30 02:57:36 WadeWilson kernel: rcu: #0117-....: (240002 ticks this GP) idle=03e/1/0x4000000000000000 softirq=389008/389008 fqs=59351 
Mar 30 02:57:36 WadeWilson kernel: #011(t=240003 jiffies g=1768205 q=103075)
Mar 30 02:57:36 WadeWilson kernel: NMI backtrace for cpu 7
Mar 30 02:57:36 WadeWilson kernel: CPU: 7 PID: 425 Comm: kcompactd0 Tainted: G      D           5.10.1-Unraid #1
Mar 30 02:57:36 WadeWilson kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.K1 12/22/2020
Mar 30 02:57:36 WadeWilson kernel: Call Trace:
Mar 30 02:57:36 WadeWilson kernel: <IRQ>
Mar 30 02:57:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 02:57:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 02:57:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 02:57:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 02:57:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 02:57:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 02:57:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 02:57:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 02:57:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 02:57:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 02:57:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 02:57:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 02:57:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 02:57:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 02:57:36 WadeWilson kernel: </IRQ>
Mar 30 02:57:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 02:57:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 02:57:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 02:57:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 02:57:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 02:57:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 02:57:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 02:57:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 02:57:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 02:57:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 02:57:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 02:57:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 02:57:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 02:57:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 02:57:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 02:57:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 02:57:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 02:57:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 02:57:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 02:57:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 02:57:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 02:57:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 02:57:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 02:57:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 02:57:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 02:57:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 02:57:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 02:57:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 02:57:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 02:57:56 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 240446 jiffies s: 3377 root: 0x1/.
Mar 30 02:57:56 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 02:57:56 WadeWilson kernel: Task dump for CPU 7:
Mar 30 02:57:56 WadeWilson kernel: task:kcompactd0      state:R  running task     stack:    0 pid:  425 ppid:     2 flags:0x00004008
Mar 30 02:57:56 WadeWilson kernel: Call Trace:
Mar 30 02:57:56 WadeWilson kernel: ? proactive_compact_node+0x75/0xa2
Mar 30 02:57:56 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 02:57:56 WadeWilson kernel: ? kcompactd+0x1ee/0x22c
Mar 30 02:57:56 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 02:57:56 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 02:57:56 WadeWilson kernel: ? kthread+0xe5/0xea
Mar 30 02:57:56 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 02:57:56 WadeWilson kernel: ? ret_from_fork+0x22/0x30
Mar 30 03:00:16 WadeWilson crond[2053]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Mar 30 03:00:16 WadeWilson crond[2053]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "monitor" &>/dev/null
Mar 30 03:00:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mar 30 03:00:36 WadeWilson kernel: rcu: #0117-....: (420005 ticks this GP) idle=03e/1/0x4000000000000000 softirq=389008/389008 fqs=104001 
Mar 30 03:00:36 WadeWilson kernel: #011(t=420006 jiffies g=1768205 q=206469)
Mar 30 03:00:36 WadeWilson kernel: NMI backtrace for cpu 7
Mar 30 03:00:36 WadeWilson kernel: CPU: 7 PID: 425 Comm: kcompactd0 Tainted: G      D           5.10.1-Unraid #1
Mar 30 03:00:36 WadeWilson kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.K1 12/22/2020
Mar 30 03:00:36 WadeWilson kernel: Call Trace:
Mar 30 03:00:36 WadeWilson kernel: <IRQ>
Mar 30 03:00:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 03:00:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 03:00:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 03:00:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 03:00:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 03:00:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 03:00:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 03:00:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 03:00:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 03:00:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 03:00:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 03:00:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 03:00:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 03:00:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 03:00:36 WadeWilson kernel: </IRQ>
Mar 30 03:00:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 03:00:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 03:00:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 03:00:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 03:00:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 03:00:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 03:00:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 03:00:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 03:00:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 03:00:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 03:00:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 03:00:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 03:00:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 03:00:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 03:00:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 03:00:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 03:00:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 03:00:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 03:00:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 03:00:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 03:00:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 03:00:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 03:00:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:00:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 03:00:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:00:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:00:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 03:00:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:00:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 03:00:56 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 420670 jiffies s: 3377 root: 0x1/.
Mar 30 03:00:56 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 03:00:56 WadeWilson kernel: Task dump for CPU 7:
Mar 30 03:00:56 WadeWilson kernel: task:kcompactd0      state:R  running task     stack:    0 pid:  425 ppid:     2 flags:0x00004008
Mar 30 03:00:56 WadeWilson kernel: Call Trace:
Mar 30 03:00:56 WadeWilson kernel: ? proactive_compact_node+0x75/0xa2
Mar 30 03:00:56 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:00:56 WadeWilson kernel: ? kcompactd+0x1ee/0x22c
Mar 30 03:00:56 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:00:56 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:00:56 WadeWilson kernel: ? kthread+0xe5/0xea
Mar 30 03:00:56 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:00:56 WadeWilson kernel: ? ret_from_fork+0x22/0x30
Mar 30 03:03:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mar 30 03:03:36 WadeWilson kernel: rcu: #0117-....: (600008 ticks this GP) idle=03e/1/0x4000000000000000 softirq=389008/389008 fqs=148640 
Mar 30 03:03:36 WadeWilson kernel: #011(t=600009 jiffies g=1768205 q=291987)
Mar 30 03:03:36 WadeWilson kernel: NMI backtrace for cpu 7
Mar 30 03:03:36 WadeWilson kernel: CPU: 7 PID: 425 Comm: kcompactd0 Tainted: G      D           5.10.1-Unraid #1
Mar 30 03:03:36 WadeWilson kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.K1 12/22/2020
Mar 30 03:03:36 WadeWilson kernel: Call Trace:
Mar 30 03:03:36 WadeWilson kernel: <IRQ>
Mar 30 03:03:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 03:03:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 03:03:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 03:03:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 03:03:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 03:03:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 03:03:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 03:03:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 03:03:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 03:03:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 03:03:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 03:03:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 03:03:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 03:03:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 03:03:36 WadeWilson kernel: </IRQ>
Mar 30 03:03:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 03:03:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 03:03:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 03:03:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 03:03:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 03:03:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 03:03:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 03:03:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 03:03:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 03:03:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 03:03:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 03:03:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 03:03:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 03:03:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 03:03:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 03:03:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 03:03:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 03:03:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 03:03:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 03:03:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 03:03:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 03:03:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 03:03:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:03:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 03:03:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:03:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:03:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 03:03:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:03:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 03:03:56 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 600894 jiffies s: 3377 root: 0x1/.
Mar 30 03:03:56 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 03:03:56 WadeWilson kernel: Task dump for CPU 7:
Mar 30 03:03:56 WadeWilson kernel: task:kcompactd0      state:R  running task     stack:    0 pid:  425 ppid:     2 flags:0x00004008
Mar 30 03:03:56 WadeWilson kernel: Call Trace:
Mar 30 03:03:56 WadeWilson kernel: ? proactive_compact_node+0x75/0xa2
Mar 30 03:03:56 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:03:56 WadeWilson kernel: ? kcompactd+0x1ee/0x22c
Mar 30 03:03:56 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:03:56 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:03:56 WadeWilson kernel: ? kthread+0xe5/0xea
Mar 30 03:03:56 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:03:56 WadeWilson kernel: ? ret_from_fork+0x22/0x30
Mar 30 03:06:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mar 30 03:06:36 WadeWilson kernel: rcu: #0117-....: (780011 ticks this GP) idle=03e/1/0x4000000000000000 softirq=389008/389008 fqs=193359 
Mar 30 03:06:36 WadeWilson kernel: #011(t=780012 jiffies g=1768205 q=388324)
Mar 30 03:06:36 WadeWilson kernel: NMI backtrace for cpu 7
Mar 30 03:06:36 WadeWilson kernel: CPU: 7 PID: 425 Comm: kcompactd0 Tainted: G      D           5.10.1-Unraid #1
Mar 30 03:06:36 WadeWilson kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.K1 12/22/2020
Mar 30 03:06:36 WadeWilson kernel: Call Trace:
Mar 30 03:06:36 WadeWilson kernel: <IRQ>
Mar 30 03:06:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 03:06:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 03:06:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 03:06:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 03:06:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 03:06:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 03:06:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 03:06:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 03:06:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 03:06:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 03:06:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 03:06:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 03:06:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 03:06:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 03:06:36 WadeWilson kernel: </IRQ>
Mar 30 03:06:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 03:06:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 03:06:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 03:06:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 03:06:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 03:06:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 03:06:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 03:06:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 03:06:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 03:06:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 03:06:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 03:06:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 03:06:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 03:06:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 03:06:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 03:06:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 03:06:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 03:06:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 03:06:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 03:06:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 03:06:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 03:06:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 03:06:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:06:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 03:06:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:06:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:06:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 03:06:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:06:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 03:06:56 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 781118 jiffies s: 3377 root: 0x1/.
Mar 30 03:06:56 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 03:06:56 WadeWilson kernel: Task dump for CPU 7:
Mar 30 03:06:56 WadeWilson kernel: task:kcompactd0      state:R  running task     stack:    0 pid:  425 ppid:     2 flags:0x00004008
Mar 30 03:06:56 WadeWilson kernel: Call Trace:
Mar 30 03:06:56 WadeWilson kernel: ? proactive_compact_node+0x75/0xa2
Mar 30 03:06:56 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:06:56 WadeWilson kernel: ? kcompactd+0x1ee/0x22c
Mar 30 03:06:56 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:06:56 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:06:56 WadeWilson kernel: ? kthread+0xe5/0xea
Mar 30 03:06:56 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:06:56 WadeWilson kernel: ? ret_from_fork+0x22/0x30
Mar 30 03:07:16 WadeWilson crond[2053]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "monitor" &>/dev/null
Mar 30 03:09:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mar 30 03:09:36 WadeWilson kernel: rcu: #0117-....: (960014 ticks this GP) idle=03e/1/0x4000000000000000 softirq=389008/389008 fqs=238149 
Mar 30 03:09:36 WadeWilson kernel: #011(t=960015 jiffies g=1768205 q=548943)
Mar 30 03:09:36 WadeWilson kernel: NMI backtrace for cpu 7
Mar 30 03:09:36 WadeWilson kernel: CPU: 7 PID: 425 Comm: kcompactd0 Tainted: G      D           5.10.1-Unraid #1
Mar 30 03:09:36 WadeWilson kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.K1 12/22/2020
Mar 30 03:09:36 WadeWilson kernel: Call Trace:
Mar 30 03:09:36 WadeWilson kernel: <IRQ>
Mar 30 03:09:36 WadeWilson kernel: dump_stack+0x6b/0x83
Mar 30 03:09:36 WadeWilson kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Mar 30 03:09:36 WadeWilson kernel: nmi_cpu_backtrace+0x7d/0x8f
Mar 30 03:09:36 WadeWilson kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Mar 30 03:09:36 WadeWilson kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Mar 30 03:09:36 WadeWilson kernel: rcu_sched_clock_irq+0x1ec/0x543
Mar 30 03:09:36 WadeWilson kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
Mar 30 03:09:36 WadeWilson kernel: update_process_times+0x50/0x6e
Mar 30 03:09:36 WadeWilson kernel: tick_sched_timer+0x36/0x64
Mar 30 03:09:36 WadeWilson kernel: __hrtimer_run_queues+0xb7/0x10b
Mar 30 03:09:36 WadeWilson kernel: ? tick_sched_do_timer+0x39/0x39
Mar 30 03:09:36 WadeWilson kernel: hrtimer_interrupt+0x8d/0x160
Mar 30 03:09:36 WadeWilson kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Mar 30 03:09:36 WadeWilson kernel: asm_call_irq_on_stack+0x12/0x20
Mar 30 03:09:36 WadeWilson kernel: </IRQ>
Mar 30 03:09:36 WadeWilson kernel: sysvec_apic_timer_interrupt+0x71/0x95
Mar 30 03:09:36 WadeWilson kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Mar 30 03:09:36 WadeWilson kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Mar 30 03:09:36 WadeWilson kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 80 2f 02 00 65 48 03 05 90 92 f8
Mar 30 03:09:36 WadeWilson kernel: RSP: 0018:ffffc90000e9fb70 EFLAGS: 00000202
Mar 30 03:09:36 WadeWilson kernel: RAX: 0000000000000101 RBX: ffffc90000e9fbb8 RCX: 000ffffffffff000
Mar 30 03:09:36 WadeWilson kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea000403b6e8
Mar 30 03:09:36 WadeWilson kernel: RBP: ffffea0015370e00 R08: ffff888000000000 R09: 0000000000000000
Mar 30 03:09:36 WadeWilson kernel: R10: 0000000000000001 R11: 000000000000000c R12: ffffea00040945c0
Mar 30 03:09:36 WadeWilson kernel: R13: ffff88810133b300 R14: 0000160000000000 R15: ffff888133da1140
Mar 30 03:09:36 WadeWilson kernel: queued_spin_lock_slowpath+0x7/0xa
Mar 30 03:09:36 WadeWilson kernel: page_vma_mapped_walk+0x4a4/0x4f8
Mar 30 03:09:36 WadeWilson kernel: remove_migration_pte+0x59/0x214
Mar 30 03:09:36 WadeWilson kernel: rmap_walk_anon+0xe7/0x156
Mar 30 03:09:36 WadeWilson kernel: remove_migration_ptes+0x49/0x63
Mar 30 03:09:36 WadeWilson kernel: ? pmd_pfn+0x3a/0x3a
Mar 30 03:09:36 WadeWilson kernel: migrate_pages+0x4e0/0x7c1
Mar 30 03:09:36 WadeWilson kernel: ? move_freelist_tail+0xba/0xba
Mar 30 03:09:36 WadeWilson kernel: ? isolate_freepages_block+0x26b/0x26b
Mar 30 03:09:36 WadeWilson kernel: compact_zone+0x6b2/0x905
Mar 30 03:09:36 WadeWilson kernel: ? set_next_entity+0x47/0x6c
Mar 30 03:09:36 WadeWilson kernel: proactive_compact_node+0x75/0xa2
Mar 30 03:09:36 WadeWilson kernel: ? fragmentation_score_node+0x2b/0x59
Mar 30 03:09:36 WadeWilson kernel: kcompactd+0x1ee/0x22c
Mar 30 03:09:36 WadeWilson kernel: ? init_wait_entry+0x24/0x24
Mar 30 03:09:36 WadeWilson kernel: ? kcompactd_do_work+0x16f/0x16f
Mar 30 03:09:36 WadeWilson kernel: kthread+0xe5/0xea
Mar 30 03:09:36 WadeWilson kernel: ? kthread_unpark+0x52/0x52
Mar 30 03:09:36 WadeWilson kernel: ret_from_fork+0x22/0x30
Mar 30 03:09:57 WadeWilson kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 961342 jiffies s: 3377 root: 0x1/.
Mar 30 03:09:57 WadeWilson kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Mar 30 03:09:57 WadeWilson kernel: Task dump for CPU 7:

 

Link to comment

Thanks @Hoopster. Those traces were from the crash/lock-up that occurred overnight.

 

I brought the system back online in the 7am (CST) range this morning and ran a XFS repair on my cache drive (once I saw the "rcu_sched self-detected stall on CPU" error in the logs, and checked the forum). After that, I rebooted the system at least one more time (maybe two) before I went to work.

 

Unfortunately, the syslog did not have any further mentions of "traces" or "self-detected stalls" before or after the most recent lock-up this morning (10:28:56 am CST).

Edited by MarkRMonaco
Link to comment
  • MarkRMonaco changed the title to [SOLVED] Server randomly goes unresponsive [6.9.1]
  • 4 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.