John Catral Posted November 10, 2021 Share Posted November 10, 2021 My Unraid server running on an HP gen8 microserver freezes up and stops responding randomlly around 1-2 months of being online. The last time this happened was in the beginning of October, so I finally setup my unraid to make a copy of the syslog in the flashdrive. It looks like my unraid froze on November 7 until I realized it was off today, November 9th. Can someone spot anything out of the ordinary with my syslog snippet? I would really appreciate it and it would help me figure out why this happens all the time. Nov 7 13:43:16 blacktower kernel: rcu: INFO: rcu_sched self-detected stall on CPU Nov 7 13:43:16 blacktower kernel: rcu: 2-....: (31380244 ticks this GP) idle=89a/1/0x4000000000000000 softirq=172338525/172338525 fqs=7841656 Nov 7 13:43:16 blacktower kernel: (t=31380522 jiffies g=343853069 q=19638568) Nov 7 13:43:16 blacktower kernel: NMI backtrace for cpu 2 Nov 7 13:43:16 blacktower kernel: CPU: 2 PID: 340 Comm: kcompactd0 Tainted: P D W IO 5.10.28-Unraid #1 Nov 7 13:43:16 blacktower kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019 Nov 7 13:43:16 blacktower kernel: Call Trace: Nov 7 13:43:16 blacktower kernel: <IRQ> Nov 7 13:43:16 blacktower kernel: dump_stack+0x6b/0x83 Nov 7 13:43:16 blacktower kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Nov 7 13:43:16 blacktower kernel: nmi_cpu_backtrace+0x7d/0x8f Nov 7 13:43:16 blacktower kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Nov 7 13:43:16 blacktower kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Nov 7 13:43:16 blacktower kernel: rcu_sched_clock_irq+0x1ec/0x543 Nov 7 13:43:16 blacktower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe Nov 7 13:43:16 blacktower kernel: update_process_times+0x50/0x6e Nov 7 13:43:16 blacktower kernel: tick_sched_timer+0x36/0x64 Nov 7 13:43:16 blacktower kernel: __hrtimer_run_queues+0xb7/0x10b Nov 7 13:43:16 blacktower kernel: ? tick_sched_do_timer+0x39/0x39 Nov 7 13:43:16 blacktower kernel: hrtimer_interrupt+0x8d/0x15b Nov 7 13:43:16 blacktower kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Nov 7 13:43:16 blacktower kernel: asm_call_irq_on_stack+0x12/0x20 Nov 7 13:43:16 blacktower kernel: </IRQ> Nov 7 13:43:16 blacktower kernel: sysvec_apic_timer_interrupt+0x71/0x95 Nov 7 13:43:16 blacktower kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Nov 7 13:43:16 blacktower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a Nov 7 13:43:16 blacktower kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 00 30 02 00 65 48 03 05 f0 8e f8 Nov 7 13:43:16 blacktower kernel: RSP: 0018:ffffc90000a4fb80 EFLAGS: 00000202 Nov 7 13:43:16 blacktower kernel: RAX: 0000000000080101 RBX: ffff888001faf080 RCX: 000ffffffffff000 Nov 7 13:43:16 blacktower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea0005b9a068 Nov 7 13:43:16 blacktower kernel: RBP: ffffc90000a4fbc8 R08: ffff888000000000 R09: 0000000000000000 Nov 7 13:43:16 blacktower kernel: R10: ffffc90000a4fb88 R11: 000000000000000d R12: ffff8881d1751100 Nov 7 13:43:16 blacktower kernel: R13: ffffea00074579c0 R14: 0000160000000000 R15: 0000000000000001 Nov 7 13:43:16 blacktower kernel: queued_spin_lock_slowpath+0x7/0xa Nov 7 13:43:16 blacktower kernel: page_vma_mapped_walk+0x497/0x4dc Nov 7 13:43:16 blacktower kernel: remove_migration_pte+0x59/0x214 Nov 7 13:43:16 blacktower kernel: rmap_walk_file+0xbc/0x125 Nov 7 13:43:16 blacktower kernel: remove_migration_ptes+0x49/0x63 Nov 7 13:43:16 blacktower kernel: ? pmd_pfn+0x3a/0x3a Nov 7 13:43:16 blacktower kernel: migrate_pages+0x4e0/0x7c1 Nov 7 13:43:16 blacktower kernel: ? move_freelist_tail+0xba/0xba Nov 7 13:43:16 blacktower kernel: ? isolate_freepages_block+0x26b/0x26b Nov 7 13:43:16 blacktower kernel: compact_zone+0x6b7/0x90a Nov 7 13:43:16 blacktower kernel: proactive_compact_node+0x75/0xa2 Nov 7 13:43:16 blacktower kernel: ? fragmentation_score_node+0x2b/0x59 Nov 7 13:43:16 blacktower kernel: kcompactd+0x1ee/0x22c Nov 7 13:43:16 blacktower kernel: ? init_wait_entry+0x24/0x24 Nov 7 13:43:16 blacktower kernel: ? kcompactd_do_work+0x16f/0x16f Nov 7 13:43:16 blacktower kernel: kthread+0xe5/0xea Nov 7 13:43:16 blacktower kernel: ? __kthread_bind_mask+0x57/0x57 Nov 7 13:43:16 blacktower kernel: ret_from_fork+0x22/0x30 Nov 7 13:46:16 blacktower kernel: rcu: INFO: rcu_sched self-detected stall on CPU Nov 7 13:46:16 blacktower kernel: rcu: 2-....: (31560242 ticks this GP) idle=89a/1/0x4000000000000000 softirq=172338525/172338525 fqs=7886625 Nov 7 13:46:16 blacktower kernel: (t=31560525 jiffies g=343853069 q=19649207) Nov 7 13:46:16 blacktower kernel: NMI backtrace for cpu 2 Nov 7 13:46:16 blacktower kernel: CPU: 2 PID: 340 Comm: kcompactd0 Tainted: P D W IO 5.10.28-Unraid #1 Nov 7 13:46:16 blacktower kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019 Nov 7 13:46:16 blacktower kernel: Call Trace: Nov 7 13:46:16 blacktower kernel: <IRQ> Nov 7 13:46:16 blacktower kernel: dump_stack+0x6b/0x83 Nov 7 13:46:16 blacktower kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Nov 7 13:46:16 blacktower kernel: nmi_cpu_backtrace+0x7d/0x8f Nov 7 13:46:16 blacktower kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Nov 7 13:46:16 blacktower kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Nov 7 13:46:16 blacktower kernel: rcu_sched_clock_irq+0x1ec/0x543 Nov 7 13:46:16 blacktower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe Nov 7 13:46:16 blacktower kernel: update_process_times+0x50/0x6e Nov 7 13:46:16 blacktower kernel: tick_sched_timer+0x36/0x64 Nov 7 13:46:16 blacktower kernel: __hrtimer_run_queues+0xb7/0x10b Nov 7 13:46:16 blacktower kernel: ? tick_sched_do_timer+0x39/0x39 Nov 7 13:46:16 blacktower kernel: hrtimer_interrupt+0x8d/0x15b Nov 7 13:46:16 blacktower kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Nov 7 13:46:16 blacktower kernel: asm_call_irq_on_stack+0x12/0x20 Nov 7 13:46:16 blacktower kernel: </IRQ> Nov 7 13:46:16 blacktower kernel: sysvec_apic_timer_interrupt+0x71/0x95 Nov 7 13:46:16 blacktower kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Nov 7 13:46:16 blacktower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a Nov 7 13:46:16 blacktower kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 00 30 02 00 65 48 03 05 f0 8e f8 Nov 7 13:46:16 blacktower kernel: RSP: 0018:ffffc90000a4fb80 EFLAGS: 00000202 Nov 7 13:46:16 blacktower kernel: RAX: 0000000000080101 RBX: ffff888001faf080 RCX: 000ffffffffff000 Nov 7 13:46:16 blacktower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea0005b9a068 Nov 7 13:46:16 blacktower kernel: RBP: ffffc90000a4fbc8 R08: ffff888000000000 R09: 0000000000000000 Nov 7 13:46:16 blacktower kernel: R10: ffffc90000a4fb88 R11: 000000000000000d R12: ffff8881d1751100 Nov 7 13:46:16 blacktower kernel: R13: ffffea00074579c0 R14: 0000160000000000 R15: 0000000000000001 Nov 7 13:46:16 blacktower kernel: queued_spin_lock_slowpath+0x7/0xa Nov 7 13:46:16 blacktower kernel: page_vma_mapped_walk+0x497/0x4dc Nov 7 13:46:16 blacktower kernel: remove_migration_pte+0x59/0x214 Nov 7 13:46:16 blacktower kernel: rmap_walk_file+0xbc/0x125 Nov 7 13:46:16 blacktower kernel: remove_migration_ptes+0x49/0x63 Nov 7 13:46:16 blacktower kernel: ? pmd_pfn+0x3a/0x3a Nov 7 13:46:16 blacktower kernel: migrate_pages+0x4e0/0x7c1 Nov 7 13:46:16 blacktower kernel: ? move_freelist_tail+0xba/0xba Nov 7 13:46:16 blacktower kernel: ? isolate_freepages_block+0x26b/0x26b Nov 7 13:46:16 blacktower kernel: compact_zone+0x6b7/0x90a Nov 7 13:46:16 blacktower kernel: proactive_compact_node+0x75/0xa2 Nov 7 13:46:16 blacktower kernel: ? fragmentation_score_node+0x2b/0x59 Nov 7 13:46:16 blacktower kernel: kcompactd+0x1ee/0x22c Nov 7 13:46:16 blacktower kernel: ? init_wait_entry+0x24/0x24 Nov 7 13:46:16 blacktower kernel: ? kcompactd_do_work+0x16f/0x16f Nov 7 13:46:16 blacktower kernel: kthread+0xe5/0xea Nov 7 13:46:16 blacktower kernel: ? __kthread_bind_mask+0x57/0x57 Nov 7 13:46:16 blacktower kernel: ret_from_fork+0x22/0x30 Nov 9 22:48:21 blacktower kernel: Linux version 5.10.28-Unraid (root@Develop) (gcc (GCC) 9.3.0, GNU ld version 2.33.1-slack15) #1 SMP Wed Apr 7 08:23:18 PDT 2021 Nov 9 22:48:21 blacktower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot Nov 9 22:48:21 blacktower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Nov 9 22:48:21 blacktower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Nov 9 22:48:21 blacktower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Nov 9 22:48:21 blacktower kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Nov 9 22:48:21 blacktower kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. Nov 9 22:48:21 blacktower kernel: BIOS-provided physical RAM map: Quote Link to comment
John Catral Posted January 26, 2022 Author Share Posted January 26, 2022 My unraid still crashes to the point I can't depend on hosting some of my apps. Quote Link to comment
JorgeB Posted January 26, 2022 Share Posted January 26, 2022 Not much to go on with the call trace posted, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
John Catral Posted January 26, 2022 Author Share Posted January 26, 2022 That's a good idea. Will try to do that. I will also post the full syslog, so hopefully that helps. Thank you! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.