Unraid server locking up every 3-5 days requiring a cold boot


Recommended Posts

Not sure what the cause is here.  I have tried my google fu but cannot find an exact solution  So,  I wanted to drop my syslog here.  This is a snipit of that.

 

May  9 18:23:22 TerryBytes kernel: rcu: INFO: rcu_sched self-detected stall on CPU
May  9 18:23:22 TerryBytes kernel: rcu:     1-....: (1140017 ticks this GP) idle=99a/1/0x4000000000000000 softirq=15221315/15221315 fqs=284927 
May  9 18:23:22 TerryBytes kernel:     (t=1140018 jiffies g=67614025 q=5292668)
May  9 18:23:22 TerryBytes kernel: NMI backtrace for cpu 1
May  9 18:23:22 TerryBytes kernel: CPU: 1 PID: 14852 Comm: kworker/u50:1 Tainted: G        W I       5.10.28-Unraid #1
May  9 18:23:22 TerryBytes kernel: Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 6.4.0 07/23/2013
May  9 18:23:22 TerryBytes kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack]
May  9 18:23:22 TerryBytes kernel: Call Trace:
May  9 18:23:22 TerryBytes kernel: <IRQ>
May  9 18:23:22 TerryBytes kernel: dump_stack+0x6b/0x83
May  9 18:23:22 TerryBytes kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
May  9 18:23:22 TerryBytes kernel: nmi_cpu_backtrace+0x7d/0x8f
May  9 18:23:22 TerryBytes kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
May  9 18:23:22 TerryBytes kernel: rcu_dump_cpu_stacks+0x9f/0xc6
May  9 18:23:22 TerryBytes kernel: rcu_sched_clock_irq+0x1ec/0x543
May  9 18:23:22 TerryBytes kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
May  9 18:23:22 TerryBytes kernel: update_process_times+0x50/0x6e
May  9 18:23:22 TerryBytes kernel: tick_sched_timer+0x36/0x64
May  9 18:23:22 TerryBytes kernel: __hrtimer_run_queues+0xb7/0x10b
May  9 18:23:22 TerryBytes kernel: ? tick_sched_do_timer+0x39/0x39
May  9 18:23:22 TerryBytes kernel: hrtimer_interrupt+0x8d/0x15b
May  9 18:23:22 TerryBytes kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
May  9 18:23:22 TerryBytes kernel: asm_call_irq_on_stack+0x12/0x20
May  9 18:23:22 TerryBytes kernel: </IRQ>
May  9 18:23:22 TerryBytes kernel: sysvec_apic_timer_interrupt+0x71/0x95
May  9 18:23:22 TerryBytes kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
May  9 18:23:22 TerryBytes kernel: RIP: 0010:nf_ct_tuplehash_to_ctrack+0x8/0xe [nf_conntrack]
May  9 18:23:22 TerryBytes kernel: Code: a8 01 48 89 02 75 04 48 89 50 08 c3 48 8b 06 48 89 77 08 48 89 07 a8 01 48 89 3e 75 04 48 89 78 08 c3 0f b6 47 37 48 6b c0 c8 <48> 8d 44 07 f0 c3 48 8b 87 b8 00 00 00 48 85 c0 74 12 40 0f b6 f6
May  9 18:23:22 TerryBytes kernel: RSP: 0018:ffffc900274efe40 EFLAGS: 00000282
May  9 18:23:22 TerryBytes kernel: RAX: ffffffffffffffc8 RBX: 0000000000000000 RCX: ffff888156880000
May  9 18:23:22 TerryBytes kernel: RDX: 00000001168ffd6f RSI: ffffc900274efe5c RDI: ffff88893e4eb1c8
May  9 18:23:22 TerryBytes kernel: RBP: 0000000000009655 R08: 0000000000000000 R09: 0000746e65696369
May  9 18:23:22 TerryBytes kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffffffffa02045a0
May  9 18:23:22 TerryBytes kernel: R13: 00000000fbce36bf R14: ffff88893e4eb1c8 R15: ffff88893e4eb180
May  9 18:23:22 TerryBytes kernel: gc_worker+0x9a/0x240 [nf_conntrack]
May  9 18:23:22 TerryBytes kernel: process_one_work+0x13c/0x1d5
May  9 18:23:22 TerryBytes kernel: worker_thread+0x18b/0x22f
May  9 18:23:22 TerryBytes kernel: ? process_scheduled_works+0x27/0x27
May  9 18:23:22 TerryBytes kernel: kthread+0xe5/0xea
May  9 18:23:22 TerryBytes kernel: ? __kthread_bind_mask+0x57/0x57
May  9 18:23:22 TerryBytes kernel: ret_from_fork+0x22/0x30

 

Followed by this


 

May  9 18:25:07 TerryBytes kernel: NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 17 timed out
May  9 18:25:07 TerryBytes kernel: WARNING: CPU: 16 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0xcf/0x12b
May  9 18:25:07 TerryBytes kernel: Modules linked in: vhost_net tun vhost vhost_iotlb tap kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle nf_tables veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables mlx4_en mlx4_core bnx2 ipmi_ssif i2c_core intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd mpt3sas glue_helper intel_cstate intel_uncore raid_class input_leds scsi_transport_sas led_class wmi ipmi_si ata_piix acpi_power_meter button acpi_cpufreq i7core_edac [last unloaded: tun]
May  9 18:25:07 TerryBytes kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W I       5.10.28-Unraid #1
May  9 18:25:07 TerryBytes kernel: Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 6.4.0 07/23/2013
May  9 18:25:07 TerryBytes kernel: RIP: 0010:dev_watchdog+0xcf/0x12b
May  9 18:25:07 TerryBytes kernel: Code: 79 b7 00 00 75 38 48 89 ef c6 05 63 79 b7 00 01 e8 79 dd fc ff 44 89 e1 48 89 ee 48 c7 c7 ef 7f de 81 48 89 c2 e8 50 16 10 00 <0f> 0b eb 10 41 ff c4 48 05 40 01 00 00 41 39 f4 75 9d eb 16 48 8b
May  9 18:25:07 TerryBytes kernel: RSP: 0018:ffffc90006640ed8 EFLAGS: 00010286
May  9 18:25:07 TerryBytes kernel: RAX: 0000000000000000 RBX: ffff888124100438 RCX: 0000000000000027
May  9 18:25:07 TerryBytes kernel: RDX: 00000000ffffdfff RSI: 0000000000000001 RDI: ffff88900fc18920
May  9 18:25:07 TerryBytes kernel: RBP: ffff888124100000 R08: 0000000000000000 R09: 00000000ffffdfff
May  9 18:25:07 TerryBytes kernel: R10: ffffc90006640d08 R11: ffffc90006640d00 R12: 0000000000000011
May  9 18:25:07 TerryBytes kernel: R13: ffffc90006640f10 R14: ffffc90006640f10 R15: ffffffff820060c8
May  9 18:25:07 TerryBytes kernel: FS:  0000000000000000(0000) GS:ffff88900fc00000(0000) knlGS:0000000000000000
May  9 18:25:07 TerryBytes kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  9 18:25:07 TerryBytes kernel: CR2: 0000148b60702080 CR3: 000000000200a002 CR4: 00000000000206e0
May  9 18:25:07 TerryBytes kernel: Call Trace:
May  9 18:25:07 TerryBytes kernel: <IRQ>
May  9 18:25:07 TerryBytes kernel: call_timer_fn.isra.0+0x12/0x6f
May  9 18:25:07 TerryBytes kernel: ? netif_tx_lock+0x7a/0x7a
May  9 18:25:07 TerryBytes kernel: __run_timers.part.0+0x144/0x185
May  9 18:25:07 TerryBytes kernel: ? update_process_times+0x68/0x6e
May  9 18:25:07 TerryBytes kernel: ? hrtimer_forward+0x73/0x7b
May  9 18:25:07 TerryBytes kernel: ? tick_sched_timer+0x5a/0x64
May  9 18:25:07 TerryBytes kernel: ? timerqueue_add+0x62/0x68
May  9 18:25:07 TerryBytes kernel: ? recalibrate_cpu_khz+0x1/0x1
May  9 18:25:07 TerryBytes kernel: run_timer_softirq+0x21/0x43
May  9 18:25:07 TerryBytes kernel: __do_softirq+0xc4/0x1c2
May  9 18:25:07 TerryBytes kernel: asm_call_irq_on_stack+0x12/0x20
May  9 18:25:07 TerryBytes kernel: </IRQ>
May  9 18:25:07 TerryBytes kernel: do_softirq_own_stack+0x2c/0x39
May  9 18:25:07 TerryBytes kernel: __irq_exit_rcu+0x45/0x80

 

syslog.txt

Link to comment

If your server is locking up every 3 - 5 days then you're doing something on your server every 3 - 5 days (and I don't mean that facetiously).

 

How long did the server run before "locking up" (maximum EVER! run time)?

 

Assuming this isn't a new server, when did it start "locking up"?

 

6.

Link to comment
7 hours ago, 6of6 said:

If your server is locking up every 3 - 5 days then you're doing something on your server every 3 - 5 days (and I don't mean that facetiously).

 

How long did the server run before "locking up" (maximum EVER! run time)?

 

Assuming this isn't a new server, when did it start "locking up"?

 

6.

 

If you mean scheduled tasks, I don't have anything like that.  

 

As far as up time prior to the last two weeks of up and down my server would be up for months, Sorry bad at recording actual time.  It started locking up after I realy started getting in to Docker, so, I am going to look into JorgeB's suggestions above.  

 

Thanks! 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.