February 24, 20233 yr Every few days, I end up having to hard reboot my server because it has locked up and become inaccessible. After reboot, it runs just fine for several days, then locks up again. Here is the latest syslog, but I can't make anything out of it (in terms of troubleshooting): Quote Feb 21 18:00:42 Tower kernel: BUG: unable to handle page fault for address: ffffffff82694e40 Feb 21 18:00:42 Tower kernel: #PF: supervisor write access in kernel mode Feb 21 18:00:42 Tower kernel: #PF: error_code(0x0002) - not-present page Feb 21 18:00:42 Tower kernel: PGD 220e067 P4D 220e067 PUD 220f063 PMD 13665c063 PTE 800ffffffd96b062 Feb 21 18:00:42 Tower kernel: Oops: 0002 [#1] PREEMPT SMP PTI Feb 21 18:00:42 Tower kernel: CPU: 1 PID: 9219 Comm: node Not tainted 5.19.17-Unraid #2 Feb 21 18:00:42 Tower kernel: Hardware name: Equus Computer Systems Nobilis/DQ77MK, BIOS MKQ7710H.86A.0071.2015.0728.1443 07/28/2015 Feb 21 18:00:42 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x153/0x1d0 Feb 21 18:00:42 Tower kernel: Code: b9 01 00 00 00 f0 0f b1 0b 74 76 eb cc c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 ce 02 00 48 03 04 f5 e0 6a 16 82 <48> 89 10 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 32 48 85 f6 74 bc Feb 21 18:00:42 Tower kernel: RSP: 0018:ffffc90000cb7ad0 EFLAGS: 00010082 Feb 21 18:00:42 Tower kernel: RAX: ffffffff82694e40 RBX: ffff88817686c440 RCX: 0000000000080000 Feb 21 18:00:42 Tower kernel: RDX: ffff88840e46ce00 RSI: 000000000000000f RDI: ffff88817686c440 Feb 21 18:00:42 Tower kernel: RBP: 0000000000000001 R08: ffffc90000cb7c30 R09: fefefefefefefeff Feb 21 18:00:42 Tower kernel: R10: 0000000000000003 R11: fefefefefefefeff R12: ffff88840e46ce00 Feb 21 18:00:42 Tower kernel: R13: 0000000000000000 R14: 0000000000000202 R15: ffff88816a8f4780 Feb 21 18:00:42 Tower kernel: FS: 00001539378b3780(0000) GS:ffff88840e440000(0000) knlGS:0000000000000000 Feb 21 18:00:42 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 18:00:42 Tower kernel: CR2: ffffffff82694e40 CR3: 00000001c3df4001 CR4: 00000000000606e0 Feb 21 18:00:42 Tower kernel: Call Trace: Feb 21 18:00:42 Tower kernel: <TASK> Feb 21 18:00:42 Tower kernel: do_raw_spin_lock+0x14/0x1a Feb 21 18:00:42 Tower kernel: _raw_spin_lock_irqsave+0x2c/0x37 Feb 21 18:00:42 Tower kernel: folio_memcg_lock+0x47/0x88 Feb 21 18:00:42 Tower kernel: page_remove_rmap+0x1b/0x239 Feb 21 18:00:42 Tower kernel: unmap_page_range+0x451/0x66e Feb 21 18:00:42 Tower kernel: unmap_vmas+0x87/0xbb Feb 21 18:00:42 Tower kernel: exit_mmap+0xdc/0x15a Feb 21 18:00:42 Tower kernel: __mmput+0x43/0xdb Feb 21 18:00:42 Tower kernel: begin_new_exec+0x6f7/0x945 Feb 21 18:00:42 Tower kernel: load_elf_binary+0x22c/0x12ae Feb 21 18:00:42 Tower kernel: ? __kernel_read+0x100/0x145 Feb 21 18:00:42 Tower kernel: ? __kernel_read+0x100/0x145 Feb 21 18:00:42 Tower kernel: bprm_execve+0x23a/0x52b Feb 21 18:00:42 Tower kernel: do_execveat_common.isra.0+0x1a9/0x1d2 Feb 21 18:00:42 Tower kernel: __x64_sys_execve+0x38/0x44 Feb 21 18:00:42 Tower kernel: do_syscall_64+0x6b/0x81 Feb 21 18:00:42 Tower kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd Feb 21 18:00:42 Tower kernel: RIP: 0033:0x15393797ea07 Feb 21 18:00:42 Tower kernel: Code: Unable to access opcode bytes at RIP 0x15393797e9dd. Feb 21 18:00:42 Tower kernel: RSP: 002b:00007ffe9f332bc8 EFLAGS: 00000202 ORIG_RAX: 000000000000003b Feb 21 18:00:42 Tower kernel: RAX: ffffffffffffffda RBX: 00007ffe9f332e98 RCX: 000015393797ea07 Feb 21 18:00:42 Tower kernel: RDX: 000015391007daf0 RSI: 0000000002fa7710 RDI: 00007ffe9f332bd0 Feb 21 18:00:42 Tower kernel: RBP: 00007ffe9f332c90 R08: 000000000301b291 R09: 0000000000000003 Feb 21 18:00:42 Tower kernel: R10: 000000000301c240 R11: 0000000000000202 R12: 0000000002fa7710 Feb 21 18:00:42 Tower kernel: R13: 000015391007daf0 R14: 0000000000000001 R15: 000000000301b28d Feb 21 18:00:42 Tower kernel: </TASK> Feb 21 18:00:42 Tower kernel: Modules linked in: xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nct6775 nct6775_core hwmon_vid wmi ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls e1000e i915 x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy i2c_algo_bit kvm_intel ttm drm_display_helper drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_gtt intel_uncore i2c_i801 firewire_ohci agpgart i2c_smbus firewire_core ahci i2c_core tpm_tis libahci tpm_tis_core syscopyarea sysfillrect sysimgblt fb_sys_fops tpm thermal fan video backlight button unix [last unloaded: e1000e] Feb 21 18:00:42 Tower kernel: CR2: ffffffff82694e40 Feb 21 18:00:42 Tower kernel: ---[ end trace 0000000000000000 ]--- Feb 21 18:00:42 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x153/0x1d0 Feb 21 18:00:42 Tower kernel: Code: b9 01 00 00 00 f0 0f b1 0b 74 76 eb cc c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 00 ce 02 00 48 03 04 f5 e0 6a 16 82 <48> 89 10 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 32 48 85 f6 74 bc Feb 21 18:00:42 Tower kernel: RSP: 0018:ffffc90000cb7ad0 EFLAGS: 00010082 Feb 21 18:00:42 Tower kernel: RAX: ffffffff82694e40 RBX: ffff88817686c440 RCX: 0000000000080000 Feb 21 18:00:42 Tower kernel: RDX: ffff88840e46ce00 RSI: 000000000000000f RDI: ffff88817686c440 Feb 21 18:00:42 Tower kernel: RBP: 0000000000000001 R08: ffffc90000cb7c30 R09: fefefefefefefeff Feb 21 18:00:42 Tower kernel: R10: 0000000000000003 R11: fefefefefefefeff R12: ffff88840e46ce00 Feb 21 18:00:42 Tower kernel: R13: 0000000000000000 R14: 0000000000000202 R15: ffff88816a8f4780 Feb 21 18:00:42 Tower kernel: FS: 00001539378b3780(0000) GS:ffff88840e440000(0000) knlGS:0000000000000000 Feb 21 18:00:42 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 18:00:42 Tower kernel: CR2: 000015393797e9dd CR3: 00000001c3df4001 CR4: 00000000000606e0 Feb 21 18:00:42 Tower kernel: note: node[9219] exited with preempt_count 2 This was repeated several times before getting to: Quote Feb 21 19:03:30 Tower kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 3757375 jiffies s: 992877 root: 0x2/. Feb 21 19:03:30 Tower kernel: rcu: blocking rcu_node structures (internal RCU debug): Feb 21 19:03:30 Tower kernel: Task dump for CPU 1: Feb 21 19:03:30 Tower kernel: task:node state:R running task stack: 0 pid:26563 ppid: 25966 flags:0x00004008 Feb 21 19:03:30 Tower kernel: Call Trace: Feb 21 19:03:30 Tower kernel: <TASK> Feb 21 19:03:30 Tower kernel: ? get_page_from_freelist+0x6ff/0x82d Feb 21 19:03:30 Tower kernel: ? __schedule+0x59e/0x5f6 Feb 21 19:03:30 Tower kernel: ? preempt_schedule_common+0x25/0x39 Feb 21 19:03:30 Tower kernel: ? __cond_resched+0x17/0x21 Feb 21 19:03:30 Tower kernel: ? native_queued_spin_lock_slowpath+0xc6/0x1d0 Feb 21 19:03:30 Tower kernel: ? do_raw_spin_lock+0x14/0x1a Feb 21 19:03:30 Tower kernel: ? _raw_spin_lock_irqsave+0x2c/0x37 Feb 21 19:03:30 Tower kernel: ? folio_memcg_lock+0x47/0x88 Feb 21 19:03:30 Tower kernel: ? page_remove_rmap+0x1b/0x239 Feb 21 19:03:30 Tower kernel: ? wp_page_copy+0x39a/0x448 Feb 21 19:03:30 Tower kernel: ? __handle_mm_fault+0x6ac/0xc7d Feb 21 19:03:30 Tower kernel: ? __fget+0x33/0x41 Feb 21 19:03:30 Tower kernel: ? handle_mm_fault+0x113/0x1d7 Feb 21 19:03:30 Tower kernel: ? do_user_addr_fault+0x36a/0x514 Feb 21 19:03:30 Tower kernel: ? exc_page_fault+0xfc/0x11e Feb 21 19:03:30 Tower kernel: ? asm_exc_page_fault+0x22/0x30 Feb 21 19:03:30 Tower kernel: </TASK> That was the last log recorded until the server was rebooted.
February 24, 20233 yr Author That sure does look like a similar issue, but I'm not running qbittorrent (or any other torrent containers). That does make me think it is a container issue rather than hardward issue though.
February 24, 20233 yr Community Expert Try stopping all containers, and if it doesn't crash start enabling one by one.
February 24, 20233 yr Author Since it may be a couple weeks before I know anything, I'll try the opposite route. Stop my least important/used containers first.
February 27, 20233 yr Following this thread. Mine locked up this afternoon. I couldn't ping it nor bring up the web gui. When I did a remote session into it with iDRAC, it seemed unresponsive/locked up. I went down to the basement to check it out and it appeared on.
March 10, 20233 yr Author It has been a couple of weeks now and I haven't had any issues yet. I'll still give it a bit longer, but I'm fairly confident in saying that it was a container.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.