Jump to content

Docker (BTRFS Image) causing Kernel Panic


Go to solution Solved by Balor,

Recommended Posts

Hello,

 

My server has been stable for months, but lately I'm encoutering a crash every week or even every 12h. It's hard to understand was is actually causing it.

 

I finally was able to capture the log at the time of the crash and it's a kernel panic caused by Dockerd:

Jun 15 12:51:46 Tower kern kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: #PF: supervisor write access in kernel mode
Jun 15 12:51:46 Tower kern kernel: #PF: error_code(0x0002) - not-present page
Jun 15 12:51:46 Tower kern kernel: PGD 17fd79067 P4D 17fd79067 PUD 17fd7c067 PMD 0
Jun 15 12:51:46 Tower kern kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Jun 15 12:51:46 Tower kern kernel: CPU: 1 PID: 4858 Comm: dockerd Not tainted 5.19.17-Unraid #2
Jun 15 12:51:46 Tower kern kernel: Hardware name: HC Technology.,Ltd. HCAR357-NR/HCAR357-NR, BIOS 5.14 09/09/2021
Jun 15 12:51:46 Tower kern kernel: RIP: 0010:do_raw_spin_lock+0x7/0x1a
Jun 15 12:51:46 Tower kern kernel: Code: c1 07 e9 11 c1 b4 00 31 c0 48 81 ff 78 ac 83 81 72 0c 31 c0 48 81 ff 70 b1 83 81 0f 92 c0 e9 f5 c0 b4 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 08 89 c6 e8 b3 03 00 00 90 e9 db c0 b4 00 8b 07 31
Jun 15 12:51:46 Tower kern kernel: RSP: 0018:ffffc90001a07d60 EFLAGS: 00010046
Jun 15 12:51:46 Tower kern kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: RDX: 0000000000000001 RSI: ffffc90001a07dc8 RDI: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: RBP: ffffc90001a07dc8 R08: 0000000000000010 R09: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Jun 15 12:51:46 Tower kern kernel: R13: ffff8881e7cfe300 R14: 0000000000000246 R15: 0000000000000008
Jun 15 12:51:46 Tower kern kernel: FS:  00001524017eb700(0000) GS:ffff888800c40000(0000) knlGS:0000000000000000
Jun 15 12:51:46 Tower kern kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 12:51:46 Tower kern kernel: CR2: 0000000000000000 CR3: 0000000155bd8000 CR4: 00000000003506e0
Jun 15 12:51:46 Tower kern kernel: Call Trace:
Jun 15 12:51:46 Tower kern kernel: <TASK>
Jun 15 12:51:46 Tower kern kernel: _raw_spin_lock_irqsave+0x2c/0x37
Jun 15 12:51:46 Tower kern kernel: prepare_to_wait_event+0x19/0xa0
Jun 15 12:51:46 Tower kern kernel: pipe_read+0x229/0x33e
Jun 15 12:51:46 Tower kern kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
Jun 15 12:51:46 Tower kern kernel: new_sync_read+0x7c/0xb3
Jun 15 12:51:46 Tower kern kernel: ? 0xffffffff81000000
Jun 15 12:51:46 Tower kern kernel: vfs_read+0xc6/0x10c
Jun 15 12:51:46 Tower kern kernel: ksys_read+0x76/0xc2
Jun 15 12:51:46 Tower kern kernel: ? fpregs_assert_state_consistent+0x1d/0x41
Jun 15 12:51:46 Tower kern kernel: do_syscall_64+0x6b/0x81
Jun 15 12:51:46 Tower kern kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jun 15 12:51:46 Tower kern kernel: RIP: 0033:0x4baa7b
Jun 15 12:51:46 Tower kern kernel: Code: e8 2a e5 fa ff eb 88 cc cc cc cc cc cc cc cc e8 db 2c fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
Jun 15 12:51:46 Tower kern kernel: RSP: 002b:000000c0183f6ad8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: RAX: ffffffffffffffda RBX: 000000c00005e500 RCX: 00000000004baa7b
Jun 15 12:51:46 Tower kern kernel: RDX: 0000000000000008 RSI: 000000c0183f6bc0 RDI: 0000000000000096
Jun 15 12:51:46 Tower kern kernel: RBP: 000000c0183f6b28 R08: 0000000000000001 R09: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: R10: 0000000000000008 R11: 0000000000000212 R12: 00000000004af3ed
Jun 15 12:51:46 Tower kern kernel: R13: 0000000000000000 R14: 000000c0107fcd00 R15: ffffffffffffffff
Jun 15 12:51:46 Tower kern kernel: </TASK>
Jun 15 12:51:46 Tower kern kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha nfsv3 nfs xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter zstd zram zsmalloc xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod xt_MASQUERADE xt_mark iptable_nat ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun tcp_diag inet_diag efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls r8169 realtek amdgpu edac_mce_amd edac_core kvm_amd kvm gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd drm rapl k10temp ahci libahci
Jun 15 12:51:46 Tower kern kernel: i2c_piix4 agpgart i2c_core syscopyarea sysfillrect sysimgblt ccp fb_sys_fops nvme nvme_core thermal tpm_crb tpm_tis video tpm_tis_core backlight tpm button acpi_cpufreq unix [last unloaded: realtek]
Jun 15 12:51:46 Tower kern kernel: CR2: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: ---[ end trace 0000000000000000 ]---
Jun 15 12:51:46 Tower kern kernel: RIP: 0010:do_raw_spin_lock+0x7/0x1a
Jun 15 12:51:46 Tower kern kernel: Code: c1 07 e9 11 c1 b4 00 31 c0 48 81 ff 78 ac 83 81 72 0c 31 c0 48 81 ff 70 b1 83 81 0f 92 c0 e9 f5 c0 b4 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 74 08 89 c6 e8 b3 03 00 00 90 e9 db c0 b4 00 8b 07 31
Jun 15 12:51:46 Tower kern kernel: RSP: 0018:ffffc90001a07d60 EFLAGS: 00010046
Jun 15 12:51:46 Tower kern kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: RDX: 0000000000000001 RSI: ffffc90001a07dc8 RDI: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: RBP: ffffc90001a07dc8 R08: 0000000000000010 R09: 0000000000000000
Jun 15 12:51:46 Tower kern kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Jun 15 12:51:46 Tower kern kernel: R13: ffff8881e7cfe300 R14: 0000000000000246 R15: 0000000000000008
Jun 15 12:51:46 Tower kern kernel: FS:  00001524017eb700(0000) GS:ffff888800c40000(0000) knlGS:0000000000000000
Jun 15 12:51:46 Tower kern kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 12:51:46 Tower kern kernel: CR2: 0000000000000000 CR3: 0000000155bd8000 CR4: 00000000003506e0
Jun 15 12:51:46 Tower kern kernel: note: dockerd[4858] exited with preempt_count 1

I'm attaching a diagnosis that I captured after a hard restart of the server after this crash.

 

Did anybody encounter issues like this ?

 

 

tower-diagnostics-20230615-1311.zip

Link to comment
  • 2 weeks later...
  • 4 weeks later...
  • Solution

Well in the end it was a hardware issue, the memory was failing.

Replace those stick by good ones from corsair and since that, no more issues. The previous stick looks like they were overheating (I can see the temp of the server have drastically reduced since I replaced them).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...