aeleos Posted January 24 Share Posted January 24 I have been battling random lockups on my usually stable server. The only change I made recently was to add some docker containers that have a relatively high CPU and memory usage. I was able to capture some logs using a remote syslog server, as as soon as the lockup happened I wouldn't get any logs written. I noticed some logs with 'shfs invoked oom-killer: ' errors but I was seeing them both before and during the lockup. The logs that I have from the time period of the crash have this, with a series of oom errors after eventually having having the full lockup around 45 minutes later. I can post the full remote syslog if someone thinks it would help but I would need to anonymize it. I have run out ideas other than maybe replacing the RAM with 64GB instead of 32, or getting a new CPU/MB as I can only suspect its a hardware issue. Jan 24 09:46:51 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000116 Jan 24 09:46:51 Tower kernel: #PF: supervisor read access in kernel mode Jan 24 09:46:51 Tower kernel: #PF: error_code(0x0000) - not-present page Jan 24 09:46:51 Tower kernel: PGD 164e69067 P4D 164e69067 PUD 59bea3067 PMD 0 Jan 24 09:46:51 Tower kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Jan 24 09:46:51 Tower kernel: CPU: 10 PID: 27565 Comm: traefik Tainted: G O 5.19.17-Unraid #2 Jan 24 09:46:51 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI/X470 AORUS GAMING 5 WIFI-CF, BIOS F63a 02/17/2022 Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 Jan 24 09:46:51 Tower kernel: Call Trace: Jan 24 09:46:51 Tower kernel: <TASK> Jan 24 09:46:51 Tower kernel: __filemap_get_folio+0x98/0x1ff Jan 24 09:46:51 Tower kernel: filemap_fault+0x6e/0x524 Jan 24 09:46:51 Tower kernel: __do_fault+0x30/0x6e Jan 24 09:46:51 Tower kernel: __handle_mm_fault+0x9a5/0xc7d Jan 24 09:46:51 Tower kernel: ? __fget_light+0x3d/0x4c Jan 24 09:46:51 Tower kernel: handle_mm_fault+0x113/0x1d7 Jan 24 09:46:51 Tower kernel: do_user_addr_fault+0x36a/0x514 Jan 24 09:46:51 Tower kernel: exc_page_fault+0xfc/0x11e Jan 24 09:46:51 Tower kernel: asm_exc_page_fault+0x22/0x30 Jan 24 09:46:51 Tower kernel: RIP: 0033:0x45f173 Jan 24 09:46:51 Tower kernel: Code: 94 24 08 01 00 00 48 39 c6 0f 8e d8 0b 00 00 4c 89 9c 24 00 01 00 00 4d 89 e0 4c 8b a4 24 08 03 00 00 4c 8b 9c 24 10 03 00 00 <41> 83 7c 24 14 00 0f 84 b1 0b 00 00 4c 89 9c 24 f0 02 00 00 4c 89 Jan 24 09:46:51 Tower kernel: RSP: 002b:000000c00058d8e0 EFLAGS: 00010206 Jan 24 09:46:51 Tower kernel: RAX: 0000000000000005 RBX: 0000000000000000 RCX: 000000c000bfa4e0 Jan 24 09:46:51 Tower kernel: RDX: 0000000000c08500 RSI: 000000007fffffff RDI: 0000000000000000 Jan 24 09:46:51 Tower kernel: RBP: 000000c00058dc40 R08: 0000000000000000 R09: 000000000043ce36 Jan 24 09:46:51 Tower kernel: R10: 000000c000c08498 R11: 0000000005d7be80 R12: 00000000051fe940 Jan 24 09:46:51 Tower kernel: R13: 0000000000000000 R14: 000000c000561380 R15: 0000000000000000 Jan 24 09:46:51 Tower kernel: </TASK> Jan 24 09:46:51 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs dm_crypt dm_mod dax md_mod it87 hwmon_vid efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls mpt3sas igb btusb btrtl btbcm raid_class gigabyte_wmi wmi_bmof mxm_wmi edac_mce_amd edac_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl btintel k10temp bluetooth nvme i2c_algo_bit i2c_piix4 apex(O) scsi_transport_sas gasket(O) i2c_core ahci nvme_core ecdh_generic ecc libahci thermal Jan 24 09:46:51 Tower kernel: tpm_crb tpm_tis tpm_tis_core tpm wmi button unix [last unloaded: tun] Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 Jan 24 09:46:51 Tower kernel: ---[ end trace 0000000000000000 ]--- Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 tower-diagnostics-20230124-1604.zip Quote Link to comment
Solution JorgeB Posted January 25 Solution Share Posted January 25 Possibly this: Quote Link to comment
aeleos Posted January 25 Author Share Posted January 25 7 hours ago, JorgeB said: Possibly this: Interesting you may just be right, one of my torrent containers was running libtorrent >2. I'm downgrading and I will mark this if it resolves the issue 1 Quote Link to comment
aeleos Posted January 28 Author Share Posted January 28 On 1/25/2023 at 2:13 AM, JorgeB said: Possibly this: After downgrading its been over 3 days with no crashes so I think this was the issue. Can't believe an auto update could trigger a bug like this, thanks so much for your help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.