January 24, 20233 yr I have been battling random lockups on my usually stable server. The only change I made recently was to add some docker containers that have a relatively high CPU and memory usage. I was able to capture some logs using a remote syslog server, as as soon as the lockup happened I wouldn't get any logs written. I noticed some logs with 'shfs invoked oom-killer: ' errors but I was seeing them both before and during the lockup. The logs that I have from the time period of the crash have this, with a series of oom errors after eventually having having the full lockup around 45 minutes later. I can post the full remote syslog if someone thinks it would help but I would need to anonymize it. I have run out ideas other than maybe replacing the RAM with 64GB instead of 32, or getting a new CPU/MB as I can only suspect its a hardware issue. Jan 24 09:46:51 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000116 Jan 24 09:46:51 Tower kernel: #PF: supervisor read access in kernel mode Jan 24 09:46:51 Tower kernel: #PF: error_code(0x0000) - not-present page Jan 24 09:46:51 Tower kernel: PGD 164e69067 P4D 164e69067 PUD 59bea3067 PMD 0 Jan 24 09:46:51 Tower kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Jan 24 09:46:51 Tower kernel: CPU: 10 PID: 27565 Comm: traefik Tainted: G O 5.19.17-Unraid #2 Jan 24 09:46:51 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI/X470 AORUS GAMING 5 WIFI-CF, BIOS F63a 02/17/2022 Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 Jan 24 09:46:51 Tower kernel: Call Trace: Jan 24 09:46:51 Tower kernel: <TASK> Jan 24 09:46:51 Tower kernel: __filemap_get_folio+0x98/0x1ff Jan 24 09:46:51 Tower kernel: filemap_fault+0x6e/0x524 Jan 24 09:46:51 Tower kernel: __do_fault+0x30/0x6e Jan 24 09:46:51 Tower kernel: __handle_mm_fault+0x9a5/0xc7d Jan 24 09:46:51 Tower kernel: ? __fget_light+0x3d/0x4c Jan 24 09:46:51 Tower kernel: handle_mm_fault+0x113/0x1d7 Jan 24 09:46:51 Tower kernel: do_user_addr_fault+0x36a/0x514 Jan 24 09:46:51 Tower kernel: exc_page_fault+0xfc/0x11e Jan 24 09:46:51 Tower kernel: asm_exc_page_fault+0x22/0x30 Jan 24 09:46:51 Tower kernel: RIP: 0033:0x45f173 Jan 24 09:46:51 Tower kernel: Code: 94 24 08 01 00 00 48 39 c6 0f 8e d8 0b 00 00 4c 89 9c 24 00 01 00 00 4d 89 e0 4c 8b a4 24 08 03 00 00 4c 8b 9c 24 10 03 00 00 <41> 83 7c 24 14 00 0f 84 b1 0b 00 00 4c 89 9c 24 f0 02 00 00 4c 89 Jan 24 09:46:51 Tower kernel: RSP: 002b:000000c00058d8e0 EFLAGS: 00010206 Jan 24 09:46:51 Tower kernel: RAX: 0000000000000005 RBX: 0000000000000000 RCX: 000000c000bfa4e0 Jan 24 09:46:51 Tower kernel: RDX: 0000000000c08500 RSI: 000000007fffffff RDI: 0000000000000000 Jan 24 09:46:51 Tower kernel: RBP: 000000c00058dc40 R08: 0000000000000000 R09: 000000000043ce36 Jan 24 09:46:51 Tower kernel: R10: 000000c000c08498 R11: 0000000005d7be80 R12: 00000000051fe940 Jan 24 09:46:51 Tower kernel: R13: 0000000000000000 R14: 000000c000561380 R15: 0000000000000000 Jan 24 09:46:51 Tower kernel: </TASK> Jan 24 09:46:51 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs dm_crypt dm_mod dax md_mod it87 hwmon_vid efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls mpt3sas igb btusb btrtl btbcm raid_class gigabyte_wmi wmi_bmof mxm_wmi edac_mce_amd edac_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl btintel k10temp bluetooth nvme i2c_algo_bit i2c_piix4 apex(O) scsi_transport_sas gasket(O) i2c_core ahci nvme_core ecdh_generic ecc libahci thermal Jan 24 09:46:51 Tower kernel: tpm_crb tpm_tis tpm_tis_core tpm wmi button unix [last unloaded: tun] Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 Jan 24 09:46:51 Tower kernel: ---[ end trace 0000000000000000 ]--- Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 tower-diagnostics-20230124-1604.zip
January 25, 20233 yr Author 7 hours ago, JorgeB said: Possibly this: Interesting you may just be right, one of my torrent containers was running libtorrent >2. I'm downgrading and I will mark this if it resolves the issue
January 28, 20233 yr Author On 1/25/2023 at 2:13 AM, JorgeB said: Possibly this: After downgrading its been over 3 days with no crashes so I think this was the issue. Can't believe an auto update could trigger a bug like this, thanks so much for your help!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.