After upgrading from 6.12.6 to 6.12.8, I started getting segfaults and call traces, from various different processes. The WebUI would hang after a while, and the entire system would become unresponsive (ssh, kvm, etc). Safe mode without plugins was similarly affected.
Unfortunately, I was not able to capture syslog of every crash and I didn't want to enable writing to flash as I had to force shutdown server each time and didn't want to deal with corrupted flash drive in addition, so these are some occurrences that I managed to capture.
Mar 3 01:00:01 Tower root: mover: started
Mar 3 01:00:02 Tower root: mover: finished
Mar 3 01:04:31 Tower kernel: traps: python3[12337] general protection fault ip:14b7635f7b2c sp:14b70994f730 error:0 in libpython3.11.so.1.0[14b763548000+1d3000]
Mar 3 01:12:39 Tower kernel: python3[21664]: segfault at 590c5402 ip 000014a8678c8585 sp 00007ffd90b8faf0 error 6 in libpython3.9.so.1.0[14a86772a000+201000] likely on CPU 12 (core 24, socket 0)
Mar 3 01:12:39 Tower kernel: Code: 24 10 4c 8b 44 24 08 44 89 ea 48 8b 0c 24 48 8d 35 95 4b 0b 00 e8 5b de ff ff c7 85 30 03 00 00 00 00 00 00 e9 2c ff ff ff 8b <87> a8 02 00 00 39 87 ac 02 00 00 7f 10 8b 87 90 02 00 00 39 87 94
This was not isolated to python3 (which is not a standard UR lib), but also smartctl, php-fpm and other processes.
Example of call traces:
Mar 3 04:00:01 Tower root: mover: finished Mar 3 04:32:44 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038 Mar 3 04:32:44 Tower kernel: #PF: supervisor read access in kernel mode Mar 3 04:32:44 Tower kernel: #PF: error_code(0x0000) - not-present page Mar 3 04:32:44 Tower kernel: PGD 52c52a067 P4D 52c52a067 PUD 4a43f3067 PMD 0 Mar 3 04:32:44 Tower kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Mar 3 04:32:44 Tower kernel: CPU: 13 PID: 8018 Comm: smartctl_type Tainted: P D O 6.1.74-Unraid #1 Mar 3 04:32:44 Tower kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3302 02/21/2024 Mar 3 04:32:44 Tower kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Mar 3 04:32:44 Tower kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 5a e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Mar 3 04:32:44 Tower kernel: RSP: 0018:ffffc90030997ca0 EFLAGS: 00010202 Mar 3 04:32:44 Tower kernel: RAX: 0000000000000001 RBX: ffff888100045a00 RCX: 0000000000000001 Mar 3 04:32:44 Tower kernel: RDX: ffffc90030997cf0 RSI: 0000000000000000 RDI: ffff888100045a00 Mar 3 04:32:44 Tower kernel: RBP: 0000000000000000 R08: ffff8889a6b4d300 R09: ffffffff8184e49c Mar 3 04:32:44 Tower kernel: R10: ffff8889a6b4d300 R11: ffff888aa3934100 R12: 0000000000000000 Mar 3 04:32:44 Tower kernel: R13: ffff8889a6b4d500 R14: ffffc90030997cf0 R15: 0000000000000071 Mar 3 04:32:44 Tower kernel: FS: 0000000000000000(0000) GS:ffff889fffb40000(0000) knlGS:0000000000000000 Mar 3 04:32:44 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 04:32:44 Tower kernel: CR2: 0000000000000038 CR3: 0000000648b1e000 CR4: 0000000000750ee0 Mar 3 04:32:44 Tower kernel: PKRU: 55555554 Mar 3 04:32:44 Tower kernel: Call Trace: Mar 3 04:32:44 Tower kernel: <TASK> Mar 3 04:32:44 Tower kernel: ? __die_body+0x1a/0x5c Mar 3 04:32:44 Tower kernel: ? page_fault_oops+0x329/0x376 Mar 3 04:32:44 Tower kernel: ? do_user_addr_fault+0x12e/0x48d Mar 3 04:32:44 Tower kernel: ? exc_page_fault+0xfb/0x11d Mar 3 04:32:44 Tower kernel: ? asm_exc_page_fault+0x22/0x30 Mar 3 04:32:44 Tower kernel: ? mas_destroy+0xa8/0xbb Mar 3 04:32:44 Tower kernel: ? memcg_slab_free_hook+0x28/0xcf Mar 3 04:32:44 Tower kernel: kmem_cache_free+0xb7/0x154 Mar 3 04:32:44 Tower kernel: ? mas_destroy+0xa8/0xbb Mar 3 04:32:44 Tower kernel: mas_destroy+0xa8/0xbb Mar 3 04:32:44 Tower kernel: mmap_region+0x457/0x61e Mar 3 04:32:44 Tower kernel: ? preempt_latency_start+0x1e/0x46 Mar 3 04:32:44 Tower kernel: do_mmap+0x3bc/0x428 Mar 3 04:32:44 Tower kernel: vm_mmap_pgoff+0xbb/0x112 Mar 3 04:32:44 Tower kernel: ksys_mmap_pgoff+0x138/0x166 Mar 3 04:32:44 Tower kernel: do_syscall_64+0x68/0x81 Mar 3 04:32:44 Tower kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Mar 3 04:32:44 Tower kernel: RIP: 0033:0x15215c49fe33 Mar 3 04:32:44 Tower kernel: Code: 1f 84 00 00 00 00 00 4c 89 23 31 c0 48 c7 43 08 00 04 00 00 eb e2 90 41 89 ca 41 f7 c1 ff 0f 00 00 75 14 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1d c3 0f 1f 40 00 c7 05 36 34 01 00 16 00 00 Mar 3 04:32:44 Tower kernel: RSP: 002b:00007fff326694c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 Mar 3 04:32:44 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 000015215c49fe33 Mar 3 04:32:44 Tower kernel: RDX: 0000000000000001 RSI: 0000000000162000 RDI: 0000152158bef000 Mar 3 04:32:44 Tower kernel: RBP: 00007fff32669860 R08: 0000000000000004 R09: 000000000004f000 Mar 3 04:32:44 Tower kernel: R10: 0000000000000812 R11: 0000000000000246 R12: 00007fff32669540 Mar 3 04:32:44 Tower kernel: R13: 0000152158d78690 R14: 00007fff32669900 R15: 0000152158ba0000 Mar 3 04:32:44 Tower kernel: </TASK> Mar 3 04:32:44 Tower kernel: Modules linked in: veth xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge nvidia_uvm(PO) xfs dm_crypt dm_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap 8021q garp mrp stp llc igc nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel ast drm_vram_helper i2c_algo_bit Mar 3 04:32:44 Tower kernel: drm_ttm_helper crypto_simd ttm cryptd drm_kms_helper mei_hdcp mei_pxp i2c_i801 rapl intel_cstate drm ipmi_ssif mpt3sas agpgart mei_me ahci cdc_ether wmi_bmof i2c_smbus nvme tpm_crb syscopyarea input_leds raid_class sr_mod usbnet sysfillrect intel_uncore sysimgblt i2c_core mei joydev led_class nvme_core cdrom libahci mii scsi_transport_sas vmd acpi_ipmi fb_sys_fops thermal fan video tpm_tis tpm_tis_core ipmi_si wmi backlight tpm intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc] Mar 3 04:32:44 Tower kernel: CR2: 0000000000000038 Mar 3 04:32:44 Tower kernel: ---[ end trace 0000000000000000 ]--- Mar 3 04:32:44 Tower kernel: RIP: 0010:do_dentry_open+0x206/0x304 Mar 3 04:32:44 Tower kernel: Code: 43 44 a8 04 74 11 48 8b 53 28 48 83 7a 08 00 75 06 83 e0 fb 89 43 44 48 8b 8b d0 00 00 00 48 8b 81 90 00 00 00 48 85 c0 74 0e <48> 83 78 58 00 74 07 81 4b 44 00 00 40 00 8b 53 40 89 d0 25 3f fc Mar 3 04:32:44 Tower kernel: RSP: 0018:ffffc90034467cd8 EFLAGS: 00010282 Mar 3 04:32:44 Tower kernel: RAX: c350ffff8881e30d RBX: ffff8887885be300 RCX: ffff8881e30dc2c2 Mar 3 04:32:44 Tower kernel: RDX: ffffffffa485a140 RSI: 0000000000000000 RDI: 00000000ffffffff Mar 3 04:32:44 Tower kernel: RBP: 0000000000000000 R08: ffffffffa4820098 R09: ffffffffa482090f Mar 3 04:32:44 Tower kernel: R10: 0000000000000000 R11: ffff88814f199268 R12: ffff8881e30dc138 Mar 3 04:32:44 Tower kernel: R13: ffff8887885be310 R14: ffffffffa4824635 R15: 0000000000000000 Mar 3 04:32:44 Tower kernel: FS: 0000000000000000(0000) GS:ffff889fffb40000(0000) knlGS:0000000000000000 Mar 3 04:32:44 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 04:32:44 Tower kernel: CR2: 0000000000000038 CR3: 0000000648b1e000 CR4: 0000000000750ee0 Mar 3 04:32:44 Tower kernel: PKRU: 55555554 Mar 3 04:32:44 Tower kernel: note: smartctl_type[8018] exited with irqs disabled Mar 3 05:03:30 Tower kernel: traps: cache_dirs[22836] general protection fault ip:4e932f sp:7ffe6185a8a0 error:0 in bash[426000+c5000]
And another one:
Mar 3 02:00:01 Tower root: mover: finished Mar 3 02:13:49 Tower kernel: general protection fault, probably for non-canonical address 0xc350ffff8881e365: 0000 [#1] PREEMPT SMP NOPTI Mar 3 02:13:49 Tower kernel: CPU: 12 PID: 32677 Comm: find Tainted: P O 6.1.74-Unraid #1 Mar 3 02:13:49 Tower kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3302 02/21/2024 Mar 3 02:13:49 Tower kernel: RIP: 0010:do_dentry_open+0x206/0x304 Mar 3 02:13:49 Tower kernel: Code: 43 44 a8 04 74 11 48 8b 53 28 48 83 7a 08 00 75 06 83 e0 fb 89 43 44 48 8b 8b d0 00 00 00 48 8b 81 90 00 00 00 48 85 c0 74 0e <48> 83 78 58 00 74 07 81 4b 44 00 00 40 00 8b 53 40 89 d0 25 3f fc Mar 3 02:13:49 Tower kernel: RSP: 0018:ffffc90034467cd8 EFLAGS: 00010282 Mar 3 02:13:49 Tower kernel: RAX: c350ffff8881e30d RBX: ffff8887885be300 RCX: ffff8881e30dc2c2 Mar 3 02:13:49 Tower kernel: RDX: ffffffffa485a140 RSI: 0000000000000000 RDI: 00000000ffffffff Mar 3 02:13:49 Tower kernel: RBP: 0000000000000000 R08: ffffffffa4820098 R09: ffffffffa482090f Mar 3 02:13:49 Tower kernel: R10: 0000000000000000 R11: ffff88814f199268 R12: ffff8881e30dc138 Mar 3 02:13:49 Tower kernel: R13: ffff8887885be310 R14: ffffffffa4824635 R15: 0000000000000000 Mar 3 02:13:49 Tower kernel: FS: 000014c2faaeb740(0000) GS:ffff889fffb00000(0000) knlGS:0000000000000000 Mar 3 02:13:49 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 02:13:49 Tower kernel: CR2: 0000000000475048 CR3: 0000000455dbe000 CR4: 0000000000750ee0 Mar 3 02:13:49 Tower kernel: PKRU: 55555554 Mar 3 02:13:49 Tower kernel: Call Trace: Mar 3 02:13:49 Tower kernel: <TASK> Mar 3 02:13:49 Tower kernel: ? __die_body+0x1a/0x5c Mar 3 02:13:49 Tower kernel: ? die_addr+0x38/0x51 Mar 3 02:13:49 Tower kernel: ? exc_general_protection+0x30f/0x345 Mar 3 02:13:49 Tower kernel: ? asm_exc_general_protection+0x22/0x30 Mar 3 02:13:49 Tower kernel: ? xfs_dir_fsync+0x61/0x61 [xfs] Mar 3 02:13:49 Tower kernel: ? xfs_buf_readahead_map+0x5/0x50 [xfs] Mar 3 02:13:49 Tower kernel: ? xfs_buf_get_map+0x66c/0x804 [xfs] Mar 3 02:13:49 Tower kernel: ? do_dentry_open+0x206/0x304 Mar 3 02:13:49 Tower kernel: ? do_dentry_open+0x192/0x304 Mar 3 02:13:49 Tower kernel: path_openat+0x8f4/0xa4d Mar 3 02:13:49 Tower kernel: do_filp_open+0x55/0xb8 Mar 3 02:13:49 Tower kernel: ? getname_flags+0x29/0x152 Mar 3 02:13:49 Tower kernel: ? kmem_cache_alloc+0x122/0x14d Mar 3 02:13:49 Tower kernel: ? _raw_spin_unlock+0x14/0x29 Mar 3 02:13:49 Tower kernel: do_sys_openat2+0x6c/0xd9 Mar 3 02:13:49 Tower kernel: do_sys_open+0x3a/0x5a Mar 3 02:13:49 Tower kernel: do_syscall_64+0x68/0x81 Mar 3 02:13:49 Tower kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Mar 3 02:13:49 Tower kernel: RIP: 0033:0x14c2fabf19ef Mar 3 02:13:49 Tower kernel: Code: 89 4c 24 58 f6 c2 40 75 32 89 d0 45 31 d2 25 00 00 41 00 3d 00 00 41 00 74 21 80 3d f2 cb 0e 00 00 74 45 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 78 c3 48 8d 84 24 80 Mar 3 02:13:49 Tower kernel: RSP: 002b:00007fff682c9fc0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 Mar 3 02:13:49 Tower kernel: RAX: ffffffffffffffda RBX: 00007fff682ca13c RCX: 000014c2fabf19ef Mar 3 02:13:49 Tower kernel: RDX: 00000000000b0900 RSI: 000000000045d2d0 RDI: 000000000000000e Mar 3 02:13:49 Tower kernel: RBP: 000000000045d1d0 R08: 0000000000000073 R09: 0000000000000000 Mar 3 02:13:49 Tower kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 Mar 3 02:13:49 Tower kernel: R13: 0000000000000000 R14: 0000000000444c90 R15: 0000000000000004 Mar 3 02:13:49 Tower kernel: </TASK> Mar 3 02:13:49 Tower kernel: Modules linked in: veth xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge nvidia_uvm(PO) xfs dm_crypt dm_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap 8021q garp mrp stp llc igc nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel ast drm_vram_helper i2c_algo_bit Mar 3 02:13:49 Tower kernel: drm_ttm_helper crypto_simd ttm cryptd drm_kms_helper mei_hdcp mei_pxp i2c_i801 rapl intel_cstate drm ipmi_ssif mpt3sas agpgart mei_me ahci cdc_ether wmi_bmof i2c_smbus nvme tpm_crb syscopyarea input_leds raid_class sr_mod usbnet sysfillrect intel_uncore sysimgblt i2c_core mei joydev led_class nvme_core cdrom libahci mii scsi_transport_sas vmd acpi_ipmi fb_sys_fops thermal fan video tpm_tis tpm_tis_core ipmi_si wmi backlight tpm intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc] Mar 3 02:13:49 Tower kernel: ---[ end trace 0000000000000000 ]--- Mar 3 02:13:49 Tower kernel: RIP: 0010:do_dentry_open+0x206/0x304 Mar 3 02:13:49 Tower kernel: Code: 43 44 a8 04 74 11 48 8b 53 28 48 83 7a 08 00 75 06 83 e0 fb 89 43 44 48 8b 8b d0 00 00 00 48 8b 81 90 00 00 00 48 85 c0 74 0e <48> 83 78 58 00 74 07 81 4b 44 00 00 40 00 8b 53 40 89 d0 25 3f fc Mar 3 02:13:49 Tower kernel: RSP: 0018:ffffc90034467cd8 EFLAGS: 00010282 Mar 3 02:13:49 Tower kernel: RAX: c350ffff8881e30d RBX: ffff8887885be300 RCX: ffff8881e30dc2c2 Mar 3 02:13:49 Tower kernel: RDX: ffffffffa485a140 RSI: 0000000000000000 RDI: 00000000ffffffff Mar 3 02:13:49 Tower kernel: RBP: 0000000000000000 R08: ffffffffa4820098 R09: ffffffffa482090f Mar 3 02:13:49 Tower kernel: R10: 0000000000000000 R11: ffff88814f199268 R12: ffff8881e30dc138 Mar 3 02:13:49 Tower kernel: R13: ffff8887885be310 R14: ffffffffa4824635 R15: 0000000000000000 Mar 3 02:13:49 Tower kernel: FS: 000014c2faaeb740(0000) GS:ffff889fffb00000(0000) knlGS:0000000000000000 Mar 3 02:13:49 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 02:13:49 Tower kernel: CR2: 0000000000475048 CR3: 0000000455dbe000 CR4: 0000000000750ee0 Mar 3 02:13:49 Tower kernel: PKRU: 5555555
Rolling back to 6.12.6 has resolved the issue. No call traces or segfaults there with the rest of the system in the exact same state, including plugins, configuration, docker etc.
System information:
Model: Custom
M/B: ASUSTeK COMPUTER INC. Pro WS W680-ACE IPMI Version Rev 1.xx
BIOS: American Megatrends Inc. Version 3302 Dated 02/21/2024
CPU: 13th Gen Intel® Core™ i9-13900K @ 5445 MHz
HVM: Enabled
IOMMU: Enabled
Cache: L1 Cache: 384 KiB, L1 Cache: 256 KiB, L2 Cache: 16 MiB, L3 Cache: 36 MiB, L1 Cache: 512 KiB, L1 Cache: 1 MiB, L2 Cache: 16 MiB, L3 Cache: 36 MiB
Memory: 96 GiB DDR5 Single-bit ECC (max. installable capacity 256 GiB)
Network: eth0: 1000 Mbps, full duplex, mtu 1500
Kernel: Linux 6.1.64-Unraid x86_64
OpenSSL: 1.1.1v
It appears that this is a global issue affecting multiple users:
As such, this is likely an issue in kernel that is shipped with 6.12.8 and requires an urgent attention.
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.