user2579 Posted January 16 Share Posted January 16 (edited) I have been battling progressively worse system issues for some time now. Originally, it looked like it was my cache pool raidz locking up, but the issues continued to get worse. I was getting errors that looked like: ``` Jan 15 01:08:48 NAS846 kernel: general protection fault, maybe for address 0x80000000: 0000 [#1] PREEMPT SMP NOPTI Jan 15 01:08:48 NAS846 kernel: CPU: 8 PID: 7541 Comm: zfs Tainted: P O 6.1.64-Unraid #1 Jan 15 01:08:48 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 2703 08/11/2023 Jan 15 01:08:48 NAS846 kernel: RIP: 0010:migrate_disable+0x71/0x76 Jan 15 01:08:48 NAS846 kernel: Code: 83 50 0c 00 00 66 c7 85 08 03 00 00 01 00 bf 01 00 00 00 e8 75 f9 ff ff 65 8b 05 17 68 f8 7e 85 c0 75 05 0f 1f 44 00 00 5b 5d <c3> cc cc cc cc 0f 1f 44 00 00 65 8b 05 fb 67 f8 7e ff c8 8b 17 74 Jan 15 01:08:48 NAS846 kernel: RSP: 0018:ffffc90053f5f928 EFLAGS: 00010286 ``` Where `Comm: zfs tainted` is sometimes dockerd tainted, lohs tainted, etc... not smoking gun for anything software. Troubleshooting on similar topics tends to point to memory, so when the system wasn't able to get through a `Start Array` for errors that would pop up when it got to mounting the cache pool I went down the hardware investigation route. Memtest passed for 3 runs, so I went to physical inspection of the sticks and the slots finding nothing. Next was elimination method, so after I pulled the first stick, I was at least able to come up in Safe Mode and bring all my containers up to do some load tests. Everything seemed fine there, so I went to the next step of a normal boot for a load soak. Overnight I saw one segfault come up without a crash and then this morning the whole server crashed, but luckily I had a syslog up. First segfault, that didn't cause any issues: ``` Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at 0 ip 000014fdfdd25080 sp 000014fdfa8920c8 error 4 in libswresample.so.4[14fdfdd1d000+18000] likely on CPU 6 (core 12, socket 0) Jan 16 02:11:16 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jan 16 02:11:17 NAS846 kernel: PMS LoudnessCmd[11524]: segfault at 0 ip 0000148ac081a080 sp 0000148abd1f30c8 error 4 in libswresample.so.4[148ac0812000+18000] likely on CPU 2 (core 4, socket 0) Jan 16 02:11:17 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jan 16 02:11:19 NAS846 kernel: PMS LoudnessCmd[12223]: segfault at 0 ip 000015071f607fc3 sp 000015071c0d00c8 error 4 in libswresample.so.4[15071f606000+18000] likely on CPU 2 (core 4, socket 0) Jan 16 02:11:19 NAS846 kernel: Code: 0f 00 00 00 0f 85 73 ff ff ff 48 f7 c6 0f 00 00 00 0f 85 66 ff ff ff 48 8d 34 56 48 8d 3c 97 48 f7 da 66 0f 6f 2d 7d 64 ff ff <66> 0f 6f 04 56 66 0f 6f 4c 56 10 66 0f ef d2 66 0f ef db 66 0f 61 ``` Last segfault leading to crash, but also seeing 2 `tainted` errors (1 tainted error happens an hour before without issue, the second seems chained to the segfault that brings down the system: ``` Jan 16 06:46:56 NAS846 kernel: traps: node[12120] trap int3 ip:1e75f12 sp:14b4ce428400 error:0 in node[400000+4d69000] Jan 16 07:00:01 NAS846 kernel: mdcmd (57): set md_write_method 1 Jan 16 07:00:01 NAS846 kernel: Jan 16 07:00:01 NAS846 root: Log Level: 1 Jan 16 07:00:01 NAS846 root: mover: started Jan 16 07:00:01 NAS846 root: mover: finished Jan 16 07:03:48 NAS846 ntpd[2526]: no peer for too long, server running free now Jan 16 07:17:12 NAS846 kernel: traps: node[31901] trap int3 ip:1e75f12 sp:1538fa647400 error:0 in node[400000+4d69000] Jan 16 07:30:22 NAS846 kernel: mdcmd (58): set md_write_method 1 Jan 16 07:30:22 NAS846 kernel: Jan 16 08:00:01 NAS846 kernel: mdcmd (59): set md_write_method 1 Jan 16 08:00:01 NAS846 kernel: Jan 16 08:00:01 NAS846 root: Log Level: 1 Jan 16 08:00:01 NAS846 root: mover: started Jan 16 08:00:01 NAS846 root: mover: finished Jan 16 08:27:01 NAS846 kernel: mdcmd (60): set md_write_method 1 Jan 16 08:27:01 NAS846 kernel: Jan 16 08:46:06 NAS846 kernel: BUG: unable to handle page fault for address: ffff886bd16feb80 Jan 16 08:46:06 NAS846 kernel: #PF: supervisor write access in kernel mode Jan 16 08:46:06 NAS846 kernel: #PF: error_code(0x0002) - not-present page Jan 16 08:46:06 NAS846 kernel: PGD 0 P4D 0 Jan 16 08:46:06 NAS846 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI Jan 16 08:46:06 NAS846 kernel: CPU: 8 PID: 31241 Comm: Whisparr Tainted: P O 6.1.64-Unraid #1 Jan 16 08:46:06 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023 Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 08:46:06 NAS846 kernel: FS: 0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 08:46:06 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0 Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554 Jan 16 08:46:06 NAS846 kernel: Call Trace: Jan 16 08:46:06 NAS846 kernel: <TASK> Jan 16 08:46:06 NAS846 kernel: ? __die_body+0x1a/0x5c Jan 16 08:46:06 NAS846 kernel: ? page_fault_oops+0x329/0x376 Jan 16 08:46:06 NAS846 kernel: ? fixup_exception+0x22/0x24b Jan 16 08:46:06 NAS846 kernel: ? exc_page_fault+0xf4/0x11d Jan 16 08:46:06 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30 Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x9c/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zfs_log_write+0x352/0x3ab [zfs] Jan 16 08:46:06 NAS846 kernel: ? zfs_write+0x8d0/0xa29 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zpl_iter_write+0xcf/0x122 [zfs] Jan 16 08:46:06 NAS846 kernel: ? vfs_write+0x10c/0x1b9 Jan 16 08:46:06 NAS846 kernel: ? ksys_pwrite64+0x64/0x84 Jan 16 08:46:06 NAS846 kernel: ? do_syscall_64+0x68/0x81 Jan 16 08:46:06 NAS846 kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce Jan 16 08:46:06 NAS846 kernel: </TASK> Jan 16 08:46:06 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO) Jan 16 08:46:06 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod] Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 Jan 16 08:46:06 NAS846 kernel: ---[ end trace 0000000000000000 ]--- Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 08:46:06 NAS846 kernel: FS: 0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 08:46:06 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0 Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554 Jan 16 08:46:06 NAS846 kernel: note: Whisparr[31241] exited with irqs disabled Jan 16 09:00:01 NAS846 kernel: mdcmd (61): set md_write_method 1 Jan 16 09:00:01 NAS846 kernel: Jan 16 09:00:01 NAS846 root: Log Level: 1 Jan 16 09:00:01 NAS846 root: mover: started Jan 16 09:00:01 NAS846 root: mover: finished Jan 16 09:27:56 NAS846 kernel: mdcmd (62): set md_write_method 1 Jan 16 09:27:56 NAS846 kernel: Jan 16 09:28:24 NAS846 kernel: traps: Tdarr_Server[3949] trap invalid opcode ip:1e75f12 sp:7fff6e7bdc90 error:0 in node[400000+4d69000] Jan 16 09:43:40 NAS846 webGUI: Successful login user root from 192.168.1.56 Jan 16 09:45:55 NAS846 kernel: device_list[10228]: segfault at 0 ip 0000000000935623 sp 00007ffce8413600 error 6 in php[600000+3b3000] likely on CPU 10 (core 20, socket 0) Jan 16 09:45:55 NAS846 kernel: Code: 13 6f fe ff 41 ff 27 49 63 47 0c 49 01 c7 48 c7 c0 80 2a 62 01 0f b6 80 22 02 00 00 84 c0 0f 85 dc 05 00 00 41 ff 27 4c 89 f8 <83> 01 01 4d 8d 7f 20 ff 60 20 83 f8 05 0f 85 5f 04 00 00 8b 46 08 Jan 16 09:45:56 NAS846 monitor: Stop running nchan processes Jan 16 09:46:24 NAS846 kernel: BUG: unable to handle page fault for address: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: #PF: supervisor read access in kernel mode Jan 16 09:46:24 NAS846 kernel: #PF: error_code(0x0000) - not-present page Jan 16 09:46:24 NAS846 kernel: PGD 0 P4D 0 Jan 16 09:46:24 NAS846 kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Jan 16 09:46:24 NAS846 kernel: CPU: 8 PID: 215 Comm: kcompactd0 Tainted: P D O 6.1.64-Unraid #1 Jan 16 09:46:24 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023 Jan 16 09:46:24 NAS846 kernel: RIP: 0010:PageHuge+0x5/0x31 Jan 16 09:46:24 NAS846 kernel: Code: cc cc f7 c7 ff 0f 00 00 75 16 48 8b 17 0f ba e2 10 73 0d 48 8b 57 48 f6 c2 01 74 04 48 8d 42 ff c3 cc cc cc cc 0f 1f 44 00 00 <48> 8b 07 0f ba e0 10 73 14 e8 b7 ff ff ff 80 78 50 02 0f 94 c0 0f Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90000923cb0 EFLAGS: 00010206 Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffffc90000923e10 RCX: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: RDX: 0000000080000000 RSI: ffffea000cfc0000 RDI: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: RBP: 000000000033f1a0 R08: 0000000000000000 R09: 0000000000000000 Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000008458ce Jan 16 09:46:24 NAS846 kernel: R13: 0000000000000000 R14: 0000000000028f44 R15: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: FS: 0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 09:46:24 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 000000000420a000 CR4: 0000000000750ee0 Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554 Jan 16 09:46:24 NAS846 kernel: Call Trace: Jan 16 09:46:24 NAS846 kernel: <TASK> Jan 16 09:46:24 NAS846 kernel: ? __die_body+0x1a/0x5c Jan 16 09:46:24 NAS846 kernel: ? page_fault_oops+0x329/0x376 Jan 16 09:46:24 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d Jan 16 09:46:24 NAS846 kernel: ? exc_page_fault+0xfb/0x11d Jan 16 09:46:24 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30 Jan 16 09:46:24 NAS846 kernel: ? PageHuge+0x5/0x31 Jan 16 09:46:24 NAS846 kernel: isolate_migratepages_block+0x276/0xbb9 Jan 16 09:46:24 NAS846 kernel: ? folio_add_lru+0x86/0x9d Jan 16 09:46:24 NAS846 kernel: compact_zone+0x7c9/0xa28 Jan 16 09:46:24 NAS846 kernel: ? finish_task_switch.isra.0+0x140/0x218 Jan 16 09:46:24 NAS846 kernel: proactive_compact_node+0x7c/0xad Jan 16 09:46:24 NAS846 kernel: ? fragmentation_score_node+0x32/0x62 Jan 16 09:46:24 NAS846 kernel: kcompactd+0x1f7/0x249 Jan 16 09:46:24 NAS846 kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Jan 16 09:46:24 NAS846 kernel: ? kcompactd_do_work+0x1d4/0x1d4 Jan 16 09:46:24 NAS846 kernel: kthread+0xe4/0xef Jan 16 09:46:24 NAS846 kernel: ? kthread_complete_and_exit+0x1b/0x1b Jan 16 09:46:24 NAS846 kernel: ret_from_fork+0x1f/0x30 Jan 16 09:46:24 NAS846 kernel: </TASK> Jan 16 09:46:24 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO) Jan 16 09:46:24 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod] Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: ---[ end trace 0000000000000000 ]--- Jan 16 09:46:24 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 09:46:24 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 09:46:24 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 09:46:24 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 09:46:24 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 09:46:24 NAS846 kernel: FS: 0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 09:46:24 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 00000007836e6000 CR4: 0000000000750ee0 Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554 Jan 16 09:46:24 NAS846 kernel: note: kcompactd0[215] exited with irqs disabled Jan 16 10:00:01 NAS846 kernel: mdcmd (63): set md_write_method 1 Jan 16 10:00:01 NAS846 kernel: Jan 16 10:00:01 NAS846 root: Log Level: 1 Jan 16 10:00:02 NAS846 root: mover: started Jan 16 10:00:02 NAS846 root: mover: finished ``` nas846-diagnostics-20240116-1051.zip Edited January 16 by user2579 formatting Quote Link to comment
JorgeB Posted January 16 Share Posted January 16 48 minutes ago, user2579 said: Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at I'm used to see these in various diags, so I assume they are harmless, you can also ignore the "tainted", since zfs is an external module the kernel is always considered "tainted" when running it. First call trace appears to be zfs related, I would try running the server with just one stick of RAM, if issues persist try a different one, that will basically rule out any RAM issues. There was an user having similar zfs related call traces and after changing to btrfs no more issues, strange but just mentioning as a possibility. Quote Link to comment
user2579 Posted January 16 Author Share Posted January 16 @JorgeB What is especially concerning about these call traces is they have gotten progressively worse off the baseline, and that's been a trend since 6.12.6. Progressively worse as in, to the point where the system won't come up. Still investigating the hardware branch of the tree to isolate memory, but I wouldn't expect with bad memory to see problems get worse over time? 1 Quote Link to comment
user2579 Posted January 16 Author Share Posted January 16 Side question, is there a 6.12 version that is considered `stable` on the level of 6.11.5? Quote Link to comment
JorgeB Posted January 16 Share Posted January 16 AFAIK 6.12.6 is pretty stable for most users, including myself, I use it in multiple servers. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.