Jump to content

System crashes from segfaults/tainted messages.


Recommended Posts

I have been battling progressively worse system issues for some time now.  Originally, it looked like it was my cache pool raidz locking up, but the issues continued to get worse.  I was getting errors that looked like:
```

Jan 15 01:08:48 NAS846 kernel: general protection fault, maybe for address 0x80000000: 0000 [#1] PREEMPT SMP NOPTI
Jan 15 01:08:48 NAS846 kernel: CPU: 8 PID: 7541 Comm: zfs Tainted: P           O       6.1.64-Unraid #1
Jan 15 01:08:48 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 2703 08/11/2023
Jan 15 01:08:48 NAS846 kernel: RIP: 0010:migrate_disable+0x71/0x76
Jan 15 01:08:48 NAS846 kernel: Code: 83 50 0c 00 00 66 c7 85 08 03 00 00 01 00 bf 01 00 00 00 e8 75 f9 ff ff 65 8b 05 17 68 f8 7e 85 c0 75 05 0f 1f 44 00 00 5b 5d <c3> cc cc cc cc 0f 1f 44 00 00 65 8b 05 fb 67 f8 7e ff c8 8b 17 74
Jan 15 01:08:48 NAS846 kernel: RSP: 0018:ffffc90053f5f928 EFLAGS: 00010286


```
Where `Comm: zfs tainted` is sometimes dockerd tainted, lohs tainted, etc... not smoking gun for anything software.

 

Troubleshooting on similar topics tends to point to memory, so when the system wasn't able to get through a `Start Array` for errors that would pop up when it got to mounting the cache pool I went down the hardware investigation route.

 

Memtest passed for 3 runs, so I went to physical inspection of the sticks and the slots finding nothing.  Next was elimination method, so after I pulled the first stick, I was at least able to come up in Safe Mode and bring all my containers up to do some load tests.  Everything seemed fine there, so I went to the next step of a normal boot for a load soak.  Overnight I saw one segfault come up without a crash and then this morning the whole server crashed, but luckily I had a syslog up.

First segfault, that didn't cause any issues:

```

Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at 0 ip 000014fdfdd25080 sp 000014fdfa8920c8 error 4 in libswresample.so.4[14fdfdd1d000+18000] likely on CPU 6 (core 12, socket 0)
Jan 16 02:11:16 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jan 16 02:11:17 NAS846 kernel: PMS LoudnessCmd[11524]: segfault at 0 ip 0000148ac081a080 sp 0000148abd1f30c8 error 4 in libswresample.so.4[148ac0812000+18000] likely on CPU 2 (core 4, socket 0)
Jan 16 02:11:17 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jan 16 02:11:19 NAS846 kernel: PMS LoudnessCmd[12223]: segfault at 0 ip 000015071f607fc3 sp 000015071c0d00c8 error 4 in libswresample.so.4[15071f606000+18000] likely on CPU 2 (core 4, socket 0)
Jan 16 02:11:19 NAS846 kernel: Code: 0f 00 00 00 0f 85 73 ff ff ff 48 f7 c6 0f 00 00 00 0f 85 66 ff ff ff 48 8d 34 56 48 8d 3c 97 48 f7 da 66 0f 6f 2d 7d 64 ff ff <66> 0f 6f 04 56 66 0f 6f 4c 56 10 66 0f ef d2 66 0f ef db 66 0f 61


```

Last segfault leading to crash, but also seeing 2 `tainted` errors (1 tainted error happens an hour before without issue, the second seems chained to the segfault that brings down the system:
```

Jan 16 06:46:56 NAS846 kernel: traps: node[12120] trap int3 ip:1e75f12 sp:14b4ce428400 error:0 in node[400000+4d69000]
Jan 16 07:00:01 NAS846 kernel: mdcmd (57): set md_write_method 1
Jan 16 07:00:01 NAS846 kernel:
Jan 16 07:00:01 NAS846 root: Log Level: 1
Jan 16 07:00:01 NAS846 root: mover: started
Jan 16 07:00:01 NAS846 root: mover: finished
Jan 16 07:03:48 NAS846 ntpd[2526]: no peer for too long, server running free now
Jan 16 07:17:12 NAS846 kernel: traps: node[31901] trap int3 ip:1e75f12 sp:1538fa647400 error:0 in node[400000+4d69000]
Jan 16 07:30:22 NAS846 kernel: mdcmd (58): set md_write_method 1
Jan 16 07:30:22 NAS846 kernel:
Jan 16 08:00:01 NAS846 kernel: mdcmd (59): set md_write_method 1
Jan 16 08:00:01 NAS846 kernel:
Jan 16 08:00:01 NAS846 root: Log Level: 1
Jan 16 08:00:01 NAS846 root: mover: started
Jan 16 08:00:01 NAS846 root: mover: finished
Jan 16 08:27:01 NAS846 kernel: mdcmd (60): set md_write_method 1
Jan 16 08:27:01 NAS846 kernel:
Jan 16 08:46:06 NAS846 kernel: BUG: unable to handle page fault for address: ffff886bd16feb80
Jan 16 08:46:06 NAS846 kernel: #PF: supervisor write access in kernel mode
Jan 16 08:46:06 NAS846 kernel: #PF: error_code(0x0002) - not-present page
Jan 16 08:46:06 NAS846 kernel: PGD 0 P4D 0
Jan 16 08:46:06 NAS846 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Jan 16 08:46:06 NAS846 kernel: CPU: 8 PID: 31241 Comm: Whisparr Tainted: P           O       6.1.64-Unraid #1
Jan 16 08:46:06 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023
Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
Jan 16 08:46:06 NAS846 kernel: FS:  0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
Jan 16 08:46:06 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0
Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554
Jan 16 08:46:06 NAS846 kernel: Call Trace:
Jan 16 08:46:06 NAS846 kernel: <TASK>
Jan 16 08:46:06 NAS846 kernel: ? __die_body+0x1a/0x5c
Jan 16 08:46:06 NAS846 kernel: ? page_fault_oops+0x329/0x376
Jan 16 08:46:06 NAS846 kernel: ? fixup_exception+0x22/0x24b
Jan 16 08:46:06 NAS846 kernel: ? exc_page_fault+0xf4/0x11d
Jan 16 08:46:06 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30
Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x295/0x312 [zfs]
Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x9c/0x312 [zfs]
Jan 16 08:46:06 NAS846 kernel: ? zfs_log_write+0x352/0x3ab [zfs]
Jan 16 08:46:06 NAS846 kernel: ? zfs_write+0x8d0/0xa29 [zfs]
Jan 16 08:46:06 NAS846 kernel: ? zpl_iter_write+0xcf/0x122 [zfs]
Jan 16 08:46:06 NAS846 kernel: ? vfs_write+0x10c/0x1b9
Jan 16 08:46:06 NAS846 kernel: ? ksys_pwrite64+0x64/0x84
Jan 16 08:46:06 NAS846 kernel: ? do_syscall_64+0x68/0x81
Jan 16 08:46:06 NAS846 kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce
Jan 16 08:46:06 NAS846 kernel: </TASK>
Jan 16 08:46:06 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO)
Jan 16 08:46:06 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod]
Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80
Jan 16 08:46:06 NAS846 kernel: ---[ end trace 0000000000000000 ]---
Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
Jan 16 08:46:06 NAS846 kernel: FS:  0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
Jan 16 08:46:06 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0
Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554
Jan 16 08:46:06 NAS846 kernel: note: Whisparr[31241] exited with irqs disabled
Jan 16 09:00:01 NAS846 kernel: mdcmd (61): set md_write_method 1
Jan 16 09:00:01 NAS846 kernel:
Jan 16 09:00:01 NAS846 root: Log Level: 1
Jan 16 09:00:01 NAS846 root: mover: started
Jan 16 09:00:01 NAS846 root: mover: finished
Jan 16 09:27:56 NAS846 kernel: mdcmd (62): set md_write_method 1
Jan 16 09:27:56 NAS846 kernel:
Jan 16 09:28:24 NAS846 kernel: traps: Tdarr_Server[3949] trap invalid opcode ip:1e75f12 sp:7fff6e7bdc90 error:0 in node[400000+4d69000]
Jan 16 09:43:40 NAS846 webGUI: Successful login user root from 192.168.1.56
Jan 16 09:45:55 NAS846 kernel: device_list[10228]: segfault at 0 ip 0000000000935623 sp 00007ffce8413600 error 6 in php[600000+3b3000] likely on CPU 10 (core 20, socket 0)
Jan 16 09:45:55 NAS846 kernel: Code: 13 6f fe ff 41 ff 27 49 63 47 0c 49 01 c7 48 c7 c0 80 2a 62 01 0f b6 80 22 02 00 00 84 c0 0f 85 dc 05 00 00 41 ff 27 4c 89 f8 <83> 01 01 4d 8d 7f 20 ff 60 20 83 f8 05 0f 85 5f 04 00 00 8b 46 08
Jan 16 09:45:56 NAS846 monitor: Stop running nchan processes
Jan 16 09:46:24 NAS846 kernel: BUG: unable to handle page fault for address: 000000000cfc67c0
Jan 16 09:46:24 NAS846 kernel: #PF: supervisor read access in kernel mode
Jan 16 09:46:24 NAS846 kernel: #PF: error_code(0x0000) - not-present page
Jan 16 09:46:24 NAS846 kernel: PGD 0 P4D 0
Jan 16 09:46:24 NAS846 kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI
Jan 16 09:46:24 NAS846 kernel: CPU: 8 PID: 215 Comm: kcompactd0 Tainted: P      D    O       6.1.64-Unraid #1
Jan 16 09:46:24 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023
Jan 16 09:46:24 NAS846 kernel: RIP: 0010:PageHuge+0x5/0x31
Jan 16 09:46:24 NAS846 kernel: Code: cc cc f7 c7 ff 0f 00 00 75 16 48 8b 17 0f ba e2 10 73 0d 48 8b 57 48 f6 c2 01 74 04 48 8d 42 ff c3 cc cc cc cc 0f 1f 44 00 00 <48> 8b 07 0f ba e0 10 73 14 e8 b7 ff ff ff 80 78 50 02 0f 94 c0 0f
Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90000923cb0 EFLAGS: 00010206
Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffffc90000923e10 RCX: 000000000cfc67c0
Jan 16 09:46:24 NAS846 kernel: RDX: 0000000080000000 RSI: ffffea000cfc0000 RDI: 000000000cfc67c0
Jan 16 09:46:24 NAS846 kernel: RBP: 000000000033f1a0 R08: 0000000000000000 R09: 0000000000000000
Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000008458ce
Jan 16 09:46:24 NAS846 kernel: R13: 0000000000000000 R14: 0000000000028f44 R15: 000000000cfc67c0
Jan 16 09:46:24 NAS846 kernel: FS:  0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
Jan 16 09:46:24 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 000000000420a000 CR4: 0000000000750ee0
Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554
Jan 16 09:46:24 NAS846 kernel: Call Trace:
Jan 16 09:46:24 NAS846 kernel: <TASK>
Jan 16 09:46:24 NAS846 kernel: ? __die_body+0x1a/0x5c
Jan 16 09:46:24 NAS846 kernel: ? page_fault_oops+0x329/0x376
Jan 16 09:46:24 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d
Jan 16 09:46:24 NAS846 kernel: ? exc_page_fault+0xfb/0x11d
Jan 16 09:46:24 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30
Jan 16 09:46:24 NAS846 kernel: ? PageHuge+0x5/0x31
Jan 16 09:46:24 NAS846 kernel: isolate_migratepages_block+0x276/0xbb9
Jan 16 09:46:24 NAS846 kernel: ? folio_add_lru+0x86/0x9d
Jan 16 09:46:24 NAS846 kernel: compact_zone+0x7c9/0xa28
Jan 16 09:46:24 NAS846 kernel: ? finish_task_switch.isra.0+0x140/0x218
Jan 16 09:46:24 NAS846 kernel: proactive_compact_node+0x7c/0xad
Jan 16 09:46:24 NAS846 kernel: ? fragmentation_score_node+0x32/0x62
Jan 16 09:46:24 NAS846 kernel: kcompactd+0x1f7/0x249
Jan 16 09:46:24 NAS846 kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
Jan 16 09:46:24 NAS846 kernel: ? kcompactd_do_work+0x1d4/0x1d4
Jan 16 09:46:24 NAS846 kernel: kthread+0xe4/0xef
Jan 16 09:46:24 NAS846 kernel: ? kthread_complete_and_exit+0x1b/0x1b
Jan 16 09:46:24 NAS846 kernel: ret_from_fork+0x1f/0x30
Jan 16 09:46:24 NAS846 kernel: </TASK>
Jan 16 09:46:24 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO)
Jan 16 09:46:24 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod]
Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0
Jan 16 09:46:24 NAS846 kernel: ---[ end trace 0000000000000000 ]---
Jan 16 09:46:24 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
Jan 16 09:46:24 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
Jan 16 09:46:24 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
Jan 16 09:46:24 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
Jan 16 09:46:24 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
Jan 16 09:46:24 NAS846 kernel: FS:  0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
Jan 16 09:46:24 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 00000007836e6000 CR4: 0000000000750ee0
Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554
Jan 16 09:46:24 NAS846 kernel: note: kcompactd0[215] exited with irqs disabled
Jan 16 10:00:01 NAS846 kernel: mdcmd (63): set md_write_method 1
Jan 16 10:00:01 NAS846 kernel:
Jan 16 10:00:01 NAS846 root: Log Level: 1
Jan 16 10:00:02 NAS846 root: mover: started
Jan 16 10:00:02 NAS846 root: mover: finished


```

 

nas846-diagnostics-20240116-1051.zip

Edited by user2579
formatting
Link to comment
48 minutes ago, user2579 said:
Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at

I'm used to see these in various diags, so I assume they are harmless, you can also ignore the "tainted", since zfs is an external module the kernel is always considered "tainted" when running it.

 

First call trace appears to be zfs related, I would try running the server with just one stick of RAM, if issues persist try a different one, that will basically rule out any RAM issues.

 

There was an user having similar zfs related call traces and after changing to btrfs no more issues, strange but just mentioning as a possibility.

 

 

 

 

Link to comment

@JorgeB What is especially concerning about these call traces is they have gotten progressively worse off the baseline, and that's been a trend since 6.12.6.  Progressively worse as in, to the point where the system won't come up.

 

Still investigating the hardware branch of the tree to isolate memory, but I wouldn't expect with bad memory to see problems get worse over time?

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...