user2579

Members
  • Posts

    19
  • Joined

  • Last visited

user2579's Achievements

Noob

Noob (1/14)

2

Reputation

  1. Hello everyone, I've been experiencing consistent kernel errors and system crashes on my Unraid setup (version 6.12.8) that seem to revolve around ZFS operations and memory handling issues. Below, I've detailed the symptoms, hardware specifics, and logs for reference. System Information: Unraid Version: 6.12.8 Hardware: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS version 3101 dated 12/08/2023. Storage: ZFS file system in use Symptoms: The system encounters kernel NULL pointer dereferences and page faults leading to crashes. Errors often mention ZFS-related operations. Mutex lock failures are frequently observed in the log right before a crash. Error Logs: I've condensed the logs to highlight key errors below: BUG: kernel NULL pointer dereference, address: 0000000000000020 BUG: unable to handle page fault for address: 000000000001cb80 Involvement of ZFS modules like rrw_exit, zfs_getattr_fast, buf_hash_remove, and arc_change_state.constprop.0 Repeated failures around mutex_lock Attempts to Resolve: - 80+ hour memtest, changed out memory completely, reseated CPU, boot in safe mode, etc I am seeking advice on further troubleshooting steps and any known fixes for these issues. Has anyone experienced similar problems or have insights into potential causes and solutions? Any help or guidance would be greatly appreciated. Thank you in advance! nas846-diagnostics-20240316-1226.zip
  2. really appreciate that mktorrent is included. have you considered sox, flac, and lame to add?
  3. This was mentioned on discord as well, can this also be related to other hardware? Great news if just a flash drive swap.
  4. Updated to 6.12.8 and everything was running a lot more stable than on 6.12.6 for almost a week. Prior to the update, one of the things I had tried to get stability was to come off a zfs raidz pool for my main cache drive where my appdata/system data was down to a single btrfs drive and that was okay for keeping things up. Switching to 6.12.8 everything seemed to be getting to normal. Yesterday, I noticed I was getting crashes that seemed to tie to plex transcoder issues and at least one instance of: ``` Feb 19 20:47:17 NAS846 emhttpd: shcmd (474): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker/' /var/lib/docker 20 Feb 19 20:47:17 NAS846 emhttpd: shcmd (476): /etc/rc.d/rc.docker start Feb 19 20:47:17 NAS846 root: starting dockerd ... Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: xz decompression failed, data probably corrupt Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: Failed to read block 0x2e91a60: -5 Feb 19 20:47:18 NAS846 avahi-daemon[23264]: Server startup complete. Host name is ``` Then I started getting a litany of call traces/crashes, almost always `Comm: lsof Tainted:` but sometimes `dockerd`: ``` Feb 20 07:09:02 NAS846 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 Feb 20 07:09:02 NAS846 kernel: #PF: supervisor read access in kernel mode Feb 20 07:09:02 NAS846 kernel: #PF: error_code(0x0000) - not-present page Feb 20 07:09:02 NAS846 kernel: PGD 348e20067 P4D 348e20067 PUD 52816a067 PMD 0 Feb 20 07:09:02 NAS846 kernel: Oops: 0000 [#5] PREEMPT SMP NOPTI Feb 20 07:09:02 NAS846 kernel: CPU: 10 PID: 23116 Comm: lsof Tainted: P D O 6.1.74-Unraid #1 Feb 20 07:09:02 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023 Feb 20 07:09:02 NAS846 kernel: RIP: 0010:__slab_free+0x9c/0x229 Feb 20 07:09:02 NAS846 kernel: Code: 89 de 4c 8b 4c 24 58 4c 8b 44 24 10 e8 2d c8 ff ff 84 c0 0f 85 a6 00 00 00 4d 85 e4 74 0c 48 8b 34 24 4c 89 e7 e8 1c f7 65 00 <48> 8b 4b 28 4c 8b 6b 20 8b 45 28 48 8b 54 24 18 48 89 4c 24 58 41 Feb 20 07:09:02 NAS846 kernel: RSP: 0018:ffffc9005c3cfdc8 EFLAGS: 00010246 Feb 20 07:09:02 NAS846 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888104c29000 Feb 20 07:09:02 NAS846 kernel: RDX: ffff888104c29000 RSI: 0000000000210d00 RDI: ffff8881001e6e00 Feb 20 07:09:02 NAS846 kernel: RBP: ffff8881001e6e00 R08: 0000000000000001 R09: ffffffff8125053c Feb 20 07:09:02 NAS846 kernel: R10: ffff888104c29000 R11: 0000000000000fe0 R12: 0000000000000000 Feb 20 07:09:02 NAS846 kernel: R13: 0000000000494830 R14: 00007fff16d06250 R15: 0000000000000002 Feb 20 07:09:02 NAS846 kernel: FS: 000014caae93fe00(0000) GS:ffff889fff480000(0000) knlGS:0000000000000000 Feb 20 07:09:02 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028 CR3: 00000002b12d8000 CR4: 0000000000750ee0 Feb 20 07:09:02 NAS846 kernel: PKRU: 55555554 Feb 20 07:09:02 NAS846 kernel: Call Trace: Feb 20 07:09:02 NAS846 kernel: <TASK> Feb 20 07:09:02 NAS846 kernel: ? __die_body+0x1a/0x5c Feb 20 07:09:02 NAS846 kernel: ? page_fault_oops+0x329/0x376 Feb 20 07:09:02 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d Feb 20 07:09:02 NAS846 kernel: ? exc_page_fault+0xfb/0x11d Feb 20 07:09:02 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30 Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f Feb 20 07:09:02 NAS846 kernel: ? __slab_free+0x9c/0x229 Feb 20 07:09:02 NAS846 kernel: ? __slab_free+0x32/0x229 Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f Feb 20 07:09:02 NAS846 kernel: ? memcg_slab_free_hook+0x20/0xcf Feb 20 07:09:02 NAS846 kernel: ? kmem_cache_alloc+0x122/0x14d Feb 20 07:09:02 NAS846 kernel: ? slab_free_freelist_hook.constprop.0+0x3b/0xaf Feb 20 07:09:02 NAS846 kernel: kmem_cache_free+0x10f/0x154 Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f Feb 20 07:09:02 NAS846 kernel: user_path_at_empty+0x42/0x4f Feb 20 07:09:02 NAS846 kernel: do_readlinkat+0x61/0x106 Feb 20 07:09:02 NAS846 kernel: __x64_sys_readlink+0x1a/0x21 Feb 20 07:09:02 NAS846 kernel: do_syscall_64+0x68/0x81 Feb 20 07:09:02 NAS846 kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Feb 20 07:09:02 NAS846 kernel: RIP: 0033:0x14caaebcd197 Feb 20 07:09:02 NAS846 kernel: Code: 73 01 c3 48 8b 0d 81 2c 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 2c 0e 00 f7 d8 64 89 02 48 Feb 20 07:09:02 NAS846 kernel: RSP: 002b:00007fff16d061d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000059 Feb 20 07:09:02 NAS846 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014caaebcd197 Feb 20 07:09:02 NAS846 kernel: RDX: 0000000000001000 RSI: 00007fff16d06250 RDI: 0000000000494830 Feb 20 07:09:02 NAS846 kernel: RBP: 00007fff16d06210 R08: 0000000000000007 R09: 00000000004bd6f0 Feb 20 07:09:02 NAS846 kernel: R10: b8f4c0f719b7152a R11: 0000000000000206 R12: 0000000000000000 Feb 20 07:09:02 NAS846 kernel: R13: 00007fff16d099d0 R14: 0000000000433dd0 R15: 000014caaed33000 Feb 20 07:09:02 NAS846 kernel: </TASK> Feb 20 07:09:02 NAS846 kernel: Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nvidia_uvm(PO) nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) nvidia_modeset(PO) kvm_intel i915 zfs(PO) kvm zunicode(PO) nvidia(PO) zzstd(O) iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper zlua(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 zavl(PO) sha1_ssse3 icp(PO) aesni_intel drm_kms_helper mei_hdcp mei_pxp crypto_simd intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) intel_cstate drm wmi_bmof mpt3sas i2c_i801 nvme agpgart mei_me i2c_smbus ahci raid_class input_leds Feb 20 07:09:02 NAS846 kernel: intel_uncore syscopyarea i2c_core scsi_transport_sas mei libahci led_class joydev nvme_core vmd sysfillrect sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod] Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028 Feb 20 07:09:02 NAS846 kernel: ---[ end trace 0000000000000000 ]--- Feb 20 07:09:02 NAS846 kernel: RIP: 0010:__slab_free+0x9c/0x229 Feb 20 07:09:02 NAS846 kernel: Code: 89 de 4c 8b 4c 24 58 4c 8b 44 24 10 e8 2d c8 ff ff 84 c0 0f 85 a6 00 00 00 4d 85 e4 74 0c 48 8b 34 24 4c 89 e7 e8 1c f7 65 00 <48> 8b 4b 28 4c 8b 6b 20 8b 45 28 48 8b 54 24 18 48 89 4c 24 58 41 Feb 20 07:09:02 NAS846 kernel: RSP: 0018:ffffc9005b9dbdc8 EFLAGS: 00010246 Feb 20 07:09:02 NAS846 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888104c2e000 Feb 20 07:09:02 NAS846 kernel: RDX: ffff888104c2e000 RSI: 0000000000210d00 RDI: ffff8881001e6e00 Feb 20 07:09:02 NAS846 kernel: RBP: ffff8881001e6e00 R08: 0000000000000001 R09: ffffffff8125053c Feb 20 07:09:02 NAS846 kernel: R10: ffff888104c2e000 R11: 0000000000000fe0 R12: 0000000000000000 Feb 20 07:09:02 NAS846 kernel: R13: 0000000000441d80 R14: 00007ffcb2c7af50 R15: 0000000000000002 Feb 20 07:09:02 NAS846 kernel: FS: 000014caae93fe00(0000) GS:ffff889fff480000(0000) knlGS:0000000000000000 Feb 20 07:09:02 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028 CR3: 00000002b12d8000 CR4: 0000000000750ee0 Feb 20 07:09:02 NAS846 kernel: PKRU: 55555554 Feb 20 07:09:02 NAS846 kernel: note: lsof[23116] exited with irqs disabled ``` Other things I have done: - 80 hour memtest with all sticks, nothing - reseated CPU, no physical anomalies - physical inspection - formatted USB and restored from backup nas846-diagnostics-20240220-0709.zip
  5. This has been something concerning for me as well. Everything has been fine for about 6 months and then recently it has progressively gotten worse to the point I have to wipe that raidz pool and start fresh or else I won't make it past mounting disks. I have new RAM from a different vendor inbound to test that (this past week I got through an 80 hour memtest on the current RAM with no issues). I re-seated/inspected the CPU, clean there as well. If the RAM swap doesn't do anything, then yea I'm investigating the NVME. What make/model NVME was giving you trouble? When you say RAID 1 NVME cache pool, are you talking about raidz1 or raid 1? On your current stable config, is the single NVME one from your original pool?
  6. I have been going made on this as well, the F6 issue that requires a CMOS reset to clear as well as that OData Server issue. My latest issue is it looks like the ECC RAM I have is either going bad or there's some kind of other issue as I'm getting all kinds of segfaults and process tainted issues. You ever figure out the F6 issue?
  7. @barnowanhaving very similar issues. When you said you deleted the appdata, did you completely nuke and start from scratch?
  8. Side question, is there a 6.12 version that is considered `stable` on the level of 6.11.5?
  9. @JorgeB What is especially concerning about these call traces is they have gotten progressively worse off the baseline, and that's been a trend since 6.12.6. Progressively worse as in, to the point where the system won't come up. Still investigating the hardware branch of the tree to isolate memory, but I wouldn't expect with bad memory to see problems get worse over time?
  10. I have been battling progressively worse system issues for some time now. Originally, it looked like it was my cache pool raidz locking up, but the issues continued to get worse. I was getting errors that looked like: ``` Jan 15 01:08:48 NAS846 kernel: general protection fault, maybe for address 0x80000000: 0000 [#1] PREEMPT SMP NOPTI Jan 15 01:08:48 NAS846 kernel: CPU: 8 PID: 7541 Comm: zfs Tainted: P O 6.1.64-Unraid #1 Jan 15 01:08:48 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 2703 08/11/2023 Jan 15 01:08:48 NAS846 kernel: RIP: 0010:migrate_disable+0x71/0x76 Jan 15 01:08:48 NAS846 kernel: Code: 83 50 0c 00 00 66 c7 85 08 03 00 00 01 00 bf 01 00 00 00 e8 75 f9 ff ff 65 8b 05 17 68 f8 7e 85 c0 75 05 0f 1f 44 00 00 5b 5d <c3> cc cc cc cc 0f 1f 44 00 00 65 8b 05 fb 67 f8 7e ff c8 8b 17 74 Jan 15 01:08:48 NAS846 kernel: RSP: 0018:ffffc90053f5f928 EFLAGS: 00010286 ``` Where `Comm: zfs tainted` is sometimes dockerd tainted, lohs tainted, etc... not smoking gun for anything software. Troubleshooting on similar topics tends to point to memory, so when the system wasn't able to get through a `Start Array` for errors that would pop up when it got to mounting the cache pool I went down the hardware investigation route. Memtest passed for 3 runs, so I went to physical inspection of the sticks and the slots finding nothing. Next was elimination method, so after I pulled the first stick, I was at least able to come up in Safe Mode and bring all my containers up to do some load tests. Everything seemed fine there, so I went to the next step of a normal boot for a load soak. Overnight I saw one segfault come up without a crash and then this morning the whole server crashed, but luckily I had a syslog up. First segfault, that didn't cause any issues: ``` Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at 0 ip 000014fdfdd25080 sp 000014fdfa8920c8 error 4 in libswresample.so.4[14fdfdd1d000+18000] likely on CPU 6 (core 12, socket 0) Jan 16 02:11:16 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jan 16 02:11:17 NAS846 kernel: PMS LoudnessCmd[11524]: segfault at 0 ip 0000148ac081a080 sp 0000148abd1f30c8 error 4 in libswresample.so.4[148ac0812000+18000] likely on CPU 2 (core 4, socket 0) Jan 16 02:11:17 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Jan 16 02:11:19 NAS846 kernel: PMS LoudnessCmd[12223]: segfault at 0 ip 000015071f607fc3 sp 000015071c0d00c8 error 4 in libswresample.so.4[15071f606000+18000] likely on CPU 2 (core 4, socket 0) Jan 16 02:11:19 NAS846 kernel: Code: 0f 00 00 00 0f 85 73 ff ff ff 48 f7 c6 0f 00 00 00 0f 85 66 ff ff ff 48 8d 34 56 48 8d 3c 97 48 f7 da 66 0f 6f 2d 7d 64 ff ff <66> 0f 6f 04 56 66 0f 6f 4c 56 10 66 0f ef d2 66 0f ef db 66 0f 61 ``` Last segfault leading to crash, but also seeing 2 `tainted` errors (1 tainted error happens an hour before without issue, the second seems chained to the segfault that brings down the system: ``` Jan 16 06:46:56 NAS846 kernel: traps: node[12120] trap int3 ip:1e75f12 sp:14b4ce428400 error:0 in node[400000+4d69000] Jan 16 07:00:01 NAS846 kernel: mdcmd (57): set md_write_method 1 Jan 16 07:00:01 NAS846 kernel: Jan 16 07:00:01 NAS846 root: Log Level: 1 Jan 16 07:00:01 NAS846 root: mover: started Jan 16 07:00:01 NAS846 root: mover: finished Jan 16 07:03:48 NAS846 ntpd[2526]: no peer for too long, server running free now Jan 16 07:17:12 NAS846 kernel: traps: node[31901] trap int3 ip:1e75f12 sp:1538fa647400 error:0 in node[400000+4d69000] Jan 16 07:30:22 NAS846 kernel: mdcmd (58): set md_write_method 1 Jan 16 07:30:22 NAS846 kernel: Jan 16 08:00:01 NAS846 kernel: mdcmd (59): set md_write_method 1 Jan 16 08:00:01 NAS846 kernel: Jan 16 08:00:01 NAS846 root: Log Level: 1 Jan 16 08:00:01 NAS846 root: mover: started Jan 16 08:00:01 NAS846 root: mover: finished Jan 16 08:27:01 NAS846 kernel: mdcmd (60): set md_write_method 1 Jan 16 08:27:01 NAS846 kernel: Jan 16 08:46:06 NAS846 kernel: BUG: unable to handle page fault for address: ffff886bd16feb80 Jan 16 08:46:06 NAS846 kernel: #PF: supervisor write access in kernel mode Jan 16 08:46:06 NAS846 kernel: #PF: error_code(0x0002) - not-present page Jan 16 08:46:06 NAS846 kernel: PGD 0 P4D 0 Jan 16 08:46:06 NAS846 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI Jan 16 08:46:06 NAS846 kernel: CPU: 8 PID: 31241 Comm: Whisparr Tainted: P O 6.1.64-Unraid #1 Jan 16 08:46:06 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023 Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 08:46:06 NAS846 kernel: FS: 0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 08:46:06 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0 Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554 Jan 16 08:46:06 NAS846 kernel: Call Trace: Jan 16 08:46:06 NAS846 kernel: <TASK> Jan 16 08:46:06 NAS846 kernel: ? __die_body+0x1a/0x5c Jan 16 08:46:06 NAS846 kernel: ? page_fault_oops+0x329/0x376 Jan 16 08:46:06 NAS846 kernel: ? fixup_exception+0x22/0x24b Jan 16 08:46:06 NAS846 kernel: ? exc_page_fault+0xf4/0x11d Jan 16 08:46:06 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30 Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x9c/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zfs_log_write+0x352/0x3ab [zfs] Jan 16 08:46:06 NAS846 kernel: ? zfs_write+0x8d0/0xa29 [zfs] Jan 16 08:46:06 NAS846 kernel: ? zpl_iter_write+0xcf/0x122 [zfs] Jan 16 08:46:06 NAS846 kernel: ? vfs_write+0x10c/0x1b9 Jan 16 08:46:06 NAS846 kernel: ? ksys_pwrite64+0x64/0x84 Jan 16 08:46:06 NAS846 kernel: ? do_syscall_64+0x68/0x81 Jan 16 08:46:06 NAS846 kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce Jan 16 08:46:06 NAS846 kernel: </TASK> Jan 16 08:46:06 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO) Jan 16 08:46:06 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod] Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 Jan 16 08:46:06 NAS846 kernel: ---[ end trace 0000000000000000 ]--- Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 08:46:06 NAS846 kernel: FS: 0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 08:46:06 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0 Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554 Jan 16 08:46:06 NAS846 kernel: note: Whisparr[31241] exited with irqs disabled Jan 16 09:00:01 NAS846 kernel: mdcmd (61): set md_write_method 1 Jan 16 09:00:01 NAS846 kernel: Jan 16 09:00:01 NAS846 root: Log Level: 1 Jan 16 09:00:01 NAS846 root: mover: started Jan 16 09:00:01 NAS846 root: mover: finished Jan 16 09:27:56 NAS846 kernel: mdcmd (62): set md_write_method 1 Jan 16 09:27:56 NAS846 kernel: Jan 16 09:28:24 NAS846 kernel: traps: Tdarr_Server[3949] trap invalid opcode ip:1e75f12 sp:7fff6e7bdc90 error:0 in node[400000+4d69000] Jan 16 09:43:40 NAS846 webGUI: Successful login user root from 192.168.1.56 Jan 16 09:45:55 NAS846 kernel: device_list[10228]: segfault at 0 ip 0000000000935623 sp 00007ffce8413600 error 6 in php[600000+3b3000] likely on CPU 10 (core 20, socket 0) Jan 16 09:45:55 NAS846 kernel: Code: 13 6f fe ff 41 ff 27 49 63 47 0c 49 01 c7 48 c7 c0 80 2a 62 01 0f b6 80 22 02 00 00 84 c0 0f 85 dc 05 00 00 41 ff 27 4c 89 f8 <83> 01 01 4d 8d 7f 20 ff 60 20 83 f8 05 0f 85 5f 04 00 00 8b 46 08 Jan 16 09:45:56 NAS846 monitor: Stop running nchan processes Jan 16 09:46:24 NAS846 kernel: BUG: unable to handle page fault for address: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: #PF: supervisor read access in kernel mode Jan 16 09:46:24 NAS846 kernel: #PF: error_code(0x0000) - not-present page Jan 16 09:46:24 NAS846 kernel: PGD 0 P4D 0 Jan 16 09:46:24 NAS846 kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Jan 16 09:46:24 NAS846 kernel: CPU: 8 PID: 215 Comm: kcompactd0 Tainted: P D O 6.1.64-Unraid #1 Jan 16 09:46:24 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023 Jan 16 09:46:24 NAS846 kernel: RIP: 0010:PageHuge+0x5/0x31 Jan 16 09:46:24 NAS846 kernel: Code: cc cc f7 c7 ff 0f 00 00 75 16 48 8b 17 0f ba e2 10 73 0d 48 8b 57 48 f6 c2 01 74 04 48 8d 42 ff c3 cc cc cc cc 0f 1f 44 00 00 <48> 8b 07 0f ba e0 10 73 14 e8 b7 ff ff ff 80 78 50 02 0f 94 c0 0f Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90000923cb0 EFLAGS: 00010206 Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffffc90000923e10 RCX: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: RDX: 0000000080000000 RSI: ffffea000cfc0000 RDI: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: RBP: 000000000033f1a0 R08: 0000000000000000 R09: 0000000000000000 Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000008458ce Jan 16 09:46:24 NAS846 kernel: R13: 0000000000000000 R14: 0000000000028f44 R15: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: FS: 0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 09:46:24 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 000000000420a000 CR4: 0000000000750ee0 Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554 Jan 16 09:46:24 NAS846 kernel: Call Trace: Jan 16 09:46:24 NAS846 kernel: <TASK> Jan 16 09:46:24 NAS846 kernel: ? __die_body+0x1a/0x5c Jan 16 09:46:24 NAS846 kernel: ? page_fault_oops+0x329/0x376 Jan 16 09:46:24 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d Jan 16 09:46:24 NAS846 kernel: ? exc_page_fault+0xfb/0x11d Jan 16 09:46:24 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30 Jan 16 09:46:24 NAS846 kernel: ? PageHuge+0x5/0x31 Jan 16 09:46:24 NAS846 kernel: isolate_migratepages_block+0x276/0xbb9 Jan 16 09:46:24 NAS846 kernel: ? folio_add_lru+0x86/0x9d Jan 16 09:46:24 NAS846 kernel: compact_zone+0x7c9/0xa28 Jan 16 09:46:24 NAS846 kernel: ? finish_task_switch.isra.0+0x140/0x218 Jan 16 09:46:24 NAS846 kernel: proactive_compact_node+0x7c/0xad Jan 16 09:46:24 NAS846 kernel: ? fragmentation_score_node+0x32/0x62 Jan 16 09:46:24 NAS846 kernel: kcompactd+0x1f7/0x249 Jan 16 09:46:24 NAS846 kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Jan 16 09:46:24 NAS846 kernel: ? kcompactd_do_work+0x1d4/0x1d4 Jan 16 09:46:24 NAS846 kernel: kthread+0xe4/0xef Jan 16 09:46:24 NAS846 kernel: ? kthread_complete_and_exit+0x1b/0x1b Jan 16 09:46:24 NAS846 kernel: ret_from_fork+0x1f/0x30 Jan 16 09:46:24 NAS846 kernel: </TASK> Jan 16 09:46:24 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO) Jan 16 09:46:24 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod] Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 Jan 16 09:46:24 NAS846 kernel: ---[ end trace 0000000000000000 ]--- Jan 16 09:46:24 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs] Jan 16 09:46:24 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287 Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040 Jan 16 09:46:24 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff Jan 16 09:46:24 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200 Jan 16 09:46:24 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000 Jan 16 09:46:24 NAS846 kernel: FS: 0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000 Jan 16 09:46:24 NAS846 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 00000007836e6000 CR4: 0000000000750ee0 Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554 Jan 16 09:46:24 NAS846 kernel: note: kcompactd0[215] exited with irqs disabled Jan 16 10:00:01 NAS846 kernel: mdcmd (63): set md_write_method 1 Jan 16 10:00:01 NAS846 kernel: Jan 16 10:00:01 NAS846 root: Log Level: 1 Jan 16 10:00:02 NAS846 root: mover: started Jan 16 10:00:02 NAS846 root: mover: finished ``` nas846-diagnostics-20240116-1051.zip
  11. This is something I liked from QNAP with QTiering. In UNRAID it would be fundamentally different, but with the plethora of storage options out there, it would be nice to allow users to both use more than one cache pool within a share, and then allow the user to specify tiering options. For instance, I would like the ability to tier1 pool of high-speed write nvme; tier2 pool of high-capacity nvme; tier3 array. My use case would be all incoming data hits the tier1, rolls off slowly to the tier2 after 24 hours and then depending on how much it is accessed, either stays until it becomes barely touched or is moved down to the array.
  12. I think the spirit of this FR gets at something that is missing native to Unraid. It would be nice to have a wizard that guides you through what you are trying to do. In this case, it would have been nice for OP to have selected drive > what do you want to do? > replace it and then the wizard walks you through the steps and what to expect, even stopping and starting the array. The current process is "just know it" or google it, but for the spirit of Unraid, the OS itself should be capable of guiding you through an expected mechanic of owning an Unraid system, changing out disks either for maintenance or upgrading.