Jump to content

user2579

Members
  • Posts

    19
  • Joined

  • Last visited

Posts posted by user2579

  1. 9 hours ago, barnowan said:

    @user2579 Not for all apps, just the troublesome ones.

     

    The issues I was experiencing ended up coming back though. Something with the RAID 1 NVMe cache pool was causing issues even though SMART status on both drives was fine, no errors were seen in the details, etc. Happened on 3 different motherboards, so I don't think it was a board issue either. I went to a single NVMe drive setup using a different drive for my appdata, and it's been rock solid ever since. I still don't know the true root cause though.

    This has been something concerning for me as well.  Everything has been fine for about 6 months and then recently it has progressively gotten worse to the point I have to wipe that raidz pool and start fresh or else I won't make it past mounting disks.  I have new RAM from a different vendor inbound to test that (this past week I got through an 80 hour memtest on the current RAM with no issues).  I re-seated/inspected the CPU, clean there as well.  If the RAM swap doesn't do anything, then yea I'm investigating the NVME.

     

    What make/model NVME was giving you trouble?  When you say RAID 1 NVME cache pool, are you talking about raidz1 or raid 1?  On your current stable config, is the single NVME one from your original pool?

  2. On 9/18/2023 at 8:39 AM, frodr said:

    I have this board, and I guess I'm the unlucky duck here. I had all kinds of trouble:

    • Startup sequence randomly stuck on F6, specially after adding/removing pcie cards.
    • Startup sequence randomly stays on OData Server info screen forever, or fully stuck.
    • IPMI card looses contact with mainboard, sometime I have to physically remove/install it to get it up and running. 
    • IPMI remote connection fall out "all the time".
    • IPMI remote control due not follow the startup procedure until end. 
    • Bios resets itself to default, mostly after tuning main power off. 

    I'm in contact with Asus Support, that is sometimes like talking to a dement person asking the same question over and over again. 

     

    A question to this board. Is the 2 pcie5 slots "connected? My understanding is the if I run 16 lanes card in slot 1, there is only 4 lanes to the CPU in pcie slot2? Does this mean a 8 lane card in pcie slot will not work? Or will it work with 4 lanes bandwidth?

     

    // 

    I have been going made on this as well, the F6 issue that requires a CMOS reset to clear as well as that OData Server issue.

     

    My latest issue is it looks like the ECC RAM I have is either going bad or there's some kind of other issue as I'm getting all kinds of segfaults and process tainted issues.

     

    You ever figure out the F6 issue?

  3. @JorgeB What is especially concerning about these call traces is they have gotten progressively worse off the baseline, and that's been a trend since 6.12.6.  Progressively worse as in, to the point where the system won't come up.

     

    Still investigating the hardware branch of the tree to isolate memory, but I wouldn't expect with bad memory to see problems get worse over time?

  4. I have been battling progressively worse system issues for some time now.  Originally, it looked like it was my cache pool raidz locking up, but the issues continued to get worse.  I was getting errors that looked like:
    ```

    Jan 15 01:08:48 NAS846 kernel: general protection fault, maybe for address 0x80000000: 0000 [#1] PREEMPT SMP NOPTI
    Jan 15 01:08:48 NAS846 kernel: CPU: 8 PID: 7541 Comm: zfs Tainted: P           O       6.1.64-Unraid #1
    Jan 15 01:08:48 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 2703 08/11/2023
    Jan 15 01:08:48 NAS846 kernel: RIP: 0010:migrate_disable+0x71/0x76
    Jan 15 01:08:48 NAS846 kernel: Code: 83 50 0c 00 00 66 c7 85 08 03 00 00 01 00 bf 01 00 00 00 e8 75 f9 ff ff 65 8b 05 17 68 f8 7e 85 c0 75 05 0f 1f 44 00 00 5b 5d <c3> cc cc cc cc 0f 1f 44 00 00 65 8b 05 fb 67 f8 7e ff c8 8b 17 74
    Jan 15 01:08:48 NAS846 kernel: RSP: 0018:ffffc90053f5f928 EFLAGS: 00010286


    ```
    Where `Comm: zfs tainted` is sometimes dockerd tainted, lohs tainted, etc... not smoking gun for anything software.

     

    Troubleshooting on similar topics tends to point to memory, so when the system wasn't able to get through a `Start Array` for errors that would pop up when it got to mounting the cache pool I went down the hardware investigation route.

     

    Memtest passed for 3 runs, so I went to physical inspection of the sticks and the slots finding nothing.  Next was elimination method, so after I pulled the first stick, I was at least able to come up in Safe Mode and bring all my containers up to do some load tests.  Everything seemed fine there, so I went to the next step of a normal boot for a load soak.  Overnight I saw one segfault come up without a crash and then this morning the whole server crashed, but luckily I had a syslog up.

    First segfault, that didn't cause any issues:

    ```

    Jan 16 02:11:16 NAS846 kernel: PMS LoudnessCmd[11497]: segfault at 0 ip 000014fdfdd25080 sp 000014fdfa8920c8 error 4 in libswresample.so.4[14fdfdd1d000+18000] likely on CPU 6 (core 12, socket 0)
    Jan 16 02:11:16 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jan 16 02:11:17 NAS846 kernel: PMS LoudnessCmd[11524]: segfault at 0 ip 0000148ac081a080 sp 0000148abd1f30c8 error 4 in libswresample.so.4[148ac0812000+18000] likely on CPU 2 (core 4, socket 0)
    Jan 16 02:11:17 NAS846 kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
    Jan 16 02:11:19 NAS846 kernel: PMS LoudnessCmd[12223]: segfault at 0 ip 000015071f607fc3 sp 000015071c0d00c8 error 4 in libswresample.so.4[15071f606000+18000] likely on CPU 2 (core 4, socket 0)
    Jan 16 02:11:19 NAS846 kernel: Code: 0f 00 00 00 0f 85 73 ff ff ff 48 f7 c6 0f 00 00 00 0f 85 66 ff ff ff 48 8d 34 56 48 8d 3c 97 48 f7 da 66 0f 6f 2d 7d 64 ff ff <66> 0f 6f 04 56 66 0f 6f 4c 56 10 66 0f ef d2 66 0f ef db 66 0f 61


    ```

    Last segfault leading to crash, but also seeing 2 `tainted` errors (1 tainted error happens an hour before without issue, the second seems chained to the segfault that brings down the system:
    ```

    Jan 16 06:46:56 NAS846 kernel: traps: node[12120] trap int3 ip:1e75f12 sp:14b4ce428400 error:0 in node[400000+4d69000]
    Jan 16 07:00:01 NAS846 kernel: mdcmd (57): set md_write_method 1
    Jan 16 07:00:01 NAS846 kernel:
    Jan 16 07:00:01 NAS846 root: Log Level: 1
    Jan 16 07:00:01 NAS846 root: mover: started
    Jan 16 07:00:01 NAS846 root: mover: finished
    Jan 16 07:03:48 NAS846 ntpd[2526]: no peer for too long, server running free now
    Jan 16 07:17:12 NAS846 kernel: traps: node[31901] trap int3 ip:1e75f12 sp:1538fa647400 error:0 in node[400000+4d69000]
    Jan 16 07:30:22 NAS846 kernel: mdcmd (58): set md_write_method 1
    Jan 16 07:30:22 NAS846 kernel:
    Jan 16 08:00:01 NAS846 kernel: mdcmd (59): set md_write_method 1
    Jan 16 08:00:01 NAS846 kernel:
    Jan 16 08:00:01 NAS846 root: Log Level: 1
    Jan 16 08:00:01 NAS846 root: mover: started
    Jan 16 08:00:01 NAS846 root: mover: finished
    Jan 16 08:27:01 NAS846 kernel: mdcmd (60): set md_write_method 1
    Jan 16 08:27:01 NAS846 kernel:
    Jan 16 08:46:06 NAS846 kernel: BUG: unable to handle page fault for address: ffff886bd16feb80
    Jan 16 08:46:06 NAS846 kernel: #PF: supervisor write access in kernel mode
    Jan 16 08:46:06 NAS846 kernel: #PF: error_code(0x0002) - not-present page
    Jan 16 08:46:06 NAS846 kernel: PGD 0 P4D 0
    Jan 16 08:46:06 NAS846 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
    Jan 16 08:46:06 NAS846 kernel: CPU: 8 PID: 31241 Comm: Whisparr Tainted: P           O       6.1.64-Unraid #1
    Jan 16 08:46:06 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023
    Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
    Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
    Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
    Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
    Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
    Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
    Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
    Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
    Jan 16 08:46:06 NAS846 kernel: FS:  0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
    Jan 16 08:46:06 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0
    Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554
    Jan 16 08:46:06 NAS846 kernel: Call Trace:
    Jan 16 08:46:06 NAS846 kernel: <TASK>
    Jan 16 08:46:06 NAS846 kernel: ? __die_body+0x1a/0x5c
    Jan 16 08:46:06 NAS846 kernel: ? page_fault_oops+0x329/0x376
    Jan 16 08:46:06 NAS846 kernel: ? fixup_exception+0x22/0x24b
    Jan 16 08:46:06 NAS846 kernel: ? exc_page_fault+0xf4/0x11d
    Jan 16 08:46:06 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30
    Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x295/0x312 [zfs]
    Jan 16 08:46:06 NAS846 kernel: ? zil_itx_assign+0x9c/0x312 [zfs]
    Jan 16 08:46:06 NAS846 kernel: ? zfs_log_write+0x352/0x3ab [zfs]
    Jan 16 08:46:06 NAS846 kernel: ? zfs_write+0x8d0/0xa29 [zfs]
    Jan 16 08:46:06 NAS846 kernel: ? zpl_iter_write+0xcf/0x122 [zfs]
    Jan 16 08:46:06 NAS846 kernel: ? vfs_write+0x10c/0x1b9
    Jan 16 08:46:06 NAS846 kernel: ? ksys_pwrite64+0x64/0x84
    Jan 16 08:46:06 NAS846 kernel: ? do_syscall_64+0x68/0x81
    Jan 16 08:46:06 NAS846 kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce
    Jan 16 08:46:06 NAS846 kernel: </TASK>
    Jan 16 08:46:06 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO)
    Jan 16 08:46:06 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod]
    Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80
    Jan 16 08:46:06 NAS846 kernel: ---[ end trace 0000000000000000 ]---
    Jan 16 08:46:06 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
    Jan 16 08:46:06 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
    Jan 16 08:46:06 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
    Jan 16 08:46:06 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
    Jan 16 08:46:06 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
    Jan 16 08:46:06 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
    Jan 16 08:46:06 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
    Jan 16 08:46:06 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
    Jan 16 08:46:06 NAS846 kernel: FS:  0000154fcf660b30(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
    Jan 16 08:46:06 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 16 08:46:06 NAS846 kernel: CR2: ffff886bd16feb80 CR3: 00000004ad894000 CR4: 0000000000750ee0
    Jan 16 08:46:06 NAS846 kernel: PKRU: 55555554
    Jan 16 08:46:06 NAS846 kernel: note: Whisparr[31241] exited with irqs disabled
    Jan 16 09:00:01 NAS846 kernel: mdcmd (61): set md_write_method 1
    Jan 16 09:00:01 NAS846 kernel:
    Jan 16 09:00:01 NAS846 root: Log Level: 1
    Jan 16 09:00:01 NAS846 root: mover: started
    Jan 16 09:00:01 NAS846 root: mover: finished
    Jan 16 09:27:56 NAS846 kernel: mdcmd (62): set md_write_method 1
    Jan 16 09:27:56 NAS846 kernel:
    Jan 16 09:28:24 NAS846 kernel: traps: Tdarr_Server[3949] trap invalid opcode ip:1e75f12 sp:7fff6e7bdc90 error:0 in node[400000+4d69000]
    Jan 16 09:43:40 NAS846 webGUI: Successful login user root from 192.168.1.56
    Jan 16 09:45:55 NAS846 kernel: device_list[10228]: segfault at 0 ip 0000000000935623 sp 00007ffce8413600 error 6 in php[600000+3b3000] likely on CPU 10 (core 20, socket 0)
    Jan 16 09:45:55 NAS846 kernel: Code: 13 6f fe ff 41 ff 27 49 63 47 0c 49 01 c7 48 c7 c0 80 2a 62 01 0f b6 80 22 02 00 00 84 c0 0f 85 dc 05 00 00 41 ff 27 4c 89 f8 <83> 01 01 4d 8d 7f 20 ff 60 20 83 f8 05 0f 85 5f 04 00 00 8b 46 08
    Jan 16 09:45:56 NAS846 monitor: Stop running nchan processes
    Jan 16 09:46:24 NAS846 kernel: BUG: unable to handle page fault for address: 000000000cfc67c0
    Jan 16 09:46:24 NAS846 kernel: #PF: supervisor read access in kernel mode
    Jan 16 09:46:24 NAS846 kernel: #PF: error_code(0x0000) - not-present page
    Jan 16 09:46:24 NAS846 kernel: PGD 0 P4D 0
    Jan 16 09:46:24 NAS846 kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI
    Jan 16 09:46:24 NAS846 kernel: CPU: 8 PID: 215 Comm: kcompactd0 Tainted: P      D    O       6.1.64-Unraid #1
    Jan 16 09:46:24 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023
    Jan 16 09:46:24 NAS846 kernel: RIP: 0010:PageHuge+0x5/0x31
    Jan 16 09:46:24 NAS846 kernel: Code: cc cc f7 c7 ff 0f 00 00 75 16 48 8b 17 0f ba e2 10 73 0d 48 8b 57 48 f6 c2 01 74 04 48 8d 42 ff c3 cc cc cc cc 0f 1f 44 00 00 <48> 8b 07 0f ba e0 10 73 14 e8 b7 ff ff ff 80 78 50 02 0f 94 c0 0f
    Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90000923cb0 EFLAGS: 00010206
    Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffffc90000923e10 RCX: 000000000cfc67c0
    Jan 16 09:46:24 NAS846 kernel: RDX: 0000000080000000 RSI: ffffea000cfc0000 RDI: 000000000cfc67c0
    Jan 16 09:46:24 NAS846 kernel: RBP: 000000000033f1a0 R08: 0000000000000000 R09: 0000000000000000
    Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000008458ce
    Jan 16 09:46:24 NAS846 kernel: R13: 0000000000000000 R14: 0000000000028f44 R15: 000000000cfc67c0
    Jan 16 09:46:24 NAS846 kernel: FS:  0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
    Jan 16 09:46:24 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 000000000420a000 CR4: 0000000000750ee0
    Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554
    Jan 16 09:46:24 NAS846 kernel: Call Trace:
    Jan 16 09:46:24 NAS846 kernel: <TASK>
    Jan 16 09:46:24 NAS846 kernel: ? __die_body+0x1a/0x5c
    Jan 16 09:46:24 NAS846 kernel: ? page_fault_oops+0x329/0x376
    Jan 16 09:46:24 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d
    Jan 16 09:46:24 NAS846 kernel: ? exc_page_fault+0xfb/0x11d
    Jan 16 09:46:24 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30
    Jan 16 09:46:24 NAS846 kernel: ? PageHuge+0x5/0x31
    Jan 16 09:46:24 NAS846 kernel: isolate_migratepages_block+0x276/0xbb9
    Jan 16 09:46:24 NAS846 kernel: ? folio_add_lru+0x86/0x9d
    Jan 16 09:46:24 NAS846 kernel: compact_zone+0x7c9/0xa28
    Jan 16 09:46:24 NAS846 kernel: ? finish_task_switch.isra.0+0x140/0x218
    Jan 16 09:46:24 NAS846 kernel: proactive_compact_node+0x7c/0xad
    Jan 16 09:46:24 NAS846 kernel: ? fragmentation_score_node+0x32/0x62
    Jan 16 09:46:24 NAS846 kernel: kcompactd+0x1f7/0x249
    Jan 16 09:46:24 NAS846 kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
    Jan 16 09:46:24 NAS846 kernel: ? kcompactd_do_work+0x1d4/0x1d4
    Jan 16 09:46:24 NAS846 kernel: kthread+0xe4/0xef
    Jan 16 09:46:24 NAS846 kernel: ? kthread_complete_and_exit+0x1b/0x1b
    Jan 16 09:46:24 NAS846 kernel: ret_from_fork+0x1f/0x30
    Jan 16 09:46:24 NAS846 kernel: </TASK>
    Jan 16 09:46:24 NAS846 kernel: Modules linked in: nvidia_uvm(PO) xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) kvm_intel nvidia_modeset(PO) zfs(PO) i915 kvm zunicode(PO) zzstd(O) nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 ast sha1_ssse3 drm_vram_helper iosf_mbi aesni_intel drm_buddy zavl(PO) i2c_algo_bit drm_ttm_helper icp(PO)
    Jan 16 09:46:24 NAS846 kernel: drm_display_helper mei_hdcp mei_pxp crypto_simd ttm i2c_i801 intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) drm_kms_helper intel_cstate wmi_bmof drm mpt3sas mei_me agpgart i2c_smbus input_leds raid_class nvme ahci intel_uncore i2c_core joydev led_class scsi_transport_sas mei syscopyarea nvme_core libahci sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core tpm wmi backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod]
    Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0
    Jan 16 09:46:24 NAS846 kernel: ---[ end trace 0000000000000000 ]---
    Jan 16 09:46:24 NAS846 kernel: RIP: 0010:zil_itx_assign+0x295/0x312 [zfs]
    Jan 16 09:46:24 NAS846 kernel: Code: 00 00 48 89 de e8 cb 6b f3 ff 48 8b 83 08 04 00 00 48 39 e8 48 0f 42 c5 48 89 83 08 04 00 00 4c 8b 64 24 08 31 c0 49 c1 e4 06 <4a> 89 84 23 80 01 00 00 4a 8d 9c 33 78 01 00 00 48 89 df e8 3f 3f
    Jan 16 09:46:24 NAS846 kernel: RSP: 0018:ffffc90063177b40 EFLAGS: 00010287
    Jan 16 09:46:24 NAS846 kernel: RAX: 0000000000000000 RBX: ffff88821201a800 RCX: 0000000000000040
    Jan 16 09:46:24 NAS846 kernel: RDX: 0000000000000001 RSI: ffff88821201abe0 RDI: 00000000ffffffff
    Jan 16 09:46:24 NAS846 kernel: RBP: ffff88841fb763c0 R08: 0000000000000000 R09: 00000000ffffffff
    Jan 16 09:46:24 NAS846 kernel: R10: 0000000000000000 R11: ffff8884955e7c4e R12: ffffffe9bf6e4200
    Jan 16 09:46:24 NAS846 kernel: R13: ffff8885ee2ac1e0 R14: 0000000000000000 R15: ffff8882249c0000
    Jan 16 09:46:24 NAS846 kernel: FS:  0000000000000000(0000) GS:ffff88981f400000(0000) knlGS:0000000000000000
    Jan 16 09:46:24 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 16 09:46:24 NAS846 kernel: CR2: 000000000cfc67c0 CR3: 00000007836e6000 CR4: 0000000000750ee0
    Jan 16 09:46:24 NAS846 kernel: PKRU: 55555554
    Jan 16 09:46:24 NAS846 kernel: note: kcompactd0[215] exited with irqs disabled
    Jan 16 10:00:01 NAS846 kernel: mdcmd (63): set md_write_method 1
    Jan 16 10:00:01 NAS846 kernel:
    Jan 16 10:00:01 NAS846 root: Log Level: 1
    Jan 16 10:00:02 NAS846 root: mover: started
    Jan 16 10:00:02 NAS846 root: mover: finished


    ```

     

    nas846-diagnostics-20240116-1051.zip

  5. On 1/3/2024 at 8:39 PM, pspfreak said:

    @pspfreak I came to the same solution as well.  Be leery about the high capacity NVMEs as they may not have DRAM.  Additionally, for mover congestion, if you're say... really downloading a massive queue to rebuild a library, if you have parity on, the transfer speeds from mover are going to be much slower than without parity- use that info at your own risk.

     

  6. This is something I liked from QNAP with QTiering.  In UNRAID it would be fundamentally different, but with the plethora of storage options out there, it would be nice to allow users to both use more than one cache pool within a share, and then allow the user to specify tiering options.  For instance, I would like the ability to tier1 pool of high-speed write nvme; tier2 pool of high-capacity nvme; tier3 array.  My use case would be all incoming data hits the tier1, rolls off slowly to the tier2 after 24 hours and then depending on how much it is accessed, either stays until it becomes barely touched or is moved down to the array.

  7. I think the spirit of this FR gets at something that is missing native to Unraid.  It would be nice to have a wizard that guides you through what you are trying to do.  In this case, it would have been nice for OP to have selected drive > what do you want to do? > replace it and then the wizard walks you through the steps and what to expect, even stopping and starting the array.

     

    The current process is "just know it" or google it, but for the spirit of Unraid, the OS itself should be capable of guiding you through an expected mechanic of owning an Unraid system, changing out disks either for maintenance or upgrading.

    • Upvote 2
  8. Okay, it looks like this is a red herring.  I removed the GPU and that got this error to stop happening, but the symptoms are still there.  If I purposely throttle sabnzbd down to 10MB/s, that seems to keep things stable, but still an issue.

  9. I am noticing cascading errors when using containers that move a lot of data (sabnzbd, firefox downloading, tdarr) that present as system hangs and dropped connectivity.  The error in the log is:

    `May 31 10:23:23 NAS846 kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
    May 31 10:23:23 NAS846 kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
    May 31 10:23:23 NAS846 kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)`

     

    I am on the latest bios and latest drivers for my GPU. 

    nas846-diagnostics-20230530-1721.zip

×
×
  • Create New...