Jump to content
  • 6.12.8 Call traces and crashes still


    user2579
    • Urgent

    Updated to 6.12.8 and everything was running a lot more stable than on 6.12.6 for almost a week. 

     

    Prior to the update, one of the things I had tried to get stability was to come off a zfs raidz pool for my main cache drive where my appdata/system data was down to a single btrfs drive and that was okay for keeping things up.  Switching to 6.12.8 everything seemed to be getting to normal.  Yesterday, I noticed I was getting crashes that seemed to tie to plex transcoder issues and at least one instance of:

    ```
    Feb 19 20:47:17 NAS846 emhttpd: shcmd (474): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker/' /var/lib/docker 20
    Feb 19 20:47:17 NAS846 emhttpd: shcmd (476): /etc/rc.d/rc.docker start
    Feb 19 20:47:17 NAS846 root: starting dockerd ...
    Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: xz decompression failed, data probably corrupt
    Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: Failed to read block 0x2e91a60: -5
    Feb 19 20:47:18 NAS846 avahi-daemon[23264]: Server startup complete. Host name is
    ```

     

    Then I started getting a litany of call traces/crashes, almost always `Comm: lsof Tainted:` but sometimes `dockerd`:

     

    ```

    Feb 20 07:09:02 NAS846 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
    Feb 20 07:09:02 NAS846 kernel: #PF: supervisor read access in kernel mode
    Feb 20 07:09:02 NAS846 kernel: #PF: error_code(0x0000) - not-present page
    Feb 20 07:09:02 NAS846 kernel: PGD 348e20067 P4D 348e20067 PUD 52816a067 PMD 0
    Feb 20 07:09:02 NAS846 kernel: Oops: 0000 [#5] PREEMPT SMP NOPTI
    Feb 20 07:09:02 NAS846 kernel: CPU: 10 PID: 23116 Comm: lsof Tainted: P      D    O       6.1.74-Unraid #1
    Feb 20 07:09:02 NAS846 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3101 12/08/2023
    Feb 20 07:09:02 NAS846 kernel: RIP: 0010:__slab_free+0x9c/0x229
    Feb 20 07:09:02 NAS846 kernel: Code: 89 de 4c 8b 4c 24 58 4c 8b 44 24 10 e8 2d c8 ff ff 84 c0 0f 85 a6 00 00 00 4d 85 e4 74 0c 48 8b 34 24 4c 89 e7 e8 1c f7 65 00 <48> 8b 4b 28 4c 8b 6b 20 8b 45 28 48 8b 54 24 18 48 89 4c 24 58 41
    Feb 20 07:09:02 NAS846 kernel: RSP: 0018:ffffc9005c3cfdc8 EFLAGS: 00010246
    Feb 20 07:09:02 NAS846 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888104c29000
    Feb 20 07:09:02 NAS846 kernel: RDX: ffff888104c29000 RSI: 0000000000210d00 RDI: ffff8881001e6e00
    Feb 20 07:09:02 NAS846 kernel: RBP: ffff8881001e6e00 R08: 0000000000000001 R09: ffffffff8125053c
    Feb 20 07:09:02 NAS846 kernel: R10: ffff888104c29000 R11: 0000000000000fe0 R12: 0000000000000000
    Feb 20 07:09:02 NAS846 kernel: R13: 0000000000494830 R14: 00007fff16d06250 R15: 0000000000000002
    Feb 20 07:09:02 NAS846 kernel: FS:  000014caae93fe00(0000) GS:ffff889fff480000(0000) knlGS:0000000000000000
    Feb 20 07:09:02 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028 CR3: 00000002b12d8000 CR4: 0000000000750ee0
    Feb 20 07:09:02 NAS846 kernel: PKRU: 55555554
    Feb 20 07:09:02 NAS846 kernel: Call Trace:
    Feb 20 07:09:02 NAS846 kernel: <TASK>
    Feb 20 07:09:02 NAS846 kernel: ? __die_body+0x1a/0x5c
    Feb 20 07:09:02 NAS846 kernel: ? page_fault_oops+0x329/0x376
    Feb 20 07:09:02 NAS846 kernel: ? do_user_addr_fault+0x12e/0x48d
    Feb 20 07:09:02 NAS846 kernel: ? exc_page_fault+0xfb/0x11d
    Feb 20 07:09:02 NAS846 kernel: ? asm_exc_page_fault+0x22/0x30
    Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f
    Feb 20 07:09:02 NAS846 kernel: ? __slab_free+0x9c/0x229
    Feb 20 07:09:02 NAS846 kernel: ? __slab_free+0x32/0x229
    Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f
    Feb 20 07:09:02 NAS846 kernel: ? memcg_slab_free_hook+0x20/0xcf
    Feb 20 07:09:02 NAS846 kernel: ? kmem_cache_alloc+0x122/0x14d
    Feb 20 07:09:02 NAS846 kernel: ? slab_free_freelist_hook.constprop.0+0x3b/0xaf
    Feb 20 07:09:02 NAS846 kernel: kmem_cache_free+0x10f/0x154
    Feb 20 07:09:02 NAS846 kernel: ? user_path_at_empty+0x42/0x4f
    Feb 20 07:09:02 NAS846 kernel: user_path_at_empty+0x42/0x4f
    Feb 20 07:09:02 NAS846 kernel: do_readlinkat+0x61/0x106
    Feb 20 07:09:02 NAS846 kernel: __x64_sys_readlink+0x1a/0x21
    Feb 20 07:09:02 NAS846 kernel: do_syscall_64+0x68/0x81
    Feb 20 07:09:02 NAS846 kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
    Feb 20 07:09:02 NAS846 kernel: RIP: 0033:0x14caaebcd197
    Feb 20 07:09:02 NAS846 kernel: Code: 73 01 c3 48 8b 0d 81 2c 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 2c 0e 00 f7 d8 64 89 02 48
    Feb 20 07:09:02 NAS846 kernel: RSP: 002b:00007fff16d061d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000059
    Feb 20 07:09:02 NAS846 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014caaebcd197
    Feb 20 07:09:02 NAS846 kernel: RDX: 0000000000001000 RSI: 00007fff16d06250 RDI: 0000000000494830
    Feb 20 07:09:02 NAS846 kernel: RBP: 00007fff16d06210 R08: 0000000000000007 R09: 00000000004bd6f0
    Feb 20 07:09:02 NAS846 kernel: R10: b8f4c0f719b7152a R11: 0000000000000206 R12: 0000000000000000
    Feb 20 07:09:02 NAS846 kernel: R13: 00007fff16d099d0 R14: 0000000000433dd0 R15: 000014caaed33000
    Feb 20 07:09:02 NAS846 kernel: </TASK>
    Feb 20 07:09:02 NAS846 kernel: Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod nvidia_uvm(PO) nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap bridge stp llc ixgbe xfrm_algo mdio igc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) nvidia_modeset(PO) kvm_intel i915 zfs(PO) kvm zunicode(PO) nvidia(PO) zzstd(O) iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper zlua(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 zavl(PO) sha1_ssse3 icp(PO) aesni_intel drm_kms_helper mei_hdcp mei_pxp crypto_simd intel_gtt cryptd zcommon(PO) znvpair(PO) rapl spl(O) intel_cstate drm wmi_bmof mpt3sas i2c_i801 nvme agpgart mei_me i2c_smbus ahci raid_class input_leds
    Feb 20 07:09:02 NAS846 kernel: intel_uncore syscopyarea i2c_core scsi_transport_sas mei libahci led_class joydev nvme_core vmd sysfillrect sysimgblt thermal fb_sys_fops fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: md_mod]
    Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028
    Feb 20 07:09:02 NAS846 kernel: ---[ end trace 0000000000000000 ]---
    Feb 20 07:09:02 NAS846 kernel: RIP: 0010:__slab_free+0x9c/0x229
    Feb 20 07:09:02 NAS846 kernel: Code: 89 de 4c 8b 4c 24 58 4c 8b 44 24 10 e8 2d c8 ff ff 84 c0 0f 85 a6 00 00 00 4d 85 e4 74 0c 48 8b 34 24 4c 89 e7 e8 1c f7 65 00 <48> 8b 4b 28 4c 8b 6b 20 8b 45 28 48 8b 54 24 18 48 89 4c 24 58 41
    Feb 20 07:09:02 NAS846 kernel: RSP: 0018:ffffc9005b9dbdc8 EFLAGS: 00010246
    Feb 20 07:09:02 NAS846 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888104c2e000
    Feb 20 07:09:02 NAS846 kernel: RDX: ffff888104c2e000 RSI: 0000000000210d00 RDI: ffff8881001e6e00
    Feb 20 07:09:02 NAS846 kernel: RBP: ffff8881001e6e00 R08: 0000000000000001 R09: ffffffff8125053c
    Feb 20 07:09:02 NAS846 kernel: R10: ffff888104c2e000 R11: 0000000000000fe0 R12: 0000000000000000
    Feb 20 07:09:02 NAS846 kernel: R13: 0000000000441d80 R14: 00007ffcb2c7af50 R15: 0000000000000002
    Feb 20 07:09:02 NAS846 kernel: FS:  000014caae93fe00(0000) GS:ffff889fff480000(0000) knlGS:0000000000000000
    Feb 20 07:09:02 NAS846 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Feb 20 07:09:02 NAS846 kernel: CR2: 0000000000000028 CR3: 00000002b12d8000 CR4: 0000000000750ee0
    Feb 20 07:09:02 NAS846 kernel: PKRU: 55555554
    Feb 20 07:09:02 NAS846 kernel: note: lsof[23116] exited with irqs disabled

    ```

     

    Other things I have done:

    - 80 hour memtest with all sticks, nothing

    - reseated CPU, no physical anomalies

    - physical inspection

    - formatted USB and restored from backup

    nas846-diagnostics-20240220-0709.zip




    User Feedback

    Recommended Comments

    Quote

    Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: xz decompression failed, data probably corrupt
    Feb 19 20:47:17 NAS846 kernel: SQUASHFS error: Failed to read block 0x2e91a60: -5

     

    These usually indicate a flash drive problem

    Link to comment
    9 hours ago, JorgeB said:

     

    These usually indicate a flash drive problem

    This was mentioned on discord as well, can this also be related to other hardware?  Great news if just a flash drive swap.

    Link to comment

    IIRC every time I've seen those errors they were caused by flash drive issues, I guess it could also be RAM related.

     

     

     

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...