Jump to content
  • [6.12.4] Server hangs once a day since updating to 6.12.4


    bastl
    • Urgent

    Hello everyone,

     

    coming from 6.12.2 with an stable server, the 6.12.4 update I did a week ago broke something. Once a day I find the server frozen, mostly in the morning. No WebUI, no SMB access, SSH or ping. No response. I have to force reboot the system.

     

    Main use for the server is for light media consumption with Jellyfin, Nextcloud sync from phone (CalDav, CardDav),
    Unifi etc. and from time to time some media conversation with Tdarr or Handbrake dockers, rarly some remote access with WG. Most dockers are running on idle also a VM or two doing nothing. Most time of the day the server is idle. No config changes on my side with the last update. No custom scripts running during this time.

     

    On 6.12.2 the server never had any issues or crashes. It started the night after the update.

     

    I activated the syslog server and catched the latest crash.

    Sep 29 19:13:48 mini root: /mnt/cache: 284 GiB (304924037120 bytes) trimmed on /dev/nvme0n1p1
    Sep 30 02:44:09 mini kernel: general protection fault, maybe for address 0xffffc900033abe6c: 0000 [#1] PREEMPT SMP NOPTI
    Sep 30 02:44:09 mini kernel: CPU: 6 PID: 31855 Comm: ps Tainted: P           O       6.1.49-Unraid #1
    Sep 30 02:44:09 mini kernel: Hardware name: BESSTAR TECH LIMITED HM90/HM90, BIOS 5.16 10/13/2021
    Sep 30 02:44:09 mini kernel: RIP: 0010:mntput_no_expire+0x59/0x1f2
    Sep 30 02:44:09 mini kernel: Code: 2e e7 ff 48 8b 83 e8 00 00 00 48 85 c0 74 16 48 8b 7b 50 83 ce ff e8 2f ef ff ff e8 cc 7a e7 ff e9 78 01 00 00 e8 91 ed ff ff <f0> 83 44 24 fc 00 48 8b 7b 50 83 ce ff e8 0e ef ff ff 48 89 df e8
    Sep 30 02:44:09 mini kernel: RSP: 0018:ffffc900033abe70 EFLAGS: 00010286
    Sep 30 02:44:09 mini kernel: RAX: 0000000000000000 RBX: ffff888134bf0838 RCX: 0000000000000064
    Sep 30 02:44:09 mini kernel: RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffff888134bf09c8
    Sep 30 02:44:09 mini kernel: RBP: ffff888106220b00 R08: 0000000000000000 R09: ffff888134bf0858
    Sep 30 02:44:09 mini kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000a801d
    Sep 30 02:44:09 mini kernel: R13: ffff888134bf0858 R14: ffff88818ab54e40 R15: 0000000000000000
    Sep 30 02:44:09 mini kernel: FS:  0000147c21ef77c0(0000) GS:ffff888712d80000(0000) knlGS:0000000000000000
    Sep 30 02:44:09 mini kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Sep 30 02:44:09 mini kernel: CR2: 00000cdb8942f000 CR3: 000000033b1e2000 CR4: 0000000000350ee0
    Sep 30 02:44:09 mini kernel: Call Trace:
    Sep 30 02:44:09 mini kernel: <TASK>
    Sep 30 02:44:09 mini kernel: ? __die_body+0x1a/0x5c
    Sep 30 02:44:09 mini kernel: ? die_addr+0x38/0x51
    Sep 30 02:44:09 mini kernel: ? exc_general_protection+0x30f/0x345
    Sep 30 02:44:09 mini kernel: ? asm_exc_general_protection+0x22/0x30
    Sep 30 02:44:09 mini kernel: ? mntput_no_expire+0x59/0x1f2
    Sep 30 02:44:09 mini kernel: ? mntput_no_expire+0x6b/0x1f2
    Sep 30 02:44:09 mini kernel: ? dput+0x39/0x17b
    Sep 30 02:44:09 mini kernel: ? __fput+0x19f/0x1d2
    Sep 30 02:44:09 mini kernel: ? task_work_run+0x6b/0x80
    Sep 30 02:44:09 mini kernel: ? exit_to_user_mode_prepare+0x75/0x10d
    Sep 30 02:44:09 mini kernel: ? syscall_exit_to_user_mode+0x18/0x2c
    Sep 30 02:44:09 mini kernel: ? do_syscall_64+0x77/0x81
    Sep 30 02:44:09 mini kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce
    Sep 30 02:44:09 mini kernel: </TASK>
    Sep 30 02:44:09 mini kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp veth macvlan xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter dm_crypt dm_mod xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) it87 tcp_diag inet_diag hwmon_vid vendor_reset(O) iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc igc r8169 realtek amdgpu edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi gpu_sched drm_buddy kvm_amd i2c_algo_bit drm_ttm_helper ttm drm_display_helper kvm
    Sep 30 02:44:09 mini kernel: drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 btusb btrtl aesni_intel btbcm btintel crypto_simd cryptd bluetooth agpgart i2c_piix4 syscopyarea rapl ahci ecdh_generic nvme sysfillrect i2c_core k10temp libahci amd_sfh ecc sysimgblt ccp fb_sys_fops nvme_core tpm_crb tpm_tis video tpm_tis_core wmi tpm backlight acpi_cpufreq button unix [last unloaded: igc]
    Sep 30 02:44:09 mini kernel: ---[ end trace 0000000000000000 ]---
    Sep 30 02:44:09 mini kernel: RIP: 0010:mntput_no_expire+0x59/0x1f2
    Sep 30 02:44:09 mini kernel: Code: 2e e7 ff 48 8b 83 e8 00 00 00 48 85 c0 74 16 48 8b 7b 50 83 ce ff e8 2f ef ff ff e8 cc 7a e7 ff e9 78 01 00 00 e8 91 ed ff ff <f0> 83 44 24 fc 00 48 8b 7b 50 83 ce ff e8 0e ef ff ff 48 89 df e8
    Sep 30 02:44:09 mini kernel: RSP: 0018:ffffc900033abe70 EFLAGS: 00010286
    Sep 30 02:44:09 mini kernel: RAX: 0000000000000000 RBX: ffff888134bf0838 RCX: 0000000000000064
    Sep 30 02:44:09 mini kernel: RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffff888134bf09c8
    Sep 30 02:44:09 mini kernel: RBP: ffff888106220b00 R08: 0000000000000000 R09: ffff888134bf0858
    Sep 30 02:44:09 mini kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000a801d
    Sep 30 02:44:09 mini kernel: R13: ffff888134bf0858 R14: ffff88818ab54e40 R15: 0000000000000000
    Sep 30 02:44:09 mini kernel: FS:  0000147c21ef77c0(0000) GS:ffff888712d80000(0000) knlGS:0000000000000000
    Sep 30 02:44:09 mini kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Sep 30 02:44:09 mini kernel: CR2: 00000cdb8942f000 CR3: 000000033b1e2000 CR4: 0000000000350ee0
    Sep 30 02:44:09 mini kernel: note: ps[31855] exited with preempt_count 2

    mini-diagnostics-20230930-1231.zip

     

    No idea how to fix this issue. Any help is appreciated.

     

    syslog-10.0.0.4.log




    User Feedback

    Recommended Comments



    Ok I don't know why I told I would wait. I stopped all containers, I disabled autostart for all, stopped the array, rebooted the host, restarted the array and started only Plex.

     

    We'll see.

    Link to comment
    13 hours ago, David Grenon said:

    restarted the array and started only Plex.

    If it still crashes try again without Plex, since that one has been known to be the culprit for some users.

    • Upvote 1
    Link to comment

    It did crash ! And not the unraid server !:
    My unraid WebUI is available at this time.
    Ram usage on docker:
    963285259_2024-02-2006_17_41-Tower_Docker.thumb.png.9d04e827ab82a0b9f9f67b452f596664.png

    and ramping up as we talk:
    211909521_2024-02-2006_32_50-Tower_Docker.thumb.png.29da7ac624830f37680ad78c855a230a.png


    Here's the config:

    783168552_2024-02-2006_18_38-Tower_UpdateContainer.thumb.png.e5eadf7adf3153a8cd08fdaa64acff7e.png
    2041201943_2024-02-2006_19_27-Tower_UpdateContainer.thumb.png.d636d30e6708ecb5f89e63b36240ec9c.png

     

    Here's HTOP:

    1281117453_2024-02-2006_23_06-root@Tower__bash--login(Tower).thumb.png.6ebd245925e443da5b8089f156c41ea7.png952917426_2024-02-2006_23_44-Tower_Docker.thumb.png.38b3562be95574017cb404f9dacb97dc.png

    Here's HTOP 10 minutes after:
    and....oh well its locked...

     

    By the time I wanted to take a second print screen of htop. The terminal AND WebUI are locked.

    So thats that.

    Edited by David Grenon
    Link to comment

    For future reference, I uninstalled unassigned device (3 of them (Normal, +, and the one with preclear)). Was stable for 2 days then crash.

     

    Reverted to 6.11.5 and I'm now up 2 days 9hours... the pluggin "fix common problems" doesn't tell me I have any OOM errors, and my HTOP seems perfectly normal.. Just running plex for now.

     

    If its still stable with all dockers I had, I'll try to update to latest. (Some people mentioned that after reverting to 6.11.5 uninstalling/reinstalling unnassigned devices and then upgrading again removed the issue)

     

    Why do I bother upgrading ? because as of now, I don't have access to the app store anymore.

     

    If it still crash after all this in 6.12. I'm waiting for 6.13 without the store.

    Link to comment

    Solved my issue. If any wonders :)

    I'm now running 6.12.8 full stable:
     

    Thank you all for the support. Sorry @bastl for hijacking your thread :P

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...