Jump to content
  • [6.11.5] Unraid crash (kernel: BUG: Bad rss-counter state mm:00000000f7646224 type:MM_SHMEMPAGES val:1)


    altyne
    • Urgent

    Dec  6 14:14:50 NightOwl kernel: ------------[ cut here ]------------
    Dec  6 14:14:50 NightOwl kernel: kernel BUG at mm/huge_memory.c:2154!
    Dec  6 14:14:50 NightOwl kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI
    Dec  6 14:14:50 NightOwl kernel: CPU: 1 PID: 30090 Comm: rocket-worker-t Tainted: G        W         5.19.17-Unraid #2
    Dec  6 14:14:50 NightOwl kernel: Hardware name: Supermicro Super Server/X11SSM-F, BIOS 2.3 11/26/2019
    Dec  6 14:14:50 NightOwl kernel: RIP: 0010:__split_huge_pmd+0x565/0x6ab
    Dec  6 14:14:50 NightOwl kernel: Code: 00 48 8b 54 24 40 48 8b 74 24 48 48 0f 45 cf bf 11 ff ff 01 48 c1 e7 27 48 01 d7 48 21 c8 48 01 c7 48 f7 07 9f ff ff ff 74 02 <0f> 0b e8 db cf ff ff 83 7c 24 10 00 75 06 f0 41 ff 44 24 30 49 81
    Dec  6 14:14:50 NightOwl kernel: RSP: 0018:ffffc90003defbf8 EFLAGS: 00010282
    Dec  6 14:14:50 NightOwl kernel: RAX: 000000043596c000 RBX: ffff888186d4f9c0 RCX: 000ffffffffff000
    Dec  6 14:14:50 NightOwl kernel: RDX: 0000000000000a98 RSI: 800000044d753027 RDI: ffff88843596ca98
    Dec  6 14:14:50 NightOwl kernel: RBP: ffffea0011358000 R08: 0000000000000000 R09: 0000000000000000
    Dec  6 14:14:50 NightOwl kernel: R10: 0000000000000003 R11: 0000000000000000 R12: ffffea001135d4c0
    Dec  6 14:14:50 NightOwl kernel: R13: ffff888104b0a510 R14: 00000000ff800001 R15: 000014e354553000
    Dec  6 14:14:50 NightOwl kernel: FS:  000014e3875fa700(0000) GS:ffff888857a80000(0000) knlGS:0000000000000000
    Dec  6 14:14:50 NightOwl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec  6 14:14:50 NightOwl kernel: CR2: 0000149b96961000 CR3: 00000001d682e001 CR4: 00000000003706e0
    Dec  6 14:14:50 NightOwl kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Dec  6 14:14:50 NightOwl kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Dec  6 14:14:50 NightOwl kernel: Call Trace:
    Dec  6 14:14:50 NightOwl kernel: <TASK>
    Dec  6 14:14:50 NightOwl kernel: ? _raw_spin_unlock+0x14/0x29
    Dec  6 14:14:50 NightOwl kernel: unmap_page_range+0x23d/0x66e
    Dec  6 14:14:50 NightOwl kernel: zap_page_range+0x96/0xd6
    Dec  6 14:14:50 NightOwl kernel: do_madvise+0x685/0xa04
    Dec  6 14:14:50 NightOwl kernel: ? percpu_counter_add_batch+0x85/0xa2
    Dec  6 14:14:50 NightOwl kernel: ? __seccomp_filter+0x89/0x313
    Dec  6 14:14:50 NightOwl kernel: ? __do_munmap+0x2ca/0x2e2
    Dec  6 14:14:50 NightOwl kernel: __x64_sys_madvise+0x28/0x2f
    Dec  6 14:14:50 NightOwl kernel: do_syscall_64+0x68/0x81
    Dec  6 14:14:50 NightOwl kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
    Dec  6 14:14:50 NightOwl kernel: RIP: 0033:0x14e38eddd6e7
    Dec  6 14:14:50 NightOwl kernel: Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 a1 87 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 87 0d 00 f7 d8 64 89 01 48
    Dec  6 14:14:50 NightOwl kernel: RSP: 002b:000014e3875f9998 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
    Dec  6 14:14:50 NightOwl kernel: RAX: ffffffffffffffda RBX: 00000000003a7180 RCX: 000014e38eddd6e7
    Dec  6 14:14:50 NightOwl kernel: RDX: 0000000000000004 RSI: 0000000000387000 RDI: 000014e354170000
    Dec  6 14:14:50 NightOwl kernel: RBP: 000014e354000020 R08: 0000000000000007 R09: 0000000000000000
    Dec  6 14:14:50 NightOwl kernel: R10: 000014e35406c4e0 R11: 0000000000000246 R12: 0000000000387000
    Dec  6 14:14:50 NightOwl kernel: R13: 0000000000170000 R14: 000014e354000000 R15: 000014e35414fe80
    Dec  6 14:14:50 NightOwl kernel: </TASK>
    Dec  6 14:14:50 NightOwl kernel: Modules linked in: xt_mark xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod efivarfs wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ast aesni_intel drm_vram_helper crypto_simd drm_ttm_helper ttm cryptd rapl intel_cstate intel_uncore ipmi_ssif drm_kms_helper i2c_i801 drm i2c_smbus nvme agpgart i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas i2c_core nvme_core ahci libahci raid_class
    Dec  6 14:14:50 NightOwl kernel: intel_pch_thermal joydev input_leds led_class scsi_transport_sas thermal fan acpi_ipmi ipmi_si video button backlight acpi_power_meter acpi_pad unix [last unloaded: igb]
    Dec  6 14:14:50 NightOwl kernel: ---[ end trace 0000000000000000 ]---
    Dec  6 14:14:52 NightOwl kernel: RIP: 0010:__split_huge_pmd+0x565/0x6ab
    Dec  6 14:14:52 NightOwl kernel: Code: 00 48 8b 54 24 40 48 8b 74 24 48 48 0f 45 cf bf 11 ff ff 01 48 c1 e7 27 48 01 d7 48 21 c8 48 01 c7 48 f7 07 9f ff ff ff 74 02 <0f> 0b e8 db cf ff ff 83 7c 24 10 00 75 06 f0 41 ff 44 24 30 49 81
    Dec  6 14:14:52 NightOwl kernel: RSP: 0018:ffffc90003defbf8 EFLAGS: 00010282
    Dec  6 14:14:52 NightOwl kernel: RAX: 000000043596c000 RBX: ffff888186d4f9c0 RCX: 000ffffffffff000
    Dec  6 14:14:52 NightOwl kernel: RDX: 0000000000000a98 RSI: 800000044d753027 RDI: ffff88843596ca98
    Dec  6 14:14:52 NightOwl kernel: RBP: ffffea0011358000 R08: 0000000000000000 R09: 0000000000000000
    Dec  6 14:14:52 NightOwl kernel: R10: 0000000000000003 R11: 0000000000000000 R12: ffffea001135d4c0
    Dec  6 14:14:52 NightOwl kernel: R13: ffff888104b0a510 R14: 00000000ff800001 R15: 000014e354553000
    Dec  6 14:14:52 NightOwl kernel: FS:  000014e3875fa700(0000) GS:ffff888857a80000(0000) knlGS:0000000000000000
    Dec  6 14:14:52 NightOwl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec  6 14:14:52 NightOwl kernel: CR2: 0000149b96961000 CR3: 00000001d682e001 CR4: 00000000003706e0
    Dec  6 14:14:52 NightOwl kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Dec  6 14:14:52 NightOwl kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Dec  6 14:14:52 NightOwl kernel: note: rocket-worker-t[30090] exited with preempt_count 1
    Dec  6 14:14:52 NightOwl kernel: ------------[ cut here ]------------
    Dec  6 14:14:52 NightOwl kernel: WARNING: CPU: 1 PID: 30090 at kernel/exit.c:741 do_exit+0x39/0x8e5
    Dec  6 14:14:52 NightOwl kernel: Modules linked in: xt_mark xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod efivarfs wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ast aesni_intel drm_vram_helper crypto_simd drm_ttm_helper ttm cryptd rapl intel_cstate intel_uncore ipmi_ssif drm_kms_helper i2c_i801 drm i2c_smbus nvme agpgart i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas i2c_core nvme_core ahci libahci raid_class
    Dec  6 14:14:52 NightOwl kernel: intel_pch_thermal joydev input_leds led_class scsi_transport_sas thermal fan acpi_ipmi ipmi_si video button backlight acpi_power_meter acpi_pad unix [last unloaded: igb]
    Dec  6 14:14:52 NightOwl kernel: CPU: 1 PID: 30090 Comm: rocket-worker-t Tainted: G      D W         5.19.17-Unraid #2
    Dec  6 14:14:52 NightOwl kernel: Hardware name: Supermicro Super Server/X11SSM-F, BIOS 2.3 11/26/2019
    Dec  6 14:14:52 NightOwl kernel: RIP: 0010:do_exit+0x39/0x8e5
    Dec  6 14:14:52 NightOwl kernel: Code: 89 fd 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 31 c0 65 48 8b 1c 25 c0 bb 01 00 48 83 bb a0 07 00 00 00 74 02 <0f> 0b 48 8b bb c8 06 00 00 e8 b7 c0 7c 00 48 8b 83 c0 06 00 00 83
    Dec  6 14:14:52 NightOwl kernel: RSP: 0018:ffffc90003defee0 EFLAGS: 00010282
    Dec  6 14:14:52 NightOwl kernel: RAX: 0000000000000000 RBX: ffff8881065f0000 RCX: 0000000000000000
    Dec  6 14:14:52 NightOwl kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: 000000000000000b
    Dec  6 14:14:52 NightOwl kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffc9000117a020
    Dec  6 14:14:52 NightOwl kernel: R10: 0000000000aaaaaa R11: 0000000000000001 R12: ffffc90003defb48
    Dec  6 14:14:52 NightOwl kernel: R13: ffff8881065f0000 R14: 0000000000000002 R15: ffffffff820b236d
    Dec  6 14:14:52 NightOwl kernel: FS:  000014e3875fa700(0000) GS:ffff888857a80000(0000) knlGS:0000000000000000
    Dec  6 14:14:52 NightOwl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec  6 14:14:52 NightOwl kernel: CR2: 0000149b96961000 CR3: 00000001d682e001 CR4: 00000000003706e0
    Dec  6 14:14:52 NightOwl kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Dec  6 14:14:52 NightOwl kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Dec  6 14:14:52 NightOwl kernel: Call Trace:
    Dec  6 14:14:52 NightOwl kernel: <TASK>
    Dec  6 14:14:52 NightOwl kernel: make_task_dead+0xba/0xba
    Dec  6 14:14:52 NightOwl kernel: rewind_stack_and_make_dead+0x17/0x17
    Dec  6 14:14:52 NightOwl kernel: RIP: 0033:0x14e38eddd6e7
    Dec  6 14:14:52 NightOwl kernel: Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 a1 87 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 87 0d 00 f7 d8 64 89 01 48
    Dec  6 14:14:52 NightOwl kernel: RSP: 002b:000014e3875f9998 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
    Dec  6 14:14:52 NightOwl kernel: RAX: ffffffffffffffda RBX: 00000000003a7180 RCX: 000014e38eddd6e7
    Dec  6 14:14:52 NightOwl kernel: RDX: 0000000000000004 RSI: 0000000000387000 RDI: 000014e354170000
    Dec  6 14:14:52 NightOwl kernel: RBP: 000014e354000020 R08: 0000000000000007 R09: 0000000000000000
    Dec  6 14:14:52 NightOwl kernel: R10: 000014e35406c4e0 R11: 0000000000000246 R12: 0000000000387000
    Dec  6 14:14:52 NightOwl kernel: R13: 0000000000170000 R14: 000014e354000000 R15: 000014e35414fe80
    Dec  6 14:14:52 NightOwl kernel: </TASK>
    Dec  6 14:14:52 NightOwl kernel: ---[ end trace 0000000000000000 ]---
    Dec  6 16:17:35 NightOwl kernel: BUG: Bad rss-counter state mm:00000000f7646224 type:MM_SHMEMPAGES val:1

     

    ====

     

    Unraid server becomes unresponsive. You have to force reboot the machine. It crash at least a week.

     

    See attached logs. 

    nightowl-diagnostics-20221206-1659.zip unraid.crash.log

    • Upvote 1



    User Feedback

    Recommended Comments

    Call trace doesn't give many clues, at least not to me, if you can boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes could be a hardware problem, if it doesn't start turning on the other services one by one.

    Link to comment
    1 hour ago, JorgeB said:

    Call trace doesn't give many clues, at least not to me, if you can boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes could be a hardware problem, if it doesn't start turning on the other services one by one.

    all VM's are disabled . Only dockers instances are running. I cannot stop all dockers and sits for a whole couple weeks to observe since I use a lot of self hosted services. 

     

    This is also the dilemma I cannot pin point the root cause. 

    Link to comment

    As the bug appears to originate within huge_memory.c, would it be worth turning off transparent huge pages and seeing if that cures the issue? There seems to be a lot of issues with hugepages at the moment and looking at the kernel forums there seem to be unresolved issues with splitting pages, particularly with Docker as it also uses hugepages if they are enabled.

     

    See this post for details on turning off THP - https://forums.unraid.net/bug-reports/stable-releases/crashes-since-updating-to-v611x-for-qbittorrent-and-deluge-users-r2153/?do=findComment&comment=21761

     

    Alternatively you could add transparent_hugepage=never to the end of your flash drive syslinux config (Main -> Flash -> syslinux configuration -> Unraid OS -> add to end of append statement 'append initrd=/bzroot transparent_hugepage=never')

    Edited by Bebbo
    • Like 1
    Link to comment
    1 hour ago, Bebbo said:

    As the bug appears to originate within huge_memory.c, would it be worth turning off transparent huge pages and seeing if that cures the issue? There seems to be a lot of issues with hugepages at the moment and looking at the kernel forums there seem to be unresolved issues with splitting pages, particularly with Docker as it also uses hugepages if they are enabled.

     

    See this post for details on turning off THP - https://forums.unraid.net/bug-reports/stable-releases/crashes-since-updating-to-v611x-for-qbittorrent-and-deluge-users-r2153/?do=findComment&comment=21761

     

    Alternatively you could add transparent_hugepage=never to the end of your flash drive syslinux config (Main -> Flash -> syslinux configuration -> Unraid OS -> add to end of append statement 'append initrd=/bzroot transparent_hugepage=never')

     

    Currently it is disabled.hub.pages.disabled.png.a1d702b46ac4dde86c7fa1e2a5488883.png

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...