Jump to content
  • v6.12.3 CPU runs 100%. Hard reset required


    high-level-fudge7319
    • Urgent

    Happened 3 times over 2 weeks following upgrade to 6.12.3.  Previously ran with 6months uptime before upgrading

    Symptoms :

    I have a Docker instance running PiHole.  The first indication is the DNS entries stop resolving as this Docker container crashes.


    If I'm quick I can get onto the WebUI and see 1 x CPU core at 100% (Intel® Xeon® CPU E3-1226 v3 @ 3.30GHz - 4 cores)

    As soon as I try to stop the array etc the rest of the cores follow until the server becomes unresponsive requiring hard reset
     

    I managed to retrieve a log for the latest crash as the log console responded before the systam started running hot.

    I do have Syslog server enabled and a share created to store the logs, but no files have been logged to that share.  This is what the log console contaned as the system began to fail.

     

    -------------------------------

    Aug 23 16:14:33 Tower kernel: Workqueue: events bpf_prog_free_deferred
    Aug 23 16:14:33 Tower kernel: RIP: 0010:free_vmap_area_noflush+0x222/0x285
    Aug 23 16:14:33 Tower kernel: Code: 46 2e 82 31 c0 48 89 73 10 4c 89 ff 48 c7 c6 b8 88 97 82 48 89 43 18 48 89 43 20 4c 89 39 e8 e0 71 65 00 49 8b 16 48 8d 43 28 <48> 89 42 08 48 89 53 28 4c 89 73 30 49 89 06 48 c7 c7 b0 88 97 82
    Aug 23 16:14:33 Tower kernel: RSP: 0018:ffffc90013be7e20 EFLAGS: 00010202
    Aug 23 16:14:33 Tower kernel: RAX: ffff8881a7d98f68 RBX: ffff8881a7d98f40 RCX: ffff888158be1a68
    Aug 23 16:14:33 Tower kernel: RDX: 0000000000000000 RSI: ffff888158be1a60 RDI: ffff8882c2a25b91
    Aug 23 16:14:33 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829788b8
    Aug 23 16:14:33 Tower kernel: R10: ffff8881a7c88800 R11: fefefefefefefeff R12: 0000000000000000
    Aug 23 16:14:33 Tower kernel: R13: 0000000000004833 R14: ffff888158be1a78 R15: ffff8881a7d98f50
    Aug 23 16:14:33 Tower kernel: FS:  0000000000000000(0000) GS:ffff88880dd00000(0000) knlGS:0000000000000000
    Aug 23 16:14:33 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 23 16:14:33 Tower kernel: CR2: 0000000000000008 CR3: 000000000220a003 CR4: 00000000001706e0
    Aug 23 16:14:33 Tower kernel: Call Trace:
    Aug 23 16:14:33 Tower kernel: <TASK>
    Aug 23 16:14:33 Tower kernel: ? __die_body+0x1a/0x5c
    Aug 23 16:14:33 Tower kernel: ? page_fault_oops+0x329/0x376
    Aug 23 16:14:33 Tower kernel: ? fixup_exception+0x22/0x24b
    Aug 23 16:14:33 Tower kernel: ? exc_page_fault+0xfb/0x11d
    Aug 23 16:14:33 Tower kernel: ? asm_exc_page_fault+0x22/0x30
    Aug 23 16:14:33 Tower kernel: ? free_vmap_area_noflush+0x222/0x285
    Aug 23 16:14:33 Tower kernel: remove_vm_area+0x5a/0x74
    Aug 23 16:14:33 Tower kernel: __vunmap+0x7c/0x1a3
    Aug 23 16:14:33 Tower kernel: process_one_work+0x1ab/0x295
    Aug 23 16:14:33 Tower kernel: worker_thread+0x18b/0x244
    Aug 23 16:14:33 Tower kernel: ? rescuer_thread+0x281/0x281
    Aug 23 16:14:33 Tower kernel: kthread+0xe7/0xef
    Aug 23 16:14:33 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
    Aug 23 16:14:33 Tower kernel: ret_from_fork+0x22/0x30
    Aug 23 16:14:33 Tower kernel: </TASK>
    Aug 23 16:14:33 Tower kernel: Modules linked in: udp_diag xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls amdgpu gpu_sched drm_buddy intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm radeon crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 video sr_mod i2c_algo_bit aesni_intel cdrom drm_ttm_helper crypto_simd ttm cryptd drm_display_helper rapl mei_hdcp mei_pxp wmi_bmof intel_cstate drm_kms_helper tpm_infineon intel_uncore ahci drm i2c_i801 i2c_smbus backlight agpgart
    Aug 23 16:14:33 Tower kernel: libahci e1000e i2c_core mei_me syscopyarea sysfillrect mei tpm_tis sysimgblt tpm_tis_core fb_sys_fops fan thermal wmi tpm button unix
    Aug 23 16:14:33 Tower kernel: CR2: 0000000000000008
    Aug 23 16:14:33 Tower kernel: ---[ end trace 0000000000000000 ]---
    Aug 23 16:14:33 Tower kernel: RIP: 0010:free_vmap_area_noflush+0x222/0x285
    Aug 23 16:14:33 Tower kernel: Code: 46 2e 82 31 c0 48 89 73 10 4c 89 ff 48 c7 c6 b8 88 97 82 48 89 43 18 48 89 43 20 4c 89 39 e8 e0 71 65 00 49 8b 16 48 8d 43 28 <48> 89 42 08 48 89 53 28 4c 89 73 30 49 89 06 48 c7 c7 b0 88 97 82
    Aug 23 16:14:33 Tower kernel: RSP: 0018:ffffc90013be7e20 EFLAGS: 00010202
    Aug 23 16:14:33 Tower kernel: RAX: ffff8881a7d98f68 RBX: ffff8881a7d98f40 RCX: ffff888158be1a68
    Aug 23 16:14:33 Tower kernel: RDX: 0000000000000000 RSI: ffff888158be1a60 RDI: ffff8882c2a25b91
    Aug 23 16:14:33 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff829788b8
    Aug 23 16:14:33 Tower kernel: R10: ffff8881a7c88800 R11: fefefefefefefeff R12: 0000000000000000
    Aug 23 16:14:33 Tower kernel: R13: 0000000000004833 R14: ffff888158be1a78 R15: ffff8881a7d98f50
    Aug 23 16:14:33 Tower kernel: FS:  0000000000000000(0000) GS:ffff88880dd00000(0000) knlGS:0000000000000000
    Aug 23 16:14:33 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 23 16:14:33 Tower kernel: CR2: 0000000000000008 CR3: 000000000220a003 CR4: 00000000001706e0
    Aug 23 16:14:33 Tower kernel: note: kworker/2:2[12330] exited with irqs disabled
    Aug 23 16:14:33 Tower kernel: note: kworker/2:2[12330] exited with preempt_count 1
    Aug 23 16:15:33 Tower kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
    Aug 23 16:15:33 Tower kernel: rcu:      1-....: (60000 ticks this GP) idle=435c/1/0x4000000000000000 softirq=4661696/4661734 fqs=18620
    Aug 23 16:15:33 Tower kernel:   (t=60001 jiffies g=7201641 q=86503 ncpus=4)
    Aug 23 16:15:33 Tower kernel: CPU: 1 PID: 5507 Comm: kworker/1:1 Tainted: P      D W  O       6.1.38-Unraid #2
    Aug 23 16:15:33 Tower kernel: Hardware name: Hewlett-Packard HP Z230 Tower Workstation/1905, BIOS L51 v01.63 04/22/2020
    Aug 23 16:15:33 Tower kernel: Workqueue: events free_work
    Aug 23 16:15:33 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x86/0x1cf
    Aug 23 16:15:33 Tower kernel: Code: c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a 8b 03 84 c0 74 04 f3 90 <eb> f6 66 c7 03 01 00 e9 32 01 00 00 e8 4a 3f ff ff 49 c7 c4 80 e1
    Aug 23 16:15:33 Tower kernel: RSP: 0018:ffffc90007c83de0 EFLAGS: 00000202
    Aug 23 16:15:33 Tower kernel: RAX: 0000000000000101 RBX: ffffffff829788b0 RCX: ffff88823a32c451
    Aug 23 16:15:33 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff829788b0
    Aug 23 16:15:33 Tower kernel: RBP: ffff8881031d0500 R08: ffffffff829788c0 R09: 0000000000000000
    Aug 23 16:15:33 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
    Aug 23 16:15:33 Tower kernel: R13: 0000000000004838 R14: 0000000000000000 R15: ffff88815783a7d0
    Aug 23 16:15:33 Tower kernel: FS:  0000000000000000(0000) GS:ffff88880dc80000(0000) knlGS:0000000000000000
    Aug 23 16:15:33 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 23 16:15:33 Tower kernel: CR2: 000000c0001a3000 CR3: 000000013ef56001 CR4: 00000000001706e0
    Aug 23 16:15:33 Tower kernel: Call Trace:
    Aug 23 16:15:33 Tower kernel: <IRQ>
    Aug 23 16:15:33 Tower kernel: ? rcu_dump_cpu_stacks+0x95/0xb9
    Aug 23 16:15:33 Tower kernel: ? rcu_sched_clock_irq+0x337/0xa37
    Aug 23 16:15:33 Tower kernel: ? tick_init_jiffy_update+0x7c/0x7c
    Aug 23 16:15:33 Tower kernel: ? update_process_times+0x62/0x81
    Aug 23 16:15:33 Tower kernel: ? tick_sched_timer+0x43/0x71
    Aug 23 16:15:33 Tower kernel: ? __hrtimer_run_queues+0xeb/0x190
    Aug 23 16:15:33 Tower kernel: ? hrtimer_interrupt+0x9c/0x16e
    Aug 23 16:15:33 Tower kernel: ? __sysvec_apic_timer_interrupt+0xc5/0x12f
    Aug 23 16:15:33 Tower kernel: ? sysvec_apic_timer_interrupt+0x80/0xa6
    Aug 23 16:15:33 Tower kernel: </IRQ>
    Aug 23 16:15:33 Tower kernel: <TASK>
    Aug 23 16:15:33 Tower kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    Aug 23 16:15:33 Tower kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf
    Aug 23 16:15:33 Tower kernel: do_raw_spin_lock+0x14/0x1a
    Aug 23 16:15:33 Tower kernel: free_vmap_area_noflush+0x7a/0x285
    Aug 23 16:15:33 Tower kernel: remove_vm_area+0x5a/0x74
    Aug 23 16:15:33 Tower kernel: __vunmap+0x7c/0x1a3
    Aug 23 16:15:33 Tower kernel: free_work+0x26/0x34
    Aug 23 16:15:33 Tower kernel: process_one_work+0x1ab/0x295
    Aug 23 16:15:33 Tower kernel: worker_thread+0x18b/0x244
    Aug 23 16:15:33 Tower kernel: ? rescuer_thread+0x281/0x281
    Aug 23 16:15:33 Tower kernel: kthread+0xe7/0xef
    Aug 23 16:15:33 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
    Aug 23 16:15:33 Tower kernel: ret_from_fork+0x22/0x30
    Aug 23 16:15:33 Tower kernel: </TASK>

    ---------------------------

    tower-diagnostics-20230823-1650.zip




    User Feedback

    Recommended Comments

    Aug 23 16:30:19 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
    Aug 23 16:30:19 Tower kernel: ? _raw_spin_unlock+0x14/0x29
    Aug 23 16:30:19 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

     

    You can try changing the docker network to ipvlan, or try this release:

     

    Link to comment

    In order to switch to ipvlan it seems I need to stop docker. 

     

    Does this require

    Switching Docker off

    Restart unraid

    Switch to ipvlan

    Re-enable docker 

    Restart unraid 

     

    Or can I stop/start docker from console without restart? 

     

    System is currently in parity check following crash. 

     

    Thanks

    Martin 

    Link to comment
    43 minutes ago, JorgeB said:

    You just need to stop the docker service, not the array.

    I've found the following. Is there a webUI docker service stop or is it console only 

     

    /etc/rc.d/rc.docker stop

    Link to comment

    Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right).

     

    First stop the docker service there.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...