blub3k Posted October 24, 2021 Share Posted October 24, 2021 (edited) Hey, I have some problems with random freezes / kernel panics on my - kind of - old hardware. Hardware CPU: Intel® Xeon® CPU L3426 @ 1.87GHz MB: TYAN S5502 (latest Bios) I have some Docker running. Already uninstall CoreFreq Plugin syslog: Oct 24 11:57:27 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Oct 24 11:57:27 Tower kernel: rcu: 1-...0: (1 GPs behind) idle=986/1/0x4000000000000000 softirq=33050598/33050599 fqs=14997 Oct 24 11:57:27 Tower kernel: (detected by 7, t=60002 jiffies, g=77560113, q=2994) Oct 24 11:57:27 Tower kernel: Sending NMI from CPU 7 to CPUs 1: Oct 24 11:57:27 Tower kernel: NMI backtrace for cpu 1 Oct 24 11:57:27 Tower kernel: CPU: 1 PID: 25887 Comm: sh Tainted: G O 5.10.28-Unraid #1 Oct 24 11:57:27 Tower kernel: Hardware name: empty empty/S5502, BIOS 'V1.03 ' 05/02/2011 Oct 24 11:57:27 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a Oct 24 11:57:27 Tower kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 00 30 02 00 65 48 03 05 f0 8e f8 Oct 24 11:57:27 Tower kernel: RSP: 0018:ffffc900000e0e60 EFLAGS: 00000002 Oct 24 11:57:27 Tower kernel: RAX: 0000000000040101 RBX: ffffc900000e0e88 RCX: 0000000000000000 Oct 24 11:57:27 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881000b9540 Oct 24 11:57:27 Tower kernel: RBP: 0000000000000001 R08: ffffffff82013888 R09: ffffffff82013580 Oct 24 11:57:27 Tower kernel: R10: 0003e41dce43e828 R11: 0000167e7f20a000 R12: 0000ad8919ca0000 Oct 24 11:57:27 Tower kernel: R13: 0000000000000000 R14: 0000bf1345f81f01 R15: 00002b4a148a6555 Oct 24 11:57:27 Tower kernel: FS: 0000146ef9c98740(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 Oct 24 11:57:27 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 24 11:57:27 Tower kernel: CR2: 0000146ef9d1be40 CR3: 000000019fd2e000 CR4: 00000000000006e0 Oct 24 11:57:27 Tower kernel: Call Trace: Oct 24 11:57:27 Tower kernel: <IRQ> Oct 24 11:57:27 Tower kernel: queued_spin_lock_slowpath+0x7/0xa Oct 24 11:57:27 Tower kernel: nr_blockdev_pages+0x13/0x64 Oct 24 11:57:27 Tower kernel: si_meminfo+0x3a/0x57 Oct 24 11:57:27 Tower kernel: Sys_MemInfo+0x20/0x9b [corefreqk] Oct 24 11:57:27 Tower kernel: ? dbs_update_util_handler+0x11/0x74 Oct 24 11:57:27 Tower kernel: ? recalibrate_cpu_khz+0x1/0x1 Oct 24 11:57:27 Tower kernel: ? timekeeping_get_ns+0x19/0x2f Oct 24 11:57:27 Tower kernel: ? Sys_DumpTask+0xe9/0xf1 [corefreqk] Oct 24 11:57:27 Tower kernel: Cycle_Nehalem+0x3a2/0x56f [corefreqk] Oct 24 11:57:27 Tower kernel: __hrtimer_run_queues+0xb7/0x10b Oct 24 11:57:27 Tower kernel: ? Cycle_Silvermont+0x691/0x691 [corefreqk] Oct 24 11:57:27 Tower kernel: hrtimer_interrupt+0x8d/0x15b Oct 24 11:57:27 Tower kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Oct 24 11:57:27 Tower kernel: asm_call_irq_on_stack+0x12/0x20 Oct 24 11:57:27 Tower kernel: </IRQ> Oct 24 11:57:27 Tower kernel: sysvec_apic_timer_interrupt+0x71/0x95 Oct 24 11:57:27 Tower kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Oct 24 11:57:27 Tower kernel: RIP: 0010:nr_blockdev_pages+0x31/0x64 Oct 24 11:57:27 Tower kernel: Code: 48 8d b8 40 05 00 00 e8 73 6d 53 00 48 8b 3d 9f b5 f9 00 45 31 c0 48 8b 87 48 05 00 00 48 8d 97 48 05 00 00 48 2d 10 01 00 00 <48> 8d 88 10 01 00 00 48 39 d1 74 17 48 8b 48 30 48 8b 80 10 01 00 Oct 24 11:57:27 Tower kernel: RSP: 0018:ffffc900013bbe78 EFLAGS: 00000286 Oct 24 11:57:27 Tower kernel: RAX: ffff888100469eb8 RBX: ffffc900013bbec0 RCX: ffff888100446928 Oct 24 11:57:27 Tower kernel: RDX: ffff8881000b9548 RSI: 00000000000041c0 RDI: ffff8881000b9000 Oct 24 11:57:27 Tower kernel: RBP: ffffc900013bbeb0 R08: 0000000000000000 R09: 0000000000000000 Oct 24 11:57:27 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900013bbec0 Oct 24 11:57:27 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Oct 24 11:57:27 Tower kernel: ? nr_blockdev_pages+0x13/0x64 Oct 24 11:57:27 Tower kernel: si_meminfo+0x3a/0x57 Oct 24 11:57:27 Tower kernel: do_sysinfo+0x95/0x12e Oct 24 11:57:27 Tower kernel: __do_sys_sysinfo+0x20/0x59 Oct 24 11:57:27 Tower kernel: do_syscall_64+0x5d/0x6a Oct 24 11:57:27 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Oct 24 11:57:27 Tower kernel: RIP: 0033:0x146ef9db2307 Oct 24 11:57:27 Tower kernel: Code: f0 ff ff 73 01 c3 48 8b 0d 86 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 63 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 7b 0c 00 f7 d8 64 89 01 48 Oct 24 11:57:27 Tower kernel: RSP: 002b:00007fff040d7258 EFLAGS: 00000206 ORIG_RAX: 0000000000000063 Oct 24 11:57:27 Tower kernel: RAX: ffffffffffffffda RBX: 000000000048bdf0 RCX: 0000146ef9db2307 Oct 24 11:57:27 Tower kernel: RDX: 0000146ef9e3d8d8 RSI: 000000000000004c RDI: 00007fff040d7260 Oct 24 11:57:27 Tower kernel: RBP: 0000000000000055 R08: 0000000000000000 R09: fffffffffffffc00 Oct 24 11:57:27 Tower kernel: R10: 0000000000418012 R11: 0000000000000206 R12: 00000000000004f0 Oct 24 11:57:27 Tower kernel: R13: 00000000004d8504 R14: 0000000000000000 R15: 0000000000000030 Before I had some issues with my HBA, but after changing PCIe Slot those issues went away. Is this some kind of hardware problem or could that be an software/os/kernel bug/issue? It is always this "sh" process which I can't identify. tower-diagnostics-20211024-2116.zip Edited October 24, 2021 by blub3k diagnostics added Quote Link to comment
JorgeB Posted October 25, 2021 Share Posted October 25, 2021 13 hours ago, blub3k said: [corefreqk] Try uninstalling the corefreq plugin. 1 Quote Link to comment
blub3k Posted October 26, 2021 Author Share Posted October 26, 2021 Hmm did not notice that. I uninstalled the plugin before but have not rebooted the system since. I'll wait and see what happens now... Uptime 2 days, 6 hours, 40 minutes Quote Link to comment
CyrIng Posted October 27, 2021 Share Posted October 27, 2021 Hello, Apologies for that crash but this plugin is embedded an outdated version of CoreFreq. See with the UNRAID CoreFreq plugin author for a refresh. Vanilla CoreFreq project is developing fixes for Xeon families as soon as I am informed through GitHub issues: https://github.com/cyring/CoreFreq/issues Feel free to contact me through a fulfilled and detailed issue, titled will your processor brand string, and issue body filled with Markdown of kernel crash log, Daemon and Client outputs, like option "-s" It usually don't take me lot of time to find the source of the issue and to provide a fix on the development branch for your testings. Recently I have provided fixes for Server processors. You can first try the develop branch at: https://github.com/cyring/CoreFreq/tree/develop Regards CyrIng 2 Quote Link to comment
blub3k Posted November 14, 2021 Author Share Posted November 14, 2021 21 days uptime now. I think a reboot after uninstalling the CoreFreq plugin solved the problem. Thanks @CyrIng for reaching out to me, but I currently don't have time to diagnose the problem any further. Sorry. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.