Jump to content

[SOLVED] Random Kernel Panic after several days - old Hardware


blub3k

Recommended Posts

Hey,

 

I have some problems with random freezes / kernel panics on my - kind of - old hardware.

 

Hardware

CPU: Intel® Xeon® CPU L3426 @ 1.87GHz

MB: TYAN S5502 (latest Bios) 

 

I have some Docker running.

Already uninstall CoreFreq Plugin

 

syslog:

Oct 24 11:57:27 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Oct 24 11:57:27 Tower kernel: rcu: 	1-...0: (1 GPs behind) idle=986/1/0x4000000000000000 softirq=33050598/33050599 fqs=14997 
Oct 24 11:57:27 Tower kernel: 	(detected by 7, t=60002 jiffies, g=77560113, q=2994)
Oct 24 11:57:27 Tower kernel: Sending NMI from CPU 7 to CPUs 1:
Oct 24 11:57:27 Tower kernel: NMI backtrace for cpu 1
Oct 24 11:57:27 Tower kernel: CPU: 1 PID: 25887 Comm: sh Tainted: G           O      5.10.28-Unraid #1
Oct 24 11:57:27 Tower kernel: Hardware name: empty empty/S5502, BIOS 'V1.03    ' 05/02/2011
Oct 24 11:57:27 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x18a
Oct 24 11:57:27 Tower kernel: Code: c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 74 0c 0f ba e0 08 72 1a c6 47 01 00 eb 14 85 c0 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c0 00 30 02 00 65 48 03 05 f0 8e f8
Oct 24 11:57:27 Tower kernel: RSP: 0018:ffffc900000e0e60 EFLAGS: 00000002
Oct 24 11:57:27 Tower kernel: RAX: 0000000000040101 RBX: ffffc900000e0e88 RCX: 0000000000000000
Oct 24 11:57:27 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881000b9540
Oct 24 11:57:27 Tower kernel: RBP: 0000000000000001 R08: ffffffff82013888 R09: ffffffff82013580
Oct 24 11:57:27 Tower kernel: R10: 0003e41dce43e828 R11: 0000167e7f20a000 R12: 0000ad8919ca0000
Oct 24 11:57:27 Tower kernel: R13: 0000000000000000 R14: 0000bf1345f81f01 R15: 00002b4a148a6555
Oct 24 11:57:27 Tower kernel: FS:  0000146ef9c98740(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000
Oct 24 11:57:27 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 24 11:57:27 Tower kernel: CR2: 0000146ef9d1be40 CR3: 000000019fd2e000 CR4: 00000000000006e0
Oct 24 11:57:27 Tower kernel: Call Trace:
Oct 24 11:57:27 Tower kernel: <IRQ>
Oct 24 11:57:27 Tower kernel: queued_spin_lock_slowpath+0x7/0xa
Oct 24 11:57:27 Tower kernel: nr_blockdev_pages+0x13/0x64
Oct 24 11:57:27 Tower kernel: si_meminfo+0x3a/0x57
Oct 24 11:57:27 Tower kernel: Sys_MemInfo+0x20/0x9b [corefreqk]
Oct 24 11:57:27 Tower kernel: ? dbs_update_util_handler+0x11/0x74
Oct 24 11:57:27 Tower kernel: ? recalibrate_cpu_khz+0x1/0x1
Oct 24 11:57:27 Tower kernel: ? timekeeping_get_ns+0x19/0x2f
Oct 24 11:57:27 Tower kernel: ? Sys_DumpTask+0xe9/0xf1 [corefreqk]
Oct 24 11:57:27 Tower kernel: Cycle_Nehalem+0x3a2/0x56f [corefreqk]
Oct 24 11:57:27 Tower kernel: __hrtimer_run_queues+0xb7/0x10b
Oct 24 11:57:27 Tower kernel: ? Cycle_Silvermont+0x691/0x691 [corefreqk]
Oct 24 11:57:27 Tower kernel: hrtimer_interrupt+0x8d/0x15b
Oct 24 11:57:27 Tower kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Oct 24 11:57:27 Tower kernel: asm_call_irq_on_stack+0x12/0x20
Oct 24 11:57:27 Tower kernel: </IRQ>
Oct 24 11:57:27 Tower kernel: sysvec_apic_timer_interrupt+0x71/0x95
Oct 24 11:57:27 Tower kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Oct 24 11:57:27 Tower kernel: RIP: 0010:nr_blockdev_pages+0x31/0x64
Oct 24 11:57:27 Tower kernel: Code: 48 8d b8 40 05 00 00 e8 73 6d 53 00 48 8b 3d 9f b5 f9 00 45 31 c0 48 8b 87 48 05 00 00 48 8d 97 48 05 00 00 48 2d 10 01 00 00 <48> 8d 88 10 01 00 00 48 39 d1 74 17 48 8b 48 30 48 8b 80 10 01 00
Oct 24 11:57:27 Tower kernel: RSP: 0018:ffffc900013bbe78 EFLAGS: 00000286
Oct 24 11:57:27 Tower kernel: RAX: ffff888100469eb8 RBX: ffffc900013bbec0 RCX: ffff888100446928
Oct 24 11:57:27 Tower kernel: RDX: ffff8881000b9548 RSI: 00000000000041c0 RDI: ffff8881000b9000
Oct 24 11:57:27 Tower kernel: RBP: ffffc900013bbeb0 R08: 0000000000000000 R09: 0000000000000000
Oct 24 11:57:27 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900013bbec0
Oct 24 11:57:27 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 24 11:57:27 Tower kernel: ? nr_blockdev_pages+0x13/0x64
Oct 24 11:57:27 Tower kernel: si_meminfo+0x3a/0x57
Oct 24 11:57:27 Tower kernel: do_sysinfo+0x95/0x12e
Oct 24 11:57:27 Tower kernel: __do_sys_sysinfo+0x20/0x59
Oct 24 11:57:27 Tower kernel: do_syscall_64+0x5d/0x6a
Oct 24 11:57:27 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 24 11:57:27 Tower kernel: RIP: 0033:0x146ef9db2307
Oct 24 11:57:27 Tower kernel: Code: f0 ff ff 73 01 c3 48 8b 0d 86 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 63 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 7b 0c 00 f7 d8 64 89 01 48
Oct 24 11:57:27 Tower kernel: RSP: 002b:00007fff040d7258 EFLAGS: 00000206 ORIG_RAX: 0000000000000063
Oct 24 11:57:27 Tower kernel: RAX: ffffffffffffffda RBX: 000000000048bdf0 RCX: 0000146ef9db2307
Oct 24 11:57:27 Tower kernel: RDX: 0000146ef9e3d8d8 RSI: 000000000000004c RDI: 00007fff040d7260
Oct 24 11:57:27 Tower kernel: RBP: 0000000000000055 R08: 0000000000000000 R09: fffffffffffffc00
Oct 24 11:57:27 Tower kernel: R10: 0000000000418012 R11: 0000000000000206 R12: 00000000000004f0
Oct 24 11:57:27 Tower kernel: R13: 00000000004d8504 R14: 0000000000000000 R15: 0000000000000030

 

Before I had some issues with my HBA, but after changing PCIe Slot those issues went away.

 

Is this some kind of hardware problem or could that be an software/os/kernel bug/issue?

 

It is always this "sh" process which I can't identify.

tower-diagnostics-20211024-2116.zip

Edited by blub3k
diagnostics added
Link to comment

Hello,

Apologies for that crash but this plugin is embedded an outdated version of CoreFreq.

 

See with the UNRAID CoreFreq plugin author for a refresh.

 

Vanilla CoreFreq project is developing fixes for Xeon families as soon as I am informed through GitHub issues:

https://github.com/cyring/CoreFreq/issues

 

Feel free to contact me through a fulfilled and detailed issue, titled will your processor brand string, and issue body filled with Markdown of kernel crash log, Daemon and Client outputs, like option "-s"

 

It usually don't take me lot of time to find the source of the issue and to provide a fix on the development branch for your testings.

 

Recently I have provided fixes for Server processors.

You can first try the develop branch at:

https://github.com/cyring/CoreFreq/tree/develop

 

Regards

CyrIng

 

 

  • Like 2
Link to comment
  • 3 weeks later...
  • JorgeB changed the title to [SOLVED] Random Kernel Panic after several days - old Hardware

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...