Parity check causing system to be slow and errors in logs


chesh

Recommended Posts

I've been troubleshooting an issue w/ my Unraid server for the last month and have been mostly avoiding the issue by not running a parity check.  At the beginning of the month, while my parity check was running, I noticed my docker containers and VMs were running like crap when a parity check was running.  At least, that's what I eventually figured out after downgrading back to 6.5.3 thinking it was an issue w/ the new 6.6.x releases.  It started out with my Windows 7 VM being unresponsive and my containers having timeout issues.  I eventually found the following in my logs:

 

Nov 29 11:22:08 Tower kernel: INFO: rcu_sched self-detected stall on CPU
Nov 29 11:22:08 Tower kernel: 	30-...: (60000 ticks this GP) idle=a26/140000000000001/0 softirq=1132375/1132375 fqs=14147 
Nov 29 11:22:08 Tower kernel: 	 (t=60001 jiffies g=52263 c=52262 q=104379)
Nov 29 11:22:08 Tower kernel: NMI backtrace for cpu 30
Nov 29 11:22:08 Tower kernel: CPU: 30 PID: 11752 Comm: unraidd Not tainted 4.14.49-unRAID #1
Nov 29 11:22:08 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602-4L/D16, BIOS P1.80 01/16/2014
Nov 29 11:22:08 Tower kernel: Call Trace:
Nov 29 11:22:08 Tower kernel: <IRQ>
Nov 29 11:22:08 Tower kernel: dump_stack+0x5d/0x79
Nov 29 11:22:08 Tower kernel: nmi_cpu_backtrace+0x9b/0xba
Nov 29 11:22:08 Tower kernel: ? irq_force_complete_move+0xf3/0xf3
Nov 29 11:22:08 Tower kernel: nmi_trigger_cpumask_backtrace+0x56/0xd4
Nov 29 11:22:08 Tower kernel: rcu_dump_cpu_stacks+0x8e/0xb8
Nov 29 11:22:08 Tower kernel: rcu_check_callbacks+0x212/0x5f0
Nov 29 11:22:08 Tower kernel: update_process_times+0x23/0x45
Nov 29 11:22:08 Tower kernel: tick_sched_timer+0x33/0x61
Nov 29 11:22:08 Tower kernel: __hrtimer_run_queues+0x78/0xc1
Nov 29 11:22:08 Tower kernel: hrtimer_interrupt+0x87/0x157
Nov 29 11:22:08 Tower kernel: smp_apic_timer_interrupt+0x75/0x85
Nov 29 11:22:08 Tower kernel: apic_timer_interrupt+0x7d/0x90
Nov 29 11:22:08 Tower kernel: </IRQ>
Nov 29 11:22:08 Tower kernel: RIP: 0010:memcmp+0x2/0x1d
Nov 29 11:22:08 Tower kernel: RSP: 0018:ffffc900077c7cd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Nov 29 11:22:08 Tower kernel: RAX: 0000000000000000 RBX: ffff881015ec0ce8 RCX: 0000000000000fd7
Nov 29 11:22:08 Tower kernel: RDX: 0000000000001000 RSI: ffff88103b417000 RDI: ffff881015ed7000
Nov 29 11:22:08 Tower kernel: RBP: ffff881015ed7000 R08: 00000000000000b6 R09: ffff881015ec0d88
Nov 29 11:22:08 Tower kernel: R10: 0000000000000fd0 R11: 0000000000000ff0 R12: ffff88103856c800
Nov 29 11:22:08 Tower kernel: R13: 0000000000000000 R14: ffff881015ec0d60 R15: 000000000000000f
Nov 29 11:22:08 Tower kernel: check_parity+0x27c/0x30b [md_mod]
Nov 29 11:22:08 Tower kernel: ? ttwu_do_wakeup.isra.4+0xd/0x84
Nov 29 11:22:08 Tower kernel: handle_stripe+0xefc/0x1293 [md_mod]
Nov 29 11:22:08 Tower kernel: unraidd+0xb8/0x111 [md_mod]
Nov 29 11:22:08 Tower kernel: ? md_open+0x2c/0x2c [md_mod]
Nov 29 11:22:08 Tower kernel: ? md_thread+0xbc/0xcc [md_mod]
Nov 29 11:22:08 Tower kernel: ? handle_stripe+0x1293/0x1293 [md_mod]
Nov 29 11:22:08 Tower kernel: md_thread+0xbc/0xcc [md_mod]
Nov 29 11:22:08 Tower kernel: ? wait_woken+0x68/0x68
Nov 29 11:22:08 Tower kernel: kthread+0x111/0x119
Nov 29 11:22:08 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Nov 29 11:22:08 Tower kernel: ret_from_fork+0x35/0x40
Nov 29 11:22:12 Tower kernel: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 30-... } 63749 jiffies s: 7381 root: 0x2/.
Nov 29 11:22:12 Tower kernel: blocking rcu_node structures: l=1:16-31:0x4000/.
Nov 29 11:22:12 Tower kernel: Task dump for CPU 30:
Nov 29 11:22:12 Tower kernel: unraidd         R  running task        0 11752      2 0x80000008
Nov 29 11:22:12 Tower kernel: Call Trace:
Nov 29 11:22:12 Tower kernel: ? md_open+0x2c/0x2c [md_mod]
Nov 29 11:22:12 Tower kernel: ? md_thread+0xbc/0xcc [md_mod]
Nov 29 11:22:12 Tower kernel: ? handle_stripe+0x1293/0x1293 [md_mod]
Nov 29 11:22:12 Tower kernel: ? md_thread+0xbc/0xcc [md_mod]
Nov 29 11:22:12 Tower kernel: ? wait_woken+0x68/0x68
Nov 29 11:22:12 Tower kernel: ? kthread+0x111/0x119
Nov 29 11:22:12 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Nov 29 11:22:12 Tower kernel: ? ret_from_fork+0x35/0x40

 

Is this a bad SATA/molex power connector or bad cable to this part of my backplane? Do I possibly have some ports going out? Any help would be much appreciated.  Thanks for any help that you can provide!

tower-diagnostics-20181130-1023.zip

Edited by chesh
Uploaded diagnostics
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.