Slower Parity/Performance Issue


bondoo0

Recommended Posts

I'm having an issue over the last several parity checks.  I've gone from around 75-80 MB/s to 55-60 MB/s and of course the duration has gone up as well, from  11 hours to 14.5.  The speed duration isn't a big deal, but I have also noticed that the server itself isn't as responsive while the parity check is running.  Previously I was able to use Plex as normal while the parity check was running, but now it buffers or stops during playback if the parity check is running.  I had previously changed the tunable attributes, so I changed those back to normal for the last 2 parity checks, but the issue still remains.  I haven't changed any hardware or added anything during that time.  When I looked at system stats before the issue I would see about 900 MB/s total throughput at the start of the check, now I'm seeing about 600 MB/s.  Any suggestions to improve this?

 

Hardware is 2 and 3 TB drives, on a supermicro H8DM8-2 motherboard with 2 Six-Core AMD Opteron 2431 2.4 Ghz chips, and 24 GB of RAM..  I use 5 ports on the MB, the other drives are connected to 3 Supermicro SAT2-MV8 HBA cards with a total of 12 disks connected.  I'm running 6.6.0 RC 4, with dual parity drives, and Plex is in a docker. container

 

I have attached my diagnostics zip, so let me know if there is something else that is needed to help diagnose this.

 

Thanks for any help.

unraid-server1-diagnostics-20181031-1529.zip

Link to comment
31 minutes ago, bondoo0 said:

I'm running 6.6.0 RC 4

Current stable version is 6.6.3. I think some people reported during the RC tests your behaviour. Did you tried the current stable build? My parity check is running right now and doesn't show any slowdowns. I get between 110-130MB/s. Same as on 6.5.3

Link to comment

So I upgraded to 6.6.3, and changed the tunables.  When I manually kicked off a parity check, I'm still gettin the slower (600 MB/s) throughput, and I would say the accessibility of the server is worse than before.  For example, now not only can I not stream through Plex, SMB shares won't come up, and the log viewer won't come up either.  

 

I can also say I don't think it's CPU constrained since the CPU usage never goes above 20%

 

I did discover that with the higher tunable values the server itself became unresponsive (I couldn't view the log, couldn't cancel the parity check, and finally had to reboot from the command line).  When I reduce the values it goes back to the server being responsive, just can't stream etc.

 

I did notice this happening in the logs, which I assume points to CPU issues (but why am I only seeing it on parity check)?

 

Nov  1 07:37:07 unraid-server1 kernel: INFO: rcu_sched self-detected stall on CPU
Nov  1 07:37:07 unraid-server1 kernel:     8-....: (420008 ticks this GP) idle=85a/1/4611686018427387906 softirq=9839/9839 fqs=103727 
Nov  1 07:37:07 unraid-server1 kernel:      (t=420008 jiffies g=7518 c=7517 q=888072)
Nov  1 07:37:07 unraid-server1 kernel: NMI backtrace for cpu 8
Nov  1 07:37:07 unraid-server1 kernel: CPU: 8 PID: 10774 Comm: unraidd Not tainted 4.18.15-unRAID #1
Nov  1 07:37:07 unraid-server1 kernel: Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014  10/22/2009
Nov  1 07:37:07 unraid-server1 kernel: Call Trace:
Nov  1 07:37:07 unraid-server1 kernel: <IRQ>
Nov  1 07:37:07 unraid-server1 kernel: dump_stack+0x5d/0x79
Nov  1 07:37:07 unraid-server1 kernel: nmi_cpu_backtrace+0x71/0x83
Nov  1 07:37:07 unraid-server1 kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Nov  1 07:37:07 unraid-server1 kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7
Nov  1 07:37:07 unraid-server1 kernel: rcu_dump_cpu_stacks+0x91/0xbb
Nov  1 07:37:07 unraid-server1 kernel: rcu_check_callbacks+0x23f/0x5ca
Nov  1 07:37:07 unraid-server1 kernel: ? tick_sched_handle.isra.5+0x2f/0x2f
Nov  1 07:37:07 unraid-server1 kernel: update_process_times+0x23/0x45
Nov  1 07:37:07 unraid-server1 kernel: tick_sched_timer+0x36/0x64
Nov  1 07:37:07 unraid-server1 kernel: __hrtimer_run_queues+0xb1/0x105
Nov  1 07:37:07 unraid-server1 kernel: hrtimer_interrupt+0xf4/0x20d
Nov  1 07:37:07 unraid-server1 kernel: smp_apic_timer_interrupt+0x79/0x89
Nov  1 07:37:07 unraid-server1 kernel: apic_timer_interrupt+0xf/0x20
Nov  1 07:37:07 unraid-server1 kernel: </IRQ>
Nov  1 07:37:07 unraid-server1 kernel: RIP: 0010:raid6_sse24_gen_syndrome+0xed/0x1b3
Nov  1 07:37:07 unraid-server1 kernel: Code: db e8 66 0f db f8 66 44 0f db e8 66 44 0f db f8 66 0f ef e5 66 0f ef f7 66 45 0f ef e5 66 45 0f ef f7 48 8b 0a 66 0f 6f 2c 01 <66> 42 0f 6f 3c 11 66 46 0f 6f 2c 01 66 46 0f 6f 3c 19 66 0f ef d5 
Nov  1 07:37:07 unraid-server1 kernel: RSP: 0018:ffffc9000414fc80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Nov  1 07:37:07 unraid-server1 kernel: RAX: 0000000000000440 RBX: 0000000000000008 RCX: ffff8802ee9a7000
Nov  1 07:37:07 unraid-server1 kernel: RDX: ffff8802ee9893f0 RSI: 0000000000000440 RDI: 0000000000000004
Nov  1 07:37:07 unraid-server1 kernel: RBP: ffff8806110da000 R08: 0000000000000460 R09: ffff8802ee989410
Nov  1 07:37:07 unraid-server1 kernel: R10: 0000000000000450 R11: 0000000000000470 R12: ffff8806110db000
Nov  1 07:37:07 unraid-server1 kernel: R13: 0000000000001000 R14: ffff8802ee9893d0 R15: 0000000000000008
Nov  1 07:37:07 unraid-server1 kernel: check_parity+0x202/0x349 [md_mod]
Nov  1 07:37:07 unraid-server1 kernel: ? autoremove_wake_function+0x9/0x2a
Nov  1 07:37:07 unraid-server1 kernel: ? __wake_up_common+0xa5/0x121
Nov  1 07:37:07 unraid-server1 kernel: handle_stripe+0xe8a/0x1226 [md_mod]
Nov  1 07:37:07 unraid-server1 kernel: unraidd+0xbc/0x123 [md_mod]
Nov  1 07:37:07 unraid-server1 kernel: ? md_open+0x2c/0x2c [md_mod]
Nov  1 07:37:07 unraid-server1 kernel: md_thread+0xcc/0xf1 [md_mod]
Nov  1 07:37:07 unraid-server1 kernel: ? wait_woken+0x68/0x68
Nov  1 07:37:07 unraid-server1 kernel: kthread+0x10b/0x113
Nov  1 07:37:07 unraid-server1 kernel: ? kthread_flush_work_fn+0x9/0x9
Nov  1 07:37:07 unraid-server1 kernel: ret_from_fork+0x22/0x40

Link to comment
19 hours ago, bondoo0 said:

I can also say I don't think it's CPU constrained since the CPU usage never goes above 20%

Parity check is single threaded.

 

19 hours ago, bondoo0 said:

I did notice this happening in the logs, which I assume points to CPU issues (but why am I only seeing it on parity check)?

This happens to some users in recent releases, start lowering the tunables little by little until the call traces stop.

 

Link to comment
3 hours ago, johnnie.black said:

Parity check is single threaded.

Good to know, and that would make sense now, since it was running about 16-17% which would be one of the six cores at 100 percent.

 

I played with it a bit yesterday and found some things.  The md_stripes and md_window don't seem to impact this issue, but the md_sync_thresh is the value that makes a huge difference.  If I bump it up to say the 2000 value suggested above, the machine basically locks up, and has to be brought down from the command line.  I had to reduce it all the way to 96 to get it so that the call traces don't show up, and I'm able to use the machine normally.  The other thing of note is changing this didn't appear to slow the parity check.  I'm still getting about 60 MB/s even with the reduced value.

 

Thanks everyone for the help with this.

3 hours ago, johnnie.black said:

 

 

Link to comment
1 hour ago, johnnie.black said:

You'll need a better CPU for that to improve.

Understood, basically as long as the parity check isn't killing usability (aka getting asked why the rest of the family can't watch a movie), I'm fine with that speed, especially since I think my HBA cards would become a bottleneck shortly after the CPU issue was fixed :).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.