bondoo0 Posted October 31, 2018 Share Posted October 31, 2018 I'm having an issue over the last several parity checks. I've gone from around 75-80 MB/s to 55-60 MB/s and of course the duration has gone up as well, from 11 hours to 14.5. The speed duration isn't a big deal, but I have also noticed that the server itself isn't as responsive while the parity check is running. Previously I was able to use Plex as normal while the parity check was running, but now it buffers or stops during playback if the parity check is running. I had previously changed the tunable attributes, so I changed those back to normal for the last 2 parity checks, but the issue still remains. I haven't changed any hardware or added anything during that time. When I looked at system stats before the issue I would see about 900 MB/s total throughput at the start of the check, now I'm seeing about 600 MB/s. Any suggestions to improve this? Hardware is 2 and 3 TB drives, on a supermicro H8DM8-2 motherboard with 2 Six-Core AMD Opteron 2431 2.4 Ghz chips, and 24 GB of RAM.. I use 5 ports on the MB, the other drives are connected to 3 Supermicro SAT2-MV8 HBA cards with a total of 12 disks connected. I'm running 6.6.0 RC 4, with dual parity drives, and Plex is in a docker. container I have attached my diagnostics zip, so let me know if there is something else that is needed to help diagnose this. Thanks for any help. unraid-server1-diagnostics-20181031-1529.zip Quote Link to comment
bastl Posted October 31, 2018 Share Posted October 31, 2018 31 minutes ago, bondoo0 said: I'm running 6.6.0 RC 4 Current stable version is 6.6.3. I think some people reported during the RC tests your behaviour. Did you tried the current stable build? My parity check is running right now and doesn't show any slowdowns. I get between 110-130MB/s. Same as on 6.5.3 Quote Link to comment
JorgeB Posted November 1, 2018 Share Posted November 1, 2018 You're likely getting CPU limited, later releases require a little more CPU power, if you haven't tried yet see if these tunables help: Tunable (md_num_stripes): 4096 Tunable (md_sync_window): 2048 Tunable (md_sync_thresh): 2000 Quote Link to comment
bondoo0 Posted November 1, 2018 Author Share Posted November 1, 2018 So I upgraded to 6.6.3, and changed the tunables. When I manually kicked off a parity check, I'm still gettin the slower (600 MB/s) throughput, and I would say the accessibility of the server is worse than before. For example, now not only can I not stream through Plex, SMB shares won't come up, and the log viewer won't come up either. I can also say I don't think it's CPU constrained since the CPU usage never goes above 20% I did discover that with the higher tunable values the server itself became unresponsive (I couldn't view the log, couldn't cancel the parity check, and finally had to reboot from the command line). When I reduce the values it goes back to the server being responsive, just can't stream etc. I did notice this happening in the logs, which I assume points to CPU issues (but why am I only seeing it on parity check)? Nov 1 07:37:07 unraid-server1 kernel: INFO: rcu_sched self-detected stall on CPU Nov 1 07:37:07 unraid-server1 kernel: 8-....: (420008 ticks this GP) idle=85a/1/4611686018427387906 softirq=9839/9839 fqs=103727 Nov 1 07:37:07 unraid-server1 kernel: (t=420008 jiffies g=7518 c=7517 q=888072) Nov 1 07:37:07 unraid-server1 kernel: NMI backtrace for cpu 8 Nov 1 07:37:07 unraid-server1 kernel: CPU: 8 PID: 10774 Comm: unraidd Not tainted 4.18.15-unRAID #1 Nov 1 07:37:07 unraid-server1 kernel: Hardware name: Supermicro H8DM8-2/H8DM8-2, BIOS 080014 10/22/2009 Nov 1 07:37:07 unraid-server1 kernel: Call Trace: Nov 1 07:37:07 unraid-server1 kernel: <IRQ> Nov 1 07:37:07 unraid-server1 kernel: dump_stack+0x5d/0x79 Nov 1 07:37:07 unraid-server1 kernel: nmi_cpu_backtrace+0x71/0x83 Nov 1 07:37:07 unraid-server1 kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Nov 1 07:37:07 unraid-server1 kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7 Nov 1 07:37:07 unraid-server1 kernel: rcu_dump_cpu_stacks+0x91/0xbb Nov 1 07:37:07 unraid-server1 kernel: rcu_check_callbacks+0x23f/0x5ca Nov 1 07:37:07 unraid-server1 kernel: ? tick_sched_handle.isra.5+0x2f/0x2f Nov 1 07:37:07 unraid-server1 kernel: update_process_times+0x23/0x45 Nov 1 07:37:07 unraid-server1 kernel: tick_sched_timer+0x36/0x64 Nov 1 07:37:07 unraid-server1 kernel: __hrtimer_run_queues+0xb1/0x105 Nov 1 07:37:07 unraid-server1 kernel: hrtimer_interrupt+0xf4/0x20d Nov 1 07:37:07 unraid-server1 kernel: smp_apic_timer_interrupt+0x79/0x89 Nov 1 07:37:07 unraid-server1 kernel: apic_timer_interrupt+0xf/0x20 Nov 1 07:37:07 unraid-server1 kernel: </IRQ> Nov 1 07:37:07 unraid-server1 kernel: RIP: 0010:raid6_sse24_gen_syndrome+0xed/0x1b3 Nov 1 07:37:07 unraid-server1 kernel: Code: db e8 66 0f db f8 66 44 0f db e8 66 44 0f db f8 66 0f ef e5 66 0f ef f7 66 45 0f ef e5 66 45 0f ef f7 48 8b 0a 66 0f 6f 2c 01 <66> 42 0f 6f 3c 11 66 46 0f 6f 2c 01 66 46 0f 6f 3c 19 66 0f ef d5 Nov 1 07:37:07 unraid-server1 kernel: RSP: 0018:ffffc9000414fc80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Nov 1 07:37:07 unraid-server1 kernel: RAX: 0000000000000440 RBX: 0000000000000008 RCX: ffff8802ee9a7000 Nov 1 07:37:07 unraid-server1 kernel: RDX: ffff8802ee9893f0 RSI: 0000000000000440 RDI: 0000000000000004 Nov 1 07:37:07 unraid-server1 kernel: RBP: ffff8806110da000 R08: 0000000000000460 R09: ffff8802ee989410 Nov 1 07:37:07 unraid-server1 kernel: R10: 0000000000000450 R11: 0000000000000470 R12: ffff8806110db000 Nov 1 07:37:07 unraid-server1 kernel: R13: 0000000000001000 R14: ffff8802ee9893d0 R15: 0000000000000008 Nov 1 07:37:07 unraid-server1 kernel: check_parity+0x202/0x349 [md_mod] Nov 1 07:37:07 unraid-server1 kernel: ? autoremove_wake_function+0x9/0x2a Nov 1 07:37:07 unraid-server1 kernel: ? __wake_up_common+0xa5/0x121 Nov 1 07:37:07 unraid-server1 kernel: handle_stripe+0xe8a/0x1226 [md_mod] Nov 1 07:37:07 unraid-server1 kernel: unraidd+0xbc/0x123 [md_mod] Nov 1 07:37:07 unraid-server1 kernel: ? md_open+0x2c/0x2c [md_mod] Nov 1 07:37:07 unraid-server1 kernel: md_thread+0xcc/0xf1 [md_mod] Nov 1 07:37:07 unraid-server1 kernel: ? wait_woken+0x68/0x68 Nov 1 07:37:07 unraid-server1 kernel: kthread+0x10b/0x113 Nov 1 07:37:07 unraid-server1 kernel: ? kthread_flush_work_fn+0x9/0x9 Nov 1 07:37:07 unraid-server1 kernel: ret_from_fork+0x22/0x40 Quote Link to comment
JorgeB Posted November 2, 2018 Share Posted November 2, 2018 19 hours ago, bondoo0 said: I can also say I don't think it's CPU constrained since the CPU usage never goes above 20% Parity check is single threaded. 19 hours ago, bondoo0 said: I did notice this happening in the logs, which I assume points to CPU issues (but why am I only seeing it on parity check)? This happens to some users in recent releases, start lowering the tunables little by little until the call traces stop. Quote Link to comment
bondoo0 Posted November 2, 2018 Author Share Posted November 2, 2018 3 hours ago, johnnie.black said: Parity check is single threaded. Good to know, and that would make sense now, since it was running about 16-17% which would be one of the six cores at 100 percent. I played with it a bit yesterday and found some things. The md_stripes and md_window don't seem to impact this issue, but the md_sync_thresh is the value that makes a huge difference. If I bump it up to say the 2000 value suggested above, the machine basically locks up, and has to be brought down from the command line. I had to reduce it all the way to 96 to get it so that the call traces don't show up, and I'm able to use the machine normally. The other thing of note is changing this didn't appear to slow the parity check. I'm still getting about 60 MB/s even with the reduced value. Thanks everyone for the help with this. 3 hours ago, johnnie.black said: Quote Link to comment
JorgeB Posted November 2, 2018 Share Posted November 2, 2018 15 minutes ago, bondoo0 said: I'm still getting about 60 MB/s even with the reduced value. You'll need a better CPU for that to improve. Quote Link to comment
bondoo0 Posted November 2, 2018 Author Share Posted November 2, 2018 1 hour ago, johnnie.black said: You'll need a better CPU for that to improve. Understood, basically as long as the parity check isn't killing usability (aka getting asked why the rest of the family can't watch a movie), I'm fine with that speed, especially since I think my HBA cards would become a bottleneck shortly after the CPU issue was fixed . Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.