darckhart Posted October 26, 2020 Share Posted October 26, 2020 So I upgraded to 6.9.0-beta30 from beta25 a couple days ago. I started parity check last night. Woke up this morning and wanted to see progress and it was not nearly where I thought it would be. Turns out the checking rate is only around 23 MB/s! I try to run a parity check every month or every other month as I remember to, and usually the checking rate is about 85 MB/s (and when things were brand new I swear it was around 160 MB/s). Nothing has changed (hardware) and maybe the Community plugins that I update fairly regularly. SMART status says all my drives are 'healthy' (6x 6 TB, plus 2x 6 TB as parity, all HGST Deskstar 7200 rpm) and 2x 1 TB Samsung 850 SSD as cache. Any ideas? Thanks Quote Link to comment
trurl Posted October 26, 2020 Share Posted October 26, 2020 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
darckhart Posted October 26, 2020 Author Share Posted October 26, 2020 OK attached! Thanks amphora-diagnostics-20201026-1313.zip Quote Link to comment
trurl Posted October 26, 2020 Share Posted October 26, 2020 Nothing obvious. The only thing I see in syslog is a possible connection problem with an Unassigned Device that looks like isn't connected anymore. That shouldn't have any effect on parity check speed. Is anything reading or writing the server while it is checking parity? Quote Link to comment
darckhart Posted October 26, 2020 Author Share Posted October 26, 2020 btw thanks for your troubleshooting support! The Uassigned Device is a USB hard drive dock so nothing to worry about there. I don't think anything should be interfering when it's checking? I really only have a media server docker (jellyfin) and I turn it off before doing the check. I was wondering (maybe) if the unraid update to new linux kernels invoke some harsh mitigation penalty for the crappy intel cpu and so it is slower? Then again I don't know if any of those operations are even required in parity check. Maybe not? Quote Link to comment
trurl Posted October 26, 2020 Share Posted October 26, 2020 39 minutes ago, darckhart said: harsh mitigation penalty Don't know if this is something that might help or not Quote Link to comment
JorgeB Posted October 27, 2020 Share Posted October 27, 2020 -beta25 to 30 shouldn't make much of a difference, you can also try the diskspeed docker to confirm all disks and controllers are performing normally. Quote Link to comment
darckhart Posted October 27, 2020 Author Share Posted October 27, 2020 Thanks both. I will check into those two things and report back. Quote Link to comment
darckhart Posted October 28, 2020 Author Share Posted October 28, 2020 I installed the Disable Mitigation Plugin and the Diskspeed docker. I had to reboot for these to take effect, which means parity check cancelled about 70% in, so I guess I have to start over. Attached here is the results of diskspeed. They are all over 100 MB/s. How do I interpret these results against the parity check throughput? I will have to wait until tonight to restart parity check to see if the cpu disable mitigations has any effect. (Too hot during day.) Quote Link to comment
JorgeB Posted October 28, 2020 Share Posted October 28, 2020 Disks look fine, was the controller bandwidth test also normal? If yes and still slow parity check install sysstat using the Nerdpack plugin, then start a parity check, let it run for a couple of minutes, go to the console and post the output of: sar -dp 5 5 Quote Link to comment
darckhart Posted October 29, 2020 Author Share Posted October 29, 2020 (edited) Looks like Disabling Mitigation did not affect anything. I am still seeing checking throughput at about 23 MB/s. I attached two images. One is for Parity 1 and Disk 4, the other is Parity 2 and Disk 2. Controller throughput looks fine to me? Here is output of the terminal command. Seems like something is capping it? Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util Average: loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: loop1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: loop2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: loop3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sdc 33.53 22534.29 0.00 672.00 1.06 31.66 2.93 9.83 Average: sdd 33.53 22534.29 0.00 672.00 1.03 30.58 2.28 7.65 Average: sde 33.49 22507.43 0.00 672.00 1.21 36.14 7.50 25.11 Average: sdf 33.49 22507.43 0.00 672.00 1.23 36.66 7.88 26.39 Average: sdg 33.49 22507.43 0.00 672.00 1.23 36.66 7.78 26.07 Average: sdh 33.49 22507.43 0.00 672.00 1.25 37.19 8.22 27.55 Average: sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: sdj 33.53 22534.29 0.00 672.00 1.11 33.11 4.59 15.41 Average: sdl 33.49 22507.43 0.00 672.00 1.13 33.74 5.20 17.41 Average: md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: md3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: md4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: md5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: md6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Edited October 29, 2020 by darckhart Quote Link to comment
JorgeB Posted October 29, 2020 Share Posted October 29, 2020 Here is output of the terminal command This confirms it's not disk related, all very far from 100% utilization. Taking a second look at your hardware this is almost certainly the problem:Model name: Intel(R) Celeron(R) CPU J3160 @ 1.60GHz As you add more disks CPU won't be able to keep up, especially with dual parity. Quote Link to comment
darckhart Posted October 29, 2020 Author Share Posted October 29, 2020 Really appreciate your continued help troubleshooting this! Good to know it is not disk related. Your highlighting of CPU is interesting. However, I have NOT added nor changed any hardware since the beginning: same CPU, same drives and number of drives. Additionally, it is not a gradual decrease in throughput over time; it appears to be huge drops. See attached throughput history. In the beginning, it was blazing fast like 160 MB/s. Then somewhere along the way it dropped to 85 MB/s. (I believe the super high outlier rates are an artefact of how the table is generated. I don't think it takes into account that I pause the check, so when it resumes and there's a teeny bit left to check, it might appear way faster than it should be?) I am trying to guess (with no knowledge really about how all this works) what might be influencing throughput. 1. We seem to have concluded that it's not disk related nor controller related. However, when looking at the output of that terminal command, the speeds looked odd to me. They are all such similar numbers that it made me wonder if there was something acting to cap it. 2. Is it software related? ie, I thought maybe the linux kernel updates with mitigations for Intel might be a factor, but we checked that too and doesn't seem so. Has maybe Parity Check function itself changed in the UnraidOS updates? 3. I would think also that as the Parity drive fills up with Parity data, it would take longer to get through it all. However, that shouldn't influence throughput correct? parity-history.txt Quote Link to comment
trurl Posted October 29, 2020 Share Posted October 29, 2020 23 minutes ago, darckhart said: same drives and number of drives Parity2 is known to require more CPU. Was it working OK with dual parity and now it isn't? Quote Link to comment
trurl Posted October 29, 2020 Share Posted October 29, 2020 26 minutes ago, darckhart said: I would think also that as the Parity drive fills up with Parity data, it would take longer to get through it all. However, that shouldn't influence throughput correct? Parity is always "full". Better to not even think of it as "data". It is just a bunch of parity bits and parity check always checks all of them. Quote Link to comment
JorgeB Posted October 29, 2020 Share Posted October 29, 2020 46 minutes ago, darckhart said: 2. Is it software related? If the array config is the same, including parity2, them most likely it's Unraid/kernel release related, it's not unusual for speeds to get a little slower with each newer kernel, sometimes there's a more noticeable drop, a low end CPU might amplify these small drops and turn them into larger ones. You could confirm that by for example downgrading to v6.7.2 and see if it's better with that one. Quote Link to comment
darckhart Posted October 30, 2020 Author Share Posted October 30, 2020 23 hours ago, trurl said: Parity2 is known to require more CPU. Was it working OK with dual parity and now it isn't? Yes, that is correct. There have been no hardware configuration changes. I have always had two parity. 23 hours ago, JorgeB said: If the array config is the same, including parity2, them most likely it's Unraid/kernel release related, it's not unusual for speeds to get a little slower with each newer kernel, sometimes there's a more noticeable drop, a low end CPU might amplify these small drops and turn them into larger ones. You could confirm that by for example downgrading to v6.7.2 and see if it's better with that one. Thanks very much for pointing out that issue thread. I will add to it. I am a little worried about downgrading. Might it break compatibility with things? Quote Link to comment
JorgeB Posted October 30, 2020 Share Posted October 30, 2020 44 minutes ago, darckhart said: Might it break compatibility with things? It won't break anything with the NAS part, you can just do a quick test after booting, but better to disable docker/VMs before just in case. Quote Link to comment
JorgeB Posted October 30, 2020 Share Posted October 30, 2020 Forgot to mention, since you're on the beta downgrading would unassign the cache pool, but just re-assign it or wait to go back to beta and it would be picked up as it was. Quote Link to comment
darckhart Posted October 31, 2020 Author Share Posted October 31, 2020 Thanks for both those advice. I will report back after trying the downgrade. (might be a while since I want to let the check complete.) Quote Link to comment
JorgeB Posted November 4, 2020 Share Posted November 4, 2020 @darckhartyou're issue might be related to this: https://forums.unraid.net/bug-reports/prereleases/690-beta-30-pre-skylake-intel-cpus-stuck-at-lowest-pstate-r1108/?do=findComment&comment=11255 Quote Link to comment
darckhart Posted November 5, 2020 Author Share Posted November 5, 2020 (edited) Interesting link! Thanks so much for the find! I think it is making a difference. The first /proc/cpuinfo command shows me 480 MHz. After the second command and then check CPU again, it now shows one core boosted up to 2.2 GHz. (I'm doing a SHA2 hash.) I don't understand your last post though: "add that line to the go file." What is that and how do I do it? Also, in case I need to do it, how do I downgrade? I found instructions for how to go back to the previously used one (but for me that would be 6.9.0-b25) but I need to go further back. How is that done? Edited November 5, 2020 by darckhart Quote Link to comment
JorgeB Posted November 5, 2020 Share Posted November 5, 2020 3 hours ago, darckhart said: What is that and how do I do it? In the flash drive: /config/go 3 hours ago, darckhart said: How is that done? Copy all the bz* files from the release to the flash drive replacing existing ones. Quote Link to comment
darckhart Posted November 6, 2020 Author Share Posted November 6, 2020 Thanks. I'll give it a try this weekend. If method 1 works permanently, I guess that will be good since I won't have to downgrade. Quote Link to comment
darckhart Posted January 18, 2021 Author Share Posted January 18, 2021 just an update: ran a parity check over this weekend. same issue (throughput is around 65 MB/s) even with the CPU MHz uncapped, mitigations off, and pstate set to performance. seems like downgrading may be the only option now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.