why parity check rate so slow?


Recommended Posts

So I upgraded to 6.9.0-beta30 from beta25 a couple days ago. I started parity check last night. Woke up this morning and wanted to see progress and it was not nearly where I thought it would be. Turns out the checking rate is only around 23 MB/s! I try to run a parity check every month or every other month as I remember to, and usually the checking rate is about 85 MB/s (and when things were brand new I swear it was around 160 MB/s). Nothing has changed (hardware) and maybe the Community plugins that I update fairly regularly. SMART status says all my drives are 'healthy' (6x 6 TB, plus 2x 6 TB as parity, all HGST Deskstar 7200 rpm) and 2x 1 TB Samsung 850 SSD as cache. Any ideas? Thanks

Link to comment

btw thanks for your troubleshooting support!

 

The Uassigned Device is a USB hard drive dock so nothing to worry about there.

 

I don't think anything should be interfering when it's checking? I really only have a media server docker (jellyfin) and I turn it off before doing the check.

 

I was wondering (maybe) if the unraid update to new linux kernels invoke some harsh mitigation penalty for the crappy intel cpu and so it is slower? Then again I don't know if any of those operations are even required in parity check. Maybe not?

Link to comment

I installed the Disable Mitigation Plugin and the Diskspeed docker.

 

I had to reboot for these to take effect, which means parity check cancelled about 70% in, so I guess I have to start over.

 

Attached here is the results of diskspeed. They are all over 100 MB/s. How do I interpret these results against the parity check throughput?

 

I will have to wait until tonight to restart parity check to see if the cpu disable mitigations has any effect. (Too hot during day.)

benchmark-speeds-20201028.png

Link to comment

Looks like Disabling Mitigation did not affect anything. I am still seeing checking throughput at about 23 MB/s.

 

I attached two images. One is for Parity 1 and Disk 4, the other is Parity 2 and Disk 2. Controller throughput looks fine to me?

 

Here is output of the terminal command. Seems like something is capping it?

 

Average:          DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
Average:        loop0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        loop1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        loop2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:        loop3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdc     33.53  22534.29      0.00    672.00      1.06     31.66      2.93      9.83
Average:          sdd     33.53  22534.29      0.00    672.00      1.03     30.58      2.28      7.65
Average:          sde     33.49  22507.43      0.00    672.00      1.21     36.14      7.50     25.11
Average:          sdf     33.49  22507.43      0.00    672.00      1.23     36.66      7.88     26.39
Average:          sdg     33.49  22507.43      0.00    672.00      1.23     36.66      7.78     26.07
Average:          sdh     33.49  22507.43      0.00    672.00      1.25     37.19      8.22     27.55
Average:          sdi      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdk      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          sdj     33.53  22534.29      0.00    672.00      1.11     33.11      4.59     15.41
Average:          sdl     33.49  22507.43      0.00    672.00      1.13     33.74      5.20     17.41
Average:          md1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          md2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          md3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          md4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          md5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:          md6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

controller-benchmark-p1d4.png

controller-benchmark-p2d2.png

Edited by darckhart
Link to comment
Here is output of the terminal command

This confirms it's not disk related, all very far from 100% utilization.

 

Taking a second look at your hardware this is almost certainly the problem:

Model name:                      Intel(R) Celeron(R) CPU  J3160  @ 1.60GHz

 

As you add more disks CPU won't be able to keep up, especially with dual parity.

 

 

Link to comment

Really appreciate your continued help troubleshooting this!

 

Good to know it is not disk related.

 

Your highlighting of CPU is interesting. However, I have NOT added nor changed any hardware since the beginning: same CPU, same drives and number of drives. Additionally, it is not a gradual decrease in throughput over time; it appears to be huge drops. See attached throughput history. In the beginning, it was blazing fast like 160 MB/s. Then somewhere along the way it dropped to 85 MB/s. (I believe the super high outlier rates are an artefact of how the table is generated. I don't think it takes into account that I pause the check, so when it resumes and there's a teeny bit left to check, it might appear way faster than it should be?)

 

I am trying to guess (with no knowledge really about how all this works) what might be influencing throughput.

 

1. We seem to have concluded that it's not disk related nor controller related. However, when looking at the output of that terminal command, the speeds looked odd to me. They are all such similar numbers that it made me wonder if there was something acting to cap it.

 

2. Is it software related? ie, I thought maybe the linux kernel updates with mitigations for Intel might be a factor, but we checked that too and doesn't seem so. Has maybe Parity Check function itself changed in the UnraidOS updates?

 

3. I would think also that as the Parity drive fills up with Parity data, it would take longer to get through it all. However, that shouldn't influence throughput correct?

 

parity-history.txt

Link to comment
26 minutes ago, darckhart said:

I would think also that as the Parity drive fills up with Parity data, it would take longer to get through it all. However, that shouldn't influence throughput correct?

Parity is always "full". Better to not even think of it as "data". It is just a bunch of parity bits and parity check always checks all of them.

Link to comment
46 minutes ago, darckhart said:

2. Is it software related?

If the array config is the same, including parity2, them most likely it's Unraid/kernel release related, it's not unusual for speeds to get a little slower with each newer kernel, sometimes there's a more noticeable drop, a low end CPU might amplify these small drops and turn them into larger ones.

 

You could confirm that by for example downgrading to v6.7.2 and see if it's better with that one.

Link to comment
23 hours ago, trurl said:

Parity2 is known to require more CPU. Was it working OK with dual parity and now it isn't?

Yes, that is correct. There have been no hardware configuration changes. I have always had two parity.

 

23 hours ago, JorgeB said:

If the array config is the same, including parity2, them most likely it's Unraid/kernel release related, it's not unusual for speeds to get a little slower with each newer kernel, sometimes there's a more noticeable drop, a low end CPU might amplify these small drops and turn them into larger ones.

 

You could confirm that by for example downgrading to v6.7.2 and see if it's better with that one.

Thanks very much for pointing out that issue thread. I will add to it. I am a little worried about downgrading. Might it break compatibility with things?

Link to comment

Interesting link! Thanks so much for the find!

 

I think it is making a difference. The first /proc/cpuinfo command shows me 480 MHz. After the second command and then check CPU again, it now shows one core boosted up to 2.2 GHz. (I'm doing a SHA2 hash.)

 

I don't understand your last post though: "add that line to the go file." What is that and how do I do it?

 

Also, in case I need to do it, how do I downgrade? I found instructions for how to go back to the previously used one (but for me that would be 6.9.0-b25) but I need to go further back. How is that done?

Edited by darckhart
Link to comment
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.