Sudden Change in Parity-Check Times


Recommended Posts

  • Replies 73
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Even on what I consider to be a P.O.S. cpu in my main server, I don't think its that bad:

 

raid6: using algorithm sse2x4 gen() 13394 MB/s

 

Interesting. By all accounts my A6 6400K ought to be even more of a POS than your A8 6600K, except for its power consumption, but at stock speed I get

 

raid6: using algorithm sse2x4 gen() 13824 MB/s

 

I'm actually rather fond of those AMD APUs, and the AM1 Kabinis are great value for everyday use.

 

With RC4 I just got my best dual parity check speed so far (2 parity + 6 data, all 5 TB Toshiba MD04ACA500 disks)

 

Last check completed on Friday, 19 August 2016, 08:58 (today), finding 0 errors.

Duration: 10 hours, 3 seconds. Average speed: 138.9 MB/sec

Link to comment

Memory configuration and speed will also play a part

 

I agree, but is the speed quoted per core or total? Our processors are essentially the same, except that yours has twice as many cores and a better GPU.

 

Two 8 GB sticks of DDR3-1600, btw. Asus A88M-A mobo (again, very similar to yours but less capable), SAS-LP HBA (compared with your SAS2).

A6-6400K_vs_A8-6600K.png.760b1848da24635c00bab299664b77b8.png

Link to comment

No. Actually you can confirm that. Just do diffs of /usr/src/linux*/drivers/md/ on each version of unraid. Pay attention to md.c and unraid.c.

 

Yes, but there could be changes that have nothing to do with parity calculation speed, I know nothing about programming so wouldn't be able to tell, Tom already confirmed on the rc4 thread that there are no changes that would affect speed, like I suspected.

Link to comment

Just for the hell of it, I posted a 6.2 / 6.1 compatible unraid-tunables-tester (http://lime-technology.com/forum/index.php?topic=29009.0) to see if it helps out any.

 

Actually found in the last post of this thread.  For reference, here is a direct link:

 

      http://lime-technology.com/forum/index.php?topic=29009.msg492140#msg492140

 

But it is a good idea to read the first few posts in the thread to figure out what the script does and how to use it!

 

EDIT: Running it now to see what the results are...

Link to comment

Just for the hell of it, I posted a 6.2 / 6.1 compatible unraid-tunables-tester (http://lime-technology.com/forum/index.php?topic=29009.0) to see if it helps out any.

 

Actually found in the last post of this thread.  For reference, here is a direct link:

 

      http://lime-technology.com/forum/index.php?topic=29009.msg492140#msg492140

 

But it is a good idea to read the first few posts in the thread to figure out what the script does and how to use it!

thx.  posted the wrong link  :o
Link to comment

Quick update.  I download Squid's modified unraid-tunables-test.sh script and ran it to get settings for the tunables variables.  I modified the settings on the 'Disk Settings' icon to the 'Best Bang-for-the-buck' suggestions and ran the non-correcting parity check again.  Here are the results:

 

Event: unRAID Parity check
Subject: Notice [ROSE] - Parity check finished (0 errors)
Description: Duration: 12 hours, 59 minutes, 20 seconds. Average speed: 64.2 MB/s
Importance: normal

 

The change was about one minute (longer).  Except for the knowledge that was gained, it doesn't seem worth the effort! 

Link to comment

Quick update.  I download Squid's modified unraid-tunables-test.sh script and ran it to get settings for the tunables variables.  I modified the settings on the 'Disk Settings' icon to the 'Best Bang-for-the-buck' suggestions and ran the non-correcting parity check again.  Here are the results:

 

Event: unRAID Parity check
Subject: Notice [ROSE] - Parity check finished (0 errors)
Description: Duration: 12 hours, 59 minutes, 20 seconds. Average speed: 64.2 MB/s
Importance: normal

 

The change was about one minute (longer).  Except for the knowledge that was gained, it doesn't seem worth the effort!

If you still have the table of results, it would probably be worthwhile to post them in the tunables thread, as it might help out Pauven with his new modifications that he's been working on.
Link to comment

OK, after having suffer through a game of whack-a-mole (on this thread and the rc4 one) about my AMD Sempron 140, I decided to upgrade to a dual core processor.  Looking around on E-bay, I found a AMD Athlon II X2 260 3.2GHz CPU for $14.00 and got it in a couple of days ago.  (Note:  The unRAID version used for all of these tests is 6.2 rc4.)

 

Let's begin by comparing spec's:

 

CPU                                Pass Mark                    P + Q Algorithm 
Sempron 140 @ 2.7GHz               739                       prefetch64-sse (10844 MB/s) + sse2x4 (7386 MB/s)
Athlon II  260 @ 3.2 GHz           1894                      prefetch64-sse (12848 MB/s) + sse2x4 (8082 MB/s)

 

Perhaps the first thing you will note is that there isn't much increase in the P + Q numbers.  For a 20% increase in clock speed, there was only a 9% increase the P =Q Algorithm.  For the increase in the Pass Mark number, it is even worst!  So much for the preliminary look at the numbers.  The real question is what is happening in the real word.  For this part, I ran both the single and dual parity non-correcting Parity Check.  I also had a comparable number for dual parity for the Sempron and I will include it. (I don't have a single parity test for rc4 because I never thought I would need it!)

 

CPU                             Parity type               Average Speed
Athlon II  260 @ 3.2 GHz        Single                        105.8MB/s
Athlon II  260 @ 3.2 GHz        Dual                           77.8MB/s

Sempron 140 @ 2.7GHz            Dual                           64.2MB/s

 

An interesting observation is that the speed for the dual parity test for 21% faster for the Athlon II than for the Sempron.  Now that is a most interesting observation.  Also, it is almost identical to the increase in clock frequency!  And it corresponds with something I observed when the dual parity test was running.  One of the cores was pegged at 100% while the other one was running between (say) 10% and 60%.  Remember this is still a low end CPU and the GUI can really suck-up CPU resources.  (And that is what it looks like to me was happening on that second core.)  Since this a dual core processor, the Pass Mark for each core is 947 is only a 28%  increase over the Sempron. 

 

What I suspect is happening (at least for the AMD processors) is that the Parity checking subroutine runs only on a single core!  This would mean that one would need an AMD processor with a Pass Mark of about 1300 per core for dual parity speeds to be approximately equal to the single parity speeds.  (I am not sure what happens with Intel processors

 

Of course, this issue probably disappears when one has a processor that supports the AVX2 extensions.  And I found that I made a mistake when I said earlier that support started with Sandy Bridge.  (That was AVX.)  For AVX2, it is actually Haswell (2013) for Intel and Carrizo (2015) for AMD. 

 

 

Link to comment

That's surprising, based on my tests I would expect a much bigger improvement.

 

I used a slower Sempron and got worst results than yours, as expected, but a less powerful dual core got much better results, so maybe there's something else affecting the results.

 

Do you mind posting your current diagnostics, if possible start a parity check, wait a couple of minutes and grab them to capture CPU usage so I can compare.

 

 

Link to comment

That's surprising, based on my tests I would expect a much bigger improvement.

 

I used a slower Sempron and got worst results than yours, as expected, but a less powerful dual core got much better results, so maybe there's something else affecting the results.

 

Do you mind posting your current diagnostics, if possible start a parity check, wait a couple of minutes and grab them to capture CPU usage so I can compare.

 

No Problem.  Here is the diagnostics file.  There will be a delay in the screen capture as the array was still setup for single parity after running the single parity tests.  Array is currently rebuilding the P + Q disk.  That will take about three to four hours. 

rose-diagnostics-20160827-0650.zip

Link to comment

OK, sorry about the confusion.  The rebuilt of P + Q is completed (3hrs, 7 min, 56 sec).

 

I started the non-correcting parity check, shutdown the GUI, and logged in via Putty.  I ran the 'diagnostics' from the  command line at 1025.  (Figured the CLI would have less impact on CPU usage than the GUI!)

 

A few minutes later, I went restarted the GUI and collected the diagnostics file using the GUI at 1029. 

 

Both files are attached.  As a piece of information for an inquiring mind, where is the CPU usage information stored in these diagnostics files? 

rose-diagnostics-20160827-1025.zip

rose-diagnostics-20160827-1029.zip

Link to comment

system\ps.txt

 

UID        PID  PPID  C STIME TTY          TIME CMD

...

root    10626    2 10 06:54 ?        00:22:19 [mdrecoveryd]

...

root    10763    2 30 06:54 ?        01:04:34 [unraidd]

...

 

At this time parity check is using only 40%, rather low.

 

Interesting.  But difficult to determine exactly what the numbers mean.  (I went googling for the column headings and apparently the 'STIME' is the start time of the process.  Both of those processes started at 6:54 today (At least, that is what I got from the googled description) and that is when I began the rebuilt of the P + Q.  The parity check started about 10:22.  And the 'C' value is (may be) calculated over the entire time that the process has been running.  Do you (or anyone else) know of a command that allows us to look at the CPU usage at a 'snap-shot' basis?  Especially from the command line...

 

If not, I wonder if a reboot is in order to clarify things a bit?

Link to comment

I'm not seeing anything obvious, but CPU usage is low, so it's not that, two things you can try if you're willing, one easy, another more complicated:

 

1- Change tunables, you're using default which is usually sufficient for a small array, but try these:

 

md_num_stripes: 4096

md_sync_window: 2048

md_syn_thresh: 2000

 

No need to do a full check, start one and check the GUI a couple of times for current speed.

 

2- This one will leave you with an unprotected array, since this is your test server it may not be an issue, but don't do it if you have critical data here:

 

Do a new config, re-assign all disks as data disks, start array and do a read check, this is a new option on v6.2 when there's no parity, since there are no calculations it's faster on CPU limited servers, if the speed remains the same there's something else limiting your speed.

 

When done you can do another new config, re-assign your parity disks and trust parity, though it's normal for the next check to correct a few sync errors.

Link to comment

Suggestion (1)  -- changed values and started the parity check.  Speed (at around 9 minutes in) was around 53.5MB/s which is slower than the default.  I also took some time and had a look at the GUI Dashboard for CPU usage for a couple of minutes.  Currently, Total CPU usage is varying from 67% to 97%.  CPU0 ranges from 53% to 99%.  CPU1 ranges from 94% to 100%.  Shutdown the GUI for a bit...

 

Restarted GUI.  54.7MB/s at 19 minutes in.  So no real improvement.  (This confirms what I found earlier when I changed them.) 

 

I will copy over the critical data on this server to the Media server and see what happens with suggestion (2).  (I also have off-site storage of the irreplaceable stuff but that is never quite current.)  It will take a while...

 

 

Link to comment

system\ps.txt

 

UID        PID  PPID  C STIME TTY          TIME CMD

...

root    10626    2 10 06:54 ?        00:22:19 [mdrecoveryd]

...

root    10763    2 30 06:54 ?        01:04:34 [unraidd]

...

 

At this time parity check is using only 40%, rather low.

 

Interesting.  But difficult to determine exactly what the numbers mean.  (I went googling for the column headings and apparently the 'STIME' is the start time of the process.  Both of those processes started at 6:54 today (At least, that is what I got from the googled description) and that is when I began the rebuilt of the P + Q.  The parity check started about 10:22.  And the 'C' value is (may be) calculated over the entire time that the process has been running.  Do you (or anyone else) know of a command that allows us to look at the CPU usage at a 'snap-shot' basis?  Especially from the command line...

 

As far as I can remember, the 'C' value is a current CPU activity level, running from 00 to 99, a 'snap-shot' type value.  The cumulative CPU usage column is under 'TIME'.

Link to comment

Interesting find!  Using the  top  command, you can see the CPU % on a real time basis with a snap-shot every few seconds.  This is copy of a portion of the output screen that I got after a killed the top process.

 

Tasks: 169 total,   3 running, 166 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us, 55.5 sy,  0.0 ni, 21.7 id,  0.0 wa,  0.0 hi, 22.3 si,  0.0 st
KiB Mem :  6114940 total,   248304 free,   104628 used,  5762008 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  5249896 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10763 root      20   0       0      0      0 R  98.7  0.0 106:23.68 unraidd
10626 root      20   0       0      0      0 D  34.9  0.0  39:58.82 mdrecoveryd
17323 root       0 -20       0      0      0 S  25.3  0.0  16:06.83 kworker/0:2H
1474 root      20   0    9680   2504   2052 S   0.7  0.0  14:02.18 cpuload
2398 root      20   0   16604   2852   2112 R   0.7  0.0   0:00.10 top
15051 root      20   0   90904   4700   2888 S   0.7  0.1  22:59.99 emhttp
  960 root       0 -20       0      0      0 S   0.3  0.0  13:12.26 kworker/1:1H

 

The mdroveryd process would peak as high as 51%  and this listed value was close to its low.  There were two other processes, kworker/0:2H  and kworker/1:1H, that seemed to combine for about 25% of %CPU.  It seem to vary as which of these two was the high one.  (I would assume that the %CPU is the percent of the CPU core that the process is running on.)  unraidd  was always the highest percentage process and it seemed it was locked at about 98%+ virtually all of the time.

 

The remaining processes seemed to be sleeping. At least, they were not above a 1 or 2 percent at any time.

 

 

Link to comment
  • 9 months later...

I can corroborate your findings on my J1900 box. Dual parity + 5 data disks  (7 8tb disks in total) is bottlenecked due to one core (the cpu has 4 real cores not hypthreaded ones) being maxed out by the unraidd process.This is with 6.3.5.  Passmark of my cpu is 1834 and the relevant internal speedtests show:

Jun  6 13:53:13 Tower kernel:   prefetch64-sse:  9376.000 MB/sec
Jun  6 13:53:13 Tower kernel:   generic_sse:  8332.000 MB/sec
Jun  6 13:53:13 Tower kernel: xor: using function: prefetch64-sse (9376.000 MB/sec)
..
Jun  6 13:53:13 Tower kernel: raid6: sse2x1   gen()   605 MB/s
Jun  6 13:53:13 Tower kernel: raid6: sse2x1   xor()  2052 MB/s
Jun  6 13:53:13 Tower kernel: raid6: sse2x2   gen()  1289 MB/s
Jun  6 13:53:13 Tower kernel: raid6: sse2x2   xor()  2687 MB/s
Jun  6 13:53:13 Tower kernel: raid6: sse2x4   gen()  2039 MB/s
Jun  6 13:53:13 Tower kernel: raid6: sse2x4   xor()  2457 MB/s
Jun  6 13:53:13 Tower kernel: raid6: using algorithm sse2x4 gen() 2039 MB/s
Jun  6 13:53:13 Tower kernel: raid6: .... xor() 2457 MB/s, rmw enabled
Jun  6 13:53:13 Tower kernel: raid6: using ssse3x2 recovery algorithm

My conclusion is that the parity calculation code is probably single threaded. Which is good news, as it should be simple to use a multithreaded implementation: Either by assigning different groups of stripes to different threads or - not quite es efficient - just use one thread per parity disk. As unraid is geared towards reusing old/low power hardware to built NAS Storrage, I would very much welcome if this gets some priority with the devs. After all much better to use hardware to its fullest potential than to just throw it away and buy new stuff all the time (sustainability, save the planet, bla bla).

 

Disclaimer: Despite our findings their is still a chance that there is another culprit for the unraidd process saturating one core, but i ruled out pci saturation/a-link/sata/tunables etc. etc. as far as i could.

 

On 27.8.2016 at 9:25 PM, Frank1940 said:

The mdroveryd process would peak as high as 51%  and this listed value was close to its low.  There were two other processes, kworker/0:2H  and kworker/1:1H, that seemed to combine for about 25% of %CPU.  It seem to vary as which of these two was the high one.  (I would assume that the %CPU is the percent of the CPU core that the process is running on.)  unraidd  was always the highest percentage process and it seemed it was locked at about 98%+ virtually all of the time.

 

The remaining processes seemed to be sleeping. At least, they were not above a 1 or 2 percent at any time.

 

 

 

Edited by Videodr0me
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.