unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

Requested values tests:

 

Thanks johnnie.black!

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  2047  |  72.4MB/s 
4096  |   2048   |  128   |  2040  |  76.8MB/s    
4096  |   2048   |  128   |  2032  |  78.3MB/s    
4096  |   2048   |  128   |  2024  |  78.9MB/s    
4096  |   2048   |  128   |  2016  |  80.0MB/s    
4096  |   2048   |  128   |  1984  |  80.0MB/s    
4096  |   2048   |  128   |  1960  |  79.8MB/s    
4096  |   2048   |  128   |  1952  |  80.0MB/s   
4096  |   2048   |  128   |  1920  |  79.8MB/s 
4096  |   2048   |  128   |  1856  |  79.8MB/s 
4096  |   2048   |  128   |  1792  |  79.8MB/s 
4096  |   2048   |  128   |  1728  |  79.8MB/s
4096  |   2048   |  128   |  1664  |  78.5MB/s 
4096  |   2048   |  128   |  1536  |  77.7MB/s
4096  |   2048   |  128   |  1280  |  77.5MB/s
4096  |   2048   |  128   |  1024  |  77.1MB/s 

 

78.8 to 80.0MB/s is a single second difference in total time.

 

You mean 79.8 to 80.0, correct?

 

To my eye, I see a bell curve.

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  1024  |  77.1MB/s 
4096  |   2048   |   64   |  1024  |  77.5MB/s 
4096  |   2048   |   32   |  1024  |  77.3MB/s 
4096  |   2048   |   16   |  1024  |  79.6MB/s 
4096  |   2048   |    8   |  1024  |  79.8MB/s 
4096  |   2048   |    4   |  1024  |  80.0MB/s 
4096  |   2048   |    1   |  1024  |  ? MB/s 

 

Although it can be set to 1 in unRAID, it will remain at 4, I believe that is the minimum possible setting.

 

Hmmm, didn't know it had a minimum value.  Interesting that 4 looks better than 8, even if only very slightly.

 

I'm running a similar test on my server now, using 7 different nr_requests values of 128, 64, 32, 16, 8, 4, and 1 (perhaps I can set at the command line...), in combination with 4 different sync_thresh values:  sync_window-1, sync_window-8, sync_window-64 and sync_window/2.  That's 28 test points.  I'm repeating it three times, at sync_windows 384, 768 and 1536. 

 

That's 84 total test points.  I should have preview results this evening, and more formal results from a longer test overnight.  Early assessment is that sync_window-64 is looking nice for sync_thresh.

Link to comment

The following are all 1 minute tests, so pretty low accuracy.  Consider this a rough preview.

 

Here's the first round of nr_request and sync_thresh tests, all with sync_window=384:

 
Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |      768    |     384     |     128     |      383    |  47.1 MB/s 
   2  |      768    |     384     |     128     |      376    |  49.8 MB/s 
   3  |      768    |     384     |     128     |      320    | 114.1 MB/s  Fastest
   4  |      768    |     384     |     128     |      192    | 104.5 MB/s 
---------------------------------------------------------------------------
   5  |      768    |     384     |      64     |      383    |  45.5 MB/s 
   6  |      768    |     384     |      64     |      376    |  50.1 MB/s 
   7  |      768    |     384     |      64     |      320    | 112.2 MB/s  Fastest
   8  |      768    |     384     |      64     |      192    | 106.4 MB/s 
---------------------------------------------------------------------------
   9  |      768    |     384     |      32     |      383    |  46.5 MB/s 
  10  |      768    |     384     |      32     |      376    |  50.2 MB/s 
  11  |      768    |     384     |      32     |      320    | 113.6 MB/s  Fastest
  12  |      768    |     384     |      32     |      192    | 103.1 MB/s 
---------------------------------------------------------------------------
  13  |      768    |     384     |      16     |      383    |  80.6 MB/s 
  14  |      768    |     384     |      16     |      376    |  81.9 MB/s 
  15  |      768    |     384     |      16     |      320    | 112.8 MB/s  Fastest
  16  |      768    |     384     |      16     |      192    | 103.7 MB/s 
---------------------------------------------------------------------------
  17  |      768    |     384     |       8     |      383    | 101.2 MB/s 
  18  |      768    |     384     |       8     |      376    | 105.3 MB/s 
  19  |      768    |     384     |       8     |      320    | 112.7 MB/s  Fastest
  20  |      768    |     384     |       8     |      192    | 107.2 MB/s 
---------------------------------------------------------------------------
  21  |      768    |     384     |       4     |      383    | 107.1 MB/s 
  22  |      768    |     384     |       4     |      376    | 110.0 MB/s 
  23  |      768    |     384     |       4     |      320    | 111.2 MB/s  Fastest
  24  |      768    |     384     |       4     |      192    | 105.6 MB/s 
---------------------------------------------------------------------------
  25  |      768    |     384     |       1     |      383    | 109.8 MB/s 
  26  |      768    |     384     |       1     |      376    | 111.1 MB/s 
  27  |      768    |     384     |       1     |      320    | 111.9 MB/s  Fastest
  28  |      768    |     384     |       1     |      192    | 108.7 MB/s 

Fastest vals were nr_reqs=128 and sync_thresh=(sync_window - 64) at 114.1 MB/s

 

Interestingly, regardless of nr_requests value, sync_thresh=(sync_window - 64) always produced the fastest result.

 

 

Second round at sync_window=768:

 Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     1536    |     768     |     128     |      767    |  73.7 MB/s 
   2  |     1536    |     768     |     128     |      760    |  79.3 MB/s 
   3  |     1536    |     768     |     128     |      704    | 111.5 MB/s  Fastest
   4  |     1536    |     768     |     128     |      384    | 108.1 MB/s 
---------------------------------------------------------------------------
   5  |     1536    |     768     |      64     |      767    |  74.8 MB/s 
   6  |     1536    |     768     |      64     |      760    |  78.4 MB/s 
   7  |     1536    |     768     |      64     |      704    | 111.4 MB/s  Fastest
   8  |     1536    |     768     |      64     |      384    | 108.9 MB/s 
---------------------------------------------------------------------------
   9  |     1536    |     768     |      32     |      767    |  73.2 MB/s 
  10  |     1536    |     768     |      32     |      760    |  80.6 MB/s 
  11  |     1536    |     768     |      32     |      704    | 112.4 MB/s  Fastest
  12  |     1536    |     768     |      32     |      384    | 103.9 MB/s 
---------------------------------------------------------------------------
  13  |     1536    |     768     |      16     |      767    | 102.1 MB/s 
  14  |     1536    |     768     |      16     |      760    | 107.4 MB/s 
  15  |     1536    |     768     |      16     |      704    | 111.1 MB/s  Fastest
  16  |     1536    |     768     |      16     |      384    | 108.6 MB/s 
---------------------------------------------------------------------------
  17  |     1536    |     768     |       8     |      767    | 105.2 MB/s 
  18  |     1536    |     768     |       8     |      760    | 106.5 MB/s 
  19  |     1536    |     768     |       8     |      704    | 112.1 MB/s  Fastest
  20  |     1536    |     768     |       8     |      384    | 107.7 MB/s 
---------------------------------------------------------------------------
  21  |     1536    |     768     |       4     |      767    | 110.0 MB/s  Fastest
  22  |     1536    |     768     |       4     |      760    | 108.9 MB/s 
  23  |     1536    |     768     |       4     |      704    | 109.3 MB/s 
  24  |     1536    |     768     |       4     |      384    | 107.8 MB/s 
---------------------------------------------------------------------------
  25  |     1536    |     768     |       1     |      767    | 108.6 MB/s 
  26  |     1536    |     768     |       1     |      760    | 109.4 MB/s 
  27  |     1536    |     768     |       1     |      704    | 109.6 MB/s  Fastest
  28  |     1536    |     768     |       1     |      384    | 107.8 MB/s 

Fastest vals were nr_reqs=32 and sync_thresh=(sync_window - 64) at 112.4 MB/s

 

Except for a single test point (nr_requests=4), once again sync_thresh=(sync_window - 64) was fastest.

 

 

Third round at sync_window=1536:

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     3072    |    1536     |     128     |     1535    |  99.7 MB/s 
   2  |     3072    |    1536     |     128     |     1528    | 114.4 MB/s  Fastest
   3  |     3072    |    1536     |     128     |     1472    | 109.1 MB/s 
   4  |     3072    |    1536     |     128     |      768    | 110.3 MB/s 
---------------------------------------------------------------------------
   5  |     3072    |    1536     |      64     |     1535    | 100.9 MB/s 
   6  |     3072    |    1536     |      64     |     1528    |  99.2 MB/s 
   7  |     3072    |    1536     |      64     |     1472    | 107.9 MB/s 
   8  |     3072    |    1536     |      64     |      768    | 109.8 MB/s  Fastest
---------------------------------------------------------------------------
   9  |     3072    |    1536     |      32     |     1535    |  99.1 MB/s 
  10  |     3072    |    1536     |      32     |     1528    | 104.3 MB/s 
  11  |     3072    |    1536     |      32     |     1472    | 110.3 MB/s 
  12  |     3072    |    1536     |      32     |      768    | 110.4 MB/s  Fastest
---------------------------------------------------------------------------
  13  |     3072    |    1536     |      16     |     1535    | 104.8 MB/s 
  14  |     3072    |    1536     |      16     |     1528    | 103.5 MB/s 
  15  |     3072    |    1536     |      16     |     1472    | 107.8 MB/s 
  16  |     3072    |    1536     |      16     |      768    | 110.5 MB/s  Fastest
---------------------------------------------------------------------------
  17  |     3072    |    1536     |       8     |     1535    | 110.9 MB/s 
  18  |     3072    |    1536     |       8     |     1528    | 110.6 MB/s 
  19  |     3072    |    1536     |       8     |     1472    | 112.3 MB/s 
  20  |     3072    |    1536     |       8     |      768    | 113.9 MB/s  Fastest
---------------------------------------------------------------------------
  21  |     3072    |    1536     |       4     |     1535    | 112.9 MB/s 
  22  |     3072    |    1536     |       4     |     1528    | 115.7 MB/s  Fastest
  23  |     3072    |    1536     |       4     |     1472    | 112.9 MB/s 
  24  |     3072    |    1536     |       4     |      768    | 110.4 MB/s 
---------------------------------------------------------------------------
  25  |     3072    |    1536     |       1     |     1535    | 107.6 MB/s 
  26  |     3072    |    1536     |       1     |     1528    | 110.2 MB/s  Fastest
  27  |     3072    |    1536     |       1     |     1472    | 108.8 MB/s 
  28  |     3072    |    1536     |       1     |      768    | 107.5 MB/s 

Fastest vals were nr_reqs=4 and sync_thresh=(sync_window -  at 115.7 MB/s

 

My server likes higher sync_window values, and it's starting to get into the sweet spot with sync_window set to 1536, which changed up the results quite a bit.  This time, sync_thresh=(sync_window - 64) wasn't fastest at all.  This time it was (sync_window/2) battling (sync_window - 8 ).

 

 

The fourth and final round at sync_window=3072:

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     6144    |    3072     |     128     |     3071    | 117.6 MB/s  Fastest
   2  |     6144    |    3072     |     128     |     3064    | 112.9 MB/s 
   3  |     6144    |    3072     |     128     |     3008    | 116.5 MB/s 
   4  |     6144    |    3072     |     128     |     1536    | 112.0 MB/s 
---------------------------------------------------------------------------
   5  |     6144    |    3072     |      64     |     3071    | 116.0 MB/s  Fastest
   6  |     6144    |    3072     |      64     |     3064    | 102.5 MB/s 
   7  |     6144    |    3072     |      64     |     3008    | 113.4 MB/s 
   8  |     6144    |    3072     |      64     |     1536    | 110.7 MB/s 
---------------------------------------------------------------------------
   9  |     6144    |    3072     |      32     |     3071    | 102.2 MB/s 
  10  |     6144    |    3072     |      32     |     3064    | 105.7 MB/s 
  11  |     6144    |    3072     |      32     |     3008    | 114.1 MB/s  Fastest
  12  |     6144    |    3072     |      32     |     1536    | 111.1 MB/s 
---------------------------------------------------------------------------
  13  |     6144    |    3072     |      16     |     3071    | 118.5 MB/s 
  14  |     6144    |    3072     |      16     |     3064    | 124.0 MB/s  Fastest
  15  |     6144    |    3072     |      16     |     3008    | 121.4 MB/s 
  16  |     6144    |    3072     |      16     |     1536    | 117.1 MB/s 
---------------------------------------------------------------------------
  17  |     6144    |    3072     |       8     |     3071    | 112.6 MB/s  Fastest
  18  |     6144    |    3072     |       8     |     3064    | 112.3 MB/s 
  19  |     6144    |    3072     |       8     |     3008    | 111.5 MB/s 
  20  |     6144    |    3072     |       8     |     1536    | 111.8 MB/s 
---------------------------------------------------------------------------
  21  |     6144    |    3072     |       4     |     3071    | 107.7 MB/s 
  22  |     6144    |    3072     |       4     |     3064    | 108.0 MB/s 
  23  |     6144    |    3072     |       4     |     3008    | 107.0 MB/s 
  24  |     6144    |    3072     |       4     |     1536    | 109.3 MB/s  Fastest
---------------------------------------------------------------------------
  25  |     6144    |    3072     |       1     |     3071    | 109.2 MB/s 
  26  |     6144    |    3072     |       1     |     3064    | 110.1 MB/s  Fastest
  27  |     6144    |    3072     |       1     |     3008    | 109.4 MB/s 
  28  |     6144    |    3072     |       1     |     1536    | 108.7 MB/s 

Fastest vals were nr_reqs=16 and sync_thresh=(sync_window -  at 124.0 MB/s

 

This sync_window is in my server's sweet spot, and now for the first time setting sync_thresh=(sync_window -1 ) finally eeked out a few victories.  Actually, every method of setting sync_thresh got a victory here. 

 

Perhaps once a server gets closer to optimized on sync_window, sync_thresh becomes less important...

 

Also very interesting is that nr_requests=16 produced a dramatically faster result than all other tests.

 

Going to run a long version of this test overnight, with 10 minute tests instead of 1 minute.  It will take about 18 hours.  Let you know tomorrow afternoon how it went.

 

Link to comment

Interesting results, look forward to the normal test results.

 

PS: are you sure nr_requests=1 works?

 

You can check the current value after setting it to 1, for me it never goes lower than 4.

 

cat /sys/block/sdX/queue/nr_requests

 

Now I wish I had bothered to look at this before running the current long test.  I had pulled nr_requests=1 out of my test routine after your previous post, but then decided I would test it at least once and put it back in - adding 2.5 hours to the overall test length.

 

My test routine just hit nr_requests=1, and here's what the system is showing:

 

root@Tower:~# cat /sys/block/sdj/queue/nr_requests
4

 

I'm really surprised, as I set the nr_requests value directly by echoing the value I want to /sys/block/sdX/queue/nr_requests, bypassing the GUI.  I didn't anticipate there would be some process to come behind and "fix" the value after I assigned it.

 

Live and learn.  Good news is we can eliminate nr_requests=1 from the tests.

 

Might as well ask, is 128 truly the upper limit?  I can't look now, the test is running.

Link to comment

Isn't the nr_requests setting actually intended for use with transactional servers that are seeing thousands of data requests per second for both reads and writes with the idea of being able to consolidate and order those requests to minimize head thrashing?  However, at some point apparently, this searching-and-ordering becomes counter-productive as the CPU time spend processing the queue exceeds the disk operation time.

 

Apparently, at some point, someone decided that 128 was a good comprise number to use for the default.  It worked 'OK' for light duty servers and didn't penalize GP computer usage too much.  (I found that some folks doing database re-indexing operations talking about sweet spots of 32768 and 65536...) 

Link to comment

Isn't the nr_requests setting actually intended for use with transactional servers that are seeing thousands of data requests per second for both reads and writes with the idea of being able to consolidate and order those requests to minimize head thrashing?  However, at some point apparently, this searching-and-ordering becomes counter-productive as the CPU time spend processing the queue exceeds the disk operation time.

 

Apparently, at some point, someone decided that 128 was a good comprise number to use for the default.  It worked 'OK' for light duty servers and didn't penalize GP computer usage too much.  (I found that some folks doing database re-indexing operations talking about sweet spots of 32768 and 65536...)

 

So you're thinking nr_requests is somehow related to NCQ?

 

I don't have NCQ enabled on my server, and nr_requests are having a huge impact in performance.  Don't know if that means anything.

Link to comment

The following are all 1 minute tests, so pretty low accuracy.  Consider this a rough preview.

 

Here's the first round of nr_request and sync_thresh tests, all with sync_window=384:

 

<<snip >>

Going to run a long version of this test overnight, with 10 minute tests instead of 1 minute.  It will take about 18 hours.  Let you know tomorrow afternoon how it went.

 

 

My test routine just hit nr_requests=1, and here's what the system is showing:

 

root@Tower:~# cat /sys/block/sdj/queue/nr_requests
4

 

I'm really surprised, as I set the nr_requests value directly by echoing the value I want to /sys/block/sdX/queue/nr_requests, bypassing the GUI.  I didn't anticipate there would be some process to come behind and "fix" the value after I assigned it.

 

Live and learn.  Good news is we can eliminate nr_requests=1 from the tests.

 

Look at all of the result for the lines where nr_requests = 4 and 1 in the tests in the first post quoted.  How do you explain the difference in results when the nr_requests (for when you thought it was 1)  was actually set to 4?

 

Not picking on what you are doing but I am wondering if there are some other variables in the test methodology which have not been considered that are having a big impact on the results that you are getting.  Perhaps, you should run the same test three or four times with the same variables and see if you get consistent results! 

Link to comment
Not picking on what you are doing but I am wondering if there are some other variables in the test methodology which have not been considered that are having a big impact on the results that you are getting.  Perhaps, you should run the same test three or four times with the same variables and see if you get consistent results!

In this same vein, it may be productive to offer the option to take down the ethernet interface for the duration of the test. That way if it suits the user, they could remove the possibility of external influence during testing. Trick would be to only allow it if there was local console access to get back from a bad state.

 

Can you detect in your script if it was called from a local console?

Link to comment

 

Look at all of the result for the lines where nr_requests = 4 and 1 in the tests in the first post quoted.  How do you explain the difference in results when the nr_requests (for when you thought it was 1)  was actually set to 4?

 

Not picking on what you are doing but I am wondering if there are some other variables in the test methodology which have not been considered that are having a big impact on the results that you are getting.  Perhaps, you should run the same test three or four times with the same variables and see if you get consistent results!

 

Low accuracy due to short test lengths.  The exact same test run multiple times with the same parameters will return different results each time when the test length is short.

 

In the 10 minute tests I just completed, the nr_requests=1 is virtually identical to ne_requests=4. 

Link to comment

Not picking on what you are doing but I am wondering if there are some other variables in the test methodology which have not been considered that are having a big impact on the results that you are getting.  Perhaps, you should run the same test three or four times with the same variables and see if you get consistent results!

In this same vein, it may be productive to offer the option to take down the ethernet interface for the duration of the test. That way if it suits the user, they could remove the possibility of external influence during testing. Trick would be to only allow it if there was local console access to get back from a bad state.

 

Can you detect in your script if it was called from a local console?

 

That feature is already available, has been there since day one.  To turn it on, simply unplug your ethernet cable.  ::)

Link to comment
In this same vein, it may be productive to offer the option to take down the ethernet interface for the duration of the test. That way if it suits the user, they could remove the possibility of external influence during testing. Trick would be to only allow it if there was local console access to get back from a bad state.

 

Can you detect in your script if it was called from a local console?

 

That feature is already available, has been there since day one.  To turn it on, simply unplug your ethernet cable.  ::)

Ha! Yes, that is the obvious solution, but not all of us have easily accessible infrastructure. I typically down the server before messing with any of the wiring, just because I don't want to be moving things around with the server up, and it's a pain to get to otherwise. With IPMI, I have a local console anywhere, so downing just the network interface programmaticly is a much easier option.

 

If it's not in the cards, I understand. Just wanted to offer an option to make testing more consistent.

Link to comment

And here's the long results.  Longer than expected, at 20 hours!!!  Each individual test point lasted 10 minutes.

 

NOTE:  I hand calculated the percentages compared to the fastest speed, which came from the last test group below.

 

 

sync_window=384, which is not ideal on my server: 

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |      768    |     384     |     128     |      383    |  51.3 MB/s - 37%
   2  |      768    |     384     |     128     |      376    |  53.1 MB/s - 39%
   3  |      768    |     384     |     128     |      320    | 129.9 MB/s - 95.3% *
   4  |      768    |     384     |     128     |      192    | 117.6 MB/s - 86%
---------------------------------------------------------------------------
   5  |      768    |     384     |      64     |      383    |  51.8 MB/s - 37%
   6  |      768    |     384     |      64     |      376    |  53.2 MB/s - 39%
   7  |      768    |     384     |      64     |      320    | 129.9 MB/s - 95.3% *
   8  |      768    |     384     |      64     |      192    | 121.2 MB/s - 89%
---------------------------------------------------------------------------
   9  |      768    |     384     |      32     |      383    |  51.4 MB/s - 37%
  10  |      768    |     384     |      32     |      376    |  55.4 MB/s - 40%
  11  |      768    |     384     |      32     |      320    | 131.4 MB/s - 96.4% *
  12  |      768    |     384     |      32     |      192    | 119.4 MB/s - 87%
---------------------------------------------------------------------------
  13  |      768    |     384     |      16     |      383    |  88.6 MB/s - 65%
  14  |      768    |     384     |      16     |      376    |  92.5 MB/s - 68%
  15  |      768    |     384     |      16     |      320    | 129.9 MB/s - 95.3% *
  16  |      768    |     384     |      16     |      192    | 116.9 MB/s - 85%
---------------------------------------------------------------------------
  17  |      768    |     384     |       8     |      383    | 117.5 MB/s - 86% 
  18  |      768    |     384     |       8     |      376    | 120.1 MB/s - 88%
  19  |      768    |     384     |       8     |      320    | 131.5 MB/s - 96.5% ***
  20  |      768    |     384     |       8     |      192    | 119.6 MB/s - 88%
---------------------------------------------------------------------------
  21  |      768    |     384     |       4     |      383    | 124.2 MB/s - 91.1%
  22  |      768    |     384     |       4     |      376    | 124.7 MB/s - 91.5%
  23  |      768    |     384     |       4     |      320    | 129.5 MB/s * 95.0%
  24  |      768    |     384     |       4     |      192    | 121.4 MB/s - 89%
---------------------------------------------------------------------------
  25  |      768    |     384     |  1 (act 4)  |      383    | 124.3 MB/s - 91.2%
  26  |      768    |     384     |  1 (act 4)  |      376    | 125.0 MB/s - 91.7%
  27  |      768    |     384     |  1 (act 4)  |      320    | 129.3 MB/s - 94.7% *
  28  |      768    |     384     |  1 (act 4)  |      192    | 121.0 MB/s - 89%

Fastest vals were nr_reqs=8 and sync_thresh=(sync_window - 64) at 131.5 MB/s

 

These results seem to illustrate how nr_requests=8 gained such popularity, not because of how good it is, but rather how bad 128 can be when combined with commonly used sync_window and sync_thresh values, delivering <40% of maximum speeds, wow!!!  But changing nr_requests alone wasn't enough, it needs to be set in conjunction with sync_window and sync_thresh.

 

Noteworthy here is that using sync_thresh=(sync_window-64) produces good results, averaging around 95% of maximum tested speed, regardless of nr_requests.

 

In this test group, it appears that sync_window-1 is equivalent to sync_window-8.

 

Also noteworthy is that nr_requests=4 (and 1, which is really 4 again) produce decent results with all sync_thresh values.

 

 

sync_window=768, which is a little better but still not optimized on my server:

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     1536    |     768     |     128     |      767    |  93.7 MB/s - 69%
   2  |     1536    |     768     |     128     |      760    |  93.5 MB/s - 69%
   3  |     1536    |     768     |     128     |      704    | 132.1 MB/s - 96.9% *
   4  |     1536    |     768     |     128     |      384    | 124.9 MB/s - 91.6%
---------------------------------------------------------------------------
   5  |     1536    |     768     |      64     |      767    |  93.9 MB/s - 69%
   6  |     1536    |     768     |      64     |      760    |  93.3 MB/s - 69%
   7  |     1536    |     768     |      64     |      704    | 132.2 MB/s - 97.0% ***
   8  |     1536    |     768     |      64     |      384    | 125.1 MB/s - 91.8%
---------------------------------------------------------------------------
   9  |     1536    |     768     |      32     |      767    |  89.9 MB/s - 66%
  10  |     1536    |     768     |      32     |      760    |  94.4 MB/s - 69%
  11  |     1536    |     768     |      32     |      704    | 132.0 MB/s - 96.8% *
  12  |     1536    |     768     |      32     |      384    | 125.4 MB/s - 92.0%
---------------------------------------------------------------------------
  13  |     1536    |     768     |      16     |      767    | 125.8 MB/s - 92.3%
  14  |     1536    |     768     |      16     |      760    | 125.8 MB/s - 92.3%
  15  |     1536    |     768     |      16     |      704    | 132.1 MB/s - 96.9% *
  16  |     1536    |     768     |      16     |      384    | 123.9 MB/s - 90.9%
---------------------------------------------------------------------------
  17  |     1536    |     768     |       8     |      767    | 128.2 MB/s - 94.1%
  18  |     1536    |     768     |       8     |      760    | 129.2 MB/s - 94.8%
  19  |     1536    |     768     |       8     |      704    | 132.0 MB/s - 96.8% *
  20  |     1536    |     768     |       8     |      384    | 123.8 MB/s - 90.8%
---------------------------------------------------------------------------
  21  |     1536    |     768     |       4     |      767    | 131.1 MB/s - 96.2% *
  22  |     1536    |     768     |       4     |      760    | 130.1 MB/s - 95.5%
  23  |     1536    |     768     |       4     |      704    | 128.3 MB/s - 94.1%
  24  |     1536    |     768     |       4     |      384    | 128.8 MB/s - 94.5%
---------------------------------------------------------------------------
  25  |     1536    |     768     |   1 (act 4) |      767    | 129.9 MB/s - 95.3%
  26  |     1536    |     768     |   1 (act 4) |      760    | 130.5 MB/s - 95.7% *
  27  |     1536    |     768     |   1 (act 4) |      704    | 128.1 MB/s - 94.0%
  28  |     1536    |     768     |   1 (act 4) |      384    | 128.9 MB/s - 94.6%

Fastest vals were nr_reqs=64 and sync_thresh=(sync_window - 64) at 132.2 MB/s

 

In general, all results are getting faster as the md_sync_window approaches optimization.

 

Again, using sync_thresh=(sync_window-64) produces good results, averaging around 96% of maximum tested speed, regardless of nr_requests.

 

Still appears that sync_window-1 is equivalent to sync_window-8.

 

Also noteworthy is that nr_requests=4 (and 1, which is really 4 again) produce decent results with all sync_thresh values.

 

 

sync_window=1536, on the fringe of my server's sweet zone:

 Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     3072    |    1536     |     128     |     1535    | 124.5 MB/s - 91.3%
   2  |     3072    |    1536     |     128     |     1528    | 127.6 MB/s - 93.6%
   3  |     3072    |    1536     |     128     |     1472    | 133.3 MB/s - 97.8% *
   4  |     3072    |    1536     |     128     |      768    | 130.2 MB/s - 95.5%
---------------------------------------------------------------------------
   5  |     3072    |    1536     |      64     |     1535    | 124.9 MB/s - 91.6%
   6  |     3072    |    1536     |      64     |     1528    | 127.9 MB/s - 93.8%
   7  |     3072    |    1536     |      64     |     1472    | 133.6 MB/s - 98.0% ***
   8  |     3072    |    1536     |      64     |      768    | 128.2 MB/s - 94.1%
---------------------------------------------------------------------------
   9  |     3072    |    1536     |      32     |     1535    | 124.8 MB/s - 91.6%
  10  |     3072    |    1536     |      32     |     1528    | 129.8 MB/s - 95.2%
  11  |     3072    |    1536     |      32     |     1472    | 133.6 MB/s - 98.0% ***
  12  |     3072    |    1536     |      32     |      768    | 130.3 MB/s - 95.6%
---------------------------------------------------------------------------
  13  |     3072    |    1536     |      16     |     1535    | 132.2 MB/s - 97.0%
  14  |     3072    |    1536     |      16     |     1528    | 132.4 MB/s - 97.1%
  15  |     3072    |    1536     |      16     |     1472    | 133.5 MB/s - 97.9% *
  16  |     3072    |    1536     |      16     |      768    | 127.9 MB/s - 93.8%
---------------------------------------------------------------------------
  17  |     3072    |    1536     |       8     |     1535    | 132.6 MB/s - 97.3%
  18  |     3072    |    1536     |       8     |     1528    | 132.8 MB/s - 97.4% *
  19  |     3072    |    1536     |       8     |     1472    | 132.8 MB/s - 97.4% *
  20  |     3072    |    1536     |       8     |      768    | 130.2 MB/s - 95.5%
---------------------------------------------------------------------------
  21  |     3072    |    1536     |       4     |     1535    | 129.7 MB/s - 95.2%
  22  |     3072    |    1536     |       4     |     1528    | 129.8 MB/s - 95.2% *
  23  |     3072    |    1536     |       4     |     1472    | 129.6 MB/s - 95.1%
  24  |     3072    |    1536     |       4     |      768    | 129.2 MB/s - 94.8%
---------------------------------------------------------------------------
  25  |     3072    |    1536     |   1 (act 4) |     1535    | 129.4 MB/s - 94.9%
  26  |     3072    |    1536     |   1 (act 4) |     1528    | 130.8 MB/s - 96.0% *
  27  |     3072    |    1536     |   1 (act 4) |     1472    | 129.6 MB/s - 95.1%
  28  |     3072    |    1536     |   1 (act 4) |      768    | 129.9 MB/s - 95.3%

Fastest vals were nr_reqs=64 and sync_thresh=(sync_window - 64) at 133.6 MB/s

 

All results are still getting faster as the md_sync_window approaches optimization.

 

Again, using sync_thresh=(sync_window-64) produces good results, averaging around 96% of maximum tested speed, regardless of nr_requests.

 

Still appears that sync_window-1 is mostly equivalent to sync_window-8, though in a few cases sync_window-8 appears a little better.

 

And again, nr_requests=4 (and 1, which is really 4 again) produced good results with all sync_thresh values, but this time so did 8 and 16, and on average both did better than 4.

 

 

sync_window=3072, finally into my server's sweet zone, and the results prove it:

 Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     6144    |    3072     |     128     |     3071    | 130.4 MB/s - 95.7%
   2  |     6144    |    3072     |     128     |     3064    | 132.3 MB/s - 97.1%
   3  |     6144    |    3072     |     128     |     3008    | 136.3 MB/s - 100% ***FASTEST***
   4  |     6144    |    3072     |     128     |     1536    | 130.9 MB/s - 96.0%
---------------------------------------------------------------------------
   5  |     6144    |    3072     |      64     |     3071    | 129.7 MB/s - 95.2%
   6  |     6144    |    3072     |      64     |     3064    | 132.1 MB/s - 96.9%
   7  |     6144    |    3072     |      64     |     3008    | 136.2 MB/s - 99.9% *
   8  |     6144    |    3072     |      64     |     1536    | 130.6 MB/s - 95.8%
---------------------------------------------------------------------------
   9  |     6144    |    3072     |      32     |     3071    | 130.2 MB/s - 95.5%
  10  |     6144    |    3072     |      32     |     3064    | 133.7 MB/s - 98.1%
  11  |     6144    |    3072     |      32     |     3008    | 136.2 MB/s - 99.9% *
  12  |     6144    |    3072     |      32     |     1536    | 131.0 MB/s - 96.1%
---------------------------------------------------------------------------
  13  |     6144    |    3072     |      16     |     3071    | 134.9 MB/s - 99.0% *
  14  |     6144    |    3072     |      16     |     3064    | 135.0 MB/s - 99.0% *
  15  |     6144    |    3072     |      16     |     3008    | 134.8 MB/s - 98.9%
  16  |     6144    |    3072     |      16     |     1536    | 134.4 MB/s - 98.6%
---------------------------------------------------------------------------
  17  |     6144    |    3072     |       8     |     3071    | 132.9 MB/s - 97.5% *
  18  |     6144    |    3072     |       8     |     3064    | 132.8 MB/s - 97.4%
  19  |     6144    |    3072     |       8     |     3008    | 132.9 MB/s - 97.5% *
  20  |     6144    |    3072     |       8     |     1536    | 132.8 MB/s - 97.4%
---------------------------------------------------------------------------
  21  |     6144    |    3072     |       4     |     3071    | 130.7 MB/s - 95.9% *
  22  |     6144    |    3072     |       4     |     3064    | 129.8 MB/s - 95.2%
  23  |     6144    |    3072     |       4     |     3008    | 129.6 MB/s - 95.1%
  24  |     6144    |    3072     |       4     |     1536    | 128.3 MB/s - 94.1%
---------------------------------------------------------------------------
  25  |     6144    |    3072     |   1 (act 4) |     3071    | 129.7 MB/s - 95.2% *
  26  |     6144    |    3072     |   1 (act 4) |     3064    | 129.5 MB/s - 95.0%
  27  |     6144    |    3072     |   1 (act 4) |     3008    | 129.5 MB/s - 95.0%
  28  |     6144    |    3072     |   1 (act 4) |     1536    | 129.5 MB/s - 95.0%

Fastest vals were nr_reqs=128 and sync_thresh=(sync_window - 64) at 136.3 MB/s

 

We see a jump of 3 MB/s now that the md_sync_window is in the optimized region for my server.

 

Pretty much undefeated, using sync_thresh=(sync_window-64) produces great results, averaging about 98% of maximum tested speed, regardless of nr_requests value, plus it produced the 3 fastest measured speeds during the entire test.

 

Still appears that sync_window-1 is mostly equivalent to sync_window-8, and even turned in a great result at nr_requests=16.

 

Actually, nr_requests=16 made every sync_thresh value look great here.  8 and 4 did okay.  8 was interesting in that almost every result was identical.

 

Accuracy:

Since nr_requests=1 was really just a retest of nr_requests=4, you can get a feel for how accurate these results are by comparing them.  Most of the time they are within a few points of each other, worst case was a 1.2 MB/s variance.  That's a variance of less than 1%, so these are pretty accurate results at 10 minutes.

 

This test was too long, and has too much fat in it. 

 

Obvious fat to cut is nr_requests=1, which unRAID overrides to 4.  I also don't see much value in nr_requests=4, 16, or 64.  I think 8, 32 and 128 are more than representative.  Or perhaps 8, 16 and 128, since 16 did really well once we got into a sync_window optimized region.

 

I also don't see any value in sync_thresh=sync_window-1.  sync_window-8 and sync_window/2 are questionable in light of sync_window-64's excellence, but this might just be on my server, so for now I think I will keep them.

 

If I cut the fat above, that would bring this overall test down to 36 test points, compared to the 112 I just ran.  It would complete in about 6 hours.

Link to comment

I also don't see any value in sync_thresh=sync_window-1.  sync_window-8 and sync_window/2 are questionable in light of sync_window-64's excellence, but this might just be on my server, so for now I think I will keep them.

 

Agree, these settings may be optimal for some servers, and while I didn't have much time this week and intend to do more testing with different controllers later, all results point to a optimal setting that is usually a little below sync_window, just not sure if there is a set value, like -60 that is optimal for everything, ideally the script would run a short test in the beginning to try and find it.

 

I retested a single LSI9211 with larger (and faster) SSDs in the hopes of seeing better defined results, and while they are, the optimal thresh value changed from the previous tests (previous tests where done with 2 controllers at the same time, so maybe why the difference, but I don't have 16 of largest SSDs to test with both again, and using only one controller with the smallest SSDs won't help either because results will be limited by their max speed in almost all tests)

 

Sync_window=2048

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  2047  |  289.7MB/s
4096  |   2048   |  128   |  2040  |  321.7MB/s
4096  |   2048   |  128   |  2036  |  335.2MB/s
4096  |   2048   |  128   |  2032  |  337.0MB/s
4096  |   2048   |  128   |  2028  |  340.5MB/s
4096  |   2048   |  128   |  2024  |  333.5MB/s
4096  |   2048   |  128   |  2016  |  330.0MB/s   
4096  |   2048   |  128   |  1984  |  330.0MB/s   
4096  |   2048   |  128   |  1960  |  330.0MB/s
4096  |   2048   |  128   |  1952  |  330.0MB/s
4096  |   2048   |  128   |  1920  |  330.0MB/s
4096  |   2048   |  128   |  1856  |  325.0MB/s
4096  |   2048   |  128   |  1792  |  326.6MB/s
4096  |   2048   |  128   |  1536  |  323.3MB/s
4096  |   2048   |  128   |  1280  |  320.1MB/s
4096  |   2048   |  128   |  1024  |  314.4MB/s 

 

Same sync_window but nr_requests=8 for the 4 fastest results (like before, looks like it doesn't make a big difference with LSI controllers)

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |    8   |  2036  |  337.0MB/s
4096  |   2048   |    8   |  2032  |  340.5MB/s
4096  |   2048   |    8   |  2028  |  340.5MB/s
4096  |   2048   |    8   |  2024  |  335.2MB/s

 

Sync_window=1024 and nr_request back to default:

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
2048  |   1024   |  128   |  1023  |  293.7MB/s
2048  |   1024   |  128   |  1016  |  328.3MB/s
2048  |   1024   |  128   |  1012  |  331.7MB/s
2048  |   1024   |  128   |  1008  |  333.5MB/s
2048  |   1024   |  128   |  1004  |  337.0MB/s
2048  |   1024   |  128   |  1000  |  325.0MB/s
2048  |   1024   |  128   |   996  |  316.9MB/s

 

Sync_window=3072

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
6144  |   3072   |  128   |  3071  |  295.0MB/s
6144  |   3072   |  128   |  3064  |  321.7MB/s
6144  |   3072   |  128   |  3056  |  335.2MB/s
6144  |   3072   |  128   |  3052  |  337.0MB/s
6144  |   3072   |  128   |  3048  |  333.5MB/s
6144  |   3072   |  128   |  3040  |  333.5MB/s
6144  |   3072   |  128   |  3032  |  331.7MB/s
6144  |   3072   |  128   |  3024  |  331.7MB/s
6144  |   3072   |  128   |  3016  |  326.6MB/s

 

Best results were always with a thresh=sync_window-20, previous tests with 2 controllers best setting for thresh was sync_window-60.

Link to comment

Best results were always with a thresh=sync_window-20, previous tests with 2 controllers best setting for thresh was sync_window-60.

 

Well, that's disappointing, in that the optimum sync_thresh assignment changes with controllers/drives.  Means more exhaustive testing is required.

 

Perhaps what would be better is a multi-staged approach. 

 

1st)  Run the nr_requests test at various md_sync_windows to determine if nr_requests should be 128, 16 or 8 for the next tests, and determine if sync_thresh should be -8, -64, or /2.  This test would also identify whether md_sync_window testing should be in the low, medium or high range.  This test would take about 6 hours.

 

2nd) Run the md_sync_window test using the nr_requests & sync_thresh values identified in the first test.  This test would identify the best md_sync_window.  Not sure at the moment, but this test should take 3-6 hours.

 

3rd) Run an enhanced nr_requests/sync_thresh test using the md_sync_window from the second test.  Because it's only testing a single md_sync_window, we can run a lot more tests at various nr_requests & sync_thresh values, to hone in on what's best.  This test could take 6-10 hours, depending up how thorough we make it.

 

Based upon your results, if I had to take a guess the above strategy would play out as follows:

 

1)  The first test:  nr_requests=8 and sync_thresh=sync_window-8 would probably come out on top, and with md_sync_window values in the high range.  While obviously not the fastest, these should be good enough to find the best md_sync_window in test 2.

 

2)  The second test would focus on md_sync_windows in the high range (say 1024-3072), and would find something in the neighborhood of 2048 works best.

 

3)  The third test would focus on md_sync_window=2048, and try all the potential sync_thresh (-1, -4 to -64 in increments of 4, and /2) and nr_requests (128, 16, 8) values, finding that sync_thresh=-20 and whatever nr_requests works best.

 

Thoughts?

 

 

Link to comment

Perhaps what would be better is a multi-staged approach. 

 

1st)  Run the nr_requests test at various md_sync_windows to determine if nr_requests should be 128, 16 or 8 for the next tests, and determine if sync_thresh should be -8, -64, or /2.  This test would also identify whether md_sync_window testing should be in the low, medium or high range.  This test would take about 6 hours.

 

 

How about using the first test to find only sync_window and sync_thresh?

 

Looks to me like with nr_requests set at default there's a better chance of finding the optimal sync_thresh, also looks like the best sync_thresh is the same (or in the case of your last test, practically the same) with the various nr_request values, so after finding the optimal window and thresh values you could do a test on those changing only nr_requests, I believe this would be faster and provide better results than trying to find optimal values for all 3 settings at the same time.

Link to comment

Perhaps what would be better is a multi-staged approach. 

 

1st)  Run the nr_requests test at various md_sync_windows to determine if nr_requests should be 128, 16 or 8 for the next tests, and determine if sync_thresh should be -8, -64, or /2.  This test would also identify whether md_sync_window testing should be in the low, medium or high range.  This test would take about 6 hours.

 

 

How about using the first test to find only sync_window and sync_thresh?

 

Looks to me like with nr_requests set at default there's a better chance of finding the optimal sync_thresh, also looks like the best sync_thresh is the same (or in the case of your last test, practically the same) with the various nr_request values, so after finding the optimal window and thresh values you could do a test on those changing only nr_requests, I believe this would be faster and provide better results than trying to find optimal values for all 3 settings at the same time.

 

You're absolutely right.  I like it.

Link to comment

Fleshing out the testing strategy, on my server Test 1 would look like this:

 

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |      768    |     384     |     128     |      376    |  53.1 MB/s 
   2  |      768    |     384     |     128     |      320    | 129.9 MB/s
   3  |      768    |     384     |     128     |      192    | 117.6 MB/s 
   4  |     1536    |     768     |     128     |      760    |  93.5 MB/s 
   5  |     1536    |     768     |     128     |      704    | 132.1 MB/s 
   6  |     1536    |     768     |     128     |      384    | 124.9 MB/s 
   7  |     3072    |    1536     |     128     |     1528    | 127.6 MB/s 
   8  |     3072    |    1536     |     128     |     1472    | 133.3 MB/s 
   9  |     3072    |    1536     |     128     |      768    | 130.2 MB/s 
  10  |     6144    |    3072     |     128     |     3064    | 132.3 MB/s 
  11  |     6144    |    3072     |     128     |     3008    | 136.3 MB/s 
  12  |     6144    |    3072     |     128     |     1536    | 130.9 MB/s 

Fastest vals were sync_window=3072 sync_thresh=(sync_window - 64) at 136.3 MB/s

 

Because we're only testing nr_requests=128, the number of test points is dramatically reduced.  With 10-minute length tests, this completes in 2 hours.

 

On my server, Test 2 then runs in the very high range from sync_window 1536 to 3456 in increments of 64, using fastest sync_thresh method from test 1.  This test is also run at nr_requests=128.  That's 30 test points, for a running time of 5 hours.

 

Hypothetically, Test 2 finds that sync_window=3072 is the leading edge of the peak.

 

Finally, Test 3 runs the another nr_requests test, this time at sync_window=3072, and testing nr_requests=128/16/8, and sync_thresh=-1, -4 to -64 in increments of 4, and /2.

 

This still feels a little excessive, at 9 hours for test 3.  16 hours would be the total running time for tests 1-3.

 

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
   1  |     6144    |    3072     |     128     |     3071    | 130.4 MB/s 
   2  |     6144    |    3072     |     128     |     3068    | 130.4 MB/s 
   3  |     6144    |    3072     |     128     |     3064    | 132.3 MB/s 
   4  |     6144    |    3072     |     128     |     3060    | 132.3 MB/s 
   5  |     6144    |    3072     |     128     |     3056    | 132.3 MB/s 
   6  |     6144    |    3072     |     128     |     3052    | 132.3 MB/s 
   7  |     6144    |    3072     |     128     |     3048    | 132.3 MB/s 
   8  |     6144    |    3072     |     128     |     3044    | 132.3 MB/s 
   9  |     6144    |    3072     |     128     |     3040    | 132.3 MB/s 
  10  |     6144    |    3072     |     128     |     3036    | 132.3 MB/s 
  11  |     6144    |    3072     |     128     |     3032    | 132.3 MB/s 
  12  |     6144    |    3072     |     128     |     3028    | 132.3 MB/s 
  13  |     6144    |    3072     |     128     |     3024    | 132.3 MB/s 
  14  |     6144    |    3072     |     128     |     3020    | 132.3 MB/s 
  15  |     6144    |    3072     |     128     |     3016    | 132.3 MB/s 
  16  |     6144    |    3072     |     128     |     3012    | 132.3 MB/s 
  17  |     6144    |    3072     |     128     |     3008    | 136.3 MB/s 
  18  |     6144    |    3072     |     128     |     1536    | 130.9 MB/s
---------------------------------------------------------------------------
  19  |     6144    |    3072     |      16     |     3071    | 130.4 MB/s 
  20  |     6144    |    3072     |      16     |     3068    | 130.4 MB/s 
  21  |     6144    |    3072     |      16     |     3064    | 132.3 MB/s 
  22  |     6144    |    3072     |      16     |     3060    | 132.3 MB/s 
  23  |     6144    |    3072     |      16     |     3056    | 132.3 MB/s 
  24  |     6144    |    3072     |      16     |     3052    | 132.3 MB/s 
  25  |     6144    |    3072     |      16     |     3048    | 132.3 MB/s 
  26  |     6144    |    3072     |      16     |     3044    | 132.3 MB/s 
  27  |     6144    |    3072     |      16     |     3040    | 132.3 MB/s 
  28  |     6144    |    3072     |      16     |     3036    | 132.3 MB/s 
  29  |     6144    |    3072     |      16     |     3032    | 132.3 MB/s 
  30  |     6144    |    3072     |      16     |     3028    | 132.3 MB/s 
  31  |     6144    |    3072     |      16     |     3024    | 132.3 MB/s 
  32  |     6144    |    3072     |      16     |     3020    | 132.3 MB/s 
  33  |     6144    |    3072     |      16     |     3016    | 132.3 MB/s 
  34  |     6144    |    3072     |      16     |     3012    | 132.3 MB/s 
  35  |     6144    |    3072     |      16     |     3008    | 136.3 MB/s 
  36  |     6144    |    3072     |      16     |     1536    | 130.9 MB/s
---------------------------------------------------------------------------
  37  |     6144    |    3072     |       8     |     3071    | 130.4 MB/s 
  38  |     6144    |    3072     |       8     |     3068    | 130.4 MB/s 
  39  |     6144    |    3072     |       8     |     3064    | 132.3 MB/s 
  40  |     6144    |    3072     |       8     |     3060    | 132.3 MB/s 
  41  |     6144    |    3072     |       8     |     3056    | 132.3 MB/s 
  42  |     6144    |    3072     |       8     |     3052    | 132.3 MB/s 
  43  |     6144    |    3072     |       8     |     3048    | 132.3 MB/s 
  44  |     6144    |    3072     |       8     |     3044    | 132.3 MB/s 
  45  |     6144    |    3072     |       8     |     3040    | 132.3 MB/s 
  46  |     6144    |    3072     |       8     |     3036    | 132.3 MB/s 
  47  |     6144    |    3072     |       8     |     3032    | 132.3 MB/s 
  48  |     6144    |    3072     |       8     |     3028    | 132.3 MB/s 
  49  |     6144    |    3072     |       8     |     3024    | 132.3 MB/s 
  50  |     6144    |    3072     |       8     |     3020    | 132.3 MB/s 
  51  |     6144    |    3072     |       8     |     3016    | 132.3 MB/s 
  52  |     6144    |    3072     |       8     |     3012    | 132.3 MB/s 
  53  |     6144    |    3072     |       8     |     3008    | 136.3 MB/s 
  54  |     6144    |    3072     |       8     |     1536    | 130.9 MB/s

Fastest vals were nr_reqs=128 and sync_thresh=(sync_window - 64) at 136.3 MB/s

 

 

 

Link to comment

16 hours is not that long for a test that most people only need to run once, but I wonder if running the nr_requests test before or after test 2 wouldn't give similar results in less time, i.e., after test 1 test nr_requests 8/16/128 with the best result from test 1, use the better result from then on, or as an alternative, after test 2, this way the last test would be done with a single nr_requests.

Link to comment

16 hours is not that long for a test that most people only need to run once, but I wonder if running the nr_requests test before or after test 2 wouldn't give similar results in less time, i.e., after test 1 test nr_requests 8/16/128 with the best result from test 1, use the better result from then on, or as an alternative, after test 2, this way the last test would be done with a single nr_requests.

 

The problem with that approach is that, on my server at optimum md_sync_window:

  • nr_requests=128 worked best at sync_thresh=-64
  • nr_requests=16 worked best at sync_thresh=-8
  • nr_requests=8 worked best at both sync_thresh=-1 and -64

 

If I only test for one, I might not find the right combination of sync_thresh vs nr_requests.  I have no way of accurately predicting which nr_requests is going to be fastest, nor which sync_thresh is going to be fastest.  I think I have to test all the combos to see which one wins.

 

And on my server at optimum md_sync_window, from the best to the worst sync_thresh&nr_requests combo there was a 6% variance.

 

Hypothetically, if Test 1 found that sync_thresh=-8 was best, and that was used for Test 2, and then again for Test 3 to find the best nr_requests value, it would have come back as nr_requests=16.  Then in Test 4 I would have tested all values for nr_requests=16 and found that sync_thresh=-8 is best.  The problem is that nr_requests=128 with sync_thresh=-64 was actually best, but got excluded from the last round of testing because nr_requests=128 didn't test well at sync_thresh=-8.

Link to comment

Just an observation, from a lazy observer, feeling too lazy to properly study every one of the long reports here - I can't see any point in testing with nr_requests=128.  I did a cursory search and never saw a test where nr_requests=128 was better (beyond margin of error) than nr_requests=8 or 16.  Even at 128's best, 8 or 16 was essentially the same.

 

I think I would test at 16 only, and determine the best of the other numbers, then at the end just 2 more tests, both 8 and 32, and pick the best of these 3.  If you wish and 32 is better, then you could test 128.  Or if 8 is better, then test 4.  But I think you're essentially done once you've picked between 8, 16, and 32.  Just my opinion, based on less experience than either of you.  (Or you can do all tests at 8 only, then test upward at the end.)

Link to comment

Just an observation, from a lazy observer, feeling too lazy to properly study every one of the long reports here - I can't see any point in testing with nr_requests=128.  I did a cursory search and never saw a test where nr_requests=128 was better (beyond margin of error) than nr_requests=8 or 16.  Even at 128's best, 8 or 16 was essentially the same.

 

I think I would test at 16 only, and determine the best of the other numbers, then at the end just 2 more tests, both 8 and 32, and pick the best of these 3.  If you wish and 32 is better, then you could test 128.  Or if 8 is better, then test 4.  But I think you're essentially done once you've picked between 8, 16, and 32.  Just my opinion, based on less experience than either of you.  (Or you can do all tests at 8 only, then test upward at the end.)

 

For both johnnie.black and myself, nr_requests=128 produced the fastest result, but only when properly tuned with the other values.

 

Looking at my results, using nr_requests=128 or 16 would have produced nearly identical results, and based upon 128 providing better ultimate performance I would probably stick with 128. 

 

Going as low as nr_requests=8 might have come up short in finding the best md_sync_window range, though, as it seems to both peak and flatline a bit early, but with ultimately lower performance than 128.  This is easier to see if you look at nr_requests=4, which peaks early at sync_window=768 and stays flat at higher values, never exceeding 96% of peak performance.

 

nr_requests=8 peaked at early 1536 and stayed flat from there, never exceeding 97.5% of peak performance.

 

Both nr_requests=16 and 128 peaked at 3072, with 16 providing 99% of peak performance, and 128 providing 100%.

 

Actually, looks like 32, 64 and 128 all peaked the same, and followed very similar curves.  I guess I could be convinced that 32 might be a nice middle of the road approach that is known to mimic 128, but may provide better tracking on some systems for which 128 never works well.

Link to comment

Sorry for breaking in, not my intention to hijack this thread, but I may have something of interest to you guys while you are running your performance tests.

 

I have made a update to the GUI which allows real-time monitoring of disk read and write performance. For the moment it is just a test and available as a plugin, so it can be uninstalled easily.  ;)

 

See the attached picture for the explanation.

 

At the top right a "button" is added which let the user toggle between read/write counters and read/write speed. Each disk is reported separately and when display refresh settings are set to "real-time" it will show the disk speeds in ... real-time.

 

Use the button to switch between views while at the main page.

 

I would be interested to know if the displayed speeds (they are taken from /proc/diskstats) correspond with your performance measurements. Or maybe the displayed speeds can help you in further tuning the tool.

 

The plugin can be installed by copy/paste URL: https://raw.githubusercontent.com/bergware/dynamix/master/unRAIDv6/dynamix.disk.io.plg in the plugin manager (requires unRAID 6.2rc4).

diskio.png.fa63a4457ff64bce48138af4079f1dc5.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.