unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

Normal test finished on bare metal

 


       unRAID Tunables Tester v4.0b3 by Pauven (for unRAID v6.2)

        Tunables Report produced Fri Aug 26 22:16:52 CDT 2016

                         Run on server: nas

                  Normal Automatic Parity Sync Test


Current Values:  md_num_stripes=1040, md_sync_window=520, md_sync_thresh=260
                 Global nr_requests=8
                    sdq nr_requests=8
                    sdp nr_requests=8
                    sdo nr_requests=8
                    sdn nr_requests=8
                    sdj nr_requests=8
                    sdm nr_requests=8
                    sdl nr_requests=8
                    sdk nr_requests=8
                    sdi nr_requests=8
                    sdh nr_requests=8
                    sdf nr_requests=8
                    sde nr_requests=8
                    sdg nr_requests=8


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 5min Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s 
-------------------------------------------------------
   1  |  59 |   1040  |   520  |   8  |   260  | 146.2 

--- FULLY AUTOMATIC nr_requests TEST 1 (4 Sample Points @ 10min Duration)---

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
  1   |     1536    |     768     |     128     |      767    |  62.0 MB/s 
  2   |     1536    |     768     |     128     |      384    |  79.2 MB/s 
  3   |     1536    |     768     |       8     |      767    |  79.6 MB/s 
  4   |     1536    |     768     |       8     |      384    | 105.8 MB/s 

Fastest vals were nr_reqs=8 and sync_thresh=50% of sync_window at 105.8 MB/s

This nr_requests value will be used for the next test.


--- FULLY AUTOMATIC TEST PASS 1a (Rough - 13 Sample Points @ 5min Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  43 |    768  |   384  |   8  |   383  |  92.0 |   192  | 149.8 
   2  |  51 |    896  |   448  |   8  |   447  | 108.0 |   224  |  81.3 
   3  |  58 |   1024  |   512  |   8  |   511  |  77.1 |   256  | 133.1 
   4  |  65 |   1152  |   576  |   8  |   575  | 102.6 |   288  | 149.3 
   5  |  73 |   1280  |   640  |   8  |   639  |  82.1 |   320  |  82.9 
   6  |  80 |   1408  |   704  |   8  |   703  |  78.9 |   352  |  98.0 
   7  |  87 |   1536  |   768  |   8  |   767  | 131.3 |   384  | 125.8 
   8  |  95 |   1664  |   832  |   8  |   831  |  89.4 |   416  |  75.9 
   9  | 102 |   1792  |   896  |   8  |   895  |  75.8 |   448  | 102.8 
  10  | 109 |   1920  |   960  |   8  |   959  |  89.9 |   480  |  94.2 
  11  | 117 |   2048  |  1024  |   8  |  1023  |  93.8 |   512  |  79.8 
  12  | 124 |   2176  |  1088  |   8  |  1087  | 132.4 |   544  | 132.8 
  13  | 131 |   2304  |  1152  |   8  |  1151  | 125.5 |   576  | 108.0 

--- FULLY AUTOMATIC TEST PASS 1b (Rough - 5 Sample Points @ 5min Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |   7 |    128  |    64  |   8  |    63  |  71.8 |    32  | 111.1 
   2  |  14 |    256  |   128  |   8  |   127  | 114.6 |    64  | 102.2 
   3  |  21 |    384  |   192  |   8  |   191  |  63.9 |    96  |  84.6 
   4  |  29 |    512  |   256  |   8  |   255  |  87.2 |   128  |  74.2 
   5  |  36 |    640  |   320  |   8  |   319  |  83.2 |   160  |  59.4 

--- Targeting Fastest Result of md_sync_window 384 bytes for Final Pass ---


--- FULLY AUTOMATIC nr_requests TEST 2 (4 Sample Points @ 10min Duration)---

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
  1   |      768    |     384     |     128     |      383    |  65.8 MB/s 
  2   |      768    |     384     |     128     |      192    |  72.3 MB/s 
  3   |      768    |     384     |       8     |      383    |  69.4 MB/s 
  4   |      768    |     384     |       8     |      192    |  96.4 MB/s 

Fastest vals were nr_reqs=8 and sync_thresh=50% of sync_window at 96.4 MB/s

This nr_requests value will be used for the next test.



--- FULLY AUTOMATIC TEST PASS 2 (Fine - 33 Sample Points @ 5min Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  29 |    512  |   256  |   8  |   255  | 104.5 |   128  | 134.5 
   2  |  30 |    528  |   264  |   8  |   263  |  72.6 |   132  |  69.4 
   3  |  31 |    544  |   272  |   8  |   271  |  97.5 |   136  | 121.8 
   4  |  32 |    560  |   280  |   8  |   279  |  99.8 |   140  |  84.3 
   5  |  32 |    576  |   288  |   8  |   287  |  57.6 |   144  |  54.9 
   6  |  33 |    592  |   296  |   8  |   295  |  72.1 |   148  |  36.6 
   7  |  34 |    608  |   304  |   8  |   303  |  60.2 |   152  |  27.2 
   8  |  35 |    624  |   312  |   8  |   311  |  59.4 |   156  |  27.3 
   9  |  36 |    640  |   320  |   8  |   319  |  54.9 |   160  |  27.6 
  10  |  37 |    656  |   328  |   8  |   327  |  42.5 |   164  |  28.8 
  11  |  38 |    672  |   336  |   8  |   335  |  34.8 |   168  |  24.4 
  12  |  39 |    688  |   344  |   8  |   343  |  35.2 |   172  |  35.2 
  13  |  40 |    704  |   352  |   8  |   351  |  62.9 |   176  |  32.9 
  14  |  41 |    720  |   360  |   8  |   359  |  63.6 |   180  |  36.3 
  15  |  42 |    736  |   368  |   8  |   367  |  71.6 |   184  |  66.2 
  16  |  43 |    752  |   376  |   8  |   375  |  76.5 |   188  |  35.0 
  17  |  43 |    768  |   384  |   8  |   383  |  52.9 |   192  |  30.8 
  18  |  44 |    784  |   392  |   8  |   391  |  62.3 |   196  |  33.5 
  19  |  45 |    800  |   400  |   8  |   399  |  73.2 |   200  |  38.5 
  20  |  46 |    816  |   408  |   8  |   407  |  62.1 |   204  |  56.6 
  21  |  47 |    832  |   416  |   8  |   415  |  75.6 |   208  |  55.7 
  22  |  48 |    848  |   424  |   8  |   423  |  58.2 |   212  |  46.9 
  23  |  49 |    864  |   432  |   8  |   431  |  78.1 |   216  |  54.9 
  24  |  50 |    880  |   440  |   8  |   439  |  67.2 |   220  |  49.0 
  25  |  51 |    896  |   448  |   8  |   447  |  62.3 |   224  |  58.9 
  26  |  52 |    912  |   456  |   8  |   455  |  77.5 |   228  | 146.1 
  27  |  53 |    928  |   464  |   8  |   463  |  97.8 |   232  |  81.0 
  28  |  54 |    944  |   472  |   8  |   471  |  67.4 |   236  |  56.9 
  29  |  54 |    960  |   480  |   8  |   479  |  83.1 |   240  |  54.4 
  30  |  55 |    976  |   488  |   8  |   487  |  66.9 |   244  |  38.1 
  31  |  56 |    992  |   496  |   8  |   495  |  64.0 |   248  |  58.5 
  32  |  57 |   1008  |   504  |   8  |   503  |  79.0 |   252  |  54.0 
  33  |  58 |   1024  |   512  |   8  |   511  |  74.8 |   256  |  69.2 

The results below do NOT include the Basline test of current values.

The Fastest Sync Speed tested was md_sync_window=576 at 149.3 MB/s
     Tunable (md_num_stripes): 1152
     Tunable (md_sync_window): 576
     Tunable (md_sync_thresh): 288
     Tunable (nr_requests): 8
This will consume 65 MB with md_num_stripes=1152, 2x md_sync_window.
This is 6MB more than your current utilization of 59MB.

The Thriftiest Sync Speed tested was md_sync_window=456 at 146.1 MB/s
     Tunable (md_num_stripes): 912
     Tunable (md_sync_window): 456
     Tunable (md_sync_thresh): 228
     Tunable (nr_requests): 8
This will consume 52 MB with md_num_stripes=912, 2x md_sync_window.
This is 7MB less than your current utilization of 59MB.

The Recommended Sync Speed is md_sync_window=576 at 149.3 MB/s
     Tunable (md_num_stripes): 1152
     Tunable (md_sync_window): 576
     Tunable (md_sync_thresh): 288
     Tunable (nr_requests): 8
This will consume 65 MB with md_num_stripes=1152, 2x md_sync_window.
This is 6MB more than your current utilization of 59MB.

NOTE: Adding additional drives will increase memory consumption.

In unRAID, go to Settings > Disk Settings to set your chosen parameter values.

Completed: 11 Hrs 22 Min 47 Sec.


NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.


System Info:  nas
              unRAID version 6.2.0-rc4
                   md_num_stripes=1040
                   md_sync_window=520
                   md_sync_thresh=260
                   nr_requests=8 (Global Setting)
                   sbNumDisks=14
              CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
      Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
              RAM: System Memory
         System Memory

Outputting lshw information for Drives and Controllers:

H/W path            Device     Class          Description
=========================================================
/0/100/1f.2                    storage        C600/X79 series chipset 6-Port SATA AHCI Controller
/0/2/0              scsi1      storage        SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
/0/2/0/0.0.0        /dev/sdd   disk           512GB Crucial_CT512M55
/0/2/0/0.1.0        /dev/sde   disk           4TB Hitachi HDS72404
/0/2/0/0.2.0        /dev/sdf   disk           4TB HGST HDN724040AL
/0/2/0/0.3.0        /dev/sdg   disk           4TB HGST HDN724040AL
/0/2/0/0.4.0        /dev/sdh   disk           4TB Hitachi HDS72404
/0/2/0/0.5.0        /dev/sdi   disk           4TB HGST HDS724040AL
/0/3/0              scsi8      storage        SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
/0/3/0/0.1.0        /dev/sdk   disk           4TB Hitachi HDS72404
/0/3/0/0.2.0        /dev/sdl   disk           4TB HGST HDS724040AL
/0/3/0/0.3.0        /dev/sdm   disk           4TB HGST HDS724040AL
/0/3/0/0.4.0        /dev/sdn   disk           4TB HGST HDS724040AL
/0/3/0/0.5.0        /dev/sdo   disk           4TB HGST HDS724040AL
/0/3/0/0.6.0        /dev/sdp   disk           4TB HGST HDS724040AL
/0/3/0/0.7.0        /dev/sdq   disk           4TB Hitachi HDS72404
/0/3/0/0.0.0        /dev/sdj   disk           4TB HGST HDS724040AL
/0/68               scsi0      storage        
/0/68/0.0.0         /dev/sda   disk           15GB Reader     SD/MS
/0/68/0.0.0/0       /dev/sda   disk           15GB 
/0/68/0.0.1         /dev/sdb   disk           Reader  MicSD/M2
/0/68/0.0.1/0       /dev/sdb   disk           
/0/69               scsi2      storage        
/0/69/0.0.0         /dev/sdc   disk           60GB INTEL SSDSC2CT06

Array Devices:
    Disk0 sdq is a Parity drive named parity
    Disk1 sdp is a Data drive named disk1
    Disk2 sdo is a Data drive named disk2
    Disk3 sdn is a Data drive named disk3
    Disk4 sdj is a Data drive named disk4
    Disk5 sdm is a Data drive named disk5
    Disk6 sdl is a Data drive named disk6
    Disk7 sdk is a Data drive named disk7
    Disk8 sdi is a Data drive named disk8
    Disk9 sdh is a Data drive named disk9
    Disk10 sdf is a Data drive named disk10
    Disk11 sde is a Data drive named disk11
    Disk12 sdg is a Data drive named disk12

Outputting free low memory information...

              total        used        free      shared  buff/cache   available
Mem:      132071772     5972928   122539640      415652     3559204   124705276
Low:      132071772     9532132   122539640
High:             0           0           0
Swap:             0           0           0


                      *** END OF REPORT ***


 

Another test with poor results.  Can't really make a judgement call on good values, other than lower seems a bit better.

 

I don't recall v2.2 on unRAID 5.x having such inconsistency, but maybe I'm just forgetting.  I was anticipating better results since the new test sample is almost twice the size of the v2.2 test sample.

 

I'm thinking I need to revamp the test strategy. 

 

The first pass seems to give a good overall picture, and if the curve forms it makes it pretty easy to see what values to try.

 

With the results I'm seeing on 6.2, the second pass seems pretty worthless.  It "might" find a slightly better value, but then again it seems just as likely to give crap results and be a total waste of time.  Considering the 2nd pass runs for almost 6 hours alone, I could eliminate it and put that time towards making the first pass more accurate.

 

I can double the sample running time to 10 minutes on the first pass.  It could run as short as 6 hours, even with the double accuracy.  If the extra 1b/1c/1d tests get triggered, it could run for a max around 20 hours.

 

Having a single pass, start to finish, instead of multiple overlapping passes, should also make the results easier to graph.

 

I'm also thinking of separating out the nr_requests test, making it standalone.  This new standalone test would replace the Short Auto.  To me, the whole purpose of the Short Auto was just to give a quick preview to determine if the Normal Auto is worth running.  The nr_requests portion of the Short Auto appears to be answering that question, and at the same time the low accuracy of the Short Auto is just causing too many questions and doubts.

 

I could make the nr_requests be more robust too, testing more points at low, medium and high values, to give a better idea of what values are right for each system.  The best result could feed into the Normal Auto test, so it could skip testing nr_requests and instead focus on md_sync_window values.

 

I would appreciate some feedback from the beta testers.  Does this sound like a better solution?  Do you have any other ideas?

 

Paul

Link to comment

I would appreciate some feedback from the beta testers.  Does this sound like a better solution?  Do you have any other ideas?

 

Paul

 

I don't really know enough about the inner workings to help you with that question.

 

These are my settings right now.

 

The Recommended Sync Speed is md_sync_window=576 at 149.3 MB/s
     Tunable (md_num_stripes): 1152
     Tunable (md_sync_window): 576
     Tunable (md_sync_thresh): 288
     Tunable (nr_requests): 8
This will consume 65 MB with md_num_stripes=1152, 2x md_sync_window.
This is 6MB more than your current utilization of 59MB.

 

I'm going to reboot back to ESXi, and I have a scheduled parity check next week.  I will just see what the results are.  Hopefully its better than it has been.

 

Thanks, Pauven!

Link to comment

Script was not finding the best values for some of my servers, so I've doing some testing on my test server since I don't remember for sure how I arrived at the settings I've been using, some interesting findings, not sure if this helps or makes it more difficult, but after all there was a good reason why I chose my go to values:

 

 

image.png

 

 

So, because the script only tests 2 sync_thresh settings and md_num_stripes is always 2 x md_sync_windows, it can't find the best settings.

 

PS: test server has only LSI controllers, I think that's the reason why nr_request doesn't make much difference.

 

Can't run the normal test on this server since the parity check finishes in 2.5minutes  :P

 

 

Link to comment

After some more testing looks like the important setting is md_sync_thresh, if set to the optimal value md_num_stripes can be left at 2 x md_sync_window.

 

thresh.png

 

 

Is it possible for the script to test a few values for md_sync_thresh between md_sync_window-1 and md_sync_window/2?

Link to comment

Script was not finding the best values for some of my servers, so I've doing some testing on my test server since I don't remember for sure how I arrived at the settings I've been using, some interesting findings, not sure if this helps or makes it more difficult, but after all there was a good reason why I chose my go to values:

...

So, because the script only tests 2 sync_thresh settings and md_num_stripes is always 2 x md_sync_windows, it can't find the best settings.

 

I really appreciate this information.  Please keep it up.

 

Are you sure the unRAID default is md_num_stripes=1280?  I really don't know, so I'm asking.

 

Can't run the normal test on this server since the parity check finishes in 2.5minutes  :P

 

I think this server should be renamed: Rosetta Stone.  With the ability to complete a full parity check in only 2.5 minutes, and at such high speeds, this server can fully reveal the interactions between different values, and in a short amount of time.

 

 

After some more testing looks like the important setting is md_sync_thresh, if set to the optimal value md_num_stripes can be left at 2 x md_sync_window.

I'm not 100% sure of how v6.2 behaves now that md_write_limit is gone, but here is what we uncovered with v5 - hopefully this will give you some ideas for testing v6.2:

 

md_num_stripes is the total # of stripes available for reading/writing/syncing.  It must always remain the highest number.

 

md_sync_window is the total # of stripes that parity syncs are limited to.  While parity checks are underway, md_num_stripes-md_sync_window=the # of remaining stripes available to handle read/write requests.  So for example, if md_num_stripes=5000, and md_sync_window=1000, then during a parity check there are 4000 stripes are remaining to handle reads/writes.

 

md_write_limit is the total # of stripes that array writes were limited to on v5.x.  While writing is underway, md_num_stripes-md_write_limit=the # of remaining stripes available to handle read/sync requests.  So for example, if md_num_stripes=5000, and md_write_limit=1500, then during writing, 3500 stripes are remaining to handle reads/syncs.

 

Using the above examples, if you were reading, writing and running a parity sync all at the same time, 5000-1000-1500=2500, so parity checks would get 1000 stripes, writes would get 1500 stripes, and reads would get the remaining 2500 stripes.

 

Lime-Tech never revealed any priority of reads vs. writes vs syncs, but if I had to guess, on v5 syncs had priority because of how much people complained about stuttering while watching movies during parity checks.  That's just a guess, though, plus I haven't been on the forums enough to know if this general behavior has changed in 6.x.

 

We had to make some educated guesses in both the above and in interpreting how to come up with the right values.  Of the three activities, reading vs. writing vs. parity checks, we figured parity checks were the most intensive activity, as it was the only one that engaged all drives simultaneously.  Based upon that, one could surmise that if 1000 was enough for a parity check, then it was more than enough for reading or writing, as those were "easier" tasks.

 

So then we deduced that this was really a balancing equation.  md_num_stripes vs. md_write_limit vs. md_sync_window determined how the server was balanced to do one task over another.  If you set num_stripes to 3000, write_limit to 1000, and sync_window to 1000, then during all three you get 1000 read, 1000 write, 1000 sync, or a perfectly balanced 1:1:1 ratio and more than sufficient for reads and writes.  (To be clear, Lime-Tech never confirmed this deduction, so it could be wrong.)

 

You could also re-balance the server by changing the ratio.  So if 1000 was enough for parity checks, and if you set num_stripes=1500, write_limit=250, and sync_window=1000, then when the server is doing all three at the same time, you get 250 read, 250 write, and 1000 sync, or a 1:1:4 balancing ratio. 

 

If you did something crazy, like set num_stripes=1001, write_limit=1 and sync_window=1000, then if you're only reading you get 1001 stripes, only writing you get 1 stripe, reading and writing you get 1 write and 1000 read, but if reading/writing/syncing you get 1000 sync, 1 write, 0 read - completely starving the reads.

 

Since md_write_limit is gone, it might be that Lime-Tech is now auto-balancing reads vs writes.  If the default values are num_stripes=1280 and sync_window=384, then LT is leaving 896 stripes to handle reads and writes during a parity check, and giving us no control or visibility over the balancing of reads vs writes.

 

I think it is also telling that LT is setting default values of num_stripes 1280 and sync_window 384, as that is more than 3:1.  Having my script set num_stripes 2:1 vs. sync_window may be too low for a properly balance read vs. write vs. sync ratio.

 

Getting back to your tests, for a parity check test, num_stripes should have nearly zero impact as long as A) you're not read/writing during the test, and B) num_stripes is bigger than sync_window.  Sync_window is the primary factor for parity check speed.

 

On Rosetta Stone, you could test this by running nr_requests=8, sync_window=2048, sync_thresh=1960, doing this twice once with num_stripes=4096, and again with num_stripes=2096, barely above sync_window.  Just make sure nothing is reading/writing to the array during this test.

 

From what I've read, md_sync_thresh is how aggressively unRAID works to refill md_sync_window.  Like a waiter at a restaurant refilling your drink, do they refill after every sip (md_sync_thresh=md_sync_window-1), after the glass is half empty (md_sync_thresh=md_sync_window/2), or when you're sucking on ice (md_sync_thresh=8).

 

If I had to guess as to what's happening with various controller cards, some cards have a slow processor that can't multi-task, while others have a fast one.  A slow processor is like the diner who pauses eating while the waiter refills their drink, so refilling after every sip makes it nearly impossible to eat.  A fast processor is like the diner who keeps chowing down, ignoring the waiter refilling their glass, so refilling after every sip doesn't slow anything down.  Lower nr_requests values probably work for slow and fast cards alike, but if you go too low, then eventually sync is starved of stripes waiting on the next refill.  It might be that, for cards that like fewer refills, using even lower values like 25% of md_sync_window will gain even more performance, so long as we don't go so low that the glass ever runs dry.

 

Is it possible for the script to test a few values for md_sync_thresh between md_sync_window-1 and md_sync_window/2?

 

Yes, anything is possible.  More test points only take more time.  I could try 25%, 50%, 75% and md_sync_window-1 to paint a nice general picture.  I could also do 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and sync_window-1.

 

The real challenge here in optimizing parity check speeds is that now instead of the single parameter we had to worry about on unRAID v5 (sync_window), we now have 3 (sync_window, sync_thresh and nr_requests), plus we see that adjusting some parameters up/down affects the behavior of other parameters.  I can't just test 50 settings and call it a day, I need to test 50 x 2 x 10, or 1000 combinations of settings.  This is an exponentially bigger problem on unRAID v6.2.

 

Also challenging is that because nr_requests and sync_thresh seem to directly impact each other, I need to test both simultaneously.  And because they behave differently at different sync_window values, I need to test them at various sync_windows too.  I have yet to wrap my head around a good routine to testing all three values.

 

I could easily program a routine that test every possible combination, millions of them, it would just take years to run.

 

For setting md_num_stripes, for now we are probably okay just setting it to 2x or 3x the md_sync_window value.  We'll need read/write tests in the future to figure out the best strategy.

Link to comment

Are you sure the unRAID default is md_num_stripes=1280?  I really don't know, so I'm asking.

Yes.  (Since that section in disk settings doesn't have a "default" button, just clearing the entries and then hitting apply will set them to (and display) the default values

 

Link to comment

On Rosetta Stone, you could test this by running nr_requests=8, sync_window=2048, sync_thresh=1960, doing this twice once with num_stripes=4096, and again with num_stripes=2096, barely above sync_window.  Just make sure nothing is reading/writing to the array during this test.

 

num_stripes=4096: 210.6

num_stripes=2096: 209.3

 

You're right, difference represents only one second for the total time, so we can consider the same speed.

Link to comment

On Rosetta Stone, you could test this by running nr_requests=8, sync_window=2048, sync_thresh=1960, doing this twice once with num_stripes=4096, and again with num_stripes=2096, barely above sync_window.  Just make sure nothing is reading/writing to the array during this test.

 

num_stripes=4096: 210.6

num_stripes=2096: 209.3

 

You're right, difference represents only one second for the total time, so we can consider the same speed.

 

Excellent.  I think that is the first proof.

 

Next, how about testing various md_sync_windows with static md_sync_thresh.  Since 1960 works as a good sync_thresh on Rosetta Stone, let's use that:

 

Test | stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

  1  |  6400  |  1960  |      8    |  1960  |  ? MB/s  <--what happens when sync_thresh = sync_window?  Probably bad...

  2  |  6400  |  1968  |      8    |  1960  |  ? MB/s  <--basically sync_window-1, but -8 to give a nice round #

  3  |  6400  |  2240  |      8    |  1960  |  ? MB/s  <--sync_window * 7/8

  4  |  6400  |  2614  |      8    |  1960  |  ? MB/s  <--sync_window * 3/4

  5  |  6400  |  3920  |      8    |  1960  |  ? MB/s  <--sync_window * 2/4

  6  |  6400  |  5880  |      8    |  1960  |  ? MB/s  <--sync_window * 1/4

 

Link to comment

Here you go:

 

Test | stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

  1  |  6400  |  1960  |      8    |  1960  |  177.9MB/s 

  2  |  6400  |  1968  |      8    |  1960  |  205.2MB/s 

  3  |  6400  |  2240  |      8    |  1960  |  206.6MB/s 

  4  |  6400  |  2614  |      8    |  1960  |  203.9MB/s 

  5  |  6400  |  3920  |      8    |  1960  |  203.9MB/s 

  6  |  6400  |  5880  |      8    |  1960  |  202.6MB/s 

Link to comment

Here you go:

 

Test | stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

  1  |  6400  |  1960  |      8    |  1960  |  177.9MB/s 

  2  |  6400  |  1968  |      8    |  1960  |  205.2MB/s 

  3  |  6400  |  2240  |      8    |  1960  |  206.6MB/s 

  4  |  6400  |  2614  |      8    |  1960  |  203.9MB/s 

  5  |  6400  |  3920  |      8    |  1960  |  203.9MB/s 

  6  |  6400  |  5880  |      8    |  1960  |  202.6MB/s

 

Thank you, that helps.  Not sure what it means yet, but it helps.

 

I'm really surprised at how much this:

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

6400  |  2240  |      8    |  1960  |  206.6MB/s 

 

 

is slower than this:

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2048  |      8    |  1960  |  210.6MB/s 

 

Please run this:

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2240  |      8    |  1960  |  ? MB/s 

 

 

I'm trying to understand if we cranked up num_stripes too high at 6400, so it is causing problems, or if the speed impact was due solely to the different sync_window.

 

 

I've also noticed in your results that sync_window=2048 with md_sync_thresh=1960 consistently produces speeds of 210.6.

 

But increasing sync_window to 2240 dropped it by 4 MB/s.

 

Similarly, decreasing sync_thresh to 1945 also dropped it by 4 MB/s.  It seems that the two values have to be spaced a specific distance apart - 88 in this case. 

 

You didn't try a spacing of 80 in your tests, wonder if that would work even better?  i.e. sync_window=2048, sync_thresh=1968.

Link to comment

You didn't try a spacing of 80 in your tests, wonder if that would work even better?  i.e. sync_window=2048, sync_thresh=1968.

 

Giving it some more thought, not sure why 80 would be special, other than it it 10x8.  8x8 (64), 8x12 (96), and 8x16 (128) might be more interesting.

 

sync_window=2048, sync_thresh=1984 (-64) = ?

sync_window=2048, sync_thresh=1968 (-80) = ?

sync_window=2048, sync_thresh=1952 (-96) = ?

sync_window=2048, sync_thresh=1920 (-128) = ?

 

I can't help but think that the best numbers are factors of 8.  2048=(8*256), in binary this is a very nice number. 

Link to comment

Please run this:

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2240  |      8    |  1960  |  ? MB/s 

 

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2240  |      8    |  1960  |  206.6MB/s 

 

 

Giving it some more thought, not sure why 80 would be special, other than it it 10x8.  8x8 (64), 8x12 (96), and 8x16 (128) might be more interesting.

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2048  |      8    |  1984  |  207.9MB/s 

4096  |  2048  |      8    |  1968  |  207.9MB/s 

4096  |  2048  |      8    |  1952  |  209.3MB/s 

4096  |  2048  |      8    |  1920  |  207.9MB/s 

 

 

Note that a check with the same settings sometimes is a second shorter or longer, because it's a very small array it makes a difference of a few MB/s, so when the results are very close they could be practically considered the same, e.g.:

 

Duration: 2 minutes, 32 seconds. Average speed: 210.6 MB/s
Duration: 2 minutes, 33 seconds. Average speed: 209.3 MB/s
Duration: 2 minutes, 34 seconds. Average speed: 207.9 MB/s

 

So with a sync_window=2048 a sync_thresh from ~1900 to ~1990 gives very similar results.

 

 

 

 

 

Link to comment

Unless unRAID uses accuracy including milliseconds in the parity check start and parity check end times, the difference of just 1001 milliseconds could result in a swing of 2.7 MB/s.

 

Theoretical differences when time delta is 1001 milliseconds, could could look like a run of 1 seconds, or 2 seconds.

 

Start @ 00:00:00.000

End @ 00:00:01.001

Reported if not using millis: 1 second

 

Start @ 00:00:00.999

End @ 00:00:02:000

Reported if not using millis: 2 seconds

 

Link to comment

Does this help?

 

       unRAID Tunables Tester v4.0b3 by Pauven (for unRAID v6.2)

        Tunables Report produced Sun Aug 28 14:38:27 CDT 2016

                         Run on server: nas2

                   Short Automatic Parity Sync Test


Current Values:  md_num_stripes=1280, md_sync_window=384, md_sync_thresh=192
                 Global nr_requests=128
                    sdb nr_requests=128
                    sdc nr_requests=128
                    sdd nr_requests=128


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s 
-------------------------------------------------------
   1  |  23 |   1280  |   384  | 128  |   192  | 887.4 

--- FULLY AUTOMATIC nr_requests TEST 1 (4 Sample Points @ 60sec Duration)---

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
  1   |     1536    |     768     |     128     |      767    | 795.6 MB/s 
  2   |     1536    |     768     |     128     |      384    | 954.8 MB/s 
  3   |     1536    |     768     |       8     |      767    | 812.2 MB/s 
  4   |     1536    |     768     |       8     |      384    | 961.8 MB/s 

Fastest vals were nr_reqs=8 and sync_thresh=50% of sync_window at 961.8 MB/s

This nr_requests value will be used for the next test.


--- FULLY AUTOMATIC TEST PASS 1a (Rough - 4 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  13 |    768  |   384  |   8  |   383  | 829.4 |   192  | 962.8 
   2  |  23 |   1280  |   640  |   8  |   639  | 808.7 |   320  | 964.8 
   3  |  32 |   1792  |   896  |   8  |   895  | 816.9 |   448  | 962.1 
   4  |  41 |   2304  |  1152  |   8  |  1151  | 816.3 |   576  | 948.3 

--- FULLY AUTOMATIC TEST PASS 1b (Rough - 2 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |   2 |    128  |    64  |   8  |    63  | 825.9 |    32  | 577.2 
   2  |  11 |    640  |   320  |   8  |   319  | 838.0 |   160  | 961.2 

--- FULLY AUTOMATIC TEST PASS 1c (Rough - 5 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  44 |   2432  |  1216  |   8  |  1215  | 813.5 |   608  | 939.8 
   2  |  53 |   2944  |  1472  |   8  |  1471  | 803.0 |   736  | 812.6 
   3  |  62 |   3456  |  1728  |   8  |  1727  | 804.8 |   864  | 821.1 
   4  |  71 |   3968  |  1984  |   8  |  1983  | 806.7 |   992  | 813.2 
   5  |  81 |   4480  |  2240  |   8  |  2239  | 808.1 |  1120  | 809.4 

--- END OF SHORT AUTO TEST FOR DETERMINING IF YOU SHOULD RUN THE NORMAL AUTO ---

If the speeds changed with different values you should run a NORMAL AUTO test.

Completed: 0 Hrs 15 Min 11 Sec.


NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.


System Info:  nas2
              unRAID version 6.2.0-rc4
                   md_num_stripes=1280
                   md_sync_window=384
                   md_sync_thresh=192
                   nr_requests=128 (Global Setting)
                   sbNumDisks=4
              CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  CPU
  CPU
  CPU
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
              RAM: 2GiB System Memory

Outputting lshw information for Drives and Controllers:

H/W path             Device      Class      Description
=======================================================
/0/100/7.1                       storage    82371AB/EB/MB PIIX4 IDE
/0/100/15/0          scsi2       storage    SAS1068 PCI-X Fusion-MPT SAS
/0/100/15/0/0.0.0    /dev/sda    disk       209MB Virtual disk
/0/100/15/0/0.1.0    /dev/sdb    disk       107GB Virtual disk
/0/100/15/0/0.2.0    /dev/sdc    disk       107GB Virtual disk
/0/100/15/0/0.3.0    /dev/sdd    disk       107GB Virtual disk
/0/1                 scsi3       storage    
/0/1/0.0.0           /dev/sde    disk       2004MB U3 Cruzer Micro
/0/1/0.0.0/0         /dev/sde    disk       2004MB 
/0/1/0.0.1           /dev/cdrom  disk       U3 Cruzer Micro
/0/1/0.0.1/0         /dev/cdrom  disk       

Array Devices:
    Disk0 sdb is a Parity drive named parity
    Disk1 sdc is a Data drive named disk1
    Disk2 sdd is a Data drive named disk2

Outputting free low memory information...

              total        used        free      shared  buff/cache   available
Mem:        2053356       68440     1620416      337380      364500     1510656
Low:        2053356      432940     1620416
High:             0           0           0
Swap:             0           0           0


                      *** END OF REPORT ***

 

 

Honestly, I was just super jealous of @johnnie.black speeds.

 

Link to comment

All my previous tests were done using 2 LSI 9211 (flashed H310), I now did some tests using a SASLP, since it's bandwidth challenged and a parity check will take more time the differences should be more noticeable, also it responds differently to the tunable changes.

 

Only thresh was changed to find the optimal values

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  2047  |  72.4MB/s 
4096  |   2048   |  128   |  2016  |  80.0MB/s    
4096  |   2048   |  128   |  1984  |  80.0MB/s    
4096  |   2048   |  128   |  1952  |  80.0MB/s   
4096  |   2048   |  128   |  1920  |  79.8MB/s 
4096  |   2048   |  128   |  1856  |  79.8MB/s 
4096  |   2048   |  128   |  1792  |  79.8MB/s 
4096  |   2048   |  128   |  1728  |  79.8MB/s
4096  |   2048   |  128   |  1664  |  78.5MB/s 
4096  |   2048   |  128   |  1536  |  77.7MB/s
4096  |   2048   |  128   |  1280  |  77.5MB/s
4096  |   2048   |  128   |  1024  |  77.1MB/s 

 

 

With a sync_window of 2048 there's a big range where it works very well, from ~1728 to ~2016, with an apparent sweet spot from ~1950 to ~2000 ,and like the LSI neither sync_window-1 nor /2 provide the best results.

 

Note also that with nr_requests=8 this controller performs always at optimal speed, making the tresh setting practically irrelevant. Of course if this controller is used with one that responds differently the trick is to find the best values with both together.

 

Using nr_requests=8 with the 2 slowest thresh values:

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |    8   |  2047  |  79.8MB/s 
4096  |   2048   |    8   |  1024  |  79.8MB/s 

 

 

Next I'm going to test the SAS2LP, same chipset as your controller, but since I don't have a spare I'll have to use one from a server, so I'll do it as soon as I can, IIRC the results were similar to the SASLP but with much bigger differences.

Link to comment

Unless unRAID uses accuracy including milliseconds in the parity check start and parity check end times, the difference of just 1001 milliseconds could result in a swing of 2.7 MB/s.

 

Theoretical differences when time delta is 1001 milliseconds, could could look like a run of 1 seconds, or 2 seconds.

 

Start @ 00:00:00.000

End @ 00:00:01.001

Reported if not using millis: 1 second

 

Start @ 00:00:00.999

End @ 00:00:02:000

Reported if not using millis: 2 seconds

 

You both make excellent points.

 

On the flip side, I'm seeing a lot of consistency in the results, and not the flickering of results that would be expected when running the same test multiple times.  Perhaps I missed it, but I didn't see even a single second variance when the same test point was run multiple times.

 

To me, it appears that there is more accuracy feeding into the internal calculation, but that a rounding function near the end of the calculation is rounding to the nearest second.

 

Hard to say whom is right, could go either way.  Regardless, I'm loving Rosetta Stone.

 

And for a parity check that runs 152 seconds, 3 seconds is 2%.  Not huge, but bigger than a typical rounding error.

Link to comment

Please run this:

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2240  |      8    |  1960  |  ? MB/s 

 

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2240  |      8    |  1960  |  206.6MB/s 

 

 

Giving it some more thought, not sure why 80 would be special, other than it it 10x8.  8x8 (64), 8x12 (96), and 8x16 (128) might be more interesting.

 

stripes | window | nr_reqs | thresh |  Speed

-----------------------------------------------------------

4096  |  2048  |      8    |  1984  |  207.9MB/s 

4096  |  2048  |      8    |  1968  |  207.9MB/s 

4096  |  2048  |      8    |  1952  |  209.3MB/s 

4096  |  2048  |      8    |  1920  |  207.9MB/s 

 

 

Note that a check with the same settings sometimes is a second shorter or longer, because it's a very small array it makes a difference of a few MB/s, so when the results are very close they could be practically considered the same, e.g.:

 

Duration: 2 minutes, 32 seconds. Average speed: 210.6 MB/s
Duration: 2 minutes, 33 seconds. Average speed: 209.3 MB/s
Duration: 2 minutes, 34 seconds. Average speed: 207.9 MB/s

 

So with a sync_window=2048 a sync_thresh from ~1900 to ~1990 gives very similar results.

 

Very interesting.  I'm still seeing how you found a great combination in 2048/1960, but I can't fathom why that is such a great combination.  I'm typically pretty good at finding the pattern.  This one is eluding me.

Link to comment

Does this help?

 

       unRAID Tunables Tester v4.0b3 by Pauven (for unRAID v6.2)

        Tunables Report produced Sun Aug 28 14:38:27 CDT 2016

                         Run on server: nas2

                   Short Automatic Parity Sync Test


Current Values:  md_num_stripes=1280, md_sync_window=384, md_sync_thresh=192
                 Global nr_requests=128
                    sdb nr_requests=128
                    sdc nr_requests=128
                    sdd nr_requests=128


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s 
-------------------------------------------------------
   1  |  23 |   1280  |   384  | 128  |   192  | 887.4 

--- FULLY AUTOMATIC nr_requests TEST 1 (4 Sample Points @ 60sec Duration)---

Test | num_stripes | sync_window | nr_requests | sync_thresh |   Speed 
---------------------------------------------------------------------------
  1   |     1536    |     768     |     128     |      767    | 795.6 MB/s 
  2   |     1536    |     768     |     128     |      384    | 954.8 MB/s 
  3   |     1536    |     768     |       8     |      767    | 812.2 MB/s 
  4   |     1536    |     768     |       8     |      384    | 961.8 MB/s 

Fastest vals were nr_reqs=8 and sync_thresh=50% of sync_window at 961.8 MB/s

This nr_requests value will be used for the next test.


--- FULLY AUTOMATIC TEST PASS 1a (Rough - 4 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  13 |    768  |   384  |   8  |   383  | 829.4 |   192  | 962.8 
   2  |  23 |   1280  |   640  |   8  |   639  | 808.7 |   320  | 964.8 
   3  |  32 |   1792  |   896  |   8  |   895  | 816.9 |   448  | 962.1 
   4  |  41 |   2304  |  1152  |   8  |  1151  | 816.3 |   576  | 948.3 

--- FULLY AUTOMATIC TEST PASS 1b (Rough - 2 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |   2 |    128  |    64  |   8  |    63  | 825.9 |    32  | 577.2 
   2  |  11 |    640  |   320  |   8  |   319  | 838.0 |   160  | 961.2 

--- FULLY AUTOMATIC TEST PASS 1c (Rough - 5 Sample Points @ 30sec Duration)---

Test | RAM | stripes | window | reqs | thresh |  MB/s | thresh |  MB/s 
------------------------------------------------------------------------
   1  |  44 |   2432  |  1216  |   8  |  1215  | 813.5 |   608  | 939.8 
   2  |  53 |   2944  |  1472  |   8  |  1471  | 803.0 |   736  | 812.6 
   3  |  62 |   3456  |  1728  |   8  |  1727  | 804.8 |   864  | 821.1 
   4  |  71 |   3968  |  1984  |   8  |  1983  | 806.7 |   992  | 813.2 
   5  |  81 |   4480  |  2240  |   8  |  2239  | 808.1 |  1120  | 809.4 

--- END OF SHORT AUTO TEST FOR DETERMINING IF YOU SHOULD RUN THE NORMAL AUTO ---

If the speeds changed with different values you should run a NORMAL AUTO test.

Completed: 0 Hrs 15 Min 11 Sec.


NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.


System Info:  nas2
              unRAID version 6.2.0-rc4
                   md_num_stripes=1280
                   md_sync_window=384
                   md_sync_thresh=192
                   nr_requests=128 (Global Setting)
                   sbNumDisks=4
              CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  CPU
  CPU
  CPU
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
  CPU [empty]
              RAM: 2GiB System Memory

Outputting lshw information for Drives and Controllers:

H/W path             Device      Class      Description
=======================================================
/0/100/7.1                       storage    82371AB/EB/MB PIIX4 IDE
/0/100/15/0          scsi2       storage    SAS1068 PCI-X Fusion-MPT SAS
/0/100/15/0/0.0.0    /dev/sda    disk       209MB Virtual disk
/0/100/15/0/0.1.0    /dev/sdb    disk       107GB Virtual disk
/0/100/15/0/0.2.0    /dev/sdc    disk       107GB Virtual disk
/0/100/15/0/0.3.0    /dev/sdd    disk       107GB Virtual disk
/0/1                 scsi3       storage    
/0/1/0.0.0           /dev/sde    disk       2004MB U3 Cruzer Micro
/0/1/0.0.0/0         /dev/sde    disk       2004MB 
/0/1/0.0.1           /dev/cdrom  disk       U3 Cruzer Micro
/0/1/0.0.1/0         /dev/cdrom  disk       

Array Devices:
    Disk0 sdb is a Parity drive named parity
    Disk1 sdc is a Data drive named disk1
    Disk2 sdd is a Data drive named disk2

Outputting free low memory information...

              total        used        free      shared  buff/cache   available
Mem:        2053356       68440     1620416      337380      364500     1510656
Low:        2053356      432940     1620416
High:             0           0           0
Swap:             0           0           0


                      *** END OF REPORT ***

 

 

Honestly, I was just super jealous of @johnnie.black speeds.

 

I'm so disappointed in your sub-gigabyte speeds...  :P

 

Sure would be awesome to run a parity check on my 3TB drives in under an hour!

Link to comment

All my previous tests were done using 2 LSI 9211 (flashed H310), I now did some tests using a SASLP, since it's bandwidth challenged and a parity check will take more time the differences should be more noticeable, also it responds differently to the tunable changes.

 

johnnie.black, you're freaking awesome!

 

Only thresh was changed to find the optimal values

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  2047  |  72.4MB/s 
4096  |   2048   |  128   |  2040  |  ? MB/s    
4096  |   2048   |  128   |  2032  |  ? MB/s    
4096  |   2048   |  128   |  2024  |  ? MB/s    
4096  |   2048   |  128   |  2016  |  80.0MB/s    
4096  |   2048   |  128   |  1984  |  80.0MB/s    
4096  |   2048   |  128   |  1960  |  ? MB/s    
4096  |   2048   |  128   |  1952  |  80.0MB/s   
4096  |   2048   |  128   |  1920  |  79.8MB/s 
4096  |   2048   |  128   |  1856  |  79.8MB/s 
4096  |   2048   |  128   |  1792  |  79.8MB/s 
4096  |   2048   |  128   |  1728  |  79.8MB/s
4096  |   2048   |  128   |  1664  |  78.5MB/s 
4096  |   2048   |  128   |  1536  |  77.7MB/s
4096  |   2048   |  128   |  1280  |  77.5MB/s
4096  |   2048   |  128   |  1024  |  77.1MB/s 

 

I can't quite tell if that is a bell curve with a peak around 1984, or a linear decline with a peak around 2040.  You also skipped over 1960 which worked best on the other controller.  If you can run the 4 extra test points above, that would help.

 

With a sync_window of 2048 there's a big range where it works very well, from ~1728 to ~2016, with an apparent sweet spot from ~1950 to ~2000 ,and like the LSI neither sync_window-1 nor /2 provide the best results.

 

That is an awesome discovery.  It may help explain why my script is having a hard time finding good values. 

 

I'm still looking for the pattern on why certain values are better than others.  If I can see the pattern, I can script it.  In this case, if the peak is 1984, that is sync_window-64, one of the values I was interested in.  But how to interpret that?  Is that 50% of nr_requests?  Is that -8 per drive on that controller?  Is that 3.125% of sync_window?  Or is that just -64 works nice on this controller for no explainable reason...

 

Note also that with nr_requests=8 this controller performs always at optimal speed, making the tresh setting practically irrelevant. Of course if this controller is used with one that responds differently the trick is to find the best values with both together.

 

Using nr_requests=8 with the 2 slowest thresh values:

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |    8   |  2047  |  79.8MB/s 
4096  |   2048   |    8   |  1024  |  79.8MB/s 

 

This has me wondering why 128 and 8 are the only values we're testing for nr_requests.  Could you complete the following?

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  1024  |  77.1MB/s 
4096  |   2048   |   64   |  1024  |  ? MB/s 
4096  |   2048   |   32   |  1024  |  ? MB/s 
4096  |   2048   |   16   |  1024  |  ? MB/s 
4096  |   2048   |    8   |  1024  |  79.8MB/s 
4096  |   2048   |    4   |  1024  |  ? MB/s 
4096  |   2048   |    1   |  1024  |  ? MB/s 

 

Next I'm going to test the SAS2LP, same chipset as your controller, but since I don't have a spare I'll have to use one from a server, so I'll do it as soon as I can, IIRC the results were similar to the SASLP but with much bigger differences.

 

Did I say you're awesome?  I think I need a bigger word.  Thanks so much!

Link to comment

Requested values tests:

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  2047  |  72.4MB/s 
4096  |   2048   |  128   |  2040  |  76.8MB/s    
4096  |   2048   |  128   |  2032  |  78.3MB/s    
4096  |   2048   |  128   |  2024  |  78.9MB/s    
4096  |   2048   |  128   |  2016  |  80.0MB/s    
4096  |   2048   |  128   |  1984  |  80.0MB/s    
4096  |   2048   |  128   |  1960  |  79.8MB/s    
4096  |   2048   |  128   |  1952  |  80.0MB/s   
4096  |   2048   |  128   |  1920  |  79.8MB/s 
4096  |   2048   |  128   |  1856  |  79.8MB/s 
4096  |   2048   |  128   |  1792  |  79.8MB/s 
4096  |   2048   |  128   |  1728  |  79.8MB/s
4096  |   2048   |  128   |  1664  |  78.5MB/s 
4096  |   2048   |  128   |  1536  |  77.7MB/s
4096  |   2048   |  128   |  1280  |  77.5MB/s
4096  |   2048   |  128   |  1024  |  77.1MB/s 

 

78.8 to 80.0MB/s is a single second difference in total time.

 

 

stripes | window | nr_reqs | thresh |   Speed
-----------------------------------------------------------
4096  |   2048   |  128   |  1024  |  77.1MB/s 
4096  |   2048   |   64   |  1024  |  77.5MB/s 
4096  |   2048   |   32   |  1024  |  77.3MB/s 
4096  |   2048   |   16   |  1024  |  79.6MB/s 
4096  |   2048   |    8   |  1024  |  79.8MB/s 
4096  |   2048   |    4   |  1024  |  80.0MB/s 
4096  |   2048   |    1   |  1024  |  ? MB/s 

 

Although it can be set to 1 in unRAID, it will remain at 4, I believe that is the minimum possible setting.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.