unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


1096 posts in this topic Last Reply

Recommended Posts

  • Replies 1.1k
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

NEW! For Unraid 6.x, this utility is named:  unraid6x-tunables-tester.sh For Unraid 5.x, this utility is named:  unraid-tunables-tester.sh   The current version is 4.1 for Unraid 6.x an

Well, good job guys.  The conversation has prompted me to find and review my testing documentation, which includes my strategy for the next test routine.   And it just so happens that at thi

Well, it's finally happened:  Unraid 6.x Tunables Tester v4.0   The first post has been updated with the release notes and download.   Paul

Posted Images

You did. I have it in there as an xbmc cache drive for my xbmc clients so there's no spinup delays in accessing thumbnails & the like. Though I'm starting to lean towards not having a central cache and letting the icons exist on each server.

 

Here's the results of version 2.2:

Tunables Report from  unRAID Tunables Tester v2.2 by Pauven

NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.

Test | num_stripes | write_limit | sync_window |   Speed 
--- FULLY AUTOMATIC TEST PASS 1 (Rough - 20 Sample Points @ 3min Duration)---
   1  |    1408     |     768     |     512     |  90.5 MB/s 
   2  |    1536     |     768     |     640     |  87.7 MB/s 
   3  |    1664     |     768     |     768     |  87.2 MB/s 
   4  |    1920     |     896     |     896     |  86.9 MB/s 
   5  |    2176     |    1024     |    1024     |  86.0 MB/s 
   6  |    2560     |    1152     |    1152     |  86.5 MB/s 
   7  |    2816     |    1280     |    1280     |  86.4 MB/s 
   8  |    3072     |    1408     |    1408     |  86.7 MB/s 
   9  |    3328     |    1536     |    1536     |  86.6 MB/s 
  10  |    3584     |    1664     |    1664     |  87.4 MB/s 
  11  |    3968     |    1792     |    1792     |  85.9 MB/s 
  12  |    4224     |    1920     |    1920     |  84.6 MB/s 
  13  |    4480     |    2048     |    2048     |  87.4 MB/s 
  14  |    4736     |    2176     |    2176     |  86.6 MB/s 
  15  |    5120     |    2304     |    2304     |  86.4 MB/s 
  16  |    5376     |    2432     |    2432     |  85.2 MB/s 
  17  |    5632     |    2560     |    2560     |  83.7 MB/s 
  18  |    5888     |    2688     |    2688     |  85.6 MB/s 
  19  |    6144     |    2816     |    2816     |  86.3 MB/s 
  20  |    6528     |    2944     |    2944     |  85.2 MB/s 
--- Targeting Fastest Result of md_sync_window 512 bytes for Special Pass ---
--- FULLY AUTOMATIC TEST PASS 1b (Rough - 4 Sample Points @ 3min Duration)---
  21  |    896     |     768     |     128     |  93.3 MB/s 
  22  |    1024     |     768     |     256     |  95.7 MB/s 
  23  |    1280     |     768     |     384     |  93.6 MB/s 
  24  |    1408     |     768     |     512     |  89.5 MB/s 
--- Targeting Fastest Result of md_sync_window 256 bytes for Final Pass ---
--- FULLY AUTOMATIC TEST PASS 2 (Final - 16 Sample Points @ 4min Duration)---
  25  |    1000     |     768     |     136     |  92.3 MB/s 
  26  |    1008     |     768     |     144     |  92.0 MB/s 
  27  |    1016     |     768     |     152     |  93.3 MB/s 
  28  |    1024     |     768     |     160     |  94.0 MB/s 
  29  |    1040     |     768     |     168     |  94.0 MB/s 
  30  |    1048     |     768     |     176     |  90.7 MB/s 
  31  |    1056     |     768     |     184     |  94.2 MB/s 
  32  |    1064     |     768     |     192     |  95.2 MB/s 
  33  |    1072     |     768     |     200     |  97.3 MB/s 
  34  |    1080     |     768     |     208     |  96.2 MB/s 
  35  |    1088     |     768     |     216     |  96.1 MB/s 
  36  |    1096     |     768     |     224     |  95.6 MB/s 
  37  |    1104     |     768     |     232     |  98.0 MB/s 
  38  |    1120     |     768     |     240     |  96.8 MB/s 
  39  |    1128     |     768     |     248     |  96.5 MB/s 
  40  |    1136     |     768     |     256     |  95.9 MB/s 

Completed: 2 Hrs 21 Min 31 Sec.

Best Bang for the Buck: Test 1 with a speed of 90.5 MB/s

     Tunable (md_num_stripes): 1408
     Tunable (md_write_limit): 768
     Tunable (md_sync_window): 512

These settings will consume 55MB of RAM on your hardware.


Unthrottled values for your server came from Test 37 with a speed of 98.0 MB/s

     Tunable (md_num_stripes): 1104
     Tunable (md_write_limit): 768
     Tunable (md_sync_window): 232

These settings will consume 43MB of RAM on your hardware.
This is 8MB more than your current utilization of 35MB.
NOTE: Adding additional drives will increase memory consumption.

In unRAID, go to Settings > Disk Settings to set your chosen parameter values.

Link to post

You did. I have it in there as an xbmc cache drive for my xbmc clients so there's no spinup delays in accessing thumbnails & the like. Though I'm starting to lean towards not having a central cache and letting the icons exist on each server.

 

 

Just as a heads up (sorry if you already know this) but while there are no spin-up delays on read, there sure will be on writes AND your writes will be limited to either cache or parity spinner speeds (i.e. slow for small random I/O). 

 

Q: Is that data so important that it needs to be inside the array? 

 

If not, why not mount the SSD outside the array (I've done that, it is easy) for fast reads and writes without any need to spin up parity. 

 

If it is that important, I'd say you could still put the SSD outside the array, and then you can rsync the data into the array on a nightly basis just like the mover script.

 

Just a thought.  Now back on topic.

Link to post

Now that the time needed to run this script has decreased you should probably update the original posts to reflect thedecrease in time.

 

Got it now, thanks for pointing it out.  I haven't been maintaining the 2nd post, mainly because the utility has been changing frequently enough that it felt like a moving target.  There's a lot more I need to update in the 2nd post.

Link to post

@Pauven:  There's some logic anomaly in RockDawg's test:  Notice how that "Unthrottled" result is worse than the "Best Bang for the Buck" result.

 

Yup, the Best Bang for the Buck logic is pretty much broken in two scenarios:  servers that respond well to lower values, like John's (this issue is easily fixable, though); and servers that have spikey results.  Currently the Best Bang recommendation only looks at pass 1, and takes the last set of values that shows a nice improvement (at least 1% over the previous values), but stops checking at soon as the the tests turn up a sub-par improvement.  While this logic worked well on my server with a very smooth bell curve, on spikey servers the logic stops checking too early.  I have some ideas on how to handle this, but probably the best way is with an array of values that I analyze after the test is done.

 

That bug aside, "Unthrottled" is a misleading word for calling that thing.  There's no real throttling in the md-mod driver at his time, but the idea for that has been floated around.  There was a discussion with Tom some time ago about inplementing another tunable, speciffically for throttling parity checks for the purpose of giving more I/O resources to other tasks, and Tom said that he'll think about doing that.

 

I guess I disagree with you there (but I'm open to better words if you have any suggestions).  If you look at my test results with extra low values, my performance drops below 10MB/s with values below 128 bytes.  If that isn't throttling, then what is it?

 

I understand your point that throttling wasn't the intent of that parameter, and that true throttling (for the purpose of prioritizing reads vs. writes vs syncs) has not been implemented or at least exposed as parameters.  But I would prefer to call that wished for feature 'load prioritization', not throttling.

 

Semantics...

Link to post

Interesting point in the results from RockDawg's test.  It appears the "unthrottled" is the best found during pass 2 and ignored pass 1.

Correct, I don't compare pass 2 results to pass 1, since they run for different lengths of time.  Pass 2 results are expected to be slightly lower, as the test extends farther into the parity check, which gradually slows from beginning to end.  It is expected that pass 1 gets you into the right range, and pass 2 finds the best value in that range.

 

If I compared pass 2 to pass 1, the majority of the time the logic would probably pick the pass 1 result since it would probably have the faster time due to the shorter test length.

 

Unfortunately some servers are not testing well (he's not slow, he just doesn't test well...).  It's not so much about comparing pass 1 to pass 2, rather it's more about servers producing inconsistent results, which no amount of logic can power through.  Myself, I'm looking at RockDawg's results and have no freaking idea which values are good values...

 

Noted something else:  Look at the results of test #1 compared to the almost identical test #36:

 

1  |    1408    |    768    |    512    | 101.9 MB/s

36  |    1416    |    768    |    512    |  91.6 MB/s

 

Really strange.    The only difference (aside from an insignificant 8 strip difference in the total allowed) is the 4 min duration vs. 3 min

 

I have to wonder if this is related to the virtualization.

 

My point exactly, inconsistent results.  Test 36 should have been pretty close, but slightly below test 1.

 

Actually, if you look at the bigger picture (and RockDawg's server is not the first to show this behavior), the 1st test started off at a nice speed, and each subsequent test gets slower, until a lower threshold is reached, and then all results hover around that lower threshold.  I would hazard a guess that these md_* values are not actually affecting anything on RockDawg's server - he could run a 512 byte test 10 times in a row, and each subsequent test would be a little bit slower.

 

I think VM is highly suspect. 

 

zoggy had nearly identical behavior in his test results:

Tunables Report from  unRAID Tunables Tester v2.0 by Pauven

Test | num_stripes | write_limit | sync_window |   Speed 
--- FULLY AUTOMATIC TEST PASS 1 (Rough - 20 Sample Points @ 3min Duration)---
   1  |    1408     |     768     |     512     |  88.0 MB/s 
   2  |    1536     |     768     |     640     |  87.8 MB/s 
   3  |    1664     |     768     |     768     |  87.4 MB/s 
   4  |    1920     |     896     |     896     |  87.0 MB/s 
   5  |    2176     |    1024     |    1024     |  87.2 MB/s 
   6  |    2560     |    1152     |    1152     |  86.8 MB/s 
   7  |    2816     |    1280     |    1280     |  86.6 MB/s 
   8  |    3072     |    1408     |    1408     |  86.2 MB/s 
   9  |    3328     |    1536     |    1536     |  86.0 MB/s 
  10  |    3584     |    1664     |    1664     |  85.7 MB/s 
  11  |    3968     |    1792     |    1792     |  85.7 MB/s 
  12  |    4224     |    1920     |    1920     |  86.1 MB/s 
  13  |    4480     |    2048     |    2048     |  86.2 MB/s 
  14  |    4736     |    2176     |    2176     |  85.7 MB/s 
  15  |    5120     |    2304     |    2304     |  85.3 MB/s 
  16  |    5376     |    2432     |    2432     |  85.3 MB/s 
  17  |    5632     |    2560     |    2560     |  85.1 MB/s 
  18  |    5888     |    2688     |    2688     |  85.1 MB/s 
  19  |    6144     |    2816     |    2816     |  84.8 MB/s 
  20  |    6528     |    2944     |    2944     |  84.8 MB/s 
--- Targeting Fastest Result of md_sync_window 512 bytes for Medium Pass ---
--- FULLY AUTOMATIC TEST PASS 2 (Final - 16 Sample Points @ 4min Duration)---
  21  |    1288     |     768     |     392     |  84.9 MB/s 
  22  |    1296     |     768     |     400     |  84.8 MB/s 
  23  |    1304     |     768     |     408     |  84.7 MB/s 
  24  |    1312     |     768     |     416     |  84.7 MB/s 
  25  |    1320     |     768     |     424     |  84.4 MB/s 
  26  |    1328     |     768     |     432     |  84.7 MB/s 
  27  |    1336     |     768     |     440     |  84.7 MB/s 
  28  |    1344     |     768     |     448     |  84.4 MB/s 
  29  |    1360     |     768     |     456     |  84.6 MB/s 
  30  |    1368     |     768     |     464     |  84.7 MB/s 
  31  |    1376     |     768     |     472     |  84.3 MB/s 
  32  |    1384     |     768     |     480     |  84.5 MB/s 
  33  |    1392     |     768     |     488     |  84.7 MB/s 
  34  |    1400     |     768     |     496     |  84.6 MB/s 
  35  |    1408     |     768     |     504     |  84.5 MB/s 
  36  |    1416     |     768     |     512     |  84.7 MB/s 

 

Notice that the 512 byte test is both the fastest and one of the slowest!

 

I talked to zoggy the other day about his build, and I don't think he mentioned VM, but I didn't think to ask either.

 

-Paul

Link to post

Seeing differing results on 5.0 final, lot higher and more steady. Going to restart with the script unmodified.

 

Test 1  - md_sync_window=512  - Completed in 604.779 seconds =  94.0 MB/s

Test 2  - md_sync_window=640  - Completed in 605.112 seconds =  89.1 MB/s

Test 3  - md_sync_window=768  - Completed in 605.048 seconds =  89.5 MB/s

Test 4  - md_sync_window=896  - Completed in 605.122 seconds =  88.6 MB/s

Test 5  - md_sync_window=1024 - Completed in 604.995 seconds =  89.2 MB/s

 

Alright John, I think your server is from Bizarro World.  Your results are pretty much the opposite of what should be happening.

 

Higher values make your sync slower, not faster.  Bizarro.

 

Longer tests produce higher speeds, not slower.  Bizarro.

 

As I described in a post above, longer tests should produce slower results because hard drives get slower the further into a parity check you traverse (due to the the tracking of the read head from outside cylinders to inside cylinders on the platters).  Longer tests are more accurate, as the noise floor is reduced, but they "should" be slightly slower.

 

But your 10-minute Extreme level tests produced significantly higher results compared to your 3-minute results.

 

I think you're right about your SSD skewing the results.  I think your SSD is SLOOOOOW,  so slow that it is dragging down your results.  But because your SSD is so small, longer tests give less weight to the SSD speed.  By my math, approximately the first 6 minutes of your parity check includes the SSD, beyond which your parity check position has passed the 32GB SSD.  So your 10-minute tests have about 4 minutes of HD only tests, pulling your avg speed back up.

 

I don't know that this is a problem, so to speak, but for accurate testing on your server, you need to be running extreme length tests (10 minute), or maybe even longer. 

 

Your results are also slightly inconsistent.  On pass 1 and 1b, the 512 byte point gets tested twice, both times at 3 minutes.  The result varied by 1 MB/s (90.5 vs. 89.5).  This is enough variance to make it difficult to choose a best value.  Again, longer tests should help here.  I see similar variance on my server, but typically at shorter run times around 2 minutes or less.

 

I was also expecting 128 bytes to be the winner from pass 1b, allowing pass 2 to test from 8 to 128 bytes.  That didn't happen, so we've yet to see what results look like on your server below 128 bytes (I'm very interested to see).

 

If you're willing, please run the following test:  (E)xtreme, 8 byte interval, starting at 8 bytes, ending at 256 bytes.  This would be a 5.5 hour test, but I think it targets the right range for your server and each test is long enough to cut through the noise the SSD is generating.

 

I think you may also want to consider some of the options proposed by others, removing the SSD from your protected array, not so much because it is slowing down your parity check for a few minutes, but because the slow speed of an SSD may be indicating it is in poor health (maybe a lack of proper maintenance like trim or other garbage collection, or possibly even getting closer to failure).  I don't think that a slow 32GB drive is worth risking parity on your whole array just for instant access capability.

 

System specs

...

Hard drives are connected to MB and to two PCI-E SATA extenders.

 

Been replacing the EARS drives, one a month, with the 4TB Seagates. Preclearing another 4TB now.

 

-John

 

Wow, you slipped that one in there, and it almost got by me!  What are these two 'PCI-E SATA extenders'?!!  I think whatever these things are is what is causing your server to work well with lower values.

 

Also, I am excited to hear that you are upgrading drives.  If you're willing, I propose a test during your next upgrade.  Hopefully you already know how long it takes to upgrade a drive, if not, you need to get a baseline.  Once you have a baseline, I propose performing an upgrade with an md_write_limit of 128 instead of the unRAID stock 768, and see how that affects the running time.  If that's a test you're willing to try, before we do it I think we need to test lower md_write_limit values just in everyday writing before going whole hog on a rebuild - but higher values seem to work well on servers that respond well to higher md_sync_window values, so I think this is a reasonable test on your server.

 

This test would help establish 2 things:  That rebuilds are connected to md_write_limit in the same way that parity checks are connected to md_sync_window, and that md_write_limit values should be set to md_sync_window values for optimal results.

 

Obviously this test is not without risks, so if you are not interested I understand.

 

-Paul

Link to post

If not, why not mount the SSD outside the array (I've done that, it is easy) for fast reads and writes without any need to spin up parity. 

 

I really like this idea.  I've been thinking about doing a dual-ssd cache drive setup (not even using a cache drive today), but that's obviously expensive and requires functionality Tom has not yet released.

 

Do you have a link, or instructions, for mounting a drive outside the array? 

 

I wouldn't mind having a smaller SSD for everyday file transfers, and then I can use bigger, cheaper hard drives for the cache drive role to speed up my BR rips.

 

Thanks,

Paul

Link to post

Well, this thread has inspired me to do some upgrades.  The sub-100MB/s speeds are just killing me.

 

I have two M1015's coming on Saturday, as well as new SSDs for cache. I also plan on getting rid of the 1.5TB drives finally.  At least for now, Im going to take the Areca card out to see if thats the bottleneck...I find it hard to believe it is.

Link to post

Well, this thread has inspired me to do some upgrades.  The sub-100MB/s speeds are just killing me.

 

I have two M1015's coming on Saturday, as well as new SSDs for cache. I also plan on getting rid of the 1.5TB drives finally.  At least for now, Im going to take the Areca card out to see if thats the bottleneck...I find it hard to believe it is.

 

Hey Steven,

 

I just went back and looked at your results, and I would agree that your parity drive array is probably not the problem.  Most likely those 1.5 TB drives are the main culprit.

 

Having a mix of drive sizes impacts parity check / rebuild speed in multiple ways.  Primarily, the slowest drive sets the pace for the whole array.  Additionally, you get multiple slow-downs as each drive reaches the inner cylinders at different points during the parity check:  so you would have slowdowns approaching  1.5TB, 2TB and 4TB.

 

This doesn't necessarily affect read or write performance unless you're accessing data on one of those drives.  Unless your parity check/rebuild times are unfathomably long, upgrades may not be cost effective.

 

Anyway, I'm interested to hear how your upgrades go.

 

-Paul

Link to post

Well, this thread has inspired me to do some upgrades.  The sub-100MB/s speeds are just killing me.

 

I have two M1015's coming on Saturday, as well as new SSDs for cache. I also plan on getting rid of the 1.5TB drives finally.  At least for now, Im going to take the Areca card out to see if thats the bottleneck...I find it hard to believe it is.

 

Hey Steven,

 

I just went back and looked at your results, and I would agree that your parity drive array is probably not the problem.  Most likely those 1.5 TB drives are the main culprit.

 

Having a mix of drive sizes impacts parity check / rebuild speed in multiple ways.  Primarily, the slowest drive sets the pace for the whole array.  Additionally, you get multiple slow-downs as each drive reaches the inner cylinders at different points during the parity check:  so you would have slowdowns approaching  1.5TB, 2TB and 4TB.

 

This doesn't necessarily affect read or write performance unless you're accessing data on one of those drives.  Unless your parity check/rebuild times are unfathomably long, upgrades may not be cost effective.

 

Anyway, I'm interested to hear how your upgrades go.

 

-Paul

 

My parity checks are longer than I would like them to be (~15 hours).  I see some folks with ~8 hour checks with a 4TB parity. 

 

Besides the speed of a RAID0, one of the reasons I went with the Areca was the ability to configure up to an 8TB parity without buying new drives. 

 

I can play around with it and see what my best solution will be.  I've been wanting to run unRAID on top of ESXi, and the two additional controllers will allow me to do that.  Right now, I'm using the motherboard controllers in my array, so I don't have any controllers left over for ESXi after passing them through.  I plan on putting in a new dual-port NIC that I will pass through to unRAID and experiment with NIC teaming.

 

My unRAID server hosts all my media for the TVs in my house, as well as via Plex to several family members outside my house.  Hy HTPC's are my only sources on each of my TVs, so I need to maximize performance as much as possible.

Link to post

Would replacing the 32GB SSD with the new 4TB and doing the rebuild suit your test needs or would it be better with upgrading one of the 1.5TB drives? I've decided to remove the SSD as it's primary purpose is no longer being utilized.

 

I'm currently running a parity check on the last tests recommended values. It's at 46%, 1.7TB mark. Currently getting 84.7MB/sec

 

It'll be tomorrow at the earliest before I can do the rebuild but I can do the other request once the parity check completes in 7-8 hours.

Link to post

Would replacing the 32GB SSD with the new 4TB and doing the rebuild suit your test needs or would it be better with upgrading one of the 1.5TB drives?

 

Yes, halfway.  I think the SSD change is so significant that it completely changes your baseline.  If you did two upgrades, one replacing the SSD, and one replacing a 1.5TB drive, with stock vs. tuned values, that would be best.

Link to post
Yes, halfway.  I think the SSD change is so significant that it completely changes your baseline.  If you did two upgrades, one replacing the SSD, and one replacing a 1.5TB drive, with stock vs. tuned values, that would be best.

 

With upgrading the 32GB SSD, I would only need to let the parity rebuild run for less than 10 minutes to see how long it takes to rebuild the 32GB range.

 

If you really want to test the rebuild timings, create a script that'll kick off the parity rebuild, let it run to 32 GB, then abort the rebuild and repeat looping over the different values. I'm game with running it on my server - as long as one of my normal drives doesn't fail  :P

Link to post

Yes, halfway.  I think the SSD change is so significant that it completely changes your baseline.  If you did two upgrades, one replacing the SSD, and one replacing a 1.5TB drive, with stock vs. tuned values, that would be best.

 

With upgrading the 32GB SSD, I would only need to let the parity rebuild run for less than 10 minutes to see how long it takes to rebuild the 32GB range.

 

If you really want to test the rebuild timings, create a script that'll kick off the parity rebuild, let it run to 32 GB, then abort the rebuild and repeat looping over the different values. I'm game with running it on my server - as long as one of my normal drives doesn't fail  :P

 

I'm not sure I want to create a script for that, sounds like a inappropriate tool to put out there.

 

But testing it manually wouldn't be too hard.

 

Basically, set the three params, start a rebuild, cancel after 10 minutes measuring what position was reached.  Rinse and repeat with new params.  I did this dozens of times on a parity check when I first tested the md_sync_window.

 

The first time you could run with stock values (1280/768/384) and the second time with tuned values (288/128/128).  If the second run is looking good after 10 minutes, no need to cancel, just let it continue to the end.  Otherwise, cancel, go back to stock values, and rebuild again.

 

If there is any impact at all, it should be very noticeable.

Link to post

Come to think of it, I can use your script with a minor modification to do all the parity rebuild tests. You have the following which kicks off a parity check: /root/mdcmd check NOCORRECT

 

What is the command to kick off a parity rebuild?

Link to post

I was curious r.e. whether there'd be any change in the recommendations with the v2.2 script and v5.0 vs. my previous run with the v2.0 script and RC16c .. and WOW were there ever.

 

I'm not sure what the changes in the various RCs have been that impacted parity (I think the newer kernel had the most impact ~ RC15) ... but my system used to do parity checks in ~ 7:40 and is now up to 8:15 !!

 

I was at about 8:10 with RC16c (I did a LOT of tunables testing about 3 months or so ago, so had fairly good settings before running this script).  Ran the Tunables-Tester script, and adjusted my parameters to the B4B values, and was pleased to see my time drop back down to just under 8 hours (7:57).

 

Then upgraded to v5.0 ... and the time jumped up to 8:17 !!

 

Reran Tunables-Tester and the suggestions changed dramatically !!  Just for grins, I tried the very high UnT values .. and it still took 8:12

 

FWIW these were the Tunables-Tester results:

 

With v2 running on RC16c:  B4B:  3072/1408/1408      UnT:  4408/1984/1984

 

With v2.2 running on v5.0:  B4B:  1408/768/512          UnT:  5968/2688/2688

 

What I HAD been using (after all my prior testing):  2560/768/1024

 

I've just changed back to the settings I had originally used and am going to kick off yet-another parity test.  Details in 8 hrs or so  :)

 

Paul -- since both of our sysems have all 3TB WD Reds, I'm curious what kind of parity check times you're getting these days.  (with v5.0)

 

Link to post

Interesting point in the results from RockDawg's test.  It appears the "unthrottled" is the best found during pass 2 and ignored pass 1.

Correct, I don't compare pass 2 results to pass 1, since they run for different lengths of time.  Pass 2 results are expected to be slightly lower, as the test extends farther into the parity check, which gradually slows from beginning to end.  It is expected that pass 1 gets you into the right range, and pass 2 finds the best value in that range.

 

If I compared pass 2 to pass 1, the majority of the time the logic would probably pick the pass 1 result since it would probably have the faster time due to the shorter test length.

 

Unfortunately some servers are not testing well (he's not slow, he just doesn't test well...).  It's not so much about comparing pass 1 to pass 2, rather it's more about servers producing inconsistent results, which no amount of logic can power through.  Myself, I'm looking at RockDawg's results and have no freaking idea which values are good values...

 

Noted something else:  Look at the results of test #1 compared to the almost identical test #36:

 

1  |    1408    |    768    |    512    | 101.9 MB/s

36  |    1416    |    768    |    512    |  91.6 MB/s

 

Really strange.    The only difference (aside from an insignificant 8 strip difference in the total allowed) is the 4 min duration vs. 3 min

 

I have to wonder if this is related to the virtualization.

 

My point exactly, inconsistent results.  Test 36 should have been pretty close, but slightly below test 1.

 

Actually, if you look at the bigger picture (and RockDawg's server is not the first to show this behavior), the 1st test started off at a nice speed, and each subsequent test gets slower, until a lower threshold is reached, and then all results hover around that lower threshold.  I would hazard a guess that these md_* values are not actually affecting anything on RockDawg's server - he could run a 512 byte test 10 times in a row, and each subsequent test would be a little bit slower.

 

I think VM is highly suspect. 

 

zoggy had nearly identical behavior in his test results:

Tunables Report from  unRAID Tunables Tester v2.0 by Pauven

Test | num_stripes | write_limit | sync_window |   Speed 
--- FULLY AUTOMATIC TEST PASS 1 (Rough - 20 Sample Points @ 3min Duration)---
   1  |    1408     |     768     |     512     |  88.0 MB/s 
   2  |    1536     |     768     |     640     |  87.8 MB/s 
   3  |    1664     |     768     |     768     |  87.4 MB/s 
   4  |    1920     |     896     |     896     |  87.0 MB/s 
   5  |    2176     |    1024     |    1024     |  87.2 MB/s 
   6  |    2560     |    1152     |    1152     |  86.8 MB/s 
   7  |    2816     |    1280     |    1280     |  86.6 MB/s 
   8  |    3072     |    1408     |    1408     |  86.2 MB/s 
   9  |    3328     |    1536     |    1536     |  86.0 MB/s 
  10  |    3584     |    1664     |    1664     |  85.7 MB/s 
  11  |    3968     |    1792     |    1792     |  85.7 MB/s 
  12  |    4224     |    1920     |    1920     |  86.1 MB/s 
  13  |    4480     |    2048     |    2048     |  86.2 MB/s 
  14  |    4736     |    2176     |    2176     |  85.7 MB/s 
  15  |    5120     |    2304     |    2304     |  85.3 MB/s 
  16  |    5376     |    2432     |    2432     |  85.3 MB/s 
  17  |    5632     |    2560     |    2560     |  85.1 MB/s 
  18  |    5888     |    2688     |    2688     |  85.1 MB/s 
  19  |    6144     |    2816     |    2816     |  84.8 MB/s 
  20  |    6528     |    2944     |    2944     |  84.8 MB/s 
--- Targeting Fastest Result of md_sync_window 512 bytes for Medium Pass ---
--- FULLY AUTOMATIC TEST PASS 2 (Final - 16 Sample Points @ 4min Duration)---
  21  |    1288     |     768     |     392     |  84.9 MB/s 
  22  |    1296     |     768     |     400     |  84.8 MB/s 
  23  |    1304     |     768     |     408     |  84.7 MB/s 
  24  |    1312     |     768     |     416     |  84.7 MB/s 
  25  |    1320     |     768     |     424     |  84.4 MB/s 
  26  |    1328     |     768     |     432     |  84.7 MB/s 
  27  |    1336     |     768     |     440     |  84.7 MB/s 
  28  |    1344     |     768     |     448     |  84.4 MB/s 
  29  |    1360     |     768     |     456     |  84.6 MB/s 
  30  |    1368     |     768     |     464     |  84.7 MB/s 
  31  |    1376     |     768     |     472     |  84.3 MB/s 
  32  |    1384     |     768     |     480     |  84.5 MB/s 
  33  |    1392     |     768     |     488     |  84.7 MB/s 
  34  |    1400     |     768     |     496     |  84.6 MB/s 
  35  |    1408     |     768     |     504     |  84.5 MB/s 
  36  |    1416     |     768     |     512     |  84.7 MB/s 

 

Notice that the 512 byte test is both the fastest and one of the slowest!

 

I talked to zoggy the other day about his build, and I don't think he mentioned VM, but I didn't think to ask either.

 

-Paul

 

nope, no vm here.

Link to post

Come to think of it, I can use your script with a minor modification to do all the parity rebuild tests. You have the following which kicks off a parity check: /root/mdcmd check NOCORRECT

 

What is the command to kick off a parity rebuild?

 

I just looked in the md.c source code, and here are the commands I've found:

 

  • set (md_trace, md_num_stripes, md_write_limit, md_sync_window, invalidateslot, resync_start, resync_end, rderror, wrerror, spinup_group, rdev_size)
  • import
  • start
  • stop
  • check (with CORRECT or NOCORRECT options)
  • nocheck
  • clear
  • dump
  • spindown
  • spinup

 

I don't see a parameter that looks like it kicks off a rebuild.

 

I read through the source code, and I don't see a command line entry point for a rebuild.  It seems you might only be able to do this through the GUI.

 

It's possible that running a mdcmd check CORRECT, if exactly one disk is 'invalid', then it "might" start a rebuild, but that is purely a guess.  From looking at the source code, I don't think it connects.

 

Link to post

Paul -- since both of our sysems have all 3TB WD Reds, I'm curious what kind of parity check times you're getting these days.  (with v5.0)

 

I'll check again.  Balls to the Wall, I was at 7:35 on RC16c, and my Best Bang was 7:40.  I had no idea your times had climbed back up from sub 7:30's.

 

I've been saving my TunablesReport.txt reports, so I should be able to spot any differences.

 

I'm baffled at the differences you're seeing - nothing in Tom's change log would have made me think this would be affected, but maybe he's doing some under the hood tweaking that he doesn't post in the change logs.

 

-Paul

Link to post

Come to think of it, I can use your script with a minor modification to do all the parity rebuild tests. You have the following which kicks off a parity check: /root/mdcmd check NOCORRECT

 

What is the command to kick off a parity rebuild?

 

I just looked in the md.c source code, and here are the commands I've found:

 

  • set (md_trace, md_num_stripes, md_write_limit, md_sync_window, invalidateslot, resync_start, resync_end, rderror, wrerror, spinup_group, rdev_size)
  • import
  • start
  • stop
  • check (with CORRECT or NOCORRECT options)
  • nocheck
  • clear
  • dump
  • spindown
  • spinup

 

I don't see a parameter that looks like it kicks off a rebuild.

 

I read through the source code, and I don't see a command line entry point for a rebuild.  It seems you might only be able to do this through the GUI.

 

It's possible that running a mdcmd check CORRECT, if exactly one disk is 'invalid', then it "might" start a rebuild, but that is purely a guess.  From looking at the source code, I don't think it connects.

 

It is initiated with '/root/mdcmd check CORRECT'.  If the driver detects one invalid disk it begins rebuilding it, if all disks are valid it does a correcting check.

 

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.