unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

root@Tower:/boot/utt# screen -version
Screen version 4.06.01 (GNU) 10-Jul-17

 

I found this on my server in the NerdPack packages folder, so definitely the 64-bit version.  Since it is already downloaded by NerdPack, you could just install it from there.

 

\\<servername>\flash\config\plugins\NerdPack\packages\6.6\screen-4.6.1s-x86_64-1.txz

Edited by Pauven
Link to comment

I just checked in NerdPack too, and looks like I have an update available:

 

image.png.8131619a19e3bb35c4a69cc19d5392f4.png

 

The version/name is a bit odd.  On the slackware repository, the version is screen-4.6.2-x86_64-2.txz, but this one in NerdPack has an 's', 4.6.2s.  Not sure if that means anything special...

Link to comment
13 hours ago, wgstarks said:

Any thoughts on this test? 

The speeds seem artificially low.  My 3TB 5400 RPM constrained array can hit 140 MB/s, and your 4TB drives should be marginally faster.  While 130 MB/s is close, I think you have a bottleneck somewhere.

 

With 7 drives on your SAS 2008 controller, let's check and see if that could be the culprit.  7 * 130 * 1.36 (this is an easier version of the formula I detailed above) = 1237 MB/s going through your controller.  PCIe 1.0 x8 and PCIe 2.0 x4 both support 2000 MB/s, and PCIe 1.0 x4 supports 1000 MB/s.  None of that lines up with 1237 MB/s, so it doesn't seem like this is a PCIe bus related constraint.  That doesn't rule out the SAS 2008 controller, though - maybe it is just slow...

 

Perhaps you have something about your build that doesn't show up in the report.  Expanders? 

 

Maybe when using all of your SATA ports on your motherboard (sdb, sdc, sdd, sde) you are hitting some kind of bus limit?  4 * 130 * 1.36 = 707 MB/s, which again doesn't really seem like a common bus limit.

 

I think you should try @jbartlett's DiskSpeed testing tool.

 

Other thoughts:  You have one of those servers that doesn't seem to react to changing the Unraid disk tunables.  Except in extreme edge cases, you get basically the same speed no matter what.  On the repeated tests, most seem to be withing +/- 0.9 MB/s, which is a fairly large variation, and for that reason your fastest measured speed of 129.7 is essentially the same as anything else hitting 127+ MB/s. 

 

Also, on at least one repeated test (Pass 1_Low Test 2 @ Thresh 120 = 127.8, and Pass 2 Test 1 = 116.6), the speed variation was 11.2 MB/s, which is huge.  Perhaps you had some process/PC accessing the array during one of those, bringing down the score.  For that reason, I say pretty much every test result was an identical result, and you probably won't notice much of any difference between any values.

 

There's certainly no harm in using the Fastest values, as the memory utilization is so low there's no reason for you to chase more efficiency. 

 

Keep in mind if you use jbartlett's DiskSpeed test and find the bottleneck, and you make changes to fix it, you would want to rerun UTT to see if the Fastest settings change.

  • Upvote 1
Link to comment

Thanks.

 

I ran the test in safe mode with SMB/AFP disabled. Also disabled syslog server. Set mover to run monthly since there is no “disabled” setting for it. Not sure how to eliminate any other internal processes that might be running.

 

Dont have any expanders not shown in the report. The Dell H310 SAS controller is installed in an X8 slot so I don’t think that is limiting speed although it may be that the H310 is slow as you say. I haven’t seen any reports of this though. Same thing is true for the SuperMicro X10SLL-F motherboard.

 

I’ll give Diskspeed a shot, but if it’s really just slow components I don’t think any amount of tweaking will change that.

Link to comment
8 hours ago, Pauven said:

Keep in mind if you use jbartlett's DiskSpeed test and find the bottleneck, and you make changes to fix it, you would want to rerun UTT to see if the Fastest settings change. 

I had eight WD 6TB Red Pros on a SAS PCIe 2 controller and I had to take two of them off of it before I could read all drives at the same time at the same speed as each drive on it's own. My UTT scores changed a large amount after.

  • Upvote 1
Link to comment
2 hours ago, jbartlett said:

I had eight WD 6TB Red Pros on a SAS PCIe 2 controller and I had to take two of them off of it before I could read all drives at the same time at the same speed as each drive on it's own. My UTT scores changed a large amount after.

I might try moving one drive from the H310 to the motherboard. That would leave 6 on the H310 and 6 on the motherboard.

Link to comment
22 hours ago, wgstarks said:

The Dell H310 SAS controller is installed in an X8 slot so I don’t think that is limiting speed

 

12 hours ago, wgstarks said:

I might try moving one drive from the H310 to the motherboard. That would leave 6 on the H310 and 6 on the motherboard.

 

Depending upon your motherboard's design, and which slot you have the card installed, the card may be communicating with the CPU through the southbridge chipset (PCH), which might be the bottleneck.  Often the PCH has a smaller pipe to the CPU, which is shared with all southbridge devices, commonly including SATA ports.

 

If the southbridge connection is the limiting factor, then even moving a drive from the H310 to a motherboard SATA port might not make any difference if the motherboard SATA port is also going through the southbridge.

 

I followed the link you provided to your current Unraid server (hopefully it still is current), and downloaded the manual for your mainboard.  I see that there is one x16 slot (electrically x8), and two x8 slots (one x8 and the other electrically x4):

 

image.png.867b8c7d5bc9070ec7781c5a1af35952.png

 

It looks like the two electrically x8 slots both connect directly to the CPU, so as long as you are using either of those, I think you would be okay.  The electrically x4 slot, furthest from the CPU, connects to the PCH - you should not be using this one.

 

Looking at the system block diagram, I see that all 6 SATA ports are connected through the PCH.  If the PCH is the bottleneck, and if you have the H310 correctly installed in one of the two x8 PCIe slots connected to the CPU, then moving a drive from the H310 to the motherboard may actually further slow down speeds.  In that case, you may want to try the opposite, and move an array drive from the motherboard to the H310, so that you have 8 drives connected directly to the CPU, and only 4 drives connected through the PCH.

 

image.thumb.png.8b34ee39f3545e3a2e56abe93f73346a.png

 

Lastly, the PCH connects to the CPU via a DMI v2.0 x4 link, which is good for 2GB/s.  That should be more than sufficient for 4 array drives (I'm not counting your cache), but if you have the H310 installed in the PCH connected PCIe slot, then you have 11 drives going over this link. 

 

11 drives * 130 MB/s * 1.36 overhead = 1945 MB/s, which is suspiciously close to the 2000 MB/s limit of the DMI connection between the PCH and the CPU.

 

  • Upvote 1
Link to comment
2 hours ago, wgstarks said:

IIRC, there are 8 drives connected to the H310 which is mounted in the first x8 slot and 4 connected to the motherboard. This does not include cache which is an M.2 SSD mounted in the middle x8 slot. It does include 1 external drive mounted via UD and an estata cable.

Correction- There are 5 drives connected to the motherboard (including the esata) and 7 connected to the H310.

Link to comment
1 minute ago, jbartlett said:

This seems to highlight a recent thought of adding a system wide benchmark of the drives, reaching multiple controllers together and all together, with all combinations of controllers.

Actually, I do create a bus tree so I can test each level for a better view.

Link to comment
On 8/15/2019 at 7:08 PM, interwebtech said:

Ran the Long Test overnight with Docker disabled.

I have set my server to use the Fastest setting based on results. I was hoping for better throughput numbers. Do I need to get rid of all my shingled drives and run all 7200 RPM to get the scores I am seeing from others? Or is 89.2 MB/s the best I can expect?

LongSyncTestReport_2019_08_14_2118.txt 8.97 kB · 4 downloads

Ended up replacing 4 drives but one of them was likely the actual boat anchor on throughput... a retail boxed Seagate 6TB from 2014.  First swap result in the middle run from 19th, a 4TB HSGT 7200 RPM drive. Still scratching head why what should be a fast drive benches so poorly. The aforementioned 6TB was removed prior to last check. Still not the best but huge improvement from 24+ hours to 17 for parity check. 

All drives are now 8TB.

Date			Duration		Speed
2019-08-20, 03:06:36	16 hr, 43 min		133.0 MB/s
2019-08-19, 10:07:58	18 hr, 39 min, 26 sec	119.1 MB/s
2019-08-18, 15:05:02	1 day, 39 min, 11 sec	90.2 MB/s

Will be running the tunables Long Test again as soon as time permits.

Edited by interwebtech
Link to comment
2 hours ago, interwebtech said:

Will be running the tunables Long Test again as soon as time permits.

This might help a little, as the previous results are no longer valid.  I'm surprised you got the increase you got without retesting the tunables.

 

Also, your 16h43m run is not bad at all.  133 MB/s is the average speed, and sounds close to right for a 5400 RPM 8TB drive, but low for a 7200 RPM 8TB drive.  An 8TB 7200 RPM drive will provide over 200 MB/s at the beginning of the disk, gradually falling to around 90 MB/s at the end of the disk.  All HDD's do this.  Your average speed will be somewhere in the middle, i.e. around 160 MB/s for a 7200 RPM 8TB drive.

Link to comment
10 hours ago, JoeUnraidUser said:

If it helps, I ran UTT v4.1 and here are my results:

Wow, you got some really good speeds for having 4TB drives in the mix - I'm guess those are 7200 RPM units.

 

Looks like the repeated tests are +/- 2.6 MB/s, maybe more, so there's a lot of variance in your run to run results.  What that means is that, except for the handful of results in the 140's to low 150's, all of the results are essentially the same.  So almost any combo of values would work fine.

 

Also, the Unraid stock settings look marvelous on your server - I would use those, and in the process save yourself over half a gig of RAM.

  • Upvote 1
Link to comment

I've just downloaded that latest script to give it a run.  Tried following the instructions but I don't have a FULLAUTO option. I gets that's for the old script.  Where can I find the guidelines to how how should run this script.  I ran the short test already without any issues.

Link to comment

There's only a handful of options to choose from, the menu has been greatly simplified.

 

 

Short Test

Run this to see if your system appears to respond to changing the Unraid disk Tunables.  If your results look mostly flat, then go on with life and forget about this tool - your server doesn't need it.  Some servers behave the same no matter what tunables you use. 

 

But if you see dramatically different speeds from the Short Test, then that shows your server appears to react to changing the tunables, and one of the real tests below could be worth the time.  Sometimes you will even seem the outlines of a bell curve forming in the Short Test results, which is a very strong indicator that your server responds well to tuning.

 

This test only takes a few minutes, so you don't have to waste much time to see if your server responds to tuning.

 

Also, keep in mind that even if your server responds well to tuning, the fastest parameters might still be the Unraid stock values, so there's no guarantee that running the tests will discover values that make your server faster.

 

 

Normal Test

This is the quickest real test.  It does not test the nr_requests values, and it uses a 5 minute duration for each test.  Because the test adapts to how your HD controller responds to the tunables, it will optionally test some additional value ranges, so the run time varies from 8 to 10 hours.

 

Thorough Test

Same as the Normal Test, but includes the nr_requests tests, which add another 4 hours to the Normal Test duration.  So far we have found that once all the other tunables have been optimized (by the normal tests), that nr_requests default value of 128 is best, making the nr_requests tests basically a waste of time.  But there is always the possibility that your server might be different, so I make this optional if you want to check.

 

Long Test (Recommended)

This is exactly the same as the Normal Test, except each test duration is doubled from 5 minutes to 10 minutes.  That means the test takes twice as long.  Longer tests improve accuracy, making it easier to identify which settings work best.  For example, if the Normal Test had an accuracy of +/- 1.0 MB/s, then the Long Test might double that accuracy to +/- 0.5 MB/s or better.  Because the test duration is doubled, the total test time also doubles to 16-20 hours.

 

I recommend this test because it has the increased accuracy of the 10 minute duration, without the extra 8 hours for the nr_requests test that are probably a waste of time.

 

Xtra-Long Test

This is exactly the same as the Thorough Test, except each test duration is doubled from 5 minutes to 10 minutes, for the same reason as the Long Test.  Another way to think of this is that this is the Long Test plus the nr_requests tests.  Because the test duration is doubled, the nr_requests tests add 8 hours, bringing total test length up to the 24-28 hour range.

 

 

FYI on Test Accuracy

Test accuracy is determined by looking at tests that get repeated in successive passes, for example Pass 2 Test 25 is always a repeat of the test result chosen from Pass 1, and Pass 2 Test 1 is usually a repeat of another test in Pass 1 as well.  The fastest test result from Passes 1 & 2 also gets repeated in Pass 3.  Because the test points can vary by server, sometimes you will get several more repeated test points to compare to determine accuracy. 

 

By comparing the reported speeds from one pass to the others for the exact same tests, you can determine the accuracy.  The accuracy varies by server.  Some servers, like mine, produce an accuracy of +/- 0.1 MB/s every single time, it's incredibly consistent.  Other servers might be +/- 2.5 MB/s, while a few servers are +/- 10 MB/s or worse.  Note, if you are seeing large accuracy variances, that might mean that you have processes running that are access the array, reading or writing data, which essentially makes the test results invalid.

 

When I look at the results and make an accuracy determination, I usually use the worst result (biggest variance) and use that as the accuracy for the entire test.  So if the test chosen from Pass 1 was 140.5 MB/s, and the Pass 2 Test 25 was 140.7 MB/s, then that is an accuracy of +/- 0.2 MB/s.  But if another repeated test was 143.0 MB/s in one pass, and 142.0 MB/s in another pass, then that indicates an accuracy of +/- 1.0 MB/s, so I say the entire test is +/- 1.0 MB/s.

 

It takes time for servers to 'settle down', so to speak, and produce accurate results.  Modern hard drives have huge caches, HD controllers often have caches, all designed to improve short-term performance.  System activity may temporarily affect throughput.  The longer tests minimize these effect, improving accuracy.

 

Also, the longer tests just provide for better math.  For example, consider a 10 second test versus a 10 minute (600 seconds) test.  2000 MB moved in 10 seconds = 200 MB/s, and 2060 MB moved in 10 seconds = 206 MB/s.  120,000 MB moved in 600 seconds is also 200 MB/s, but 120,060 MB moved in 600 seconds is 200.1 MB/s.  In this example, the variance in both tests was just 60 MB, but the average speed accuracy increased from +/- 6.0 MB/s to +/- 0.1 MB/s, 60 times more accurate.  This helps illustrate why the Short Test, which uses a 10 second duration, is not accurate enough for usable results.

 

Understanding the accuracy of your results is important when trying to determine which result is fastest.  If your accuracy is +/- 1.0 MB/s, then for all intents and purposes, 162 MB/s is the same as 163 MB/s, and there's no reason to pick 163 over 162.

Edited by Pauven
  • Upvote 1
Link to comment
1 hour ago, Pauven said:

There's only a handful of options to choose from, the menu has been greatly simplified.

Thanks for that.  I went and ran the short test earlier and the results seemed to imply I should run the Long one.  It's now at sample point 17 so I'll see what it says in the morning.

 

FWIW, here are my short test results

 

--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 | 138 | 4096 | 2048 | 128 |  2000  | 115.2 


--- BASELINE TEST OF UNRAID DEFAULT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 |  43 | 1280 |  384 | 128 |   192  |  15.5 


 --- TEST PASS 1 (2 Min - 12 Sample Points @ 10sec Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 |  25 |  768 |  384 | 128 |   376  |  22.2 |   320  |  23.7 |   192  |  15.7
  2 |  51 | 1536 |  768 | 128 |   760  |  54.7 |   704  |  44.7 |   384  |  26.0
  3 | 103 | 3072 | 1536 | 128 |  1528  | 122.0 |  1472  |  92.9 |   768  |  48.3
  4 | 207 | 6144 | 3072 | 128 |  3064  | 129.7 |  3008  | 123.2 |  1536  |  97.2

 --- TEST PASS 1_HIGH (30 Sec - 3 Sample Points @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 | 414 |12288 | 6144 | 128 |  6136  | 127.9 |  6080  | 130.2 |  3072  | 130.7

 --- TEST PASS 1_VERYHIGH (30 Sec - 3 Sample Points @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 | 622 |18432 | 9216 | 128 |  9208  | 129.0 |  9152  | 128.7 |  4608  | 130.4

 

Link to comment
13 minutes ago, dalben said:

I went and ran the short test earlier and the results seemed to imply I should run the Long one.

 

Definitely!  I see a strong linear progression from 15 MB/s to 130 MB/s as the settings increase.  I've never seen a range this large, or a speed that slow using Unraid default values!  Fascinating! 

 

I'm very interested in seeing the Long results.  Please include the CSV file too, I'll probably chart this one for everyone.

Link to comment

Might be a good time to mention that I'm having some real bad concurrent array performance issues with 6.7 (thread in stable bug report forum) as are others.  It's what has got me hunting all over in an attempt to solve it. 

 

So my results might be impacted by that.  I also had to stop the test right now. With it running my wife couldn't watch a movie as it stuttering and stalling continuously. I'll run the long one when I know the server should be idle. 

Link to comment

Here's a short test report when I know the servers disk activity was very low.  Also added the aborted Long Test results in case that is of interest.  I'll run the Long when I know I have a 10 hour window when no one needs the server.

ShortSyncTestReport_2019_08_24_1630.csv ShortSyncTestReport_2019_08_24_1630.txt ShortSyncTestReport_2019_08_25_0649.csv ShortSyncTestReport_2019_08_25_0649.txt LongSyncTestReport_2019_08_24_1638.csv LongSyncTestReport_2019_08_24_1638.txt

Link to comment

Uhmmmm...

 

 --- TEST PASS 2 (10 Hrs - 49 Sample Points @ 10min Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 | 207 | 6144 | 3072 | 128 |  3064  | 148.0
  2 | 216 | 6400 | 3200 | 128 |  3192  | 176.9   <-- !!!!!!!!!!!!!!!!!!!!!!!!!!
  3 | 224 | 6656 | 3328 | 128 |  3320  | 146.6
  4 | 233 | 6912 | 3456 | 128 |  3448  | 148.5
  5 | 242 | 7168 | 3584 | 128 |  3576  | 148.3
  6 | 250 | 7424 | 3712 | 128 |  3704  | 148.3
  7 | 259 | 7680 | 3840 | 128 |  3832  | 148.8
  8 | 267 | 7936 | 3968 | 128 |  3960  | 148.6
  9 | 276 | 8192 | 4096 | 128 |  4088  | 146.9
 10 | 285 | 8448 | 4224 | 128 |  4216  | 149.0
 11 | 293 | 8704 | 4352 | 128 |  4344  | 148.3
 12 | 302 | 8960 | 4480 | 128 |  4472  | 149.1
 13 | 311 | 9216 | 4608 | 128 |  4600  | 108.6
 14 | 319 | 9472 | 4736 | 128 |  4728  | 148.5
 15 | 328 | 9728 | 4864 | 128 |  4856  | 145.8
 16 | 337 | 9984 | 4992 | 128 |  4984  | 149.0
 17 | 345 |10240 | 5120 | 128 |  5112  | 148.5

I think that has to be some kind of glitch, but I can't imagine how.  I've never seen a 30 MB/s jump on a specific setting combo like that.  Unless the whole time the wife was watching a movie, except for that one test.

Link to comment
3 hours ago, Pauven said:

Uhmmmm...

 


 --- TEST PASS 2 (10 Hrs - 49 Sample Points @ 10min Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 | 207 | 6144 | 3072 | 128 |  3064  | 148.0
  2 | 216 | 6400 | 3200 | 128 |  3192  | 176.9   <-- !!!!!!!!!!!!!!!!!!!!!!!!!!
  3 | 224 | 6656 | 3328 | 128 |  3320  | 146.6
  4 | 233 | 6912 | 3456 | 128 |  3448  | 148.5
  5 | 242 | 7168 | 3584 | 128 |  3576  | 148.3
  6 | 250 | 7424 | 3712 | 128 |  3704  | 148.3
  7 | 259 | 7680 | 3840 | 128 |  3832  | 148.8
  8 | 267 | 7936 | 3968 | 128 |  3960  | 148.6
  9 | 276 | 8192 | 4096 | 128 |  4088  | 146.9
 10 | 285 | 8448 | 4224 | 128 |  4216  | 149.0
 11 | 293 | 8704 | 4352 | 128 |  4344  | 148.3
 12 | 302 | 8960 | 4480 | 128 |  4472  | 149.1
 13 | 311 | 9216 | 4608 | 128 |  4600  | 108.6
 14 | 319 | 9472 | 4736 | 128 |  4728  | 148.5
 15 | 328 | 9728 | 4864 | 128 |  4856  | 145.8
 16 | 337 | 9984 | 4992 | 128 |  4984  | 149.0
 17 | 345 |10240 | 5120 | 128 |  5112  | 148.5

I think that has to be some kind of glitch, but I can't imagine how.  I've never seen a 30 MB/s jump on a specific setting combo like that.  Unless the whole time the wife was watching a movie, except for that one test.

I think there's a substantial computational advantage showing its face here.
6400 is 80^2 (power of two advantage for computing is strong)
3200 and 6400 - evenly divisible, exactly one-half - again, whole integers with no floating point precision, and it's exactly half - meaning the CPU can take shortcuts. I'm not 100% certain how the stripe size and window size interact, but those two mathematical advantages compared to the rest of the table could make a massive difference on a lower powered CPU.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.