Jump to content
Pauven

unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables

1065 posts in this topic Last Reply

Recommended Posts

Just FYI... I reverted to 6.6.7 last night.

 

Results of short test:

 

                   Unraid 6.x Tunables Tester v4.1 BETA 3 by Pauven

             Tunables Report produced Tue Aug 13 15:59:37 CDT 2019

                              Run on server: nas

                             Short Parity Sync Test


Current Values:  md_num_stripes=6144, md_sync_window=3072, md_sync_thresh=3064
                 Global nr_requests=128
                 Disk Specific nr_requests Values:
                    sdl=128, sdj=128, sdg=128, sde=128, sdn=128, sdm=128, 
                    sdp=128, sdr=128, sdk=128, sdf=128, sdi=128, sdh=128, 
                    sdq=128, sdo=128, 


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 | 351 | 6144 | 3072 | 128 |  3064  | 171.2 


--- BASELINE TEST OF UNRAID DEFAULT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 |  73 | 1280 |  384 | 128 |   192  |  62.5 


 --- TEST PASS 1 (2 Min - 12 Sample Points @ 10sec Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 |  43 |  768 |  384 | 128 |   376  |  74.1 |   320  |  62.1 |   192  |  59.9
  2 |  87 | 1536 |  768 | 128 |   760  | 117.4 |   704  | 127.2 |   384  | 111.0
  3 | 175 | 3072 | 1536 | 128 |  1528  | 133.7 |  1472  | 149.9 |   768  | 118.0
  4 | 351 | 6144 | 3072 | 128 |  3064  | 145.7 |  3008  | 145.7 |  1536  | 145.8

 --- END OF SHORT AUTO TEST FOR DETERMINING IF YOU SHOULD RUN THE REAL TEST ---

If the speeds changed with different values you should run a NORMAL/LONG test.
If speeds didn't change then adjusting Tunables likely won't help your system.

Completed: 0 Hrs 3 Min 15 Sec.


NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with Unraid,
      especially if you have any add-ons or plug-ins installed.


System Info:  nas
              Unraid version 6.6.7
                   md_num_stripes=6144
                   md_sync_window=3072
                   md_sync_thresh=3064
                   nr_requests=128 (Global Setting)
                   sbNumDisks=14
              CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
              RAM: 32GiB System Memory

Outputting free low memory information...

              total        used        free      shared  buff/cache   available
Mem:       32942816     4570920    27368236      674948     1003660    27190564
Low:       32942816     5574580    27368236
High:             0           0           0
Swap:             0           0           0


SCSI Host Controllers and Connected Drives
--------------------------------------------------

[0] scsi0	usb-storage 	
[0:0:0:0]	flash		sda	31.9GB	Reader SD MS

[1] scsi1	ata_piix 	

[2] scsi2	ata_piix 	

[3] scsi3	vmw_pvscsi 	PVSCSI SCSI Controller

[4] scsi4	vmw_pvscsi 	PVSCSI SCSI Controller

[5] scsi5	mpt3sas 	SAS3416 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
[5:0:0:0]	disk3		sde	8.00TB	HGST HDN728080AL
[5:0:2:0]	disk9		sdf	8.00TB	HGST HDN728080AL
[5:0:3:0]	disk2		sdg	8.00TB	HGST HDN728080AL
[5:0:4:0]	disk11		sdh	8.00TB	HGST HDN728080AL
[5:0:5:0]	disk10		sdi	8.00TB	HGST HDN728080AL
[5:0:6:0]	disk1		sdj	8.00TB	HGST HDN728080AL
[5:0:7:0]	disk8		sdk	8.00TB	HGST HDN728080AL
[5:0:8:0]	parity		sdl	8.00TB	HGST HDN728080AL
[5:0:9:0]	disk5		sdm	8.00TB	HGST HDN728080AL
[5:0:10:0]	disk4		sdn	8.00TB	HGST HDN728080AL
[5:0:11:0]	parity2		sdo	8.00TB	HGST HDN728080AL
[5:0:12:0]	disk6		sdp	8.00TB	HGST HDN728080AL
[5:0:13:0]	disk12		sdq	8.00TB	HGST HDN728080AL
[5:0:14:0]	disk7		sdr	8.00TB	HGST HDN728080AL

[N0] scsiN0	nvme0 	NVMe
[N:0:4:1]	cache		nvme0n1	512GB	Samsung SSD 970 


                      *** END OF REPORT ***

 

Share this post


Link to post

That seems to have left off some tests.  I course, I already closed the window. I will go run it again

 

Share this post


Link to post

Nevermind, I see what it did.  The results arent accurate anyway as I have several things using the array at the moment.

 

                   Unraid 6.x Tunables Tester v4.1 BETA 3 by Pauven


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 10sec Duration)---

Test  1 - window=3072, thresh=3064: 1.689 GB in 10.039 sec  = 172.3 MB/s        

--- BASELINE TEST OF UNRAID DEFAULT VALUES (1 Sample Point @ 10sec Duration)---

Setting all drives to nr_requests=128 for the following tests
Test  1 - window= 384, thresh= 192: 1.359 GB in 10.041 sec  = 138.6 MB/s        

 --- TEST PASS 1 (2 Min - 12 Sample Points @ 10sec Duration) ---

Setting all drives to nr_requests=128 for the following tests
Test  1a - window= 384, thresh= 376: 1.071 GB in 10.042 sec  = 109.2 MB/s       
Test  1b - window= 384, thresh= 320: 1.474 GB in 10.038 sec  = 150.4 MB/s       
Test  1c - window= 384, thresh= 192: 0.977 GB in 10.038 sec  =  99.7 MB/s       
Test  2a - window= 768, thresh= 760: 1.685 GB in 10.039 sec  = 171.9 MB/s       
Test  2b - window= 768, thresh= 704: 1.668 GB in 10.041 sec  = 170.1 MB/s       
Test  2c - window= 768, thresh= 384: 1.638 GB in 10.041 sec  = 167.1 MB/s       
Test  3a - window=1536, thresh=1528: 1.693 GB in 10.043 sec  = 172.6 MB/s       
Test  3b - window=1536, thresh=1472: 1.527 GB in 10.043 sec  = 155.7 MB/s       
Test  3c - window=1536, thresh= 768: 1.342 GB in 10.042 sec  = 136.9 MB/s       
Test  4a - window=3072, thresh=3064: 1.476 GB in 10.043 sec  = 150.5 MB/s       
Test  4b - window=3072, thresh=3008: 1.365 GB in 10.042 sec  = 139.2 MB/s       
Test  4c - window=3072, thresh=1536: 1.415 GB in 10.038 sec  = 144.3 MB/s       

 --- END OF SHORT AUTO TEST FOR DETERMINING IF YOU SHOULD RUN THE REAL TEST ---

If the speeds changed with different values you should run a NORMAL/LONG test.
If speeds didn't change then adjusting Tunables likely won't help your system.

Completed: 0 Hrs 2 Min 34 Sec.

Results have been written to ShortSyncTestReport_2019_08_13_1604.txt
Show ShortSyncTestReport_2019_08_13_1604.txt now? (Y to show):

 

Share this post


Link to post
7 minutes ago, StevenD said:

Just FYI... I reverted to 6.6.7 last night.

Why?

 

I'm still on 6.6.6 because I've seen too many issues reported in the 6.6.7 and 6.7.x branches that just never seemed to get any solutions.

 

Looks like those two drives still aren't showing.  I'll test again on my side.  I got it working with Xaero's data, and assumed it would fix yours too.

Share this post


Link to post
7 minutes ago, StevenD said:

Nevermind, I see what it did.  The results arent accurate anyway as I have several things using the array at the moment. 

Right.  The Short test omits Passes 2 & 3, to make it quicker, and never makes any recommendations - primarily because the 10 second tests are way too quick to be accurate and you get a lot of fake numbers.

 

For some users, their server responds the same no matter what tunables are used.  That's the point of the Short test, to save them 8+ hours of running the longer tests if it won't help them.

Share this post


Link to post
1 hour ago, StevenD said:

If it matters, those two "missing" disks are mounted via Unassigned Devices and they are not part of the array.

My mistake, looks like you were right.

 

I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices.  That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report.

 

I'm a little conflicted on this.  On the one hand, I wanted the report to be a complete picture of all controllers and attached drives, but on the other hand I guess having it only display array devices is nice too, since these are the only drives being tested and tuned.

 

I don't think I would be able to include non-array drives without a significant rewrite of this report.  So.... no. Not gonna happen.

Share this post


Link to post
8 minutes ago, Pauven said:

My mistake, looks like you were right.

 

I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices.  That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report.

 

I'm a little conflicted on this.  On the one hand, I wanted the report to be a complete picture of all controllers and attached drives, but on the other hand I guess having it only display array devices is nice too, since these are the only drives being tested and tuned.

 

I don't think I would be able to include non-array drives without a significant rewrite of this report.  So.... no. Not gonna happen.

 

I see no reason to include non-array devices. 

Share this post


Link to post
28 minutes ago, Pauven said:

Why?

 

I'm still on 6.6.6 because I've seen too many issues reported in the 6.6.7 and 6.7.x branches that just never seemed to get any solutions.

 

Looks like those two drives still aren't showing.  I'll test again on my side.  I got it working with Xaero's data, and assumed it would fix yours too.

 

Tons of SMB issues. On 6.6.7, I can write to my cache at a steady 1GB/s. On 6.7.x, it fluctuates between 1GB/s and ZERO.  It literally pauses during transfers. 

Share this post


Link to post
58 minutes ago, jbartlett said:

Information attached.

Thanks @jbartlett!

 

Any chance your two NVMe drives are non-array devices?

Share this post


Link to post
31 minutes ago, Pauven said:

I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices.  That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report.

Small correction on what I wrote here.  The mdcmd status output only has drives 0-29, which is predefined by Unraid to Parity and Data disks only.  54 is the flash drive, and 30 & 31 are cache drives (I'm sure there's other predefined assignments, but that is all I've mapped out).

 

So I was getting myself confused as to how I was getting the flash and cache drives to show in the report, since they are not in the mdcmd status output.

 

I finally realized that I am using both mdcmd status and the /var/local/emhttp/disks.ini file to build the DiskName2Num lookup.  Looks like /var/local/emhttp/disks.ini has all array drives, up to 54, so it includes the flash and cache.  (yes, that means I have an unnecessary, redundant operation using the mdcmd output to build the DiskName2Num lookup, but it doesn't hurt anything)

 

Ultimately the story stays the same - non-array drives aren't in /var/local/emhttp/disks.ini either, so they still don't get in the report.

Share this post


Link to post
52 minutes ago, StevenD said:

 

Tons of SMB issues. On 6.6.7, I can write to my cache at a steady 1GB/s. On 6.7.x, it fluctuates between 1GB/s and ZERO.  It literally pauses during transfers.  

I was starting to feel a bit guilty for still rock'n the beastly 6.6.6, especially while trying to trouble-shoot all these storage report issues for users running 6.7.x.  Now I feel a bit vindicated for sticking with Damienraid, and happy I avoided all that SMB/SQLite nonsense.  Hopefully my server hasn't sold its circuits to Beezlebub and won't be stuck on 6.6.6 forever in a journey to the bottomless pit...

 

Perhaps I need to rename my server from Tower.  Abaddon... Apollyon... Beelzebub... Belial... Dragon... I know, Leviathan!

Share this post


Link to post
51 minutes ago, Pauven said:

Thanks @jbartlett!

 

Any chance your two NVMe drives are non-array devices?

Bingo! I didn't make the connection until reading the posts just prior to this.

Share this post


Link to post
1 minute ago, jbartlett said:

Bingo! I didn't make the connection until reading the posts just prior to this.

Thanks for confirming.  I think UTT v4.1 BETA 3 is ready to make the jump to final.

Share this post


Link to post
38 minutes ago, Pauven said:

I was starting to feel a bit guilty for still rock'n the beastly 6.6.6, especially while trying to trouble-shoot all these storage report issues for users running 6.7.x.  Now I feel a bit vindicated for sticking with Damienraid, and happy I avoided all that SMB/SQLite nonsense.  Hopefully my server hasn't sold its circuits to Beezlebub and won't be stuck on 6.6.6 forever in a journey to the bottomless pit...

 

Perhaps I need to rename my server from Tower.  Abaddon... Apollyon... Beelzebub... Belial... Dragon... I know, Leviathan!

 

I believe the only people who have experienced corruption have appdata on their array. Mine has always been on cache. 

Share this post


Link to post


SCSI Host Controllers and Connected Drives
--------------------------------------------------

[0] scsi0       usb-storage 
[0:0:0:0]       flash           sda     62.7GB  Extreme

[1] scsi1       megaraid_sas    MegaRAID SAS 2008 [Falcon]
[1:0:11:0]      disk13          sdb     8.00TB  WDC WD80EFAX-68L
[1:0:12:0]      disk5           sdc     8.00TB  WDC WD80EFAX-68L
[1:0:13:0]      disk7           sdd     8.00TB  WDC WD80EFAX-68L
[1:0:14:0]      disk2           sde     8.00TB  WDC WD80EFAX-68L
[1:0:15:0]      disk3           sdf     8.00TB  WDC WD80EFAX-68L
[1:0:16:0]      disk4           sdg     8.00TB  WDC WD80EFAX-68L
[1:0:17:0]      disk10          sdh     8.00TB  WDC WD80EFAX-68L
[1:0:18:0]      disk21          sdi     8.00TB  WDC WD80EFAX-68L
[1:0:19:0]      disk8           sdj     8.00TB  WDC WD80EFAX-68L
[1:0:20:0]      disk12          sdk     8.00TB  WDC WD80EFAX-68L
[1:0:21:0]      disk11          sdl     8.00TB  WDC WD80EFAX-68L
[1:0:22:0]      disk15          sdm     8.00TB  WDC WD80EFAX-68L
[1:0:23:0]      disk16          sdn     8.00TB  WDC WD80EFAX-68L
[1:0:24:0]      disk19          sdo     8.00TB  WDC WD80EFAX-68L
[1:0:25:0]      disk22          sdp     8.00TB  WDC WD80EMAZ-00W
[1:0:26:0]      disk17          sdq     8.00TB  WDC WD80EFAX-68L
[1:0:27:0]      disk18          sdr     8.00TB  WDC WD80EFAX-68L
[1:0:28:0]      disk20          sds     8.00TB  WDC WD80EFAX-68L
[1:0:29:0]      disk6           sdt     8.00TB  WDC WD80EFAX-68L
[1:0:30:0]      disk9           sdu     8.00TB  WDC WD80EFAX-68L
[1:0:31:0]      disk14          sdv     8.00TB  WDC WD80EFAX-68L
[1:0:32:0]      disk1           sdw     8.00TB  WDC WD80EFAX-68L
[1:0:33:0]      parity2         sdx     8.00TB  WDC WD80EMAZ-00W
[1:0:34:0]      parity          sdy     8.00TB  WDC WD80EMAZ-00W

[N0] scsiN0     nvme0   NVMe
[N:0:1:1]       cache           nvme0n1 1.02TB  INTEL SSDPEKNW01

[N1] scsiN1     nvme1   NVMe
[N:1:1:1]       cache2          nvme1n1 1.02TB  INTEL SSDPEKNW01

Results from B3 look good!

Share this post


Link to post
11 hours ago, Xaero said:

Results from B3 look good!

Fantastic!  That settles it then, I'll release UTT v4.1 final today.

 

 

5 hours ago, DanielCoffey said:

Version 4.1 Beta 2 Xtra Long Test completed...

Those results look perfect to me.  Proof that, even as good as Unraid v6.x performs with stock settings on most servers, some servers still need tuning.  Going from 141 MB/s stock to 164 MB/s tuned nets you a nice 16% bump in peak performance.

 

I also find the Thriftiest settings very interesting.  Only 22 MB of RAM consumed (16 MB less than stock Unraid), yet a solid 15 MB/s (11%) gain over stock performance.

 

The consistency of your results for the repeated tests is +/- 0.1 MB/s, so you can trust the report accuracy on this server.

 

I really appreciate you doing the Extra Long test.  As I expected, the extra nr_requests tests only provided slower speeds once the other settings were tuned.  I'm still curious if there will be a server out there that responds well to lower nr_requests values once tuned, but it seems less and less likely.

 

Personally, I'd probably go with the Fastest values on your server.  The Recommended values only save you 122 MB over the Fastest, and the Fastest are only consuming 366 MB.  If you had a lot more drives, the memory consumption would go up proportionally and the lower Recommended values to save RAM would make more sense then.

Share this post


Link to post
20 minutes ago, Pauven said:

Fantastic!  That settles it then, I'll release UTT v4.1 final today.

Sounds good.  Please provide some basic instructions on how to run the test(s) and how to interpret and use them!   It is probably obvious to those who have assisted you in the development of this tool but the rest of us could use some guidance! 

Share this post


Link to post

The 4.0 test that I downloaded off the first page yesterday at 12:17 PM EST. On my 6.6.7 Unraid is returning that I should be using a negative md_sync_thresh " md_sync_thresh: -56" is that even possible to use a negative md_sync_thresh? Or does the script need some sort of code added to never let it test down into negative values? The test also indicated that the setting will consume 0 MB which I thought was confusing too. Thought it seemed strange so before I updated my unraid to these settings I thought I better double check. 

 

If it makes any difference I do NOT have parity disks in my array.

 

Example: 

The Fastest settings tested give a peak speed of 143.7 MB/s
     md_sync_window: 8          md_num_stripes: 16
     md_sync_thresh: -56             nr_requests: 128
This will consume 0 MB (38 MB less than your current utilization of 38 MB)

 

Full report attached.

 

 

NormalSyncTestReport_2019_08_13_1334.txt NormalSyncTestReport_2019_08_13_1334.csv

Share this post


Link to post
Posted (edited)
1 hour ago, ispaydeu said:

The 4.0 test that I downloaded off the first page yesterday at 12:17 PM EST. On my 6.6.7 Unraid is returning that I should be using a negative md_sync_thresh " md_sync_thresh: -56" is that even possible to use a negative md_sync_thresh? Or does the script need some sort of code added to never let it test down into negative values? The test also indicated that the setting will consume 0 MB which I thought was confusing too. Thought it seemed strange so before I updated my unraid to these settings I thought I better double check. 

 

If it makes any difference I do NOT have parity disks in my array.

 

Example: 

The Fastest settings tested give a peak speed of 143.7 MB/s
     md_sync_window: 8          md_num_stripes: 16
     md_sync_thresh: -56             nr_requests: 128
This will consume 0 MB (38 MB less than your current utilization of 38 MB)

 

Wow.  I've said it before and I'll say it again, every server is unique, some in very surprising ways.

 

I scanned through your results, and for repeated tests I see fairly large variances of up to +/- 2.3 MB/s, so keep that in mind when comparing results.  The Long test, with a 10 minute duration for each test, should provide more accurate results.

 

Regarding consuming 0 MB, it's actually not 0.  I'm rounding to the nearest MB, so anything under 0.5 MB would round down to 0 MB.  Here's the formula and your actual result:

(( ( md_num_stripes * (2640 + (4096 * sbNumDisks)) ) / 1048576 )) = RAM Consumed (In Megabytes)

With your values:
(( ( 16 * (2640 + (4096 * 7)) ) / 1048576 )) = 0.477783203 MB

*NOTE:  I've just added a new function to UTT v4.1 to show memory used in KB when it rounds down to 0 MB.

 

Regarding the negative md_sync_thresh values, I had to double-check the code to see if UTT was really setting negative values, and it is.

 

While UTT is setting negative md_sync_thresh values, I'm not sure if Unraid is overriding the values when they are below a certain threshold.  While I know how to read the currently 'configured' value, I don't know how to query the currently 'set' value.  Does anyone know how to do this?

 

I did go into the Unraid Disk settings, and manually set a negative value and applied it, and Unraid saved it!

 

image.png.c971ad1ef50beafdc0672b57df5f2af5.png

 

So best I can tell, the UTT script is setting negative md_sync_thresh values, Unraid is accepting them, and your server is responding better with them.

 

Perhaps @limetech can share some insight.

 

Paul

Edited by Pauven

Share this post


Link to post
Posted (edited)

I just posted UTT v4.1 final, in the first post.

 

3 hours ago, Frank1940 said:

Please provide some basic instructions on how to run the test(s) and how to interpret and use them!   It is probably obvious to those who have assisted you in the development of this tool but the rest of us could use some guidance! 

Everything you need should be in the first two posts. 

 

Perhaps @SpaceInvaderOne could do one of his great videos on using UTT...  ;)

 

Edited by Pauven

Share this post


Link to post
3 hours ago, ispaydeu said:

If it makes any difference I do NOT have parity disks in my array.

Sorry I missed this comment earlier.  I'm not sure what to make of this.  UTT performs tests of the Unraid Disk Tunables by running dozens or hundreds of non-correcting parity checks.  But if you don't have parity disks... then how in the world are you even running UTT?

 

I don't know if you can trust any of the results - I don't even know what the results mean anymore.  If you don't have parity disks, then you shouldn't be able to check parity, and you shouldn't be able to use this tool to check parity check speeds with different tunables configured.

 

That also might explain why negative md_sync_thresh values were responding well on your machine.

 

Is there even a [CHECK] Parity button on your Unraid Main screen?

Share this post


Link to post
2 hours ago, Pauven said:

Is there even a [CHECK] Parity button on your Unraid Main screen?

My backup server has no Parity drive. The Check button then does a Read check of all the array drives looking for errors.

Share this post


Link to post

I started a new test today with 4.1. Rebooted my server into safe mode as was recommended. When I started the script I got a notice that I should be running screen, but in safe mode nerdpack plugin is disabled so screen isn’t available. How are people running screen in safe mode?

Share this post


Link to post
1 hour ago, wgstarks said:

I started a new test today with 4.1. Rebooted my server into safe mode as was recommended. When I started the script I got a notice that I should be running screen, but in safe mode nerdpack plugin is disabled so screen isn’t available. How are people running screen in safe mode?

LOL!  So true!  Hadn't thought of that...

 

You really only need to use safe mode if you have a ton of stuff that's just too hard to disable individually. 

 

Instead, just make sure you stop all VM's and Dockers, plus any plugins that would be accessing your array disks.  I haven't noticed any issues from the CacheDirs plug-in, since once it is running it's mainly pinging RAM to prevent disks from spinning up, but you can always stop that one too just to be safe.

 

Other alternatives: 

 

You can run directly on your server's console instead of remote access, completely eliminating the need for screen.

 

Using screen is optional, though recommended when running remote.  If you have confidence that your network connection is solid, that your PC won't sleep, shutdown, or randomly update and reboot itself during the test, and that power brownouts/blackouts won't disrupt your connection, then screen really isn't needed.  Screen is like insurance - many people get by without it.

 

Though I am also curious - how can you run screen in safe mode?

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.