unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

Just spotted a possible improvement for the script...

 

Prior to running the new script, I reset my Disk Settings to the default values. When I started the script it first ran an initial baseline of current values and then it did a baseline test of unraid default values. After that it went into Test Pass 1.

 

Given that the current values were actually set to unraid default, might it be useful for the script to notice this and skip one of those two tests?

                   Unraid 6.x Tunables Tester v4.0 by Pauven


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 10min Duration)---

Test  1 - window= 384, thresh= 192: 83.778 GB in 600.150 sec = 142.9 MB/s

--- BASELINE TEST OF UNRAID DEFAULT VALUES (1 Sample Point @ 10min Duration)---

Setting all drives to nr_requests=128 for the following tests
Test  1 - window= 384, thresh= 192: 83.838 GB in 600.093 sec = 143.1 MB/s

 

Link to comment
15 minutes ago, DanielCoffey said:

Given that the current values were actually set to unraid default, might it be useful for the script to notice this and skip one of those two tests?

 

I think that would make be a negligible improvement.  Considering the test you're running is somewhere in the neighborhood of 16-28 hours, reducing run time by 10 minutes isn't going to make much of a difference.

 

Besides, another way of looking at this is that by running the exact same test back-to-back, you get an idea of the accuracy of the tests.  You've already established that you have an accuracy no greater than +/- 0.2 MB/s.  There's a few places in the script that the exact same test gets rerun (i.e. fastest from Pass 1 = Test 25 from Pass 2, and the fastest from Pass 2 will be repeated once in Pass 3). 

 

There's value to being able to see how consistent the results are when running the exact same test more than once, as that's the only way you can tell what is just noise and what is a true speed difference.  Otherwise you might put too much importance in a 0.2 MB/s speed increase from one combo to another.

Link to comment

Well the test completed successfully, thank you.

 

My system has eight WD 8TB Red 5400rpm drives in dual parity, five data and one unallocated. The default unraid speed reported was 142.1, the peak test reported by the script was 162.5 and the 99% result was 161.6.

 

The best result the v5.x script could come up with was around 156.0 so the new script definitely explores new areas.

LongSyncTestReport_2019_08_05_1907.txt

Link to comment
1 hour ago, DanielCoffey said:

Well the test completed successfully, thank you.

 

My system has eight WD 8TB Red 5400rpm drives in dual parity, five data and one unallocated. The default unraid speed reported was 142.1, the peak test reported by the script was 162.5 and the 99% result was 161.6.

 

The best result the v5.x script could come up with was around 156.0 so the new script definitely explores new areas. 

LongSyncTestReport_2019_08_05_1907.txt 8.47 kB · 2 downloads

 

Excellent!

 

The previous UTT v2.2 script didn't test md_sync_thresh at all, it just left it at the current value (probably 192) for the entire test.  UTT v4.0 for Unraid 6.x replaces the old md_write_limit tests with md_sync_thresh tests.  As you can see from your results, it makes a big difference in peak speeds.

 

I just did some napkin math, and by my estimate bumping up peak speeds from 156 to 162 MB/s might drop your parity check time by around 3 minutes.  The jump from 143 MB/s stock settings is much larger, probably a 30 minute speedup.  I look forward to you posting your new parity check times.

 

I think the Thrifty settings are very interesting.  Where stock Unraid settings only net you around 143 MB/s, the Thrifty settings get you over 156 MB/s with lower memory utilization than stock!

The Thriftiest settings (95% of Fastest) give a peak speed of 156.4 MB/s
     md_sync_window: 384          md_num_stripes: 768
     md_sync_thresh: 376             nr_requests: 128
This will consume 22 MB (16 MB less than your current utilization of 38 MB)

 

I do see a bug in the report output, at the end where the drives are listed.  The sdb drive (7:0:7:0) should have been listed at the very bottom, but somehow got listed at the top.  And drives sdf and sdg got listed twice.  None of this affects the test, it's just extra info for the report.  It just bugs me that I thought I had that sorting worked out.

SCSI Host Controllers and Connected Drives
--------------------------------------------------

[0] scsi0	usb-storage -	
[7:0:7:0]	parity		sdb	8.00TB	WDC WD80EFZX-68U

[1] scsi1	ahci -	

[2] scsi2	ahci -	

[3] scsi3	ahci -	
[7:0:4:0]	disk3		sdf	8.00TB	WDC WD80EFZX-68U

[4] scsi4	ahci -	
[7:0:5:0]	disk4		sdg	8.00TB	WDC WD80EFZX-68U

[5] scsi5	ahci -	

[6] scsi6	ahci -	

[7] scsi7	mpt2sas -	SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
					
[7:0:1:0]	parity2		sdc	8.00TB	WDC WD80EFZX-68U
[7:0:2:0]	disk1		sdd	8.00TB	WDC WD80EFZX-68U
[7:0:3:0]	disk2		sde	8.00TB	WDC WD80EFZX-68U
[7:0:4:0]	disk3		sdf	8.00TB	WDC WD80EFZX-68U
[7:0:5:0]	disk4		sdg	8.00TB	WDC WD80EFZX-68U
[7:0:6:0]	disk5		sdh	8.00TB	WDC WD80EFZX-68U

 

Thanks for sharing!

 

Paul

 

Link to comment
2 hours ago, Pauven said:

I do see a bug in the report output, at the end where the drives are listed.  The sdb drive (7:0:7:0) should have been listed at the very bottom, but somehow got listed at the top.  And drives sdf and sdg got listed twice.  None of this affects the test, it's just extra info for the report.  It just bugs me that I thought I had that sorting worked out.

I'll take a look at the sorting and filtering for those drives at the end tonight when I get off work. I have an idea for a good way to process and organize that, but need to see how the information is pulled and whether or not my idea will work. Would you rather them be ordered by Bus position or devfs node (sdX)?

Also line 516 - you have "[TOWER]" hard coded. /etc/hostname has the server's name stored in it. 
I could nitpick over giant blocks of echos but it really is negligible at the end of the day lol.

Edited by Xaero
Link to comment

I just noticed something else in your results.

 

For the most part, 162.4 MB/s is the same as 162.5 MB/s, you're not going to notice the difference.  But the script searches for the highest value, and so it hones in on 162.5.

 

In Pass 2, it hit 162.5 at both md_sync_window 6400 (Test 3) and 7296 (Test 10) (plus several other even higher md_sync_window values).  It used the lowest md_sync_window that hit the peak speed in Pass 3, so 6400.

 

In Pass 3, md_sync_window 6400 was retested 16 times with different md_sync_thresh values, and the highest peak speed reached was 162.4.  Since the exact same values from Pass 2 Test 3 were retested in Pass 3 Test 1r, the new 162.4 result ended up replacing 162.5 in the results table.

 

Then, in the very final step of the interface, all of the results were scanned again to find the fastest result.  Since the 6400 result was now 162.4 (Pass 3 replaced the Pass 2 value), the fastest became 7296.  This is a side effect of how the results are stored and that the script does retest a few combos more than once.  While I've thought about this issue, I've never figured out a good solution.

 

Long story short, md_sync_window 6144 @ 162.4 MB/s is every bit as fast as 7296 @ 162.5 MB/s, and it has a significantly lower memory utilization. 

 

In fact, you can probably go even lower than 6144, as it looks like your server hits peak speeds between 3072 and 6144, but the script didn't test those because 9216 happened to get lucky and spit out a 162.5, so the script targeted the highest range.

 

  • Upvote 1
Link to comment
6 minutes ago, Xaero said:

I'll take a look at the sorting and filtering for those drives at the end tonight when I get off work. I have an idea for a good way to process and organize that, but need to see how the information is pulled and whether or not my idea will work. Would you rather them be ordered by Bus position or devfs node (sdX)?

 

Thanks!  Bus position.  I want to show which drives are connected to each controller.  For example, here's mine:

 

SCSI Host Controllers and Connected Drives
--------------------------------------------------

[0] scsi0	usb-storage -	
[0:0:0:0]	flash		sda	4.00GB	Patriot Memory

[1] scsi1	ahci -	

[2] scsi2	ahci -	

[3] scsi3	ahci -	

[4] scsi4	ahci -	

[5] scsi5	ahci -	

[6] scsi6	ahci -	

[7] scsi7	ahci -	

[8] scsi8	ahci -	

[9] scsi9	ahci -	

[10] scsi10	ahci -	

[11] scsi11	ahci -	

[12] scsi12	mvsas -	HighPoint Technologies, Inc.
[12:0:0:0]	disk17		sdb	3.00TB	WDC WD30EFRX-68A
[12:0:1:0]	disk18		sdc	3.00TB	WDC WD30EFRX-68A
[12:0:2:0]	disk19		sdd	3.00TB	WDC WD30EFRX-68E
[12:0:3:0]	disk20		sde	3.00TB	WDC WD30EFRX-68E
[12:0:4:0]	parity2		sdf	8.00TB	HGST HUH728080AL
[12:0:5:0]	parity		sdg	8.00TB	HGST HUH728080AL

[13] scsi13	mvsas -	HighPoint Technologies, Inc.
[13:0:0:0]	disk1		sdh	8.00TB	HGST HUH728080AL
[13:0:1:0]	disk2		sdi	3.00TB	WDC WD30EFRX-68A
[13:0:2:0]	disk3		sdj	3.00TB	WDC WD30EFRX-68E
[13:0:3:0]	disk4		sdk	3.00TB	WDC WD30EFRX-68A
[13:0:4:0]	disk5		sdl	3.00TB	WDC WD30EFRX-68A
[13:0:5:0]	disk6		sdm	3.00TB	WDC WD30EFRX-68A
[13:0:6:0]	disk7		sdn	3.00TB	WDC WD30EFRX-68A
[13:0:7:0]	disk8		sdo	3.00TB	WDC WD30EFRX-68A

[14] scsi14	mvsas -	HighPoint Technologies, Inc.
[14:0:0:0]	disk9		sdp	3.00TB	WDC WD30EFRX-68A
[14:0:1:0]	disk10		sdq	3.00TB	WDC WD30EFRX-68A
[14:0:2:0]	disk11		sdr	3.00TB	WDC WD30EFRX-68A
[14:0:3:0]	disk12		sds	3.00TB	WDC WD30EFRX-68A
[14:0:4:0]	disk13		sdt	3.00TB	WDC WD30EFRX-68A
[14:0:5:0]	disk14		sdu	3.00TB	WDC WD30EFRX-68E
[14:0:6:0]	disk15		sdv	4.00TB	ST4000VN000-1H41
[14:0:7:0]	disk16		sdw	4.00TB	ST4000VN000-1H41

 

Link to comment
9 minutes ago, Pauven said:

I just noticed something else in your results.

 

For the most part, 162.4 MB/s is the same as 162.5 MB/s, you're not going to notice the difference.  But the script searches for the highest value, and so it hones in on 162.5.

 

In Pass 2, it hit 162.5 at both md_sync_window 6400 (Test 3) and 7296 (Test 10) (plus several other even higher md_sync_window values).  It used the lowest md_sync_window that hit the peak speed in Pass 3, so 6400.

 

In Pass 3, md_sync_window 6400 was retested 16 times with different md_sync_thresh values, and the highest peak speed reached was 162.4.  Since the exact same values from Pass 2 Test 3 were retested in Pass 3 Test 1r, the new 162.4 result ended up replacing 162.5 in the results table.

 

Then, in the very final step of the interface, all of the results were scanned again to find the fastest result.  Since the 6400 result was now 162.4 (Pass 3 replaced the Pass 2 value), the fastest became 7296.  This is a side effect of how the results are stored and that the script does retest a few combos more than once.  While I've thought about this issue, I've never figured out a good solution.

 

Long story short, md_sync_window 6144 @ 162.4 MB/s is every bit as fast as 7296 @ 162.5 MB/s, and it has a significantly lower memory utilization. 

 

In fact, you can probably go even lower than 6144, as it looks like your server hits peak speeds between 3072 and 6144, but the script didn't test those because 9216 happened to get lucky and spit out a 162.5, so the script targeted the highest range.

 

To solve this, you could discard "low" values if within a threshold, rather than replacing them?

Edited by Xaero
Link to comment
10 minutes ago, Xaero said:

Also line 516 - you have "[TOWER]" hard coded. /etc/hostname has the server's name stored in it.

Good catch!  Luckily that's only the Notification subject line, so it doesn't affect the test results.  Easy fix.

 

12 minutes ago, Xaero said:

I could nitpick over giant blocks of echos but it really is negligible at the end of the day lol. 

I briefly looked at alternatives, but I find that the giant block of echos makes it slightly easier for me to do text alignment, and to find text output statements.  It didn't seem worth the effort for me to clean it up.  It doesn't affect script performance or functionality, so it really only matters to the rare few people who poke their head under the hood.  Like I said before, Bash is not my forte. 

 

I haven't touched Bash in 3 years, since I last worked on this script, and this might be the last time I ever touch it.

 

If this ever gets converted into a plug-in, it would probably need to be rewritten in PHP, so one less reason to spend effort on cosmetic code enhancements.

 

 

Link to comment

Thanks, @Pauven!

 

I can't remove all access to my production unRAID right now, but I wanted to test your script so  ran it on my test VM.

 

                   Unraid 6.x Tunables Tester v4.0 by Pauven

             Tunables Report produced Tue Aug  6 12:11:09 CDT 2019

                              Run on server: Tower

                             Short Parity Sync Test


Current Values:  md_num_stripes=1280, md_sync_window=384, md_sync_thresh=192
                 Global nr_requests=128
                 Disk Specific nr_requests Values:
                    sdd=128, sde=128, sdf=128, 


--- INITIAL BASELINE TEST OF CURRENT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 |  23 | 1280 |  384 | 128 |   192  | 543.6 


--- BASELINE TEST OF UNRAID DEFAULT VALUES (1 Sample Point @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s
----------------------------------------------
  1 |  23 | 1280 |  384 | 128 |   192  | 557.7 


 --- TEST PASS 1 (2 Min - 12 Sample Points @ 10sec Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 |  13 |  768 |  384 | 128 |   376  | 598.0 |   320  | 573.5 |   192  | 567.4
  2 |  27 | 1536 |  768 | 128 |   760  | 602.2 |   704  | 604.7 |   384  | 589.9
  3 |  55 | 3072 | 1536 | 128 |  1528  | 584.8 |  1472  | 596.6 |   768  | 565.4
  4 | 111 | 6144 | 3072 | 128 |  3064  | 583.8 |  3008  | 570.9 |  1536  | 593.3

 --- TEST PASS 1_LOW (2.5 Min - 15 Sample Points @ 10sec Duration) ---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 |   2 |  128 |   64 | 128 |    56  | 358.2 |     0  | 288.7 |    32  | 317.1
  2 |   4 |  256 |  128 | 128 |   120  | 440.2 |    64  | 370.6 |    64  | 371.4
  3 |   6 |  384 |  192 | 128 |   184  | 452.2 |   128  | 526.4 |    96  | 432.6
  4 |   9 |  512 |  256 | 128 |   248  | 504.7 |   192  | 527.9 |   128  | 574.1
  5 |  11 |  640 |  320 | 128 |   312  | 548.5 |   256  | 538.2 |   160  | 557.2

 --- TEST PASS 1_HIGH (30 Sec - 3 Sample Points @ 10sec Duration)---
Tst | RAM | stri |  win | req | thresh |  MB/s | thresh |  MB/s | thresh |  MB/s
--------------------------------------------------------------------------------
  1 | 222 |12288 | 6144 | 128 |  6136  | 570.0 |  6080  | 575.8 |  3072  | 584.5

 --- END OF SHORT AUTO TEST FOR DETERMINING IF YOU SHOULD RUN THE REAL TEST ---

If the speeds changed with different values you should run a NORMAL/LONG test.
If speeds didn't change then adjusting Tunables likely won't help your system.

Completed: 0 Hrs 5 Min 27 Sec.


NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with Unraid,
      especially if you have any add-ons or plug-ins installed.


System Info:  Tower
              Unraid version 6.7.2
                   md_num_stripes=1280
                   md_sync_window=384
                   md_sync_thresh=192
                   nr_requests=128 (Global Setting)
                   sbNumDisks=4
              CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
              RAM: System Memory

Outputting free low memory information...

              total        used        free      shared  buff/cache   available
Mem:        4042176      154352     3285988      591668      601836     3065212
Low:        4042176      756188     3285988
High:             0           0           0
Swap:             0           0           0


SCSI Host Controllers and Connected Drives
--------------------------------------------------

[0] scsi0	usb-storage -	
[3:0:1:0]	parity		sdd	21.4GB	Virtual disk 600

[1] scsi1	ata_piix -	

[2] scsi2	ata_piix -	

[3] scsi3	mptsas -	SAS1068 PCI-X Fusion-MPT SAS
	disk3				
[3:0:1:0]	parity		sdd	21.4GB	Virtual disk 600
[3:0:2:0]	disk1		sde	21.4GB	Virtual disk 600
[3:0:3:0]	disk2		sdf	21.4GB	Virtual disk 600


                      *** END OF REPORT ***

 

Link to comment
4 minutes ago, StevenD said:

Thanks, @Pauven!

 

I can't remove all access to my production unRAID right now, but I wanted to test your script so  ran it on my test VM.

 

You're welcome!

 

With lows of 289 MB/s, and highs of 602 MB/s, this VM server definitely shows like it will respond well to tuning.  Though with only 21 GB of virtual storage (SSD?), not much point in tuning... 🤔

 

I don't know how closely your production box matches your VM, so you should always run the Short test on each box the first time just to see if it responds to changing values.  Some servers produce the same speeds no matter what values are used, and for them there's no need to try to tune, it would just be a waste of time and electricity.

Link to comment
2 hours ago, Pauven said:

Then, in the very final step of the interface, all of the results were scanned again to find the fastest result.  Since the 6400 result was now 162.4 (Pass 3 replaced the Pass 2 value), the fastest became 7296.  This is a side effect of how the results are stored and that the script does retest a few combos more than once.  While I've thought about this issue, I've never figured out a good solution.

 

Long story short, md_sync_window 6144 @ 162.4 MB/s is every bit as fast as 7296 @ 162.5 MB/s, and it has a significantly lower memory utilization. 

The threshold mentioned above might be a way to work around this. 162.5MB/s is only 0.06% faster than 162.4MB/s. Maybe if the speed difference is less than 0.2% but utilizes less memory, take the values that result in less memory. 0.2% would match 162.175MB/s and up. 0.5% would allow up to nearly 1MB/sec difference.

Link to comment
47 minutes ago, Pauven said:

 

You're welcome!

 

With lows of 289 MB/s, and highs of 602 MB/s, this VM server definitely shows like it will respond well to tuning.  Though with only 21 GB of virtual storage (SSD?), not much point in tuning... 🤔

 

I don't know how closely your production box matches your VM, so you should always run the Short test on each box the first time just to see if it responds to changing values.  Some servers produce the same speeds no matter what values are used, and for them there's no need to try to tune, it would just be a waste of time and electricity.

 

This VM only exists for me to compile open-vm-tools, so its 99% virtual (I have a passed through licensed USB).  My production box uses a passed-through LSI controller with 8TB drives.

 

Hopefully I can run the tuning script on it soon.

 

Link to comment
12 minutes ago, jbartlett said:

The threshold mentioned above might be a way to work around this. 162.5MB/s is only 0.06% faster than 162.4MB/s. Maybe if the speed difference is less than 0.2% but utilizes less memory, take the values that result in less memory. 0.2% would match 162.175MB/s and up. 0.5% would allow up to nearly 1MB/sec difference.

That sounds good.  If I combine that with the two-pass logic that comes up with the Thrifty/Recommended values, this should be trivial.  First I scan for the fastest speed, then I scan for the lowest value that achieves 99.5% or 99.9% of fastest measured speed.

 

In DanielCoffey's test, that would have resulted in 6144 being used for Pass 2 (which is what I would have picked) instead of 9216.

 

A threshold of 99.4% would have been too low, as that would have picked up the 3072 @ 161.6 MB/s result from Pass 1.  I think 99.8% is the lowest I would want to go.

Link to comment

I've implemented a new function to use the lowest md_sync_window that is at least 99.8% of max speed for the next test pass.  Early testing is that it is working as expected.  I'll test it overnight before releasing UTT v4.1.

 

5 hours ago, Xaero said:

To solve this, you could discard "low" values if within a threshold, rather than replacing them?

Thanks for the suggestion, which I used with a slight modification.  I'm now comparing to an existing test result (if it exists), and only replace it if the new result is faster.  This will be in UTT v4.1 too.

 

5 hours ago, Xaero said:

I'll take a look at the sorting and filtering for those drives at the end tonight when I get off work.

No pressure, and I appreciate the help, but thought I should mention that I would like to include this fix in v4.1 if possible.  My eyes start swimming when I look at this code section, and I'm having a hard time seeing what I'm doing wrong.

 

I had a similar issue on my own server, and I reworked the code and thought it was fixed.  My previous issue was that I was hard-coding the substring of the bus ID, and with my server now having 14 SCSI buses, the transition from 1-digit IDs to 2-digit IDs was throwing off the logic.  So I modified the substring of the bus ID to work regardless of the string length.  I don't see how that would be affecting Daniel's report, though, since all of his SCSI ID's are single digit. 

 

There was also an issue with the lsscsi -H output and the $SCSIs array variable under Unraid v6.6.  Back when I first wrote the GetDisks function under Unraid v6.2, the first 3 entries were always strings I needed to skip.  But under Unraid v6.6, somehow the array order got reversed, and now I have to skip the last 3 values.  That's why I have that $maxentry-3 calculation in the DiskReport function.  I've noticed that before I did the $maxentry fix ( eval Disks=\${$SCSI[@]:0:$maxentry} ), I was getting similar results to what Daniel is getting, so perhaps the bug in my logic is related to the $maxentry.

  • Upvote 1
Link to comment
1 hour ago, Pauven said:

I've implemented a new function to use the lowest md_sync_window that is at least 99.8% of max speed for the next test pass.  Early testing is that it is working as expected.  I'll test it overnight before releasing UTT v4.1.

I've got my drive cleared off and ready to be moved to a different controller and ready to test. 👍

Link to comment

I'll go ahead and warn that since I don't have a desktop setup at the moment and I'm limited to a 720p screen editing this script is not easy.

I see what you've done with the associative array - but I think that some systems may not output everything as you assume. So some of the array variables are null or 0 instead of being defined, which results in other logic failing and causing the erroneous output.

I don't know if I can debug this with my limited free time at the moment, so I won't make any promises.


root@BlackHole:~# tr -d "]" < <( sed 's/\[/scsi/g' < <( lsscsi -H )
> )
scsi0    usb-storage
scsi1    megaraid_sas
scsiN:0  /dev/nvme0  INTEL SSDPEKNW010T8               BTNH8435071Z1P0B    002C
scsiN:1  /dev/nvme1  INTEL SSDPEKNW010T8               BTNH843506KY1P0B    002C
root@BlackHole:~# lshw -quiet -short -c storage | grep storage
/0/100/1/0           scsi1            storage        MegaRAID SAS 2008 [Falcon]
/0/100/2.2/0                          storage        SSDPEKNW020T8 [660p, 2TB]
/0/100/2.3/0                          storage        SSDPEKNW020T8 [660p, 2TB]
/0/a1                scsi0            storage
root@BlackHole:~# lsscsi -st
[0:0:0:0]    disk    usb:3-9:1.0                     /dev/sda   62.7GB
[1:0:10:0]   enclosu                                 -               -
[1:0:11:0]   disk                                    /dev/sdb   8.00TB
[1:0:12:0]   disk                                    /dev/sdc   8.00TB
[1:0:13:0]   disk                                    /dev/sdd   8.00TB
[1:0:14:0]   disk                                    /dev/sde   8.00TB
[1:0:15:0]   disk                                    /dev/sdf   8.00TB
[1:0:16:0]   disk                                    /dev/sdg   8.00TB
[1:0:17:0]   disk                                    /dev/sdh   8.00TB
[1:0:18:0]   disk                                    /dev/sdi   8.00TB
[1:0:19:0]   disk                                    /dev/sdj   8.00TB
[1:0:20:0]   disk                                    /dev/sdk   8.00TB
[1:0:21:0]   disk                                    /dev/sdl   8.00TB
[1:0:22:0]   disk                                    /dev/sdm   8.00TB
[1:0:23:0]   disk                                    /dev/sdn   8.00TB
[1:0:24:0]   disk                                    /dev/sdo   8.00TB
[1:0:25:0]   disk                                    /dev/sdp   8.00TB
[1:0:26:0]   disk                                    /dev/sdq   8.00TB
[1:0:27:0]   disk                                    /dev/sdr   8.00TB
[1:0:28:0]   disk                                    /dev/sds   8.00TB
[1:0:29:0]   disk                                    /dev/sdt   8.00TB
[1:0:30:0]   disk                                    /dev/sdu   8.00TB
[1:0:31:0]   disk                                    /dev/sdv   8.00TB
[1:0:32:0]   disk                                    /dev/sdw   8.00TB
[1:0:33:0]   disk                                    /dev/sdx   8.00TB
[1:0:34:0]   disk                                    /dev/sdy   8.00TB
[N:0:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme0n1  1.02TB
[N:1:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme1n1  1.02TB
root@BlackHole:~# tr -d "]" < <( sed 's/\[/scsi/g' < <( lsscsi -H ) )
scsi0    usb-storage
scsi1    megaraid_sas
scsiN:0  /dev/nvme0  INTEL SSDPEKNW010T8               BTNH8435071Z1P0B    002C
scsiN:1  /dev/nvme1  INTEL SSDPEKNW010T8               BTNH843506KY1P0B    002C

 

for example I think some of my hardware may break your script - note that my NVME SSD's report an extra column (pcie, then the bus address) compared to regular disks.

Link to comment

Parity Check complete on the 8Tb WD Reds using the 99% setting... a nice improvement that shaved 90 minutes off the vanilla settings.

 

On Vanilla settings I had checks that were usually around 18h05 and up to 20h25 if there had been an unclean shutdown that had dropped a disk.

The old v2 script 99% setting got me a check that came in at 17h25.

The new v4 script on the 99% setting brought the check down to 16h29.

 

While it is still a shame that 8Tb checks come on at that long for physical reasons, it is nice to have a tuned array.

 

This is a hobby server rather than a business one so if you make a V4.1 script available I am happy to rerun it to check the layout of HDD reporting and also to see how it homes in on most suitable results with the new margins you have included.

  • Like 1
Link to comment

scsistring=`echo $scsistring | tr -d "[]"}` - why is this brace here? line 181.

Seems a bit... out of place. Even reading the code that defines the context of that line.

 

Also since you are doing a lot to re-arrange the data into a format you like, I don't think I'll mess with that logic. You've already got associative arrays and such; I think we have a small syntax error or other tiny mistake causing erroneous output. The logic looks "okay"

Edited by Xaero
Link to comment
8 hours ago, DanielCoffey said:

On Vanilla settings I had checks that were usually around 18h05 and up to 20h25 if there had been an unclean shutdown that had dropped a disk.

The old v2 script 99% setting got me a check that came in at 17h25.

The new v4 script on the 99% setting brought the check down to 16h29.

 

While it is still a shame that 8Tb checks come on at that long for physical reasons, it is nice to have a tuned array.

 

Wow, that's better than I anticipated.  Congrats!  Just wondering, how long does it take to do one pass of a pre-clear (i.e. the final read) for one of your disks? 

Link to comment
32 minutes ago, Xaero said:

scsistring=`echo $scsistring | tr -d "[]"}` - why is this brace here? line 181.

Seems a bit... out of place. Even reading the code that defines the context of that line.

Hmmm, I don't know.  Typo?  Copy&Paste relic?  I agree it looks wrong.  Do you think that was the problem?

Link to comment
8 hours ago, DanielCoffey said:

This is a hobby server rather than a business one so if you make a V4.1 script available I am happy to rerun it to check the layout of HDD reporting and also to see how it homes in on most suitable results with the new margins you have included.

 

Awesome, thanks!  Maybe next time you can run the Xtra-Long version, which includes an extra 36 test points evaluating the nr_requests values.  Would be nice to know if there are any servers that like low nr_request values after everything else is tuned.

Edited by Pauven
Link to comment
9 hours ago, tmchow said:

Just ran v4.0 on my array. Latest Unraid 6.7.2 with a 27TB array:

 

Parity: 6TB WD Red

Array:

  • 3 x 6TB WD Red
  • 2 x 3TB Seagate
  • 1 x 3TB WD Red

Test took 17 Hrs 45 Min 25 Sec to run.  Attached the reports in case anyone was interested, including @Pauven

With 32GB of RAM, I'd say go with the fastest.  Looks like your current values weren't too far off the mark, performance wise, so I'm not sure if you'll see a performance improvement or not. 

 

Any reason why the disk report at the end is showing you have a 512 GB  Parity drive, but 3 & 6 TB data drives?

Link to comment
19 minutes ago, Pauven said:

Just wondering, how long does it take to do one pass of a pre-clear (i.e. the final read) for one of your disks? 

I am sorry but I must have cleared those logs a long time ago. I would have cleared them back in 2017 and don't have a local copy of the logs any more. Unless they are buried on the flash I don't know where they are.

 

I can certainly do an extra-long test on the 4.1 script when you have it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.