Jump to content

Pauven

Members
  • Posts

    747
  • Joined

  • Last visited

  • Days Won

    7

Posts posted by Pauven

  1. 1 hour ago, wgstarks said:

    I started a new test today with 4.1. Rebooted my server into safe mode as was recommended. When I started the script I got a notice that I should be running screen, but in safe mode nerdpack plugin is disabled so screen isn’t available. How are people running screen in safe mode?

    LOL!  So true!  Hadn't thought of that...

     

    You really only need to use safe mode if you have a ton of stuff that's just too hard to disable individually. 

     

    Instead, just make sure you stop all VM's and Dockers, plus any plugins that would be accessing your array disks.  I haven't noticed any issues from the CacheDirs plug-in, since once it is running it's mainly pinging RAM to prevent disks from spinning up, but you can always stop that one too just to be safe.

     

    Other alternatives: 

     

    You can run directly on your server's console instead of remote access, completely eliminating the need for screen.

     

    Using screen is optional, though recommended when running remote.  If you have confidence that your network connection is solid, that your PC won't sleep, shutdown, or randomly update and reboot itself during the test, and that power brownouts/blackouts won't disrupt your connection, then screen really isn't needed.  Screen is like insurance - many people get by without it.

     

    Though I am also curious - how can you run screen in safe mode?

  2. 3 hours ago, ispaydeu said:

    If it makes any difference I do NOT have parity disks in my array.

    Sorry I missed this comment earlier.  I'm not sure what to make of this.  UTT performs tests of the Unraid Disk Tunables by running dozens or hundreds of non-correcting parity checks.  But if you don't have parity disks... then how in the world are you even running UTT?

     

    I don't know if you can trust any of the results - I don't even know what the results mean anymore.  If you don't have parity disks, then you shouldn't be able to check parity, and you shouldn't be able to use this tool to check parity check speeds with different tunables configured.

     

    That also might explain why negative md_sync_thresh values were responding well on your machine.

     

    Is there even a [CHECK] Parity button on your Unraid Main screen?

    • Upvote 1
  3. I just posted UTT v4.1 final, in the first post.

     

    3 hours ago, Frank1940 said:

    Please provide some basic instructions on how to run the test(s) and how to interpret and use them!   It is probably obvious to those who have assisted you in the development of this tool but the rest of us could use some guidance! 

    Everything you need should be in the first two posts. 

     

    Perhaps @SpaceInvaderOne could do one of his great videos on using UTT...  ;)

     

  4. 1 hour ago, ispaydeu said:

    The 4.0 test that I downloaded off the first page yesterday at 12:17 PM EST. On my 6.6.7 Unraid is returning that I should be using a negative md_sync_thresh " md_sync_thresh: -56" is that even possible to use a negative md_sync_thresh? Or does the script need some sort of code added to never let it test down into negative values? The test also indicated that the setting will consume 0 MB which I thought was confusing too. Thought it seemed strange so before I updated my unraid to these settings I thought I better double check. 

     

    If it makes any difference I do NOT have parity disks in my array.

     

    Example: 

    The Fastest settings tested give a peak speed of 143.7 MB/s
         md_sync_window: 8          md_num_stripes: 16
         md_sync_thresh: -56             nr_requests: 128
    This will consume 0 MB (38 MB less than your current utilization of 38 MB)

     

    Wow.  I've said it before and I'll say it again, every server is unique, some in very surprising ways.

     

    I scanned through your results, and for repeated tests I see fairly large variances of up to +/- 2.3 MB/s, so keep that in mind when comparing results.  The Long test, with a 10 minute duration for each test, should provide more accurate results.

     

    Regarding consuming 0 MB, it's actually not 0.  I'm rounding to the nearest MB, so anything under 0.5 MB would round down to 0 MB.  Here's the formula and your actual result:

    (( ( md_num_stripes * (2640 + (4096 * sbNumDisks)) ) / 1048576 )) = RAM Consumed (In Megabytes)
    
    With your values:
    (( ( 16 * (2640 + (4096 * 7)) ) / 1048576 )) = 0.477783203 MB

    *NOTE:  I've just added a new function to UTT v4.1 to show memory used in KB when it rounds down to 0 MB.

     

    Regarding the negative md_sync_thresh values, I had to double-check the code to see if UTT was really setting negative values, and it is.

     

    While UTT is setting negative md_sync_thresh values, I'm not sure if Unraid is overriding the values when they are below a certain threshold.  While I know how to read the currently 'configured' value, I don't know how to query the currently 'set' value.  Does anyone know how to do this?

     

    I did go into the Unraid Disk settings, and manually set a negative value and applied it, and Unraid saved it!

     

    image.png.c971ad1ef50beafdc0672b57df5f2af5.png

     

    So best I can tell, the UTT script is setting negative md_sync_thresh values, Unraid is accepting them, and your server is responding better with them.

     

    Perhaps @limetech can share some insight.

     

    Paul

    • Upvote 1
  5. 11 hours ago, Xaero said:

    Results from B3 look good!

    Fantastic!  That settles it then, I'll release UTT v4.1 final today.

     

     

    5 hours ago, DanielCoffey said:

    Version 4.1 Beta 2 Xtra Long Test completed...

    Those results look perfect to me.  Proof that, even as good as Unraid v6.x performs with stock settings on most servers, some servers still need tuning.  Going from 141 MB/s stock to 164 MB/s tuned nets you a nice 16% bump in peak performance.

     

    I also find the Thriftiest settings very interesting.  Only 22 MB of RAM consumed (16 MB less than stock Unraid), yet a solid 15 MB/s (11%) gain over stock performance.

     

    The consistency of your results for the repeated tests is +/- 0.1 MB/s, so you can trust the report accuracy on this server.

     

    I really appreciate you doing the Extra Long test.  As I expected, the extra nr_requests tests only provided slower speeds once the other settings were tuned.  I'm still curious if there will be a server out there that responds well to lower nr_requests values once tuned, but it seems less and less likely.

     

    Personally, I'd probably go with the Fastest values on your server.  The Recommended values only save you 122 MB over the Fastest, and the Fastest are only consuming 366 MB.  If you had a lot more drives, the memory consumption would go up proportionally and the lower Recommended values to save RAM would make more sense then.

  6. 52 minutes ago, StevenD said:

     

    Tons of SMB issues. On 6.6.7, I can write to my cache at a steady 1GB/s. On 6.7.x, it fluctuates between 1GB/s and ZERO.  It literally pauses during transfers.  

    I was starting to feel a bit guilty for still rock'n the beastly 6.6.6, especially while trying to trouble-shoot all these storage report issues for users running 6.7.x.  Now I feel a bit vindicated for sticking with Damienraid, and happy I avoided all that SMB/SQLite nonsense.  Hopefully my server hasn't sold its circuits to Beezlebub and won't be stuck on 6.6.6 forever in a journey to the bottomless pit...

     

    Perhaps I need to rename my server from Tower.  Abaddon... Apollyon... Beelzebub... Belial... Dragon... I know, Leviathan!

    • Like 1
  7. 31 minutes ago, Pauven said:

    I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices.  That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report.

    Small correction on what I wrote here.  The mdcmd status output only has drives 0-29, which is predefined by Unraid to Parity and Data disks only.  54 is the flash drive, and 30 & 31 are cache drives (I'm sure there's other predefined assignments, but that is all I've mapped out).

     

    So I was getting myself confused as to how I was getting the flash and cache drives to show in the report, since they are not in the mdcmd status output.

     

    I finally realized that I am using both mdcmd status and the /var/local/emhttp/disks.ini file to build the DiskName2Num lookup.  Looks like /var/local/emhttp/disks.ini has all array drives, up to 54, so it includes the flash and cache.  (yes, that means I have an unnecessary, redundant operation using the mdcmd output to build the DiskName2Num lookup, but it doesn't hurt anything)

     

    Ultimately the story stays the same - non-array drives aren't in /var/local/emhttp/disks.ini either, so they still don't get in the report.

  8. 1 hour ago, StevenD said:

    If it matters, those two "missing" disks are mounted via Unassigned Devices and they are not part of the array.

    My mistake, looks like you were right.

     

    I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices.  That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report.

     

    I'm a little conflicted on this.  On the one hand, I wanted the report to be a complete picture of all controllers and attached drives, but on the other hand I guess having it only display array devices is nice too, since these are the only drives being tested and tuned.

     

    I don't think I would be able to include non-array drives without a significant rewrite of this report.  So.... no. Not gonna happen.

  9. 7 minutes ago, StevenD said:

    Nevermind, I see what it did.  The results arent accurate anyway as I have several things using the array at the moment. 

    Right.  The Short test omits Passes 2 & 3, to make it quicker, and never makes any recommendations - primarily because the 10 second tests are way too quick to be accurate and you get a lot of fake numbers.

     

    For some users, their server responds the same no matter what tunables are used.  That's the point of the Short test, to save them 8+ hours of running the longer tests if it won't help them.

  10. 2 minutes ago, Xaero said:

    It looks like his nvme drives have a 5th column. Not sure if thats the cause or not

    I'm thinking no.  Here are his two NVMe drives next to your two:

    [N:0:2:1]    disk    pcie 0x144d:0xa801                         /dev/nvme0n1   500GB
    [N:1:0:1]    disk    pcie 0x1b4b:0x1093                         /dev/nvme1n1   256GB
    [N:0:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme0n1  1.02TB
    [N:1:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme1n1  1.02TB

    Columns line up the same, and UTT 4.1 Beta 2+ already accounts for the extra column on NVMe drives.

  11. 27 minutes ago, Xaero said:

    I'm not sure if this will work;
    awk '{ if ($3 == "") print "Column 3 is empty for " NR }'

     

    This should print Colum 3 is empty for # where # is the row number. 

     

    awk 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = 0 }; 1'

     

    This may also work to replace empty columns fields with "0" but is a copy paste from a different application

    Thanks for the suggestion.  That code is pretty complex, beyond my comfort level to use it.  Instead I went this path, which seems to do the trick:

    if [[ ${key[3]} == *"dev"* ]]; then
    	DN=${key[3]/\/dev\//}  #path
    	DSP=${key[4]} #size
    else
    	DN=${key[2]/\/dev\//}  #path
    	DSP=${key[3]} #size
    fi

    I should have BETA 3 ready to test in a few...

  12. Okay, I think I've found the issue with the report not showing certain disks. 

     

    For reference, here's @StevenD's lsscsi -st output:

    [0:0:0:0]    disk    usb:1-1.1:1.0                   /dev/sda   31.9GB
    [0:0:0:1]    disk    usb:1-1.1:1.0                   /dev/sdb        -
    [3:0:0:0]    disk                                    /dev/sdc   1.07GB
    [4:0:0:0]    disk                                    /dev/sdd    960GB
    [5:0:0:0]    disk    sas:0x300605b00e84f8bf          /dev/sde   8.00TB
    [5:0:1:0]    enclosu sas:0x300705b00e84f8b0          -               -
    [5:0:2:0]    disk    sas:0x300605b00e84f8bb          /dev/sdf   8.00TB
    [5:0:3:0]    disk    sas:0x300605b00e84f8b3          /dev/sdg   8.00TB
    [5:0:4:0]    disk    sas:0x300605b00e84f8b5          /dev/sdh   8.00TB
    [5:0:5:0]    disk    sas:0x300605b00e84f8b9          /dev/sdi   8.00TB
    [5:0:6:0]    disk    sas:0x300605b00e84f8bd          /dev/sdj   8.00TB
    [5:0:7:0]    disk    sas:0x300605b00e84f8b7          /dev/sdk   8.00TB
    [5:0:8:0]    disk    sas:0x300605b00e84f8ba          /dev/sdl   8.00TB
    [5:0:9:0]    disk    sas:0x300605b00e84f8b4          /dev/sdm   8.00TB
    [5:0:10:0]   disk    sas:0x300605b00e84f8b1          /dev/sdn   8.00TB
    [5:0:11:0]   disk    sas:0x300605b00e84f8be          /dev/sdo   8.00TB
    [5:0:12:0]   disk    sas:0x300605b00e84f8bc          /dev/sdp   8.00TB
    [5:0:13:0]   disk    sas:0x300605b00e84f8b8          /dev/sdq   8.00TB
    [5:0:14:0]   disk    sas:0x300605b00e84f8b0          /dev/sdr   8.00TB
    [N:0:4:1]    disk    pcie 0x144d:0xa801                         /dev/nvme0n1   512GB

    Notice that rows 3 & 4 (disks sdc & sdd) don't have the 3rd column populated. 

     

    Now, here is StevenD's storage report:

    SCSI Host Controllers and Connected Drives
    --------------------------------------------------
    
    [0] scsi0	usb-storage -	
    [0:0:0:0]	flash		sda	31.9GB	
    
    [1] scsi1	ata_piix -	
    
    [2] scsi2	ata_piix -	
    
    [3] scsi3	vmw_pvscsi -	PVSCSI SCSI Controller
    
    [4] scsi4	vmw_pvscsi -	PVSCSI SCSI Controller
    
    [5] scsi5	mpt3sas -	SAS3416 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
    [5:0:0:0]	disk3		sde	8.00TB	
    [5:0:10:0]	disk8		sdn	8.00TB	
    [5:0:11:0]	disk7		sdo	8.00TB	
    [5:0:12:0]	disk6		sdp	8.00TB	
    [5:0:13:0]	disk4		sdq	8.00TB	
    [5:0:14:0]	disk12		sdr	8.00TB	
    [5:0:2:0]	disk1		sdf	8.00TB	
    [5:0:3:0]	disk9		sdg	8.00TB	
    [5:0:4:0]	disk10		sdh	8.00TB	
    [5:0:5:0]	parity		sdi	8.00TB	
    [5:0:6:0]	disk2		sdj	8.00TB	
    [5:0:7:0]	disk11		sdk	8.00TB	
    [5:0:8:0]	disk5		sdl	8.00TB	
    [5:0:9:0]	parity2		sdm	8.00TB	
    
    [N0] scsiN0	nvme0 -	NVMe
    [N:0:4:1]	cache		nvme0n1	512GB	

    Notice that those two disks are missing.

     

    Now let's look at @Xaero's lsscsi -st output:

    [0:0:0:0]    disk    usb:3-9:1.0                     /dev/sda   62.7GB
    [1:0:10:0]   enclosu                                 -               -
    [1:0:11:0]   disk                                    /dev/sdb   8.00TB
    [1:0:12:0]   disk                                    /dev/sdc   8.00TB
    [1:0:13:0]   disk                                    /dev/sdd   8.00TB
    [1:0:14:0]   disk                                    /dev/sde   8.00TB
    [1:0:15:0]   disk                                    /dev/sdf   8.00TB
    [1:0:16:0]   disk                                    /dev/sdg   8.00TB
    [1:0:17:0]   disk                                    /dev/sdh   8.00TB
    [1:0:18:0]   disk                                    /dev/sdi   8.00TB
    [1:0:19:0]   disk                                    /dev/sdj   8.00TB
    [1:0:20:0]   disk                                    /dev/sdk   8.00TB
    [1:0:21:0]   disk                                    /dev/sdl   8.00TB
    [1:0:22:0]   disk                                    /dev/sdm   8.00TB
    [1:0:23:0]   disk                                    /dev/sdn   8.00TB
    [1:0:24:0]   disk                                    /dev/sdo   8.00TB
    [1:0:25:0]   disk                                    /dev/sdp   8.00TB
    [1:0:26:0]   disk                                    /dev/sdq   8.00TB
    [1:0:27:0]   disk                                    /dev/sdr   8.00TB
    [1:0:28:0]   disk                                    /dev/sds   8.00TB
    [1:0:29:0]   disk                                    /dev/sdt   8.00TB
    [1:0:30:0]   disk                                    /dev/sdu   8.00TB
    [1:0:31:0]   disk                                    /dev/sdv   8.00TB
    [1:0:32:0]   disk                                    /dev/sdw   8.00TB
    [1:0:33:0]   disk                                    /dev/sdx   8.00TB
    [1:0:34:0]   disk                                    /dev/sdy   8.00TB
    [N:0:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme0n1  1.02TB
    [N:1:1:1]    disk    pcie 0x8086:0x390d                         /dev/nvme1n1  1.02TB

    Notice that the disks missing the 3rd column are the sames ones missing from his storage report.

     

    What's happening is that the code that extracts data from the 4th column (grabbing the /dev/sd? value) breaks because column 3 is missing, so instead of grabbing the disk's sd? value, it grabs the disk's size (8.00TB).

     

    The solution is to make this code smarter to pull the correct column when column 3 is missing.  As a quick test, I hardcoded it to pull the previous column, and got good data for Xaero's MegaRAID drives:

    [1] scsi1	megaraid_sas 	MegaRAID SAS 2008 [Falcon]
    [1:0:11:0]	disk13		sdb	8.00TB	WDC WD30EFRX-68A
    [1:0:12:0]	disk5		sdc	8.00TB	WDC WD30EFRX-68A
    [1:0:13:0]	disk7		sdd	8.00TB	WDC WD30EFRX-68A
    [1:0:14:0]	disk2		sde	8.00TB	WDC WD30EFRX-68A
    [1:0:15:0]	disk3		sdf	8.00TB	WDC WD30EFRX-68E
    [1:0:16:0]	disk4		sdg	8.00TB	WDC WD30EFRX-68A
    [1:0:17:0]	disk10		sdh	8.00TB	WDC WD30EFRX-68A
    [1:0:18:0]	disk21		sdi	8.00TB	
    [1:0:19:0]	disk8		sdj	8.00TB	WDC WD30EFRX-68A
    [1:0:20:0]	disk12		sdk	8.00TB	WDC WD30EFRX-68A
    [1:0:21:0]	disk11		sdl	8.00TB	WDC WD30EFRX-68A
    [1:0:22:0]	disk15		sdm	8.00TB	ST4000VN000-1H41
    [1:0:23:0]	disk16		sdn	8.00TB	ST4000VN000-1H41
    [1:0:24:0]	disk19		sdo	8.00TB	WDC WD30EFRX-68E
    [1:0:25:0]	disk22		sdp	8.00TB	
    [1:0:26:0]	disk17		sdq	8.00TB	WDC WD30EFRX-68A
    [1:0:27:0]	disk18		sdr	8.00TB	WDC WD30EFRX-68A
    [1:0:28:0]	disk20		sds	8.00TB	WDC WD30EFRX-68E
    [1:0:29:0]	disk6		sdt	8.00TB	WDC WD30EFRX-68A
    [1:0:30:0]	disk9		sdu	8.00TB	WDC WD30EFRX-68A
    [1:0:31:0]	disk14		sdv	8.00TB	WDC WD30EFRX-68E
    [1:0:32:0]	disk1		sdw	8.00TB	HGST HUH728080AL
    [1:0:33:0]	parity2		sdx	8.00TB	HGST HUH728080AL
    [1:0:34:0]	parity		sdy	8.00TB	HGST HUH728080AL

     

    So how do I detect the missing column and auto adjust which field I'm extracting?  Here's the code in question:

    #Get Drives linked to SCSI Hosts
    echo " Querying lsscsi for the HDD's connected to each SCSI Host"
    while read -r line 
    do
    	key=($line)
    	scsistring=${key[0]//:/ }
    	scsistring=`echo $scsistring | tr -d "[]"`
    	scsistring=($scsistring)
    	scsi=${scsistring[0]}       #${key[0]:1:1}
    	if [ $scsi == "N" ]; then
    		scsi=$scsi${scsistring[1]}
    		x=${scsistring[3]} #${key[0]:5:1}
    		DN=${key[4]/\/dev\//}  #path   #<--This would also be affected by a missing column 3
    		DSP=${key[5]} #size            #<--This would also be affected by a missing column 3
    	else
    		x=${scsistring[2]} #${key[0]:5:1}
    		DN=${key[3]/\/dev\//}  #path   #<--This is the line pulling the wrong column when col 3 is missing
    		DSP=${key[4]} #size            #<--This ends up wrong too
    	fi
    	DK=${key[0]}
    	Disk=${DiskName2Num[$DN]}          #<--This lookup fails when $DN has disk size instead of sd? drive letter
    	if [ "$Disk" != "" ]; then         #<--Which in turn makes $Disk == "", so this section is skipped
    		DiskSCSI[$Disk]="$DK"
    		DiskSizePretty[$Disk]=$DSP #size
    		eval scsi${scsi}disks["Z$x"]=$Disk  #"${key[0]:1:-1}" #scsi12[[2]=19
    	fi
    done < <( lsscsi -st; )

     

  13. 18 hours ago, Xaero said:

    Don't worry, I'm sure some obnoxious niche thing with my server will cause a hiccup.

     

    13 hours ago, Xaero said:

    The megaraid disks don't show up still, not sure what's causing that.

     

    True to form!

     

    I'm working on replicating your data this morning.  I have everything except:

    egrep -i "\[|idx|name|type|device|color" /var/local/emhttp/disks.ini

     

  14. 7 hours ago, DanielCoffey said:

    here is the Short v4.1 BETA 2 test result.

    Looks good, thanks for sharing.

     

     

    6 hours ago, jbartlett said:

    My Beta2 short test.

    I see a couple NVMe hosts listed at the bottom, but the NVMe drives are missing.  They also weren't listed in the data your shared with me, like the egrep of /var/local/emhttp/disks.ini.

     

    The two NVMe hosts were listed in your lshw output you provided:

    /0/100/1.2/0                             storage     NVMe SSD Controller SM961/PM961
    /0/117/1.2/0                             storage     WD Black NVMe SSD

    I've looked around, and I can't find where you've posted the output of lsscsi -st so can you do that for me?

  15. 49 minutes ago, StevenD said:

    <whew>

     

    Had a little scare after rebooting from safe mode.  None of my drives showed up.

     

    PcCK4Mh.png

     

     

    I reverted the disk settings to default before I rebooted again.  This time they came up.  not sure if they are related.  I will have to play around with it.

    More likely that your "SAS3416 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)" didn't initialize correctly on boot. 

  16. UTT v4.1 BETA 2 is attached.

     

    Same as with BETA 1, I'm primarily concerned about the SCSI Hosts and Discs report, so if I could get a few users to run this with a Short test and post the reports that would be great. 

     

    BETA 2 has more fixes for the SCSI Host Controllers and Connected Drives report (including a modified numerical sort on drive port #), and cosmetic tweaks to the server name that shows in the notifications.

     

    BETA 2 still has my debugging statements in the code, but they are all commented out.

     

    Here's the v4.1 changelog:

    # V4.1: Added a function to use the first result with 99.8% max speed for Pass 2
    #       Fixed Server Name in Notification messages (was hardcoded TOWER)
    #       Many fixes to the SCSI Host Controllers and Connected Drives report
    #       Added a function to check lsscsi version and optionally upgrade to v0.30
    #       Cosmetic menu tweaks - by Pauven 08/12/2019

     

    unraid6x-tunables-tester.sh.v4_1_BETA2.txt

×
×
  • Create New...