Pauven

Members
  • Posts

    747
  • Joined

  • Last visited

  • Days Won

    7

Everything posted by Pauven

  1. Hmmm. Well, I guess the good news is that you are only getting a couple notifications instead of hundreds, so it seems like it is mostly working. I'm not sure how a couple are slipping through. The one at the end is actually not that surprising, as there can be a delay for Unraid to send out each parity check start/finished notification, and the UTT script might have already removed the block at the end of the script before that notification comes through, so it should almost be expected that the very last parity check finished notification slips through. But the one at the beginning has me stumped, since the block is put into place before any parity checks are started. I see you are on Unraid 6.7.x - perhaps something has changed related to notifications since Unraid 6.6.x. I did all my development on Unraid 6.6.6, and I refuse to use 6.7.x until the numerous SMB and SQLite issues have been resolved.
  2. That sounds artificially low. Agreed. Looking at your test results, I see a couple things. First, you have a mixture of drives: 8TB, 6TB and 4TB. This has a impact on max speeds. How? Imagine a foot race with world's fastest man, Olympic champion Usain Bolt, your local high school's 40m track champion, a 5-year-old boy, and a surprisingly agile 92-year-old grandmother. I know you're thinking Usain will win, but wait... All four runners are on the same team, and they are roped together, and the race requirement is that no one gets yanked down to the ground - everyone has to finish standing up. Now it seems a bit more obvious that no matter how fast Usain is, he and his teammates basically have to walk alongside the 92-year-old grandmother who is setting the pace for the race. This is how Parity Checks work on Unraid. In my server, my 3TB 5400 RPM drives are the slowest, so they set the pace at 140 MB/s, even though my 8TB 7200 RPM drives can easily exceed 200 MB/s on their own. I'm not sure which drives are slowest in your system, your 4TB drives look like 7200 RPM units, so it might be the 6TB drives. But even though your drive mixture is slowing you down some, even your slowest drive should be good for 150+ MB/s. So something else is slowing your server down. To determine what that bottleneck is, math is your friend. I see that you have 16 drives connected to your SAS2116 PCI-Express Fusion-MPT SAS-2 controller. To understand what kind of bandwidth that controller is seeing, simply multiply the max speed by the number of drives: 16 Drives * 89.2 MB/s = 1,427 MB/s But that is just the drive data throughput. SATA drives use an 8b/10b encoding which has a 20% overhead throughput penalty, so your realized bandwidth is only 80% of what the controller is seeing. So we need to add the overhead back into that number: 1427 MB/s / 0.80 = 1784 MB/s We also need to factor in the PCI-Express overhead. While the 8b/10b protocol overhead in PCIe v1 and v2 is already factored into those speeds, there are additional overheads like TLP that further reduce the published speeds. You might only get at most 92% of published PCI-e bandwidth numbers, possibly less: 1784 MB/s / 0.92 = 1939 MB/s being handled by your PCI-Express slot. 1939 MB/s is a very interesting number, as it is very close to 2000 MB/s, which is equivalent to PCIe v1.0 x 8 lanes, and PCIe v2.0 x 4 lanes. So, long classroom lecture short, most likely what is happening is that your SAS controller is connecting to your system at PCIe 1.0 x8 or PCIe 2.0 x4. I'm not certain what controller you have, but based upon the driver I think the card has a PCIe 2.0 x8 max connection speed, which should be good for double what you are getting (perhaps around 182 MB/s for 16 drives). So you probably have plugged the controller into the wrong slot. On many motherboards, some of the x16 slots are only wired for x4, so while your PCIe 2.0 x8 card would fit in the x16 slot, the speed gets reduced to half-speed, PCIe 2.0 x4. Alternatively, you might have a really old system that only supports PCIe 1.0, which again would cut your speeds in half. Your signature doesn't specify your exact hardware, so I don't know which it would be. One last tip: If you are doing Windows VM's with passthrough graphics, and you are putting your graphics card in the fastest PCIe slot hoping for max speed - that probably isn't needed. I did some testing a couple years back, putting the video card in PCIe 3.0 x 16 and PCIe 3.0 x 4 slots, and in 3D Mark the score was nearly the same. I know all the hardware review websites like to make a big deal about PCIe bandwidth and video cards, but the reality is that for gaming it really doesn't make much of a difference. On the other hand, 16 fast hard drives can easily saturate a PCIe 2.0 x8 connection, so it is very important to put your HD controller in the fastest available slot. </class> Paul
  3. UTT v4.x also sends out a test begin and a test end notification, instead of the hundreds of notifications you would get with out the block. Any chance you're confusing the UTT notifications with the Unraid Parity Check notifications?
  4. Sorry, I should have tried the link that StevenD provided. I didn't realize it wasn't a direct link to the file, but rather to a web page from where you can start a download. This URL should work: http://mirrors.slackware.com/slackware/slackware-current/slackware/ap/screen-4.6.2-i586-2.txz I'll update my post above too.
  5. To expand on StevenD's answer: Change into your UTT directory: cd /boot/utt *** Download screen: wget http://mirrors.slackware.com/slackware/slackware-current/slackware/ap/screen-4.6.2-i586-2.txz Install screen: upgradepkg --install-new screen-4.6.2-i586-2.txz Run screen: screen *** NOTE: You should only have to download screen once, and you can do this from your Window's PC and save it to your \\<servername>\flash\utt directory or via the wget command line above. Each time you reboot, screen is no longer installed as Unraid boots from a static image, so you would still need to do steps 1, 3 & 4, but skip step 2 since you had downloaded it previously.
  6. Yeah, you want to stop any and all access of your shares during the test, from any and all sources.
  7. LOL! So true! Hadn't thought of that... You really only need to use safe mode if you have a ton of stuff that's just too hard to disable individually. Instead, just make sure you stop all VM's and Dockers, plus any plugins that would be accessing your array disks. I haven't noticed any issues from the CacheDirs plug-in, since once it is running it's mainly pinging RAM to prevent disks from spinning up, but you can always stop that one too just to be safe. Other alternatives: You can run directly on your server's console instead of remote access, completely eliminating the need for screen. Using screen is optional, though recommended when running remote. If you have confidence that your network connection is solid, that your PC won't sleep, shutdown, or randomly update and reboot itself during the test, and that power brownouts/blackouts won't disrupt your connection, then screen really isn't needed. Screen is like insurance - many people get by without it. Though I am also curious - how can you run screen in safe mode?
  8. Sorry I missed this comment earlier. I'm not sure what to make of this. UTT performs tests of the Unraid Disk Tunables by running dozens or hundreds of non-correcting parity checks. But if you don't have parity disks... then how in the world are you even running UTT? I don't know if you can trust any of the results - I don't even know what the results mean anymore. If you don't have parity disks, then you shouldn't be able to check parity, and you shouldn't be able to use this tool to check parity check speeds with different tunables configured. That also might explain why negative md_sync_thresh values were responding well on your machine. Is there even a [CHECK] Parity button on your Unraid Main screen?
  9. I just posted UTT v4.1 final, in the first post. Everything you need should be in the first two posts. Perhaps @SpaceInvaderOne could do one of his great videos on using UTT...
  10. Wow. I've said it before and I'll say it again, every server is unique, some in very surprising ways. I scanned through your results, and for repeated tests I see fairly large variances of up to +/- 2.3 MB/s, so keep that in mind when comparing results. The Long test, with a 10 minute duration for each test, should provide more accurate results. Regarding consuming 0 MB, it's actually not 0. I'm rounding to the nearest MB, so anything under 0.5 MB would round down to 0 MB. Here's the formula and your actual result: (( ( md_num_stripes * (2640 + (4096 * sbNumDisks)) ) / 1048576 )) = RAM Consumed (In Megabytes) With your values: (( ( 16 * (2640 + (4096 * 7)) ) / 1048576 )) = 0.477783203 MB *NOTE: I've just added a new function to UTT v4.1 to show memory used in KB when it rounds down to 0 MB. Regarding the negative md_sync_thresh values, I had to double-check the code to see if UTT was really setting negative values, and it is. While UTT is setting negative md_sync_thresh values, I'm not sure if Unraid is overriding the values when they are below a certain threshold. While I know how to read the currently 'configured' value, I don't know how to query the currently 'set' value. Does anyone know how to do this? I did go into the Unraid Disk settings, and manually set a negative value and applied it, and Unraid saved it! So best I can tell, the UTT script is setting negative md_sync_thresh values, Unraid is accepting them, and your server is responding better with them. Perhaps @limetech can share some insight. Paul
  11. Fantastic! That settles it then, I'll release UTT v4.1 final today. Those results look perfect to me. Proof that, even as good as Unraid v6.x performs with stock settings on most servers, some servers still need tuning. Going from 141 MB/s stock to 164 MB/s tuned nets you a nice 16% bump in peak performance. I also find the Thriftiest settings very interesting. Only 22 MB of RAM consumed (16 MB less than stock Unraid), yet a solid 15 MB/s (11%) gain over stock performance. The consistency of your results for the repeated tests is +/- 0.1 MB/s, so you can trust the report accuracy on this server. I really appreciate you doing the Extra Long test. As I expected, the extra nr_requests tests only provided slower speeds once the other settings were tuned. I'm still curious if there will be a server out there that responds well to lower nr_requests values once tuned, but it seems less and less likely. Personally, I'd probably go with the Fastest values on your server. The Recommended values only save you 122 MB over the Fastest, and the Fastest are only consuming 366 MB. If you had a lot more drives, the memory consumption would go up proportionally and the lower Recommended values to save RAM would make more sense then.
  12. Thanks for confirming. I think UTT v4.1 BETA 3 is ready to make the jump to final.
  13. I was starting to feel a bit guilty for still rock'n the beastly 6.6.6, especially while trying to trouble-shoot all these storage report issues for users running 6.7.x. Now I feel a bit vindicated for sticking with Damienraid, and happy I avoided all that SMB/SQLite nonsense. Hopefully my server hasn't sold its circuits to Beezlebub and won't be stuck on 6.6.6 forever in a journey to the bottomless pit... Perhaps I need to rename my server from Tower. Abaddon... Apollyon... Beelzebub... Belial... Dragon... I know, Leviathan!
  14. Small correction on what I wrote here. The mdcmd status output only has drives 0-29, which is predefined by Unraid to Parity and Data disks only. 54 is the flash drive, and 30 & 31 are cache drives (I'm sure there's other predefined assignments, but that is all I've mapped out). So I was getting myself confused as to how I was getting the flash and cache drives to show in the report, since they are not in the mdcmd status output. I finally realized that I am using both mdcmd status and the /var/local/emhttp/disks.ini file to build the DiskName2Num lookup. Looks like /var/local/emhttp/disks.ini has all array drives, up to 54, so it includes the flash and cache. (yes, that means I have an unnecessary, redundant operation using the mdcmd output to build the DiskName2Num lookup, but it doesn't hurt anything) Ultimately the story stays the same - non-array drives aren't in /var/local/emhttp/disks.ini either, so they still don't get in the report.
  15. Thanks @jbartlett! Any chance your two NVMe drives are non-array devices?
  16. My mistake, looks like you were right. I build a DiskName2Num lookup array, but it is based upon the data from mdcmd status, which of course only provides data on array devices. That means these unassigned disks don't get a Disk Name to Disk Number lookup entry, so it's not available for the final report. I'm a little conflicted on this. On the one hand, I wanted the report to be a complete picture of all controllers and attached drives, but on the other hand I guess having it only display array devices is nice too, since these are the only drives being tested and tuned. I don't think I would be able to include non-array drives without a significant rewrite of this report. So.... no. Not gonna happen.
  17. Right. The Short test omits Passes 2 & 3, to make it quicker, and never makes any recommendations - primarily because the 10 second tests are way too quick to be accurate and you get a lot of fake numbers. For some users, their server responds the same no matter what tunables are used. That's the point of the Short test, to save them 8+ hours of running the longer tests if it won't help them.
  18. Why? I'm still on 6.6.6 because I've seen too many issues reported in the 6.6.7 and 6.7.x branches that just never seemed to get any solutions. Looks like those two drives still aren't showing. I'll test again on my side. I got it working with Xaero's data, and assumed it would fix yours too.
  19. Looks good. I'd say your accuracy is +/- 0.2 MB/s. So even though the new fastest recommendation is ever so slightly faster, it's within the error of margin so you likely won't see a difference.
  20. Yours should be fixed in BETA 3 if you want to give the Short test a run.
  21. I'm thinking no. Here are his two NVMe drives next to your two: [N:0:2:1] disk pcie 0x144d:0xa801 /dev/nvme0n1 500GB [N:1:0:1] disk pcie 0x1b4b:0x1093 /dev/nvme1n1 256GB [N:0:1:1] disk pcie 0x8086:0x390d /dev/nvme0n1 1.02TB [N:1:1:1] disk pcie 0x8086:0x390d /dev/nvme1n1 1.02TB Columns line up the same, and UTT 4.1 Beta 2+ already accounts for the extra column on NVMe drives.
  22. Looks like your two NVMe drives still aren't showing. I've got your lsscsi -st and lshw output, but I need more. Please provide the output from the following: lsscsi -H mdcmd status | egrep -i "rdevStatus|rdevName|diskSize" df -h egrep -i "\[|idx|name|type|device|color" /var/local/emhttp/disks.ini
  23. Hopefully 3rd time's the charm: UTT 4.1 BETA 3 is attached. Only change is the fix for missing disks in the storage report. unraid6x-tunables-tester.sh.v4_1_BETA3.txt
  24. Thanks for the suggestion. That code is pretty complex, beyond my comfort level to use it. Instead I went this path, which seems to do the trick: if [[ ${key[3]} == *"dev"* ]]; then DN=${key[3]/\/dev\//} #path DSP=${key[4]} #size else DN=${key[2]/\/dev\//} #path DSP=${key[3]} #size fi I should have BETA 3 ready to test in a few...
  25. Okay, I think I've found the issue with the report not showing certain disks. For reference, here's @StevenD's lsscsi -st output: [0:0:0:0] disk usb:1-1.1:1.0 /dev/sda 31.9GB [0:0:0:1] disk usb:1-1.1:1.0 /dev/sdb - [3:0:0:0] disk /dev/sdc 1.07GB [4:0:0:0] disk /dev/sdd 960GB [5:0:0:0] disk sas:0x300605b00e84f8bf /dev/sde 8.00TB [5:0:1:0] enclosu sas:0x300705b00e84f8b0 - - [5:0:2:0] disk sas:0x300605b00e84f8bb /dev/sdf 8.00TB [5:0:3:0] disk sas:0x300605b00e84f8b3 /dev/sdg 8.00TB [5:0:4:0] disk sas:0x300605b00e84f8b5 /dev/sdh 8.00TB [5:0:5:0] disk sas:0x300605b00e84f8b9 /dev/sdi 8.00TB [5:0:6:0] disk sas:0x300605b00e84f8bd /dev/sdj 8.00TB [5:0:7:0] disk sas:0x300605b00e84f8b7 /dev/sdk 8.00TB [5:0:8:0] disk sas:0x300605b00e84f8ba /dev/sdl 8.00TB [5:0:9:0] disk sas:0x300605b00e84f8b4 /dev/sdm 8.00TB [5:0:10:0] disk sas:0x300605b00e84f8b1 /dev/sdn 8.00TB [5:0:11:0] disk sas:0x300605b00e84f8be /dev/sdo 8.00TB [5:0:12:0] disk sas:0x300605b00e84f8bc /dev/sdp 8.00TB [5:0:13:0] disk sas:0x300605b00e84f8b8 /dev/sdq 8.00TB [5:0:14:0] disk sas:0x300605b00e84f8b0 /dev/sdr 8.00TB [N:0:4:1] disk pcie 0x144d:0xa801 /dev/nvme0n1 512GB Notice that rows 3 & 4 (disks sdc & sdd) don't have the 3rd column populated. Now, here is StevenD's storage report: SCSI Host Controllers and Connected Drives -------------------------------------------------- [0] scsi0 usb-storage - [0:0:0:0] flash sda 31.9GB [1] scsi1 ata_piix - [2] scsi2 ata_piix - [3] scsi3 vmw_pvscsi - PVSCSI SCSI Controller [4] scsi4 vmw_pvscsi - PVSCSI SCSI Controller [5] scsi5 mpt3sas - SAS3416 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) [5:0:0:0] disk3 sde 8.00TB [5:0:10:0] disk8 sdn 8.00TB [5:0:11:0] disk7 sdo 8.00TB [5:0:12:0] disk6 sdp 8.00TB [5:0:13:0] disk4 sdq 8.00TB [5:0:14:0] disk12 sdr 8.00TB [5:0:2:0] disk1 sdf 8.00TB [5:0:3:0] disk9 sdg 8.00TB [5:0:4:0] disk10 sdh 8.00TB [5:0:5:0] parity sdi 8.00TB [5:0:6:0] disk2 sdj 8.00TB [5:0:7:0] disk11 sdk 8.00TB [5:0:8:0] disk5 sdl 8.00TB [5:0:9:0] parity2 sdm 8.00TB [N0] scsiN0 nvme0 - NVMe [N:0:4:1] cache nvme0n1 512GB Notice that those two disks are missing. Now let's look at @Xaero's lsscsi -st output: [0:0:0:0] disk usb:3-9:1.0 /dev/sda 62.7GB [1:0:10:0] enclosu - - [1:0:11:0] disk /dev/sdb 8.00TB [1:0:12:0] disk /dev/sdc 8.00TB [1:0:13:0] disk /dev/sdd 8.00TB [1:0:14:0] disk /dev/sde 8.00TB [1:0:15:0] disk /dev/sdf 8.00TB [1:0:16:0] disk /dev/sdg 8.00TB [1:0:17:0] disk /dev/sdh 8.00TB [1:0:18:0] disk /dev/sdi 8.00TB [1:0:19:0] disk /dev/sdj 8.00TB [1:0:20:0] disk /dev/sdk 8.00TB [1:0:21:0] disk /dev/sdl 8.00TB [1:0:22:0] disk /dev/sdm 8.00TB [1:0:23:0] disk /dev/sdn 8.00TB [1:0:24:0] disk /dev/sdo 8.00TB [1:0:25:0] disk /dev/sdp 8.00TB [1:0:26:0] disk /dev/sdq 8.00TB [1:0:27:0] disk /dev/sdr 8.00TB [1:0:28:0] disk /dev/sds 8.00TB [1:0:29:0] disk /dev/sdt 8.00TB [1:0:30:0] disk /dev/sdu 8.00TB [1:0:31:0] disk /dev/sdv 8.00TB [1:0:32:0] disk /dev/sdw 8.00TB [1:0:33:0] disk /dev/sdx 8.00TB [1:0:34:0] disk /dev/sdy 8.00TB [N:0:1:1] disk pcie 0x8086:0x390d /dev/nvme0n1 1.02TB [N:1:1:1] disk pcie 0x8086:0x390d /dev/nvme1n1 1.02TB Notice that the disks missing the 3rd column are the sames ones missing from his storage report. What's happening is that the code that extracts data from the 4th column (grabbing the /dev/sd? value) breaks because column 3 is missing, so instead of grabbing the disk's sd? value, it grabs the disk's size (8.00TB). The solution is to make this code smarter to pull the correct column when column 3 is missing. As a quick test, I hardcoded it to pull the previous column, and got good data for Xaero's MegaRAID drives: [1] scsi1 megaraid_sas MegaRAID SAS 2008 [Falcon] [1:0:11:0] disk13 sdb 8.00TB WDC WD30EFRX-68A [1:0:12:0] disk5 sdc 8.00TB WDC WD30EFRX-68A [1:0:13:0] disk7 sdd 8.00TB WDC WD30EFRX-68A [1:0:14:0] disk2 sde 8.00TB WDC WD30EFRX-68A [1:0:15:0] disk3 sdf 8.00TB WDC WD30EFRX-68E [1:0:16:0] disk4 sdg 8.00TB WDC WD30EFRX-68A [1:0:17:0] disk10 sdh 8.00TB WDC WD30EFRX-68A [1:0:18:0] disk21 sdi 8.00TB [1:0:19:0] disk8 sdj 8.00TB WDC WD30EFRX-68A [1:0:20:0] disk12 sdk 8.00TB WDC WD30EFRX-68A [1:0:21:0] disk11 sdl 8.00TB WDC WD30EFRX-68A [1:0:22:0] disk15 sdm 8.00TB ST4000VN000-1H41 [1:0:23:0] disk16 sdn 8.00TB ST4000VN000-1H41 [1:0:24:0] disk19 sdo 8.00TB WDC WD30EFRX-68E [1:0:25:0] disk22 sdp 8.00TB [1:0:26:0] disk17 sdq 8.00TB WDC WD30EFRX-68A [1:0:27:0] disk18 sdr 8.00TB WDC WD30EFRX-68A [1:0:28:0] disk20 sds 8.00TB WDC WD30EFRX-68E [1:0:29:0] disk6 sdt 8.00TB WDC WD30EFRX-68A [1:0:30:0] disk9 sdu 8.00TB WDC WD30EFRX-68A [1:0:31:0] disk14 sdv 8.00TB WDC WD30EFRX-68E [1:0:32:0] disk1 sdw 8.00TB HGST HUH728080AL [1:0:33:0] parity2 sdx 8.00TB HGST HUH728080AL [1:0:34:0] parity sdy 8.00TB HGST HUH728080AL So how do I detect the missing column and auto adjust which field I'm extracting? Here's the code in question: #Get Drives linked to SCSI Hosts echo " Querying lsscsi for the HDD's connected to each SCSI Host" while read -r line do key=($line) scsistring=${key[0]//:/ } scsistring=`echo $scsistring | tr -d "[]"` scsistring=($scsistring) scsi=${scsistring[0]} #${key[0]:1:1} if [ $scsi == "N" ]; then scsi=$scsi${scsistring[1]} x=${scsistring[3]} #${key[0]:5:1} DN=${key[4]/\/dev\//} #path #<--This would also be affected by a missing column 3 DSP=${key[5]} #size #<--This would also be affected by a missing column 3 else x=${scsistring[2]} #${key[0]:5:1} DN=${key[3]/\/dev\//} #path #<--This is the line pulling the wrong column when col 3 is missing DSP=${key[4]} #size #<--This ends up wrong too fi DK=${key[0]} Disk=${DiskName2Num[$DN]} #<--This lookup fails when $DN has disk size instead of sd? drive letter if [ "$Disk" != "" ]; then #<--Which in turn makes $Disk == "", so this section is skipped DiskSCSI[$Disk]="$DK" DiskSizePretty[$Disk]=$DSP #size eval scsi${scsi}disks["Z$x"]=$Disk #"${key[0]:1:-1}" #scsi12[[2]=19 fi done < <( lsscsi -st; )