Jump to content

Pauven

Members
  • Content Count

    716
  • Joined

  • Last visited

  • Days Won

    7

Everything posted by Pauven

  1. Drats. So wait... I was actually too quick to create the new version of UTT? I could have sat on my tookus and let Limetech fix the issue for me? That's disappointing. But your tuned values are sky-high, probably among the highest I have ever seen shared here. I would say that you have a special needs controller. Definitely share this with Limetech. Very interesting, thanks for sharing. It took me a long time and a lot of effort to come up with a testing strategy for the v6.0-v6.7 tunables, and these changes with 6.8 pretty much throw all that out the window. If anyone sees any info regarding the new tunables, please repost here. And fingers crossed that Limetech makes UTT unnecessary, as I really really really don't want to do it all over yet again...
  2. In my experience, a rebuild should be similar in time to a parity check. The parity check reads from all drives simultaneously, while a rebuild writes to one and reads from all others. Total bandwidth is close to identical, as is parity calculation load on the CPU. As jbartlett advised, one of your drives could be running slow.
  3. Tom, any reason you are no longer posting that RC's are available in the Prerelease forum? The last one I see is "Unraid OS version 6.7.0-rc8 available". Paul
  4. I think that is a really interesting finding. In a disk to disk transfer, you're both reading from and writing to 3 disks simultaneously (4 if you had dual parity), which is a very different workload than just reading from all disks simultaneously. I'm guessing what happened is that you went so low in memory, that disk to disk transfers were impacted. I'll have to do some testing on my server and see if that is something I can replicate. You have a server that responds very well to low values, at least as far as parity checks go. Actually, it seems to respond the same for almost any set of values, achieving around 141 MB/s across the board except for a few edge cases. For that type of server, you're probably best off just running stock Unraid tunables settings.
  5. Hi @DanielCoffey, thanks for lending a helping hand. Even though it has the same name, for some reason the file you posted has a different size than the original version. I think it would be wise if you remove the file you posted, just in case. Also, the original file is hosted on the Unraid forum, which has done a decent job of hosting files for years. Not sure why vekselstrom had an issue downloading, though it seems to have been a temporary issue. I think it would be best if we keep the download option centralized in the first post, which gives me control over updates.
  6. My monthly parity check completed in another record time for my server, dropping another 12 seconds (haha). Even though the UTT v4.1 enhancements resulted in slightly better peak numbers, my server was already well optimized so the additional performance was not impactful.
  7. UTT does not do any writes, only reads. Specifically, it applies a combination of tunables parameters, then initiates a non-correcting (read only) parity check, let's it run for 5 or 10 minutes (depending upon the test length you chose), then aborts the parity check. It then tries the next set of values and repeats. I believe dalben's report might be the very first time a drive failure has been reported during testing. UTT v4 works the same basic way as the previous versions, so there's years of data behind that statement. In theory, the tests that UTT performs are no more strenuous then a regular parity check. But anytime you spin up and use your hard drives, especially all of them at once generating max possible heat, you risk a drive failure - same as during a parity check. Some may feel that the stress is slightly harder than a parity check, as UTT keeps repeating the first 5/10 minutes of the parity check, for dozens of times (minimum 82 times, maximum 139 times), so it keeps all of your drives spinning at their fastest/hottest for the entire test period, unlike a true parity check that would allow smaller drives to complete and spin down as larger drives continue the check. But the stress should be less than hard drive bench-marking, especially tests that do random small file reads/writes and generate lots of head movement.
  8. Array integrity comes first, you did the right thing. Unfortunately, the slightly different results in this run causes the script to test a lower range in Pass 2, so it didn't retest that magical 177 MB/s test from your earlier run. Would have been interesting to see it retested, and if it consistently performs better.
  9. Uhmmmm... --- TEST PASS 2 (10 Hrs - 49 Sample Points @ 10min Duration) --- Tst | RAM | stri | win | req | thresh | MB/s ---------------------------------------------- 1 | 207 | 6144 | 3072 | 128 | 3064 | 148.0 2 | 216 | 6400 | 3200 | 128 | 3192 | 176.9 <-- !!!!!!!!!!!!!!!!!!!!!!!!!! 3 | 224 | 6656 | 3328 | 128 | 3320 | 146.6 4 | 233 | 6912 | 3456 | 128 | 3448 | 148.5 5 | 242 | 7168 | 3584 | 128 | 3576 | 148.3 6 | 250 | 7424 | 3712 | 128 | 3704 | 148.3 7 | 259 | 7680 | 3840 | 128 | 3832 | 148.8 8 | 267 | 7936 | 3968 | 128 | 3960 | 148.6 9 | 276 | 8192 | 4096 | 128 | 4088 | 146.9 10 | 285 | 8448 | 4224 | 128 | 4216 | 149.0 11 | 293 | 8704 | 4352 | 128 | 4344 | 148.3 12 | 302 | 8960 | 4480 | 128 | 4472 | 149.1 13 | 311 | 9216 | 4608 | 128 | 4600 | 108.6 14 | 319 | 9472 | 4736 | 128 | 4728 | 148.5 15 | 328 | 9728 | 4864 | 128 | 4856 | 145.8 16 | 337 | 9984 | 4992 | 128 | 4984 | 149.0 17 | 345 |10240 | 5120 | 128 | 5112 | 148.5 I think that has to be some kind of glitch, but I can't imagine how. I've never seen a 30 MB/s jump on a specific setting combo like that. Unless the whole time the wife was watching a movie, except for that one test.
  10. Definitely! I see a strong linear progression from 15 MB/s to 130 MB/s as the settings increase. I've never seen a range this large, or a speed that slow using Unraid default values! Fascinating! I'm very interested in seeing the Long results. Please include the CSV file too, I'll probably chart this one for everyone.
  11. There's only a handful of options to choose from, the menu has been greatly simplified. Short Test Run this to see if your system appears to respond to changing the Unraid disk Tunables. If your results look mostly flat, then go on with life and forget about this tool - your server doesn't need it. Some servers behave the same no matter what tunables you use. But if you see dramatically different speeds from the Short Test, then that shows your server appears to react to changing the tunables, and one of the real tests below could be worth the time. Sometimes you will even seem the outlines of a bell curve forming in the Short Test results, which is a very strong indicator that your server responds well to tuning. This test only takes a few minutes, so you don't have to waste much time to see if your server responds to tuning. Also, keep in mind that even if your server responds well to tuning, the fastest parameters might still be the Unraid stock values, so there's no guarantee that running the tests will discover values that make your server faster. Normal Test This is the quickest real test. It does not test the nr_requests values, and it uses a 5 minute duration for each test. Because the test adapts to how your HD controller responds to the tunables, it will optionally test some additional value ranges, so the run time varies from 8 to 10 hours. Thorough Test Same as the Normal Test, but includes the nr_requests tests, which add another 4 hours to the Normal Test duration. So far we have found that once all the other tunables have been optimized (by the normal tests), that nr_requests default value of 128 is best, making the nr_requests tests basically a waste of time. But there is always the possibility that your server might be different, so I make this optional if you want to check. Long Test (Recommended) This is exactly the same as the Normal Test, except each test duration is doubled from 5 minutes to 10 minutes. That means the test takes twice as long. Longer tests improve accuracy, making it easier to identify which settings work best. For example, if the Normal Test had an accuracy of +/- 1.0 MB/s, then the Long Test might double that accuracy to +/- 0.5 MB/s or better. Because the test duration is doubled, the total test time also doubles to 16-20 hours. I recommend this test because it has the increased accuracy of the 10 minute duration, without the extra 8 hours for the nr_requests test that are probably a waste of time. Xtra-Long Test This is exactly the same as the Thorough Test, except each test duration is doubled from 5 minutes to 10 minutes, for the same reason as the Long Test. Another way to think of this is that this is the Long Test plus the nr_requests tests. Because the test duration is doubled, the nr_requests tests add 8 hours, bringing total test length up to the 24-28 hour range. FYI on Test Accuracy Test accuracy is determined by looking at tests that get repeated in successive passes, for example Pass 2 Test 25 is always a repeat of the test result chosen from Pass 1, and Pass 2 Test 1 is usually a repeat of another test in Pass 1 as well. The fastest test result from Passes 1 & 2 also gets repeated in Pass 3. Because the test points can vary by server, sometimes you will get several more repeated test points to compare to determine accuracy. By comparing the reported speeds from one pass to the others for the exact same tests, you can determine the accuracy. The accuracy varies by server. Some servers, like mine, produce an accuracy of +/- 0.1 MB/s every single time, it's incredibly consistent. Other servers might be +/- 2.5 MB/s, while a few servers are +/- 10 MB/s or worse. Note, if you are seeing large accuracy variances, that might mean that you have processes running that are access the array, reading or writing data, which essentially makes the test results invalid. When I look at the results and make an accuracy determination, I usually use the worst result (biggest variance) and use that as the accuracy for the entire test. So if the test chosen from Pass 1 was 140.5 MB/s, and the Pass 2 Test 25 was 140.7 MB/s, then that is an accuracy of +/- 0.2 MB/s. But if another repeated test was 143.0 MB/s in one pass, and 142.0 MB/s in another pass, then that indicates an accuracy of +/- 1.0 MB/s, so I say the entire test is +/- 1.0 MB/s. It takes time for servers to 'settle down', so to speak, and produce accurate results. Modern hard drives have huge caches, HD controllers often have caches, all designed to improve short-term performance. System activity may temporarily affect throughput. The longer tests minimize these effect, improving accuracy. Also, the longer tests just provide for better math. For example, consider a 10 second test versus a 10 minute (600 seconds) test. 2000 MB moved in 10 seconds = 200 MB/s, and 2060 MB moved in 10 seconds = 206 MB/s. 120,000 MB moved in 600 seconds is also 200 MB/s, but 120,060 MB moved in 600 seconds is 200.1 MB/s. In this example, the variance in both tests was just 60 MB, but the average speed accuracy increased from +/- 6.0 MB/s to +/- 0.1 MB/s, 60 times more accurate. This helps illustrate why the Short Test, which uses a 10 second duration, is not accurate enough for usable results. Understanding the accuracy of your results is important when trying to determine which result is fastest. If your accuracy is +/- 1.0 MB/s, then for all intents and purposes, 162 MB/s is the same as 163 MB/s, and there's no reason to pick 163 over 162.
  12. Wow, you got some really good speeds for having 4TB drives in the mix - I'm guess those are 7200 RPM units. Looks like the repeated tests are +/- 2.6 MB/s, maybe more, so there's a lot of variance in your run to run results. What that means is that, except for the handful of results in the 140's to low 150's, all of the results are essentially the same. So almost any combo of values would work fine. Also, the Unraid stock settings look marvelous on your server - I would use those, and in the process save yourself over half a gig of RAM.
  13. This might help a little, as the previous results are no longer valid. I'm surprised you got the increase you got without retesting the tunables. Also, your 16h43m run is not bad at all. 133 MB/s is the average speed, and sounds close to right for a 5400 RPM 8TB drive, but low for a 7200 RPM 8TB drive. An 8TB 7200 RPM drive will provide over 200 MB/s at the beginning of the disk, gradually falling to around 90 MB/s at the end of the disk. All HDD's do this. Your average speed will be somewhere in the middle, i.e. around 160 MB/s for a 7200 RPM 8TB drive.
  14. Depending upon your motherboard's design, and which slot you have the card installed, the card may be communicating with the CPU through the southbridge chipset (PCH), which might be the bottleneck. Often the PCH has a smaller pipe to the CPU, which is shared with all southbridge devices, commonly including SATA ports. If the southbridge connection is the limiting factor, then even moving a drive from the H310 to a motherboard SATA port might not make any difference if the motherboard SATA port is also going through the southbridge. I followed the link you provided to your current Unraid server (hopefully it still is current), and downloaded the manual for your mainboard. I see that there is one x16 slot (electrically x8), and two x8 slots (one x8 and the other electrically x4): It looks like the two electrically x8 slots both connect directly to the CPU, so as long as you are using either of those, I think you would be okay. The electrically x4 slot, furthest from the CPU, connects to the PCH - you should not be using this one. Looking at the system block diagram, I see that all 6 SATA ports are connected through the PCH. If the PCH is the bottleneck, and if you have the H310 correctly installed in one of the two x8 PCIe slots connected to the CPU, then moving a drive from the H310 to the motherboard may actually further slow down speeds. In that case, you may want to try the opposite, and move an array drive from the motherboard to the H310, so that you have 8 drives connected directly to the CPU, and only 4 drives connected through the PCH. Lastly, the PCH connects to the CPU via a DMI v2.0 x4 link, which is good for 2GB/s. That should be more than sufficient for 4 array drives (I'm not counting your cache), but if you have the H310 installed in the PCH connected PCIe slot, then you have 11 drives going over this link. 11 drives * 130 MB/s * 1.36 overhead = 1945 MB/s, which is suspiciously close to the 2000 MB/s limit of the DMI connection between the PCH and the CPU.
  15. The speeds seem artificially low. My 3TB 5400 RPM constrained array can hit 140 MB/s, and your 4TB drives should be marginally faster. While 130 MB/s is close, I think you have a bottleneck somewhere. With 7 drives on your SAS 2008 controller, let's check and see if that could be the culprit. 7 * 130 * 1.36 (this is an easier version of the formula I detailed above) = 1237 MB/s going through your controller. PCIe 1.0 x8 and PCIe 2.0 x4 both support 2000 MB/s, and PCIe 1.0 x4 supports 1000 MB/s. None of that lines up with 1237 MB/s, so it doesn't seem like this is a PCIe bus related constraint. That doesn't rule out the SAS 2008 controller, though - maybe it is just slow... Perhaps you have something about your build that doesn't show up in the report. Expanders? Maybe when using all of your SATA ports on your motherboard (sdb, sdc, sdd, sde) you are hitting some kind of bus limit? 4 * 130 * 1.36 = 707 MB/s, which again doesn't really seem like a common bus limit. I think you should try @jbartlett's DiskSpeed testing tool. Other thoughts: You have one of those servers that doesn't seem to react to changing the Unraid disk tunables. Except in extreme edge cases, you get basically the same speed no matter what. On the repeated tests, most seem to be withing +/- 0.9 MB/s, which is a fairly large variation, and for that reason your fastest measured speed of 129.7 is essentially the same as anything else hitting 127+ MB/s. Also, on at least one repeated test (Pass 1_Low Test 2 @ Thresh 120 = 127.8, and Pass 2 Test 1 = 116.6), the speed variation was 11.2 MB/s, which is huge. Perhaps you had some process/PC accessing the array during one of those, bringing down the score. For that reason, I say pretty much every test result was an identical result, and you probably won't notice much of any difference between any values. There's certainly no harm in using the Fastest values, as the memory utilization is so low there's no reason for you to chase more efficiency. Keep in mind if you use jbartlett's DiskSpeed test and find the bottleneck, and you make changes to fix it, you would want to rerun UTT to see if the Fastest settings change.
  16. I just checked in NerdPack too, and looks like I have an update available: The version/name is a bit odd. On the slackware repository, the version is screen-4.6.2-x86_64-2.txz, but this one in NerdPack has an 's', 4.6.2s. Not sure if that means anything special...
  17. root@Tower:/boot/utt# screen -version Screen version 4.06.01 (GNU) 10-Jul-17 I found this on my server in the NerdPack packages folder, so definitely the 64-bit version. Since it is already downloaded by NerdPack, you could just install it from there. \\<servername>\flash\config\plugins\NerdPack\packages\6.6\screen-4.6.1s-x86_64-1.txz
  18. Any chance that the screen problem is because we are installing a 32-bit version on a 64-bit server? Anyone know? Here's a URL to the 64-bit version: https://mirrors.slackware.com/slackware/slackware64-current/slackware64/ap/screen-4.6.2-x86_64-2.txz
  19. What was the output from step 3) Install screen: upgradepkg --install-new screen-4.6.2-i586-2.txz ?
  20. Hmmm. Well, I guess the good news is that you are only getting a couple notifications instead of hundreds, so it seems like it is mostly working. I'm not sure how a couple are slipping through. The one at the end is actually not that surprising, as there can be a delay for Unraid to send out each parity check start/finished notification, and the UTT script might have already removed the block at the end of the script before that notification comes through, so it should almost be expected that the very last parity check finished notification slips through. But the one at the beginning has me stumped, since the block is put into place before any parity checks are started. I see you are on Unraid 6.7.x - perhaps something has changed related to notifications since Unraid 6.6.x. I did all my development on Unraid 6.6.6, and I refuse to use 6.7.x until the numerous SMB and SQLite issues have been resolved.
  21. That sounds artificially low. Agreed. Looking at your test results, I see a couple things. First, you have a mixture of drives: 8TB, 6TB and 4TB. This has a impact on max speeds. How? Imagine a foot race with world's fastest man, Olympic champion Usain Bolt, your local high school's 40m track champion, a 5-year-old boy, and a surprisingly agile 92-year-old grandmother. I know you're thinking Usain will win, but wait... All four runners are on the same team, and they are roped together, and the race requirement is that no one gets yanked down to the ground - everyone has to finish standing up. Now it seems a bit more obvious that no matter how fast Usain is, he and his teammates basically have to walk alongside the 92-year-old grandmother who is setting the pace for the race. This is how Parity Checks work on Unraid. In my server, my 3TB 5400 RPM drives are the slowest, so they set the pace at 140 MB/s, even though my 8TB 7200 RPM drives can easily exceed 200 MB/s on their own. I'm not sure which drives are slowest in your system, your 4TB drives look like 7200 RPM units, so it might be the 6TB drives. But even though your drive mixture is slowing you down some, even your slowest drive should be good for 150+ MB/s. So something else is slowing your server down. To determine what that bottleneck is, math is your friend. I see that you have 16 drives connected to your SAS2116 PCI-Express Fusion-MPT SAS-2 controller. To understand what kind of bandwidth that controller is seeing, simply multiply the max speed by the number of drives: 16 Drives * 89.2 MB/s = 1,427 MB/s But that is just the drive data throughput. SATA drives use an 8b/10b encoding which has a 20% overhead throughput penalty, so your realized bandwidth is only 80% of what the controller is seeing. So we need to add the overhead back into that number: 1427 MB/s / 0.80 = 1784 MB/s We also need to factor in the PCI-Express overhead. While the 8b/10b protocol overhead in PCIe v1 and v2 is already factored into those speeds, there are additional overheads like TLP that further reduce the published speeds. You might only get at most 92% of published PCI-e bandwidth numbers, possibly less: 1784 MB/s / 0.92 = 1939 MB/s being handled by your PCI-Express slot. 1939 MB/s is a very interesting number, as it is very close to 2000 MB/s, which is equivalent to PCIe v1.0 x 8 lanes, and PCIe v2.0 x 4 lanes. So, long classroom lecture short, most likely what is happening is that your SAS controller is connecting to your system at PCIe 1.0 x8 or PCIe 2.0 x4. I'm not certain what controller you have, but based upon the driver I think the card has a PCIe 2.0 x8 max connection speed, which should be good for double what you are getting (perhaps around 182 MB/s for 16 drives). So you probably have plugged the controller into the wrong slot. On many motherboards, some of the x16 slots are only wired for x4, so while your PCIe 2.0 x8 card would fit in the x16 slot, the speed gets reduced to half-speed, PCIe 2.0 x4. Alternatively, you might have a really old system that only supports PCIe 1.0, which again would cut your speeds in half. Your signature doesn't specify your exact hardware, so I don't know which it would be. One last tip: If you are doing Windows VM's with passthrough graphics, and you are putting your graphics card in the fastest PCIe slot hoping for max speed - that probably isn't needed. I did some testing a couple years back, putting the video card in PCIe 3.0 x 16 and PCIe 3.0 x 4 slots, and in 3D Mark the score was nearly the same. I know all the hardware review websites like to make a big deal about PCIe bandwidth and video cards, but the reality is that for gaming it really doesn't make much of a difference. On the other hand, 16 fast hard drives can easily saturate a PCIe 2.0 x8 connection, so it is very important to put your HD controller in the fastest available slot. </class> Paul
  22. UTT v4.x also sends out a test begin and a test end notification, instead of the hundreds of notifications you would get with out the block. Any chance you're confusing the UTT notifications with the Unraid Parity Check notifications?
  23. Sorry, I should have tried the link that StevenD provided. I didn't realize it wasn't a direct link to the file, but rather to a web page from where you can start a download. This URL should work: http://mirrors.slackware.com/slackware/slackware-current/slackware/ap/screen-4.6.2-i586-2.txz I'll update my post above too.
  24. To expand on StevenD's answer: Change into your UTT directory: cd /boot/utt *** Download screen: wget http://mirrors.slackware.com/slackware/slackware-current/slackware/ap/screen-4.6.2-i586-2.txz Install screen: upgradepkg --install-new screen-4.6.2-i586-2.txz Run screen: screen *** NOTE: You should only have to download screen once, and you can do this from your Window's PC and save it to your \\<servername>\flash\utt directory or via the wget command line above. Each time you reboot, screen is no longer installed as Unraid boots from a static image, so you would still need to do steps 1, 3 & 4, but skip step 2 since you had downloaded it previously.
  25. Yeah, you want to stop any and all access of your shares during the test, from any and all sources.