Jump to content

Pauven

Members
  • Posts

    747
  • Joined

  • Last visited

  • Days Won

    7

Everything posted by Pauven

  1. I do want to give a special credit to @itimpi for this solution. He showed me how to safely block these notifications. Also, my original concept was to block them for 24 hours (since a full-scale UTT test cycle can be around 20 hours), and he challenged me on this, and came up with the awesome methodology to block them for only a minute. Paul
  2. The end-user notification system just controls emails and the on-screen popups informing you of events. Whether or not you receive these notifications, the underlying events still occur. The solution I've implemented defaults to allowing all notifications, and if a flag file is present, it omits the notifications for 'Unraid Parity check' events only, and it does this for a maximum of 60 seconds after the timestamp of the flag file, before reverting back to allowing all notifications. But just because you aren't notified that a parity check is running doesn't mean Unraid can't do a parity check. After all, the Unraid Tunables Tester starts and stops over a hundred partial parity checks while the parity check notifications are blocked. UTT actually has to update the flag file hundreds of times during a test (right before every parity check start or stop) to set the current timestamp in order to block the Unraid notification that a parity check has started/stopped. Whether or not the notifications are shown or blocked, normal system events like Unraid starting a parity check after a power failure will simply not be affected. I would also expect that if there was a power failure a split second after the flag file was updated to the current timestamp, that the time for the power to return and for the server to reboot and for Unraid to automatically begin a parity check would exceed 1 minute, so not only would the auto parity check begin but the flag file would have expired and you would receive the notifications too. Oh, and no I haven't tested a power failure on my server. Never have and never will. Sorry for the long answer. I presume many users may be concerned over the prospect of having any notifications blocked, so I thought it best to explain in a bit more detail how the safeguards work. I will also be making the notification blocking feature optional in UTT, so if a user is uncomfortable with this feature they can avoid it. Paul
  3. Thought I would give a status update on the development of UTT v4. @itimpi provided some great information, and I now have parity check notifications blocked for the duration of the tests. The blocking function has safeguards built-in, so even if the script is aborted, within one minute parity check notifications are unblocked (I'm doing this with a flag file that expires after 1 minute). I'm also preventing most of the parity checks from being logged in the parity check history. I say most, because Unraid forcibly rewrites the status of the very last parity check to the log if you remove it. Still, much better to have just a single entry instead of hundreds. This next beta is pretty close to a full-rewrite, at least as compared to the last v2.2 for Unraid v5. Tons of new functionality. UTT v2.2 essentially used a one-dimensional array of test values - that's all that was needed. For Unraid v6 and the new md_sync_thresh and nr_requests, the test results are now being logged in a psuedo-three-dimensional array. This is much more complicated, but I finally have it working. I'm still fleshing out some of the new tests and options, and hope to have something for public release next week. Fingers crossed that all this new logic will actually provide accurate tuning parameters for all types of machines... Paul
  4. The new logic is performing over 100 parity check start/stops. Nothing really new there, but with Unraid v6 there is the adverse effect that these actions are logged in the parity check history, and you get separate on-screen notifications for each of these events. Does anyone know of any way to temporarily disable the logging of these events to the parity check history, and/or temporarily disable the on-screen notifications? If you don't know the answer, but know who might, can you help bring them into this conversation? Paul
  5. Well, good job guys. The conversation has prompted me to find and review my testing documentation, which includes my strategy for the next test routine. And it just so happens that at this rare moment, I find myself with a bit of free time, and craving working on something different for a change of pace. So I think I'm going to try implementing the new testing strategy. I'll also take a look at Xaero's revision to see what best practices I need to apply.
  6. Almost 36 hours is indeed too long. My mixture of 5400 RPM 3TB & 4TB drives, and 7200 RPM 8TB drives finishes in 18.5 hours. A pure 7200 RPM 8TB setup should complete in under 16.5 hours. Even slower 5400 RPM 8TB drives should finish in under 22 hours. I'm not sure that your math accounts for typical communication protocol overhead, nor the inefficiency inherent in port expanders. I'm happy you did find a 10MB/s improvement. This is exactly the reason I avoided port expanders. 60-70 MB/s sounds close to what I would expect. This is absolutely correct. Though I think many users struggle to understand this without a visual: This chart is from a HDD review, showing throughput speed (MB/s) over the course of the disk. The lime green line starts off high around 200MB/s at the beginning (outside edge of the disk platter), then tapers off to around 95 MB/s at the end of the disk (inside edge of the disk platter). This is just a random sample, and doesn't necessarily represent your drives, so this is just to show the concept of what is going on. The average speed of the drive (and the resulting Parity Check) would be around 155 MB/s, not the 200 MB/s peak. The Unraid Tunables Tester only tests the very beginning of the drive (i.e. the first 5-10%, where speeds are the highest). This is way above the average speed of an entire drive, beginning to end. On this chart I drew three dashed lines. The green line at the top represents Unraid Tunables set to a value that doesn't limit performance at all. This is what we are trying to achieve with the Unraid Tunables Tester. The yellow line in the middle represents how a typical system (one without any real performance issue) might perform with stock Unraid Tunables. Notice that while peak performance is reduced from 200 MB/s to perhaps around 190 MB/s, this slight reduction is only for the first 17% of the drive, beyond which the performance is no longer limited. A 5% speed reduction for 17% of the drive only reduces average throughput (for the entire drive) by less than 1%, so fixing this issue might only increase average throughput for the entire drive by 1-2 MB/s. Sure, it's an improvement, but a very small one. The red line at the bottom represents how some controllers have major performance issues when using the stock Unraid Tunables - like my controller. In this case, the throughput is so constrained, over 90% of the drive performs slower than it is capable of performing. Fixing the Tunables on my system unleashes huge performance gains. Hopefully that helps show why most systems see extremely little improvement from adjusting the Unraid Tunables - these systems are already performing so close to optimum that any speed increase will hardly make a dent in parity check times. It's only the systems that are misbehaving that truly benefit. Paul
  7. While your comments are very interesting, they are really outside the scope of the tunables tester. Going back to the car analogy, the tunables tester is trying to identify the best gas and oil to use to get the manufacturer's rated horsepower. You bought a car with 300 horsepower, but for some reason you're only getting 150 hp, so we test all the variables to find out why you're not getting what you bought. Perhaps your car needs 93 octane and 5w40, but you've been running 87 octane and 10w30. Or maybe you need a tuneup with new spark plugs. Simple fixes to eliminate performance issues, and each car might have slightly different issues. What you're talking about is swapping out camshafts, porting and polishing, upgrading the fuel system, and adding NOS, trying to get the maximum amount of power from the engine and hoping it doesn't blow. The tunables tester should be limited to testing the user settable tuning values that Limetech provides access to in the GUI. Hence the name - Unraid Tunables Tester. Everything else that you're talking about sounds like a discussion you should be having with Limetech, or developing brand new tuning tools way beyond the scope of the tunables tester. Years ago, when I upgraded from v4.x to v5.x, my parity check times immediately increased from 6.5 hours to 12+ hours. I was fine with 6.5 hours, but knew there was a problem with the nearly doubled parity check times. Thus the tunables tester was born, and I was able to get back to my 6.5 hour parity checks. My system does everything I ask of it, and while it's certainly not the fastest server in the world, performance is fine and I can watch movies while a parity check is running with no stuttering. I can simultaneously run 4 Windows 2016 server VM's too. Not sure what else I could ask for. I'm the type of person that is not interested in modifying hidden tuning parameters trying to eek out a bit of extra performance beyond what stock Unraid offers, and I would also be slow to upgrade to a new Unraid version that has made major changes to the I/O scheduler. My 67 TB of data, and my time not spent doing data recovery, is much more important to me than a bit more performance that I probably won't even notice. I'm still running 6.6.6 because I've seen things in 6.6.7 and 6.7.x that have me waiting for a more robust update. Don't get me wrong, I think your ideas are great and I hope you pursue them. If you can get Limetech to implement these enhancements in core Unraid, that would benefit us all, and after a few months of letting everyone else beta test it for me, I would happily upgrade for the free performance boost. But elevating Unraid to a higher level of performance is not what the tunables tester is about - it's about fixing problematic settings that are dramatically hurting performance on certain machines.
  8. While the testing ideology isn't broken, due to the v6.x changes in tunables, the script no longer provides a complete picture. Going from memory, nr_requests is a new tunable that affects performance, and I had planned to include it in my v6 compatible beta, and in testing we saw some interesting results around 4 on some machines. And while md_write_limit is gone in v6, we have a new md_sync_thresh that needs to be tested. So instead of 3 tunables, with v6 there's really at least 4 - which only further complicates testing. You can find them on the Disk Settings. Note, some of them wouldn't be part of this type of testing, like md_write_method is beyond the scope of what this tunables tester is doing (and is irrelevant for parity checks anyway). I vaguely recall doing some type of testing with NCQ - but I don't think I ever automated that. I think that was more of a manual effort, test with it both on and off, and see what works for your machine. poll_attributes I think is a newer tunable added in a fairly recent version - I don't think it existed back when I was working on my v6.x beta tunables tester. I don't even know what this does. Believe it or not, the v5 script does a lot of that. At runtime it provides multiple test types to choose from, and the quicker tests do exactly as you describe. All of the tests try to skip around, hunting for a value region that performs better, then focusing in on that region and testing more values. Older versions of the tunables tester made big jumps and tested much faster, but as I refined the tool I added more detailed testing that didn't skip around as much, because I've also found, from examining dozens upon dozens of user submitted results from different Unraid builds, that it is a mistake to make any major assumptions about what values work best. There are some machines that get faster with smaller values (below 512). Performance is not always best at powers of two intervals - for some machines yes, but not all. These tunables seems to be controlling how Unraid communicates with the drive controller. And there are dozens of different drive controllers, each with unique behaviors and tuning requirements. Add in that many users have "Frankenstein" builds using different combinations of controller cards (some from the CPU/northbridge, some from the chipset/southbridge, and the rest from sometimes mismatched controller cards), and what you end up with is an entirely unpredictable set of tuning parameters to make that type of machine perform well. While I don't disagree with your sentiment, making a tunables tester that works equally well on every type of build in the real world doesn't align very well with expedited testing that skips around too much - what works great on one machine doesn't work at all on another. It was very frustrating trying to identify a testing pattern that worked well on any and all machines. To me, the big picture is that it's better to spend 8 hours identifying a set of parameters that provide a real-world benefit, rather than wasting 1 hour to come up with parameters that aren't really that great. It's not like you run this thing every day - for most users it is a run once and then never again. Trying to save time for something you run once, at the cost of accuracy, isn't ideal. Paul
  9. To add to jonathanm's answer, the script starts a series of partial, read-only, non-correcting parity checks, each with slightly tweaked parameters, and logs the performance of each combination of settings. Essentially, it is measuring the peak performance of reading from all discs simultaneously, and showing how that can be tweaked to improve peak performance. Improving peak performance is not the same thing as improving the time of a full parity check, as your parity check only spends a few minutes at the beginning of the drives where peak performance has an impact, as performance gradually tapers off from the beginning of your drive to the end. If your peak performance was abnormally slow (i.e. 50 MB/s), then that would affect a much larger percentage of the parity check, and improving that to 150MB/s would make a huge improvement in parity check times, but increasing from 164 MB/s to 173 MB/s won't make much of a difference, since essentially you were already close to max performance and that small increase will only affect perhaps the first few % of the drive space. In a similar way, I could improve aerodynamics on my car to increase top speed from 164 MPH to 173 MPH, but that won't necessarily help my work commute where I'm limited to speeds below 65 MPH. But if for some reason my car couldn't go faster than 50 MPH, any increase at all would help my commute time. There are a handful of drive controllers (like the one in my sig) that suffer extremely slow parity check speeds with stock Unraid settings, so I see a huge performance increase from tweaking the tunables. There is also some evidence that tweaking these tunables can help with multi-tasking (i.e. streaming a movie without stuttering during a parity check), and for some users this seems to be true. I know there are some users who have concerns that maximizing parity check speed takes away bandwidth for streaming, though I don't think we ever actually saw evidence of this. That's a shame, as that is really what is needed to make this script compatible with 6.x. LT changed the tunables from 5.x to 6.x, and the original script needs updating to work properly with the 6.x tunables. Fixing a few obsolete code segments to make it run without errors on 6.x doesn't mean you will get usable results on 6.x. I had created a beta version for Unraid 6.x a while back, but testing showed it was not producing usable results. I documented a new tunables testing strategy based on those results, but never did get around to implementing them. It seems that finding good settings on 6.x is harder than it was for 5.x - possibly because 6.x just runs better and there's less issues to be resolved. I still have my documented changes for the next version around here somewhere... That's another shame. Seems like you know what you're doing, more so than I do with regards to Linux. I'm a Windows developer, and my limited experience with Linux and Bash (that's what this script is, right?) is this script. For me to pick it up again, I have to essentially re-learn everything. I keep thinking a much stronger developer than I will pick this up someday. I'm not trying to convince users not to use this tool, and I certainly appreciate someone trying to keep it alive, but I did want to clarify that the logic needs improvement for Unraid 6.x, and you may not get accurate recommendations with this Unraid 5.x tunables tester. Paul
  10. What the current status with 6.7.0 RC4? Are there still stability issues, or has the situation improved?
  11. At least they are still actively troubleshooting it after 17 months. [he says almost convincingly... clinging to hope]
  12. Simply awesome, thanks for doing this!
  13. In case anyone needs it, here's a link to the original discussion on the Ryzen lock issue: Most of the troubleshooting comes before that entry. Some notes: I never got a log entry, ever, in all my crashes - it just crashes too fast. On a few occasions I was able to see a crash dump on the console screen, but that was rare, and though I shared photos of it, no one was able to determine anything from it. I also configured my system to boot into Windows by default, then manually started Unraid - that way when it crashed the next boot was into Windows, and it was there that Windows reported the Machine Check Exceptions (MCE's) in the logs, which seem to get stored in the BIOS and reported to the system on the next boot, though Unraid doesn't show this info.
  14. Do NOT disable C6. I've recently seen this advice floating around the forums, and even saw it in someone's video. Not only will disabling C6 not help, it actually makes it worse. How do I know? I'm the guy who originally identified the solution nearly 2 years ago. I meticulously tested every BIOS setting, figuring out that disabling "Global C-state Control" is the solution. I even have a link to this in my signature (though for some reason our sigs don't show here in the bug report section). Disabling C6 is not the same thing as disabling Global C-state Control. Now, all that said, it seems there's something going on with 6.7 that even Global C-state Control isn't helping.
  15. I can't see the gallery you created, or any galleries at all. Is this section of the forum invite only?
  16. That's a great idea. That way it's either half stretched or half squished, so it will look more normal. I went ahead and created a 1080p version, and even stretched to 4K it looks okay, so for this one I'll just stick with 1080p. Since we seemed to be missing some AMD love in here:
  17. That's really disappointing. I keep hoping ASRock will add that feature to my BIOS, or maybe I'll get lucky and find it hiding somewhere I hadn't looked. But if it doesn't even work.... then I guess I don't need to email ASRock support to ask them to add this setting. What's your motherboard?
  18. I don't GUI boot, and rarely view the GUI on a mobile device (rare enough that I don't care how it looks there). 90% of the time, I'm viewing in Chrome or Firefox on a 4K monitor. Sometimes I'm viewing full screen, and sometimes I pin the browser to one half of the screen. So effectively I'm viewing at both 3840 and 1920 on the same monitor. Occasionally I view on my 1080p TV. To test, I just created a 4K banner (3840 x 90) and it looks perfect at fullscreen, but when I shrink the browser to half the screen width the banner gets smooshed. I also noticed that due to the smooshing/stretching, the server info at the right and Unraid logo at left cover different portions of the banner depending upon how smooshed/stretched the banner is. The banner I created is very dark, and needs a lighter box at both ends to make the text overlays readable, but the size of the necessary box changes depending upon the smoosh/stretch factor. Would be nice to be able to adjust the transparency of the server info background box instead - is that possible? It is too transparent. I'm still on 6.6.6 - not sure if 6.7 changes any of this behavior. Paul
  19. I've got a mixture of 1080p and 4K monitors. Is there a solution to one banner that displays correctly on 4K and 1080p?
  20. Those are definitely not the ones you want to touch! The AMD BIOS's are typically very complex, with settings hiding where you wouldn't think to look. Try poking around, as Global C-state Control is a pretty universal option. Paul
  21. @TechnoBabble28, I don't think you disabled the correct setting. You do not want to adjust C6, as it will make the problem worse. Leave it at Auto. Instead, you need to disable "Global C-state Control", which is typically located here: Advanced --> AMD CBS --> Zen Common Options --> Global C-state Control Tom, sorry I'm not going to try 6.7.0-RC1, I need stability in my life right now. Paul
  22. I noticed from your screenshot that you are running a Ryzen 1700. These Ryzens are famous for hard locks since day 1. I believe that the core problem was traced down to idle power use being so low that it causes a mismatch between different power rails on many power supplies, causing things get wonky and then the system locks up. There have been multiple fixes, from disabling C-States, Unraid config file parameters, and the best solution being enabling the power supply compensation setting in your BIOS (if your BIOS has this - mine does not). I am currently using the settings in my config file. Typically these Ryzen machines lock up only when idle, as that's when they can enter the lower power states that lead to the problem. Your usage matches this. Your time-frame to lockup is also very indicative of this issue. I have noticed that different Unraid releases seem more/less susceptible to this issue. Perhaps the latest 6.7 release with 4.19 is more susceptible - maybe the new code is so efficient it is idling better. The fixes for this are outside the scope of this thread. Just wanted to make sure you were aware - sorry if you already knew all this. With a fix in place, the hard lock is completely resolved. Of course, this may not be your issue at all, and like Tom said you need to open a bug report. Paul
  23. I gotcha beat. I live in a gated community, and you have to dial 666 at the gate to ring us to let you in. Needless to say, we don't get many visitors. 😈
×
×
  • Create New...