Jump to content

Pauven

Members
  • Posts

    747
  • Joined

  • Last visited

  • Days Won

    7

Posts posted by Pauven

  1. On 7/20/2019 at 10:42 PM, Xaero said:

    For me, it takes just shy of 1.5 days to complete a parity check on 24 8tb drives, totalling 176TB with dual parity bringing the raw platters to 192tb. 
    Obvioulsy I have roughly 3 times your surface area. Which should come out to around 18-20 hours of parity check time, assuming similar speeds through and through.

     

    Almost 36 hours is indeed too long.  My mixture of 5400 RPM 3TB & 4TB drives, and 7200 RPM 8TB drives finishes in 18.5 hours.  A pure 7200 RPM 8TB setup should complete in under 16.5 hours.  Even slower 5400 RPM 8TB drives should finish in under 22 hours.

     

    On 7/20/2019 at 10:42 PM, Xaero said:

    Thanks to a port expander bottleneck ( Math found the answer - 2 * 4 links = 8 links * 3gb/s = 24gb/s / 24 drives = 1gb/s per drive = ~125mb/s give or take maximum theoretical bi-directional. ) that is not the case.
    But even knowing that, my speeds are still slower than they should be - around 60mb/s for the majority of the parity check. Something else seems fishy here. Running the script yielded ~10mb/s improvement for me, so I'll take what I can get until I can afford to toss money at the real problem.

     

    I'm not sure that your math accounts for typical communication protocol overhead, nor the inefficiency inherent in port expanders.  I'm happy you did find a 10MB/s improvement.  This is exactly the reason I avoided port expanders.  60-70 MB/s sounds close to what I would expect.

     

    2 hours ago, tmchow said:

    After running this tunables script last night, looks like I'm barely going to get any perf improvements but the stated speeds (both Bang for buck and unthrottled) are way faster what my parity speeds run at.   What's the reason for the disparity?

     

    2 hours ago, Squid said:

    The parity check times are an average, and ultimately include the much slower reads from the outer cylinders of the drives.  The tunable only runs for x number of minutes, and effectively only tests the fastest part of the drives.

     

    This is absolutely correct.  Though I think many users struggle to understand this without a visual:

     

    image.thumb.png.b2e8a8dadf49ae81328401535fce6caa.png

     

    This chart is from a HDD review, showing throughput speed (MB/s) over the course of the disk.  The lime green line starts off high around 200MB/s at the beginning (outside edge of the disk platter), then tapers off to around 95 MB/s at the end of the disk (inside edge of the disk platter).  This is just a random sample, and doesn't necessarily represent your drives, so this is just to show the concept of what is going on.

     

    The average speed of the drive (and the resulting Parity Check) would be around 155 MB/s, not the 200 MB/s peak.

     

    The Unraid Tunables Tester only tests the very beginning of the drive (i.e. the first 5-10%, where speeds are the highest).  This is way above the average speed of an entire drive, beginning to end.

     

    On this chart I drew three dashed lines. 

     

    The green line at the top represents Unraid Tunables set to a value that doesn't limit performance at all.  This is what we are trying to achieve with the Unraid Tunables Tester.

     

    The yellow line in the middle represents how a typical system (one without any real performance issue) might perform with stock Unraid Tunables.  Notice that while peak performance is reduced from 200 MB/s to perhaps around 190 MB/s, this slight reduction is only for the first 17% of the drive, beyond which the performance is no longer limited.  A 5% speed reduction for 17% of the drive only reduces average throughput (for the entire drive) by less than 1%, so fixing this issue might only increase average throughput for the entire drive by 1-2 MB/s.  Sure, it's an improvement, but a very small one.

     

    The red line at the bottom represents how some controllers have major performance issues when using the stock Unraid Tunables - like my controller.  In this case, the throughput is so constrained, over 90% of the drive performs slower than it is capable of performing.  Fixing the Tunables on my system unleashes huge performance gains.

     

    Hopefully that helps show why most systems see extremely little improvement from adjusting the Unraid Tunables - these systems are already performing so close to optimum that any speed increase will hardly make a dent in parity check times.  It's only the systems that are misbehaving that truly benefit.

     

    Paul

    • Upvote 1
  2. While your comments are very interesting, they are really outside the scope of the tunables tester.

     

    Going back to the car analogy, the tunables tester is trying to identify the best gas and oil to use to get the manufacturer's rated horsepower.  You bought a car with 300 horsepower, but for some reason you're only getting 150 hp, so we test all the variables to find out why you're not getting what you bought.  Perhaps your car needs 93 octane and 5w40, but you've been running 87 octane and 10w30.  Or maybe you need a tuneup with new spark plugs.  Simple fixes to eliminate performance issues, and each car might have slightly different issues.

     

    What you're talking about is swapping out camshafts, porting and polishing, upgrading the fuel system, and adding NOS, trying to get the maximum amount of power from the engine and hoping it doesn't blow.

     

    The tunables tester should be limited to testing the user settable tuning values that Limetech provides access to in the GUI.  Hence the name - Unraid Tunables Tester.

     

    Everything else that you're talking about sounds like a discussion you should be having with Limetech, or developing brand new tuning tools way beyond the scope of the tunables tester.

     

    Years ago, when I upgraded from v4.x to v5.x, my parity check times immediately increased from 6.5 hours to 12+ hours.  I was fine with 6.5 hours, but knew there was a problem with the nearly doubled parity check times.  Thus the tunables tester was born, and I was able to get back to my 6.5 hour parity checks.  My system does everything I ask of it, and while it's certainly not the fastest server in the world, performance is fine and I can watch movies while a parity check is running with no stuttering.  I can simultaneously run 4 Windows 2016 server VM's too.  Not sure what else I could ask for.

     

    I'm the type of person that is not interested in modifying hidden tuning parameters trying to eek out a bit of extra performance beyond what stock Unraid offers, and I would also be slow to upgrade to a new Unraid version that has made major changes to the I/O scheduler.  My 67 TB of data, and my time not spent doing data recovery, is much more important to me than a bit more performance that I probably won't even notice.  I'm still running 6.6.6 because I've seen things in 6.6.7 and 6.7.x that have me waiting for a more robust update.

     

    Don't get me wrong, I think your ideas are great and I hope you pursue them.  If you can get Limetech to implement these enhancements in core Unraid, that would benefit us all, and after a few months of letting everyone else beta test it for me, I would happily upgrade for the free performance boost.

     

    But elevating Unraid to a higher level of performance is not what the tunables tester is about - it's about fixing problematic settings that are dramatically hurting performance on certain machines.

     

  3. 9 minutes ago, Xaero said:

    Indeed, the point was to get something from the script usable with the existing code base and logic - not to create something new. The testing automation ideology implemented isn't fundamentally broken - just the amount of flexibility is. Ideally, one would create a much more in-depth test pattern for a finalized script for 6.x and Linux kernel versions in the 4.x->5.x family.

     

    While the testing ideology isn't broken, due to the v6.x changes in tunables, the script no longer provides a complete picture.  Going from memory, nr_requests is a new tunable that affects performance, and I had planned to include it in my v6 compatible beta, and in testing we saw some interesting results around 4 on some machines. 

     

    And while md_write_limit is gone in v6, we have a new md_sync_thresh that needs to be tested.  So instead of 3 tunables, with v6 there's really at least 4 - which only further complicates testing.

     

     

    12 minutes ago, Xaero said:

    I actually don't know what tunables are available currently (It's not hard to dump them with mdcmd) or what their direct impact is on various workloads.

     

    You can find them on the Disk Settings.  Note, some of them wouldn't be part of this type of testing, like md_write_method is beyond the scope of what this tunables tester is doing (and is irrelevant for parity checks anyway).

     

    I vaguely recall doing some type of testing with NCQ - but I don't think I ever automated that.  I think that was more of a manual effort, test with it both on and off, and see what works for your machine.

     

    poll_attributes I think is a newer tunable added in a fairly recent version - I don't think it existed back when I was working on my v6.x beta tunables tester.  I don't even know what this does.

     

    image.thumb.png.e9c6ee8029901d7d353f7e12dc749980.png

     

    18 minutes ago, Xaero said:

    I think, looking at the behavior of this script in the real world, and using a bit of computer science, a lot of the "testing time" can be eliminated. For example, performance always seems to be best at direct power of two intervals. We usually see okay performance at 512, with a slight drop off at 640, trending downward until we hit 1024. Knowing this, we can make big jumps upward until we see a decline, and then make smaller jumps downward in value to reach a nominal value with less "work" Obviously this would be great for getting initial rough values quickly. Higher granularity could then be used to optimize further. There's probably some math that could be done with disk allocation logic (stripe, sector, cache size et al) for even further optimizing, but that's a pretty large amount of research that needs to be done. 

     

    Believe it or not, the v5 script does a lot of that.  At runtime it provides multiple test types to choose from, and the quicker tests do exactly as you describe.  All of the tests try to skip around, hunting for a value region that performs better, then focusing in on that region and testing more values.  Older versions of the tunables tester made big jumps and tested much faster, but as I refined the tool I added more detailed testing that didn't skip around as much, because I've also found, from examining dozens upon dozens of user submitted results from different Unraid builds, that it is a mistake to make any major assumptions about what values work best.  There are some machines that get faster with smaller values (below 512).  Performance is not always best at powers of two intervals - for some machines yes, but not all. 

     

    These tunables seems to be controlling how Unraid communicates with the drive controller.  And there are dozens of different drive controllers, each with unique behaviors and tuning requirements.  Add in that many users have "Frankenstein" builds using different combinations of controller cards (some from the CPU/northbridge, some from the chipset/southbridge, and the rest from sometimes mismatched controller cards), and what you end up with is an entirely unpredictable set of tuning parameters to make that type of machine perform well.

     

    While I don't disagree with your sentiment, making a tunables tester that works equally well on every type of build in the real world doesn't align very well with expedited testing that skips around too much - what works great on one machine doesn't work at all on another.  It was very frustrating trying to identify a testing pattern that worked well on any and all machines.  To me, the big picture is that it's better to spend 8 hours identifying a set of parameters that provide a real-world benefit, rather than wasting 1 hour to come up with parameters that aren't really that great.  It's not like you run this thing every day - for most users it is a run once and then never again.  Trying to save time for something you run once, at the cost of accuracy, isn't ideal.

     

    Paul

  4. 4 hours ago, wgstarks said:

    I was surprised to see that this script starts a parity check when you run it. Is this normal?

     

    Edit: Tried canceling the parity check, but it just restarted when the next test started.

     

    3 hours ago, jonathanm said:

    Yes, since the point of this is to optimize the tunables for parity checks.

     

    Side benefit is that it might improve regular performance as well, but it benchmarks parity check times.

     

    To add to jonathanm's answer, the script starts a series of partial, read-only, non-correcting parity checks, each with slightly tweaked parameters, and logs the performance of each combination of settings.  Essentially, it is measuring the peak performance of reading from all discs simultaneously, and showing how that can be tweaked to improve peak performance.

     

     

    6 hours ago, DanielCoffey said:

    It came in at 17h15 again so it shows that the default settings are at least in the right ballpark. The longer parity checks I had in the past tended to be ones where I had suffered an unclean shutdown or dropped a drive due to cable wiggle and it needed a good look at everything.

     

    Improving peak performance is not the same thing as improving the time of a full parity check, as your parity check only spends a few minutes at the beginning of the drives where peak performance has an impact, as performance gradually tapers off from the beginning of your drive to the end.  If your peak performance was abnormally slow (i.e. 50 MB/s), then that would affect a much larger percentage of the parity check, and improving that to 150MB/s would make a huge improvement in parity check times, but increasing from 164 MB/s to 173 MB/s won't make much of a difference, since essentially you were already close to max performance and that small increase will only affect perhaps the first few % of the drive space.

     

    In a similar way, I could improve aerodynamics on my car to increase top speed from 164 MPH to 173 MPH, but that won't necessarily help my work commute where I'm limited to speeds below 65 MPH.  But if for some reason my car couldn't go faster than 50 MPH, any increase at all would help my commute time.

     

    There are a handful of drive controllers (like the one in my sig) that suffer extremely slow parity check speeds with stock Unraid settings, so I see a huge performance increase from tweaking the tunables.

     

    There is also some evidence that tweaking these tunables can help with multi-tasking (i.e. streaming a movie without stuttering during a parity check), and for some users this seems to be true.  I know there are some users who have concerns that maximizing parity check speed takes away bandwidth for streaming, though I don't think we ever actually saw evidence of this.

     

     

    On 7/19/2019 at 12:11 AM, Xaero said:

    I didn't bother messing with the logic as I wanted the script to retain its original functionality, and just have more longevity. 


    That's a shame, as that is really what is needed to make this script compatible with 6.x.  LT changed the tunables from 5.x to 6.x, and the original script needs updating to work properly with the 6.x tunables.  Fixing a few obsolete code segments to make it run without errors on 6.x doesn't mean you will get usable results on 6.x.

     

    I had created a beta version for Unraid 6.x a while back, but testing showed it was not producing usable results.  I documented a new tunables testing strategy based on those results, but never did get around to implementing them.  It seems that finding good settings on 6.x is harder than it was for 5.x - possibly because 6.x just runs better and there's less issues to be resolved. 

     

    I still have my documented changes for the next version around here somewhere...

     

     

    On 7/19/2019 at 12:11 AM, Xaero said:

    I don't plan on adopting this project, just don't like seeing hard work lost or error messages on my terminal windows.

     

    That's another shame.  Seems like you know what you're doing, more so than I do with regards to Linux.  I'm a Windows developer, and my limited experience with Linux and Bash (that's what this script is, right?) is this script.  For me to pick it up again, I have to essentially re-learn everything.  I keep thinking a much stronger developer than I will pick this up someday.

     

     

    I'm not trying to convince users not to use this tool, and I certainly appreciate someone trying to keep it alive, but I did want to clarify that the logic needs improvement for Unraid 6.x, and you may not get accurate recommendations with this Unraid 5.x tunables tester.

     

    Paul

    • Like 1
    • Upvote 1
  5. 31 minutes ago, Frank1940 said:

    Try creating a banner with a horizontal width of (say) 3000 Pixels.  This might give a banner that won't look too distorted at the extremes.

    That's a great idea.  That way it's either half stretched or half squished, so it will look more normal.

     

    I went ahead and created a 1080p version, and even stretched to 4K it looks okay, so for this one I'll just stick with 1080p.

     

    Since we seemed to be missing some AMD love in here:

     

    RyzenBannerExample.thumb.png.00b35c916cef69b18dc10b7c446215a9.png

     

    Banner_Ryzen_2K.thumb.png.d2cd4657e94c78b4e21893e6bd9fcb57.png

    • Like 3
    • Upvote 1
  6. I don't GUI boot, and rarely view the GUI on a mobile device (rare enough that I don't care how it looks there).

     

    90% of the time, I'm viewing in Chrome or Firefox on a 4K monitor.  Sometimes I'm viewing full screen, and sometimes I pin the browser to one half of the screen.  So effectively I'm viewing at both 3840 and 1920 on the same monitor.  Occasionally I view on my 1080p TV.

     

    To test, I just created a 4K banner (3840 x 90) and it looks perfect at fullscreen, but when I shrink the browser to half the screen width the banner gets smooshed.

     

    I also noticed that due to the smooshing/stretching, the server info at the right and Unraid logo at left cover different portions of the banner depending upon how smooshed/stretched the banner is.  The banner I created is very dark, and needs a lighter box at both ends to make the text overlays readable, but the size of the necessary box changes depending upon the smoosh/stretch factor.  Would be nice to be able to adjust the transparency of the server info background box instead - is that possible?  It is too transparent.

     

    I'm still on 6.6.6 - not sure if 6.7 changes any of this behavior.

     

    Paul

  7. On 9/28/2018 at 8:28 PM, Hoopster said:

    Depends on your monitor resolution.  The unRAID GUI used to be fixed at a 1280 pixel width regardless of monitor resolution and the recommended banner size was 1270x90.  I, and many others, have a 1920x1080/1200 resolution monitor.  With the GUI now using the full monitor width, "old" banners are stretched to fill.

     

    All the banners I create are 1920x90 since that is my monitor resolution.  For others, it may be different.

    I've got a mixture of 1080p and 4K monitors.  Is there a solution to one banner that displays correctly on 4K and 1080p?

  8. 11 hours ago, trurl said:

    I haven't commented on the version number before this. Here is something somewhat related.

     

    When I first got a cellphone I decided to just take my wife's number and add 1. Hers ended in 6659, then mine was 6660.

     

    So I use those digits 666 quite a bit when asked for my NUMBER.

    I gotcha beat.  I live in a gated community, and you have to dial 666 at the gate to ring us to let you in.

     

    Needless to say, we don't get many visitors.  😈

    • Like 1
  9. 1 hour ago, david279 said:

    You could run the command at the start of the array using the scripts plugin or add it to your go file.

     

    At that point, it has already been loaded, and is no different than me unloading it from the command line.

     

    I want a way to prevent it from ever loading in the first place.

  10. Interesting results.

     

    After loading the linux scaling driver, does the behavior revert to the Ryzen PBoost curve when you unload the driver, or do you have to reboot.

     

    I don't see where my driver is getting loaded, which makes me think it is automatic by the kernel.  Not sure how I would block it during the boot.

     

    I just read that Ryzen microcode updates were finally added to the linux-firmware.git collection.  This makes me think my Ryzen 1800X has never received any microcode firmware updates.  Not sure if that would affect the boost or not, but regardless I posted a request for including the new microcode updates here:  https://lime-technology.com/bug-reports/stable-releases/652-amd-cpu-microcode-updates-r84/

     

  11. 36 minutes ago, react said:

    One way you guys can test this is to unload everything from unraid, and do cpu stress test on unraid itself and check if 1 or 2T can boost to 4GHz (if unraid does not spread the load anyway). This would have to be done as Paul proceeded in removing the Linux CPU Freq scaling driver (as i believe is useless with Ryzen). Also we would need a better way to benchmark instead of getting only instantaneous GHz reading with grep it would be great to get session min/max/avg readings, dont know if that command exists, but it can be created with the grep command in a script file running @ lets say 10ms or even less if possible (it would be handy). If this is the case then its actually a good sign Ryzen 1 users cannot reach 4GHz, meaning unraid is spreading the load evenly across multiple cores, opposite to windows ...

     

    The data that i get in Ryzen 2700x both on unraid and on bare-bone windows 10 matches in perfection the previous graph and i can even see on windows (using hardware monitor) a session max close to 4.35GHz in some cores with the stock cooler. On unraid max that i could spot using grep was around 4.25GHz.

     

    I feel that I have essentially already done this test.  With everything unloaded and nothing running, unRAID reports that all 16 cores are idle at 0%.  I can then run a command to load a single core, and it never exceeds 3.7GHz.  I have also repeatedly seen 14 cores idling at 2.2GHz, and the other 2 at 3.7GHz, which seems to corroborate that I am loading only a single core.

     

    I have also used other commands to test frequency, besides grepping /proc/cpuinfo.  So far, the other commands essentially matched the /proc/cpuinfo, and if anything they were slightly lower.  Sorry, I don't have all the commands in front of me, I found them during web searches (supposedly they were more accurate for Ryzen), gave them a try, and moved on.

     

    I certainly agree that, based upon that graphic, the Ryzen 2xxx series will hit higher frequencies with more cores active, which is really awesome.  But that shouldn't change the fact that with only a single core loaded, a Ryzen 1800X should hit 4.0 GHz minimum, and 4.1 GHz with XFR.  Mine never goes over 3.7GHz, which is the same frequency that I can easily hit with all cores using the Performance governor.

     

    Perhaps I am mistaken, and that even when it appears that only a single core is loaded, in reality more cores are active.  In which case, I agree that the graphic perfectly explains what is going on.

     

    Paul

  12. 1 hour ago, david279 said:

    rmmod -f acpi-cpufreq to unload

     

    That worked, thanks!  Now I get this:

    root@Tower:~# cpufreq-info
    cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
    Report errors and bugs to [email protected], please.
    analyzing CPU 0:
      no or unknown cpufreq driver is active on this CPU
      maximum transition latency: 4294.55 ms.

    for all 16 CPU's.

     

    I also see "Driver: * no driver *" for CPU Frequency Scaling in Tips & Tweaks, and changing from On Demand to Performance no longer has any impact, where before it would push all cores to 3.7GHz.  So the behavior is now very similar to react's system.

     

    But I'm not seeing anything above 3.7GHz.  Top speeds looks the same.

     

    I wonder if it has to boot that way.  Perhaps stripping the driver out of a running system doesn't have the same result as booting without a driver.  And I still need to look in my BIOS again to see if there is a setting related to giving the OS control over frequency scaling.

     

    Paul

  13. Very interesting!  Thanks react!

     

    I might be reading between the lines, but it looks like you don't have a software cpu frequency driver running at all, so it is defaulting to hardware frequency control. 

     

    I have the acpi-cpufreq driver controlling the frequencies, which overrides the hardware control.  Perhaps the problem is with the acpi-cpufreq driver.

     

    My guess is that you didn't see any change when switching from Conservative to Performance because you don't have a driver, so you aren't really controlling anything.

     

    Anyone know how I can unload the acpi-cpufreq driver?

     

    I also recall some settings in the BIOS that I think were related to allowing the operating system to control frequency - though I might be imagining that now.  I'll have to look again.  Maybe I can disable software/OS control.

     

    Paul

  14. 1 hour ago, david279 said:

    Funny thing the all core boost for the 1800x is 3.7 but it's the xfr that doesn't seem to translate over.

     

    Not sure I agree with your interpretation.

     

    The 1800x  base clock is is 3.6GHz, and the single core boost is 4.0GHz.  XFR adds 0.1GHz to both of those numbers.  I'm getting 3.7Ghz all core, so I think XFR is working.

     

    To me, it seems the boost isn't working.  Core Performance Boost (CPB) allows individual cores to boost up to 4GHz (4.1GHz with XFR) on the 1800x.  This works on Windows.

     

    CPB support for early/first release 1800X's (which is what I have) was added to Linux kernel 4.14 (it was broken prior to that), and I've verified that the CPB flag is showing.

     

    I just did some more testing.  Upgraded my BIOS to the latest, (which includes the new AGESA 1.0.0.1a) but that made no difference.  Also tried disabling SVM in the BIOS, as I read somewhere that solved another user's problem with no single-core boost in Linux.  Disabling SVM broke KVM (which I definitely need) and did not fix the boost issue anyway.

     

    My temps are fine, and my cooling is sufficient.  I could easily overclock all cores to 4.0GHz, but I choose not to.  Stability, longevity, and power conservation are more important to me.  All the same, this CPU should boost a single core up to 4.1GHz, and it just won't do it under Linux.

     

     

    react, since you have a 2700x, what do you get for this command?

     

    root@Tower:~# cpufreq-info
    cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
    Report errors and bugs to cpufreq@vger.kernel.org, please.
    analyzing CPU 0:
      driver: acpi-cpufreq
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency: 4294.55 ms.
      hardware limits: 2.20 GHz - 3.60 GHz
      available frequency steps: 3.60 GHz, 3.20 GHz, 2.20 GHz
      available cpufreq governors: conservative, userspace, powersave, ondemand, performance, schedutil
      current policy: frequency should be within 2.20 GHz and 3.60 GHz.
                      The governor "ondemand" may decide which speed to use
                      within this range.
      current CPU frequency is 2.20 GHz (asserted by call to hardware).
      cpufreq stats: 3.60 GHz:6.43%, 3.20 GHz:4.04%, 2.20 GHz:89.53%  (6269)

     

     

    Paul

  15. I did a bit more testing.  Rebooting to complete the 6.5.2 installation, I double-checked my BIOS to make sure all the settings appeared correct.  While everything looks fine, I don't see an XFR setting at all in my BIOS.  I think it is safe to assume it is enabled.

     

    In Tips & Tweaks, I found that if I change "Enable Intel Turbo Boost?" to Yes (I had it on No before), then my max core frequency changes to 3.7GHz, so that's a small improvement.

     

    I tried changing the "CPU Scaling Governor", and on Performance all cores basically idled at 3.7GHz.  On Demand allows other cores to scale down, normally hitting around 2.1GHz, and I can see my 1 or 2 cores hitting 3.7GHz, but no higher.  Power Saver basically forces all cores down to about 2.2GHz.

     

    The odd part here is that the motherboard/cpu easily hits an 8/16 core frequency of 3.7GHz, which is perfect, but refuses to boost a single core any higher than 3.7GHz, even if only a single core is loaded.

     

    3 minutes ago, david279 said:

    This seems to be a ryzen 1 issue so I just oc, glad it's fixed for ryzen 2ND gen...

     

    I agree that I'm glad it is fixed, I'm just baffled that I'm not finding any additional info on the web regarding the problem or the solution.

     

    Paul

  16. 19 hours ago, react said:

    On the bios (latest version) did left it almost at default settings. XFR enable ( mem running @ 3GHz ) , AMD cool and quiet enable ... So all ryzen firmware oc features are enable and working well, the grep MHz command shows cpu core speed varying from 2.2GHz up to 4.25GHz :)

     

    image.png.b5ae1b178468c0e2fcdb682e20f4d9ac.png

     

     

    I can't believe that there isn't more discussion on this, as it certainly caught my attention.

     

    I have an 1800X, and my individual core's max out at 3.6GHz with unRAID, even when only a single core is loaded.

     

    I always figured that this was just a limitation of Linux, as in earlier testing I saw 4.1GHz in Windows.

     

    Now react comes along and shows boosting working perfectly on a 2700X, what's up with that?

     

    While I know that the 2xxx series on the X470 motherboards adds Precision Boost 2 and XFR2, these simply permit more aggressive boosting when more cores are loaded, and shouldn't be required for boosting when only a single core was loaded.

     

    I never see over 3.6GHz (I don't manually overclock).  Is this the same experience of other Ryzen 1 users?

     

    Is there a change in Ryzen 2 or X470 that fixes this issue, or a system setting or BIOS setting I might have wrong?

     

    Paul

  17. Hey gang, sorry I've been absent.

     

    Luckily I've been healthy, so no concerns there.

     

    Instead, I've been sidetracked by by work and other projects.  My biggest sidetrack has been a new program I wrote to replace the old My Movies Media Center Plugin.  That may be of some interest to unRAID users like me who store their large movie collections on their server:  If you're interested, you can check it out here:  MM Browser

     

    MM Browser was supposed to be a quick little 2-4 week programming project, just for myself, but then I went crazy and decided to sell it online, which required a ton more programming and a website.  User support has been much more time consuming than I ever fathomed.  I've easily spent 6 months full time on MM Browser.

     

    MM Browser pulled me away from my other project, the Chameleon Pinball Engine.  I had planned to have it ready for the next big Southern Fried Gameroom Expo here in Atlanta.  Somehow the time slipped away.  The show is in 4 weeks, and I'm realizing that there's too much work to make up to make to the show.  That's a big disappointment for me.

     

    I've also got a a small enterprise software suite that I've worked on for the past decade, and I'm currently working on my first big sale.  Trying to sell enterprise software to, uhm, big enterprises, has been eye opening to say the least.  So many hurdles, and I'm spending more time doing documentation than anything else.  Right now this is my biggest priority.

     

    Plus I've got a full time consulting gig at the moment.

     

    Long story short, I just haven't a moment to spare.

     

    I would release the private beta, but to be honest it just didn't work well, so that version was scrapped.

     

    I have documented plans for a new version that hopefully would fix the problems of the private beta.  Every so often I think about trying to knock it out, and I've come close to working on it a few times, but it just fell too low on my priority list.  There's always a chance I may get to it soon, but I can't make any promises.

     

    I know that this isn't the answer anyone was looking for.  Sorry.

     

    If anyone else wants to run with it, please feel free.  You have my blessing.

     

    Paul

     

    • Like 1
    • Upvote 1
  18. 10 hours ago, Jcloud said:

    If you have no trust for the tinyurl:

     

    It's like you're inside my head!

     

    I added mine to the survey.  Did you want us to include our Username?

     

    A couple notes on my build.  My Ryzen server has always been highly susceptible to the stability issues, and I always had to disable C-state Global Control in the BIOS to solve them.  For over a month now I've been running with "/usr/local/sbin/zenstates --c6-disable" in my go file, and C-state Global Control enabled, and I am running stable.

     

    I don't do GPU passthrough, but I am currently running 4 VM's.  I'm running 3 VM's with Windows Server 2016 Standard, each with 4-cores and 16GB of RAM.  These are running an enterprise web application that I am developing against.  I'm also running a single VM with Windows Server 2016 Datacenter and SQL Server 2016, also with 4-cores and 16GB.  This VMs is the database server for the other 3 application servers.

     

    Surprisingly, even with 16-cores and 64GB concurrently allocated, the unRAID server is running like a champ.  I'm averaging about 37% CPU load, and 57% memory usage.  The KVM technology is simply amazing.  And I can still stream Blu-ray movies.

     

    I'm hammering these servers with requests, approximately 24 XML API requests per second per server, 40 HTTP requests per second per server, and 72 database queries per second, or 6.2 million DB queries per day.  My current unRAID uptime is 32 days (when I installed 6.4.1), and the VM uptime is approaching 18 days during this stress test session.

     

    My only problem is that the database server ran out of disk storage as the audit table grew so large from all the DB requests...  hehehe.

     

    Paul

    • Like 1
  19. On 3/7/2018 at 11:13 AM, John_M said:

     

    The maximum rated memory controller speed for 1000-series Ryzens is 2667* MHz and for the 2000-series APUs 2933* MHz. Anything higher than this is an overclock and is not guaranteed. It is not true to say that all Ryzen chipsets have issues with RAM clocks.

     

    *Note: these are the effective speeds due to the DDR RAM. The actual speeds are 1333 and 1467 MHz, respectively.

     

    Sorry, couldn't resist.  What John stated is 100% correct, though some details were omitted.

     

    To build on John's statement, those maximum speeds are for Single-Rank DIMMs and typically only 2 populated slots.  The most likely configuration this would achieve is 16GB of installed RAM (2x 8GB Single-Rank). 

     

    If you are running Dual-Rank DIMMs (think 16GB DIMMs, though I believe some 8GB DIMMs can also be Dual-Rank), or have installed memory into all 4 DIMM slots, then the maximum supported frequency drops.

     

    And if you are running Dual-Rank DIMMs, AND have installed memory into all 4 DIMM slots, then the maximum supported frequency drops again.

     

    On my Ryzen 7 1800x build, I have installed 64GB, so that means I am running both Dual-Rank DIMMs and I have populated all 4 memory slots.  While my memory is only running at DDR4-2400, technically that is overclocked, as maximum supported speed in this configuration is just DDR4-2133.

     

    While my ASRock board runs my memory at DDR4-2400 without any special configuration, I am running overclocked according to AMD Ryzen specifications.

     

    Depending upon the installed memory, Killer IQ may be overclocking the integrated Ryzen memory controller by nearly 50%.

     

    While initially all Ryzen boards had issues with RAM clocks, the situation has greatly improved over the past year, so that Killer IQ's assertion is no longer valid.

     

    As G.I. Joe says, "Knowing is half the battle."

     

    Paul

    • Like 1
  20. 8 hours ago, Tuftuf said:

    @luisv I know you've kept c-states off a while now due to the issues. 

     

    I don't think anyone has reported C-State issues since the 'fix' was added for it, which was awhile ago. I'm just wondering if its time to try figure out what else it could be.

     

    @everyone-else else Has anyone else had C-State issues with the recent versions? 

     

    EDIT: Maybe I should have read @david279's post first.

     

    I was the original discoverer of the Ryzen stability issue and C-state solution.  My server is extremely susceptible to the C-state issue, typically crashing in 4-8 hours when the issue is present.

     

    I'm running 6.4.0-rc7a, with C-states enabled, and my uptime is 52 days.

     

    I have avoided all of the recent 'Really Close' releases since 7a, as the changes just seemed too scary for me to be a guinea pig.  I think it was the introduction of the block level device encryption.  I don't plan to use it, but I have nightmares thinking that a beta version could somehow misbehave and accidentally encrypt my precious data, so that I never get it back.  I know the odds of that happening are pretty much zilch, though if it could happen it would likely happen to me.  I'm waiting for the next stable public release.

     

    Anyway, perhaps something has changed since 7a that lost the fix for the Ryzen C-state issue.

     

    Paul

  21. 3 hours ago, Greygoose said:

    Pauven, I can not offer any assistance. Except to say thank you for continuing to get ryzen rolling sweet with Unraid. 

     

     

     

    Thanks Greygoose!

     

    I just reached 50+ hours uptime on 6.4.0-rc7a with C-states enabled.  Looks like Lime-Tech may have solved the stability issue, good job guys!

     

    With C-states enabled, Idle wattage has dropped 10+ watts.  My UPS only reports in 10.5w increments (which is 1% of the 1050w power rating), so actual savings are likely somewhere between 10.5w-21 watts.  From earlier testing with a more accurate Kill-A-Watt, the actual delta between C-states enabled & disabled was between 12w-18w.

     

    Idle temps have dropped 2-3 degrees C on both CPU (41C) and System (36C).  Not as much as I had hoped, but I think my expectations were off.  I did a lot of initial testing with the case cover off, and temps have unsurprisingly increased simply from closing the case, as case fans are on lowest speed (three 120mm fans, 1000 RPM @ 35% PWM), and they have to suck air past the HD's, so very little airflow at idle.  The CPU fan speed profile is set to 'Standard' in the BIOS.

     

    At max case fan speeds (2750 RPM), idle CPU temp easily drops to 35C and System to 30C, but the higher fan speeds consume an extra 10+ watts and make lots of noise.

     

    As a compromise, I just changed my minimum case fan speed to 1400 RPM @ 50% PWM, which is much more quiet and energy efficient than full blast, but still improves my idle temps a couple degrees over the slowest fan speeds:  39C CPU, 34C System.  I'll probably change the CPU fan profile from Standard to Performance in the BIOS to see if that drops the 5C delta over ambient a bit, but other than that I think I'm done.

     

    I'm happy to have idle temps back in the 30's, at reasonable fan speeds/noise, and with idle watts back to a more reasonable level.

     

    Paul

    • Upvote 1
  22. 13 minutes ago, HellDiverUK said:

    Paul, have you considered wiping your boot stick and starting again fresh?  

     

    I had some weirdness a while back with stuff not updating, drives going missing, etc, and in the end I wiped my boot stick and started again, and most of the problems went away.  The original stick was one I'd been using probably from 6beta days, so who knows what crap was lingering in the background.  

     

    Also, with your machine doing all those lockups and KPs, it seems to me there might be some corruption creeping in.

     

     

     

    That's not a bad idea.  I've been using the same USB stick for 8+ years, since the beginning when I started with 4.5 beta4 (with its brand new 20-disk limit).  How's that for a flashback.

     

    Though I've certainly wiped it on occasion over the years.  Most recently I think for the 6.1 branch.

     

    Now that I've finally got things settled, I'm gonna let it chill as-is.  If more problem crop up, this will be high on my trouble-shooting list.

     

    As far as dealing with the potential corruption, I might just have to start from scratch and rebuild my configuration if I wipe the drive.  Otherwise, I'm simply restoring potentially corrupted files.

     

    Thanks.

     

    Paul

  23. Okay, multiple findings.

     

    First, when I checked on my server this morning, I found a Kernel Panic on the console screen, and the system was fully hung.  Here's a pic:

     

    1RI2SHixU8NGitvuAvpjcqrBI2e7ePP0B5PHkEcz

     

    I restarted in Safe Mode again, started the array, and checked the share.cfg file.  shareCacheEnabled was still missing.

     

    I stopped the array and went to the Settings/Global Shares panel.  I couldn't directly apply "Yes" to "Use cache disk:", as it was already on "Yes" and wouldn't let me Apply it.  I set it to "No", Applied, then set back to "Yes" and Applied.

     

    Now the share.cfg file got updated with the shareCacheEnabled="Yes" line, plus what appears to be several additional lines that must have also been missing.  Here's the new file contents:

     

    # Generated settings:
    shareDisk="e"
    shareUser="e"
    shareUserInclude=""
    shareUserExclude=""
    shareSMBEnabled="yes"
    shareNFSEnabled="no"
    shareNFSFsid="100"
    shareAFPEnabled="no"
    shareInitialOwner="Administrator"
    shareInitialGroup="Domain Users"
    shareCacheEnabled="yes"
    shareCacheFloor="2000000"
    shareMoverSchedule="40 3 * * *"
    shareMoverLogging="yes"
    fuse_remember="330"
    fuse_directio="auto"
    shareAvahiEnabled="yes"
    shareAvahiSMBName="%h"
    shareAvahiSMBModel="Xserve"
    shareAvahiAFPName="%h-AFP"
    shareAvahiAFPModel="Xserve"

     

    Expecting Mover to now work again, I restarted into normal mode, as I wanted my temperature and fan plugins to keep my drives cool while the Mover got busy.

     

    On reboot, I confirmed that shareCacheEnabled="yes" was still in the share.cfg file.  I then manually started Mover.

     

    This time the logged message was "root: mover: started", and I can see disk activity so it appears that Mover really is working.

     

    So it appears that my Cache drive and Mover troubles are finally over - thank you Tom and all who helped.

     

    That said, assuming Lime-Tech is already here reading this, I'd like to take a moment to recount my experiences with the -rc6/-rc7a releases:

    • Experienced 1 Kernel Panic while in Safe Mode on -rc7a (above), and possibly another in Safe Mode on -rc6 (speculation)
    • The upgrade from 6.3.latest to 6.4.0-rc6 coincided with whacking some cache related configuration file parameters (can't rule out plug-ins as a contributing factor)
    • Could not assign the cache drive under -rc6, though -rc7a fixed this
    • Several -rc7a anomalies (cache drive showing unassigned even though it was assigned, multiple Stop/Starts/Restarts required to get system synced up & behaving correctly)
    • Currently, Mover is working but only at about 36 MB/s peak.  Never paid attention before, because Mover is normally running in the middle of the night, but this seems rather slow.  Possibly because data is being written to a drive that is 96% full, so this may be nothing.
    • Odd caching in new GUI under -rc7a (didn't notice on -rc6) in which sometimes I have to Shift-F5/Forced Refresh to get current data presented.  
    •      An easy example is the UPS Summary on the Dashboard, which kept reporting 157 watts for 30 minutes after I spun the drives down.  I finally forced a screen refresh, and the status updated to 84 watts.  
    •      Another example (plugin related) is the Dynamix System Temps ticker at the bottom of the screen doesn't seem to be updating.  I've got both Firefox and IE open on the Main screen, and the ticker has been frozen on both for 10+ minutes, and they don't match each other.  If I click around the menus, sometimes the ticker updates, and sometimes it just disappears.  The behavior seems worse on IE than Firefox.
    • On the plus side, the new 4.12 kernel includes some drivers that were missing, so that's pretty nice.  It will take a while to determine if the C-state issue is resolved.

     

    Thanks,

    Paul

     

     

×
×
  • Create New...