• [6.9.0 beta 30] Pre-Skylake Intel CPUs Stuck at Lowest pstate


    NNate
    • Minor

    Update: Slow parity check is a symptom of a 5.8 Kernel Bug where pre-Skylake CPUs get stuck at the minimum pstate MHz (often 800MHz). Scroll down in this thread for links to the associated Kernel bug and discussion. Otherwise, this link will jump you to my updated post in this thread. 

     

     

    I've been noticing that 6.9.0 has been slower than 6.8.3 for disk access when using dockers (also the high amount of disk reads on my cache even after making them 1MB aligned, but that's another issue).

     

    I did my first parity check and it's markedly slower than historical. Normally parity check will complete in under 19hours at 118MB/s. The current parity check run happening now has been running for 9.5 hours and is only 30% done at 74 MB/s - with another 21 hours estimated to remain.

     

    If that estimate is true (and I know things slow down toward the end), that'll be over 30 hrs vs 19 hrs. 10 more hours than history shows. I haven't made any hardware changes vs 6.8.3.

     

    I've attached my diagnostics as well.

     

    CurrentParityCheck.thumb.png.86a9926eb35da41b93acc1dce4439f5c.png

     

     

    ParityCheckHistory.thumb.png.e6cad9d30b73b250caf82059e61298d1.png

     

     

    • Thanks 1



    User Feedback

    Recommended Comments

    I'll keep waiting, but it's now been the 18hrs 50min that it's been historically and I'm only at 62% complete. Things have sped up slightly (82ish MB/s), but still overall a far way away from previous runs.

    I can update when it completes, but it seems the original estimate will be pretty close.

    Edited by NNate
    Link to comment

    It's always possible it is much slower with certain hardware/config, but in my test server with 30 devices there's only a small slow down, 149MB/s with v6.8.3 and 142MB/s with -beta30.

    Link to comment

    OK, it finally finished after 1 day, 4hrs, 41min with 77.5MB/s vs previously consistently at 18hrs, 50min with 118MB/s.

     

    That's 10hrs slower (50%) - that's crazy. I have no idea what's gone wrong.

    Link to comment

    Diags show a high CPU utilization, suggesting that is the problem, or at least part of it, maybe the latest kernel vulnerability mitigations have an impact.

    Link to comment

    Yeah, 1 core (hopped around to different cores along the way) was pegged at 100% during the entire check. I know it's not the fastest CPU out there, but I'd think an i5-4690k would have the muscle to power through. So that was certainly surprising, but I guess I never really paid attention to the CPU usage in the past during a parity check.

    Link to comment

    Parity checks are single threaded, so one core stuck at 100% will bottleneck, but yeah, that CPU should be enough for the array size, unless like mentioned some kernel mitigation is making a large difference.

    Link to comment

    I do have the "Disable Security Mitigations" plugin installed. I don't mind turning off the mitigations and running again. Currently rebooting for those to take effect and then will test again once my system has time to stabilize after startup.

    Edited by NNate
    Link to comment

    Those mitigation were already on v6.8.3, my thinking is that could be some new mitigation or change specific to this kernel, also not a bad idea to retest with v6.8 if possible to confirm it's really a beta issue.

    Link to comment

    This looks very suspect:
    https://bugzilla.kernel.org/show_bug.cgi?id=209085

    when I `cat /proc/cpuinfo | grep "MHz"` it basically only shows 800MHz.

    cat /sys/devices/system/cpu/intel_p_state/status = passive

    cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor = powersave

     

    Seems like these findings line up with the above linked bug reports/reddit page.

     

    Sounds like a very significant issue if you're running Intel pre-Skylake (ie 6th gen processors).

     

    Edited by NNate
    • Like 1
    • Thanks 1
    Link to comment

    Oh yeah, wow "echo active > /sys/devices/system/cpu/intel_pstate/status" really gave my system a kick in the pants. HUGE difference

     

    This will hurt a lot of people's performance pre-Skylake. Being stuck at the lowest pstate supported is painful.

     

    How can I make sure this change sticks post-reboot? Anything that can be done for others beyond manually making those changes?

    Edited by NNate
    Link to comment

    Do you have the Tips and Tweaks plugin installed? Check your "CPU Scaling Governor" setting. I'm guessing it is set to "Power Save", change it to "On Demand" and your CPUs should no longer be throttled.

    Link to comment

    Yes, that would solve it as well. From what I've read, I don't think "On Demand" is as efficient with the pstates for Intel as the "Power Save".

    Link to comment

    So I guess you're saying there is a bug in the 5.8 kernel that causes "powersave" mode to be too aggressive? That could be true, as I changed that setting a long time ago and it wasn't until recently that it started feeling sluggish.

     

    At least for me though, I'm more comfortable reverting to defaults (CPU Scaling Governor = On Demand) rather than forcing pstate to active.  i.e. undo a customization rather than add more customization

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.