• [6.9.0 RC1 - RC2] - CPU usage stuck at 100% (2 core/thread)


    tjb_altf4
    • Minor

    Ok so the upgrade to 6.9.0-rc1 has been mostly smooth, however I'm seeing 2 cores stuck on 100%

    Noticed there are some call traces in logs, which never appeared in 6.8.3

     

    100% usage happens in safe mode also, so not related to plugins.

    The actual cores in use are rotating.

    I have rolled back and forward to ensure that it was a clean upgrade, tried uninstalling plugins etc with no change.

     

    Usage is irrespective of VM and Docker services running or not, so is something related to core OS.

    Fresh diagnostics attached

     

     

    fortytwo-diagnostics-20201212-1456.zip




    User Feedback

    Recommended Comments

    35 minutes ago, Vr2Io said:

    Pls check relate ACPI interrupts or not.

     

    grep EN -r /sys/firmware/acpi/interrupts/

    Output:

    root@fortytwo:~# grep EN -r /sys/firmware/acpi/interrupts/
    /sys/firmware/acpi/interrupts/ff_pwr_btn:       0  EN     enabled      unmasked
    /sys/firmware/acpi/interrupts/gpe08:       0  EN     enabled      unmasked
    /sys/firmware/acpi/interrupts/ff_gbl_lock:       0  EN     enabled      unmasked

     

    Edited by tjb_altf4
    Link to comment
    5 hours ago, ljm42 said:

    It looks like you have Tips and Tweaks installed, what is your "CPU Scaling Governor" set to? If it is "Power Save", try "On Demand". Made a big difference for me. There is more discussion here: https://forums.unraid.net/bug-reports/prereleases/690-beta-30-pre-skylake-intel-cpus-stuck-at-lowest-pstate-r1108/ 

     

    I do have Tips and Tweaks installed, however the Governor is already set to On Demand.

    Edited by tjb_altf4
    Link to comment

    Uncovered a couple of issues that may be impacting me....

     

    Looks like this might be a kernel bug that was introduced in kernel 5.6.8, when USB4 support was introduced.

    Unraid 6.8.3 which had kernel 4.19 did not have this issue.

     

    The bug report post description accurately describes the issue "Batch mode top showing high CPU usage by usb_hub_wq and pm"

     

    image.png.0f91b428ee9fae7c013f51ed6dbc9f62.png

     

    https://bugzilla.redhat.com/show_bug.cgi?id=1858291

     

    Also looking at the output of dsmeg there are zenstates errors (I don't have zenstates added to go file)

    Is zenstates part of the kernel or the unraid OS now?

    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]
    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates
                               Please report to [email protected]

    Same issue reflected here, stating same issues with kernel 5.9.x

    https://github.com/r4m0n/ZenStates-Linux/issues/16

    Edited by tjb_altf4
    Link to comment
    1 hour ago, SimonF said:

    Could it be the bluetooth drivers? Do you use BT on your motherboard, have you tried to disable in BIOS if not in use?

    Not tried to disable it, not using it... bluetooth drivers were present in 6.8.3 which didn't have this issue.

    That said I'll try disabling bluetooth and wifi next time I have the chance to reboot the server to see if it helps.

    Link to comment
    14 hours ago, SimonF said:

    Could it be the bluetooth drivers? Do you use BT on your motherboard, have you tried to disable in BIOS if not in use?

    Disabled BT and WAN, no change.

     

    If I didn't have so many cores up my sleeve this be making my system unstable.

    Link to comment
    On 12/15/2020 at 3:26 PM, tjb_altf4 said:

    [Tue Dec 15 08:18:54 2020] msr: Write to unrecognized MSR 0xc0010292 by zenstates Please report to [email protected]

    I haven't this max core problem on 1920x, did you try not execute zenstates in GO file ?

     

     

    On 12/12/2020 at 7:16 PM, tjb_altf4 said:

    OK, with the array stopped, wireguard and shfs are not in use but kworkers are still hitting 2 cores at 100%

    Then what process max out cores ?

     

    Edited by Vr2Io
    Link to comment
    46 minutes ago, Vr2Io said:

    did you try not execute zenstates in GO file ?

    Well I didn't think I had zenstates in there... but sure enough I double checked and found it in my GO file.

    Will reboot and test ASAP

    Link to comment
    On 12/20/2020 at 11:36 AM, Vr2Io said:

    I haven't this max core problem on 1920x, did you try not execute zenstates in GO file ?

     

     

    Then what process max out cores ?

     

    What motherboard do you have? The issue seems to be centered around USB.

    The processes maxing out are kworkers for usb_hub_wq and pm

     

    I have one trick left up my sleeve, which is moving to a newer BIOS, but only want to try that as last resort as my system is otherwise running well and I don't want to introduce new issues, particularly around virtualisation.

    Edited by tjb_altf4
    Link to comment
    36 minutes ago, tjb_altf4 said:

    What motherboard do you have?

    GIGABYTE X399 AORUS PRO, also 6.9RC2.

     

    How about unplug unnecessary USB device ? Or disable 3rd party onboard USB controller.

     

    No problem on 6.9 beta3x ?

    Edited by Vr2Io
    Link to comment
    On 12/22/2020 at 8:36 AM, Vr2Io said:

    GIGABYTE X399 AORUS PRO, also 6.9RC2.

    May come down to vendor differences between ASRock & Gigabyte, maybe BIOS, maybe hardware choices.

     

    On 12/22/2020 at 8:36 AM, Vr2Io said:

    No problem on 6.9 beta3x ?

    I didn't run this server on beta, however I suspect it would be the same if my suspicions around it being related to the USB4 implementation in the kernel.  

    On 12/22/2020 at 8:36 AM, Vr2Io said:

    How about unplug unnecessary USB device ? Or disable 3rd party onboard USB controller.

    Not really anything connected other than unraid flash and keyboard/mouse, I think I need to try swapping the connected ports see if this makes a difference.

    Link to comment

    I moved the usb devices to the primary usb slots and I noted the kworker/xx +pm and +usb_hub_wq, as well as the ksoftirqs changed IDs for the first time.

    Really seems to be something not working right for my hardware around usb in the kernel.

     

     

    image.png.871f6ae736ed63949feaf9b29e3c4826.png

    Link to comment
    5 hours ago, tjb_altf4 said:

    usb in the kernel

    May not cover your specific issues, but there are a few usb fixes in 5.10.2 kernel release.

     

    Alan Stern (1):
          USB: legotower: fix logical error in recent commit
    
    Alexander Sverdlin (1):
          serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access
    
    Bui Quang Minh (1):
          USB: dummy-hcd: Fix uninitialized array use in init()
    
    Greg Kroah-Hartman (1):
          Linux 5.10.2
    
    Hans de Goede (1):
          xhci-pci: Allow host runtime PM as default for Intel Alpine Ridge LP
    
    Li Jun (1):
          xhci: Give USB2 ports time to enter U3 in bus suspend
    
    Mika Westerberg (1):
          xhci-pci: Allow host runtime PM as default for Intel Maple Ridge xHCI
    
    Oliver Neukum (2):
          USB: add RESET_RESUME quirk for Snapscan 1212
          USB: UAS: introduce a quirk to set no_write_same
    
    Peilin Ye (1):
          ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info()
    
    Steven Rostedt (VMware) (2):
          ktest.pl: If size of log is too big to email, email error message
          ktest.pl: Fix the logic for truncating the size of the log file for email
    
    Takashi Iwai (3):
          ALSA: usb-audio: Fix potential out-of-bounds shift
          ALSA: usb-audio: Fix control 'access overflow' errors from chmap
          ALSA: pcm: oss: Fix potential out-of-bounds shift
    
    Tejas Joglekar (1):
          usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK
    
    Thomas Gleixner (1):
          USB: sisusbvga: Make console support depend on BROKEN

     

    Edited by SimonF
    • Like 1
    Link to comment

     

    On 12/13/2020 at 9:06 PM, ljm42 said:

    It looks like you have Tips and Tweaks installed, what is your "CPU Scaling Governor" set to? If it is "Power Save", try "On Demand". Made a big difference for me. There is more discussion here: https://forums.unraid.net/bug-reports/prereleases/690-beta-30-pre-skylake-intel-cpus-stuck-at-lowest-pstate-r1108/ 

     

    just to let you know that this issue is still present in 6.9.2 and this solution is still valid. My system is now 10times faster than in "power safe". 

     

    Thank you very much!

    • Like 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.