• [6.10.0-RC1] Unable to enable ASPM: it stays disabled even with policy and boot option (Ubuntu works, Unraid not)


    Falcosc
    • Minor

    Same issue like

    The Kernel Update didn't change the behavior.

     

    Currently, Tested Ubunutu Versions:

    • Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1)
    • Ubuntu 21.04
    • Ubuntu 21.10 "Impish Indri" - Alpha amd64 (20210811)

     

    Because the Kernel Version of Ubuntu 21.10 is very close to Unraid's 5.13.8 the lspci outputs are identical except for the ASPM part.

     

    Steps to reproduce:

    1. correctly configure bios to support ASPM
    2. Don't install any unnecessary components, no hard drives connected, no add-in cards installed
    3. prepair a fresh Ubuntu and set boot option intel_iommu=on to get same lspci output like unraid
    4. check if ASPM is enabled
      lspci -vvvnnPPDq > ubunutu_lspci.txt
      lsmod > ubunutu_lsmod.txt
      modprobe -c > ubunutu_modprobe.txt
      cat /proc/modules > ubunutu_modules.txt

      ubunutu_lspci.txtubunutu_lsmod.txtubunutu_modprobe.txtubunutu_modules.txt

    5. prepair unraid: add pcie_aspm=force to /boot/syslinux/syslinux.cfg
    6. check if ASPM is enabled
      lspci -vvvnnPPDq > lspci.txt
      lsmod > lsmod.txt
      modprobe -c > modprobe.txt
      cat /proc/modules > modules.txt

      modules.txtmodprobe.txtlspci.txtlsmod.txt

    Expected Result:

    ASPM should be enabled like it is in Ubuntu to save 4W of power in idle

     

    Actual Result:

    It is still disabled

     

    tower-diagnostics-20210811-2142.zip

     

    I don't know which files could be interesting to find out what does prevent unraid from enabling ASPM. I thought maybe one of the kernel modules or configuration are differently and did export some of them. But lspci does already tell the ubuntu and unraid are using the same kernel modules.

     

    Is there maybe a script somewhere in unraid which does disable ASPM to fix issues related to this option?

     

    Ubuntu Kernel config and Syslog:

    syslogconfig-5.11.0-20-generic

     




    User Feedback

    Recommended Comments

    Do you need the "pcie_aspm=force" kernel parameter?  There is this warning:

    WARNING: Forcing ASPM on may cause system lockups.

     

    Also it would be helpful to post the kernel .config file of your Ubuntu install.

    Link to comment

    Thank you for the prompt response.

     

    No, the parameter is not needed. Ubuntu does enable ASPM without the parameter.

     

    Unraid does not enable ASPM with or without parameter.

     

    I was only comfortable trying this parameter because I know that Ubuntu did enable ASPM on all devices by default without setting anything. But force on Unraid didn't help at all. There are some more tricks, which I documented and tested in the old 6.9 topics. For the 6.10 I did only test the force parameter.

     

    By the way, this issue is related to my chipset. If I connect PCI-E Devices direct to the CPU, they get ASPM enabled on unraid like they should, even without force. So maybe something related to the chipset is causing the disabling.

     

    @limetechWhat else do you need from Ubuntu?
    Do you like to get exports from all 3 Ubuntu Versions, or only the 21.10 Alpha?

    Edited by Falcosc
    Link to comment

     

    2 hours ago, Falcosc said:

    Do you like to get exports from all 3 Ubuntu Versions, or only the 21.10 Alpha?

     

    Just the kernel .config file from whichever one works for you. I want to see how hey defined acpm-related kernel config settings.

    Your syslog from diags report that ACPM is enabled...

    Link to comment
    6 hours ago, limetech said:

    Your syslog from diags report that ACPM is enabled...

    Yes, I did confirm that the force command was correctly put into boot parameters by looking into syslog as well. And ASPM does work on devices which are directly connected to CPU.

    So, there is something which does ignore the enablement or something which disables ASPM for selected devices.

    I did also check the power consumption in case lspci is wrong, but ASPM is really disabled on my chipset and nic.

     

    I will export the ubuntu syslog as well, maybe it is the other way around and something inside ubuntu does manually enable ASPM after the kernel fails to do it? There are so many possibilities. The correct way would be to debug the boot process to find out which driver/module/script does detect and enable ASPM on the chipset. But I don't know how to do this.

    Edited by Falcosc
    Link to comment
    On 8/12/2021 at 1:15 AM, limetech said:

    Your syslog from diags report that ACPM is enabled...

    I think around this lies the problem.

     

    One of my servers has "perfectly" working ASPM (without setting any additional Kernel options):

    lspci -vv | awk '/ASPM.*?abled/{print $0}' RS= | grep --color -P '(^[a-z0-9:.]+|ASPM.*?abled)'
    00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07) (prog-if 00 [Normal decode])
                    LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
    00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0) (prog-if 00 [Normal decode])
                    LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
    00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0) (prog-if 00 [Normal decode])
                    LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
    00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0) (prog-if 00 [Normal decode])
                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
    00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0) (prog-if 00 [Normal decode])
                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
    01:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02)
                    LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
    04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
    05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 (prog-if 02 [NVM Express])
                    LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+

     

    But its syslog claims the opposite?! Why does FADT return the wrong ASPM capabilities?

    ACPI FADT declares the system doesn't support PCIe ASPM, so disable it

     

    And I have a different server which does not return this message in the syslog, but all its devices return "ASPM Disabled", although it's enabled in the BIOS (and enabled, when using Ubuntu). Even "pcie_aspm=force" doesn't change anything. Only setpci works.

     

    For me it sounds like a driver is missing?!

     

    @Falcosc

    Maybe this script helps to enable ASPM automatically:

    https://web.archive.org/web/20190301120327/http://drvbp1.linux-foundation.org/~mcgrof/scripts/enable-aspm

    Link to comment

    yes, I just notice the same thing,  I'm quite sure before 6.92,  it is work as I had to manually add pcie_asmp=off to get rid of some PCIe Bus Error message

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.