• [6.11.0] All server fans shutoff a few hours after startup.


    CriticalMach
    • Urgent

    First noticed this problem a few weeks ago with 6.11.0 rc4. After rebooting into rc4 a few hours went by and then I started getting email alerts that multiple drives were overheating. I came downstairs and was met with complete silence. ALL of the fans in my server case were off. I rebooted and the fans spun up upon reboot but a few hours later it happened again. I reverted to rc 3 and it did not happen again. When rc5 came out I upgraded to it to see if it would happen again and sure enough a few hours later it did. Reverted to rc3 and no issues for about a week. I upgraded to 6.11 final when it was released and the issue happened again. At that point I reverted to 6.10.3 and have been running it for the past week with zero issues. This morning I went into my bios to ensure that nothing has changed with my fan settings. My fans are set to run 100% all the time. I then upgraded to 6.11 again and a few hours later the issue repeated. I am not running a fan controller nor do I have my fans on any kind of curve, they run at 100% all the time.. I can 100% replicate the bug by upgraded to 6.11 and waiting a few hours.

    Diagnostics are attached. I received the overheat alert email at 18:55 Sep 29th and immediately shutdown the server. I would appreciate any help!

    tower-diagnostics-20220929-1855.zip




    User Feedback

    Recommended Comments

    This is rather strange, stock Unraid doesn't have any fan control, please try booting in safe mode to rule out any plugins.

    Link to comment

    Well, it's made it all day without the fans stopping so you might be onto something. Here's a list of all of my plugins but nothing here looks like it would anything to do with fans. Any thoughts, anyone?

    ca.backup2.plg - 2022.07.23  (Up to date)
    ca.mover.tuning.plg - 2022.04.13  (Up to date)
    ca.turbo.plg - 2022.09.16  (Up to date)
    ca.update.applications.plg - 2021.09.24  (Up to date)
    community.applications.plg - 2022.09.26  (Up to date)
    customtab.plg - 2021.03.10  (Up to date)
    disable.security.plg - 2021.03.10  (Up to date)
    disklocation-master.plg - 2022.06.10  (Up to date)
    docker.folder.plg - 2022.09.24  (Up to date)
    dynamix.active.streams.plg - 2020.06.17  (Up to date)
    dynamix.cache.dirs.plg - 2020.08.03  (Up to date)
    dynamix.file.manager.plg - 2022.09.07  (Up to date)
    dynamix.s3.sleep.plg - 2021.03.13  (Up to date)
    dynamix.system.buttons.plg - 2020.06.20  (Up to date)
    dynamix.system.info.plg - 2020.06.21  (Up to date)
    dynamix.system.stats.plg - 2022.05.20a  (Up to date)
    dynamix.unraid.net.plg - 2022.09.28.1258  (Up to date)
    enhanced.log.plg - 2022.08.19  (Up to date)
    file.activity.plg - 2022.08.19  (Up to date)
    fix.common.problems.plg - 2022.09.26  (Up to date)
    flash.remount.plg - 2021.09.06  (Up to date)
    gpustat.plg - 2022.02.22  (Up to date)
    gui-links.plg - 2022.05.29  (Up to date)
    gui.search.plg - 2022.02.12  (Up to date)
    nvidia-driver.plg - 2022.09.27a  (Up to date)
    open.files.plg - 2022.08.19  (Up to date)
    plexstreams.plg - 2022.08.31  (Up to date)
    rclone.plg - 2022.09.02  (Up to date)
    theme.engine.plg - 2020.01.16  (Up to date)
    tips.and.tweaks.plg - 2022.08.30  (Up to date)
    unassigned.devices.plg - 2022.09.16  (Up to date)
    unassigned.devices-plus.plg - 2022.08.19  (Up to date)
    unassigned.devices.preclear.plg - 2022.09.02  (Up to date)
    unbalance.plg - v2021.04.21  (Up to date)
    unlimited-width.plg - 2020.05.27  (Up to date)
    unRAIDServer.plg - 6.11.0
    usb_manager.plg - 2022.08.20  (Up to date)
    user.scripts.plg - 2022.08.01  (Up to date)
    wakeonlan.plg - 2019.12.30  (Up to date)

    Link to comment
    dynamix.s3.sleep.plg - 2021.03.13  (Up to date)

     

    From the ones installed I would try uninstalling this one first, you can just rename the respective *.plg to for example *.bak on /boot/config/plugins to prevent the plugin installing after a reboot, if it's not this one not quite sure which one could be, I would remove a few at a time to try and drill down on the culprit.

    Link to comment

    Looks more like a hardware issue to me, you can try going back to the previous known good release and see if it still happens, if yes it's likely hardware related.

    Link to comment

    I've also been experiencing a similar problem. After moving to 6.11 the cpu cooler/fan (bequiet) would stop running and the server overheat and crash. I thought it was the fan/cooler so I replaced the cooler/fan and case fans for all new artic AIO + new bequiet fans for the case.

    The fans have stopped "stopping" all together, but what will happen is they'll work ok for a couple of days and then either freeze at very low rpms eventually overheating the server or freeze at high rpms (this bequiet fans go up to 3000rpm) getting quite noisy.

     

     

    Link to comment

    My server was still doing it, but I "solved" the issue by getting a couple of molex to 4 pin fan adapters and bypassing my motherboard fan headers completely. 

    Link to comment

    Cross posting from:

     

    I have noticed the same sort of issue on my end. Server has been running fine for years. However in the last month or so, after X amount of uptime days my CPU fan will stop spinning and the rest of the case fans will slow down to almost nothing. This causes the HDD temps to exceed the warning limit of 45 C. A reboot always restores functionality until another X days pass.

     

    I have not made any hardware changes to the server for years, only upgrading Unraid versions as they release.

     

    The timing does coincide with my Unraid upgrade from 6.10.3 to 6.11.2 which was done on November 6. (Issue also occurred on 6.11.5).

     

    I will be trying to downgrade to Unraid 6.10.3 and monitor.

     

    For reference, this is an Asus X470-F motherboard and Ryzen 3700x CPU.

     

    I should add, I have never had any of the Dynamix plugins installed and the issue has happened 4-5 times in the last month or so.

    Link to comment

    Holy shit! I've been dealing with this for what feels like 2 months. at first I thought it was RAM. but I just watched the dam thing. all fans stopped its slowly overheated then killed itself from temp protection

    Link to comment

    Mine just started doing this 2 days ago.  Had no idea what happened until I was sitting in the room and all of a sudden it went quiet.  All the fans just stopped.  I watched as the temps rose until I decided to just reboot the system instead of letting it get too hot.

    Edited by Haldanite
    Link to comment

    are we onto something here? had this with 2 different MB too, my solution for now is an Sata to 4x 4pin pwn adapter and running all fans directly from the powersupply (its all noctua fans and not that loud even on full tilt)

    Edited by Marcel40625
    Link to comment
    20 hours ago, JorgeB said:

    So looks like this issue is caused by a new Asus WMI sensor module and and a buggy Asus firmware on some boards, @OrneryTaurusfound this:

     

    https://forums.unraid.net/topic/131340-unsolved-611x-bug-system-fans-stopped-working-after-a-few-days-while-running-multiple-times/?do=findComment&comment=1217166

     

     

    After 49 days uptime on Unraid 6.10.x (never was able to pass 10+ days on 6.11.x), I decided to give this workaround a try.

     

    For Unraid you have to modify/add the file "/boot/config/modprobe.d/disable-asus-wmi.conf" instead of what is mentioned in the Reddit link. On boot it will copy it to "/etc/modprobe.d".

     

    I confirmed that this worked because I no longer see the fan speeds in the Unraid dashboard, indicating that the Asus module is disabled.

    • Like 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.