• 6.12.4 - Gets unresponsive (Have to Cut power & hard reboot)


    casperse
    • Urgent

    Hi All

     

    Over night the system gets unresponsive (I cant even telnet into the server).

    So cutting the power is my only option.

    This time I did set an alarm to see when it would "go down"

    image.png.7cc2bdabf9f3fca7dfe6545daf611fc2.png

    image.thumb.png.176c1ac65e0ff396aabb758a3cbc4399.png

     

    2023-09-05 23:55:42 - unresponsive

     

    And I did the diagnostic just after hard reboot.

    I hope someone can help me find out what is causing this, after the upgrade?

    (FYI: I have replaced my PSU and my USB flash)

     

    Its like the network just "dies" next time I will try to ping the server....

    Really hope someone can help me, I miss the old days where a reboot happend once a month 🙂



     

    diagnostics-20230906-0645.zip




    User Feedback

    Recommended Comments



    So I waited and it just happened again total freeze of the Unraid server, not even a SSH connection was possible.

     

    Cut the power and retrieved the syslog from the usb boot stick.

    (For some reason the setting to save the syslog to the cache drive didn't seem to work?).

     

    Please let me know if you need any more information that I can retrieve for you
     

    syslog

    Edited by casperse
    Link to comment
    Sep 10 10:17:40 PLEXZONE sshd[22102]: Connection closed by authenticating user root 192.168.0.92 port 57804 [preauth]
    Sep 10 10:20:17 PLEXZONE kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-15

    Was the crash at this time?

    Link to comment

    According to the uptimerobot the alarm I got was received at:
    2023-09-10 05:20:06 and 2023-09-10 06:04:02 after that it was unresponsive.

    Edited by casperse
    Link to comment
    17 hours ago, fateful-inbound8133 said:

    Could be related to this issue many of us are seeing on 6.12.3:

     

    But I am running 6.12.4 and I think its resolved in this release

    Link to comment

    I'm not seeing anything relevant logged, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

    Link to comment

    What about this:
    image.thumb.png.9b0a63a1cbbda81c45ea29206ab275ab.png

    I did a:

    /etc/rc.d/rc.nginx restart
    /etc/rc.d/rc.nginx reload


    And my services is still running but my UI is gone! (No way to login)
    Any idea on what command to write to get it back?

    Edited by casperse
    Link to comment

    Ended up doing a telnet/SSH and a reboot command.

    Could this be related to the "connect.myunraid" I have this setup with custom ports and are using it everyday!
    Only change I can think of?

    Maybe trying to run Unraid without and se if the problems disappear? 

    Link to comment

    So the long errors above:

    "nchan: Out of shared memory while allocating message of size"

     

    Crashed again this morning.

    So this could be related to a docker? (I have now stopped everything except a few, to see if this makes a difference....

    Link to comment
    16 minutes ago, casperse said:

    So the long errors above:

    "nchan: Out of shared memory while allocating message of size"

    Difficult to say what's causing those, and possibly there's more than a reason, for some users it's leaving a browser window opened to the GUI, especially if it's going to sleep.

    Link to comment

    I have a VM on the server that have a chrome browser with a open session to the Unraid UI
    Closing that stopped the log from giving more errors 🙂

    • Like 1
    Link to comment

    Quite a few of the browsers have introduced the idea of sleeping idle tabs so I think if this option is active it can cause problems if tabs are left open to the Unraid GUI.

    Link to comment

    Update: Closing all active browsers to Unraid and always closing active windows helped.

    Also I did a stop of all dockers and have slowly started them over the last days.

    And I think "Nextcloud + DB" dockers might be part of the problem?

     

    5x24H days uptime new record!

    image.png.2467b6f3c180368b063fde6def45344a.png

    Link to comment

    I keep searching for answers to the 6.12 issues and this is a new one. I have yet to get 6.12 run longer than a day on any version, including 6.12.4. I do tend to leave a browser open on some device or another conencted to my server. 

     

    If leaving a browser open causes a server to lock up, the problem is the OS, not the browser, lol. 

     

    I guess I'll stick to 6.11.5 where I don't have to worry about my server crashing if I forget to close a browser tab (which I do all the time).

    Link to comment
    On 9/27/2023 at 7:17 AM, shaunvis said:

    I keep searching for answers to the 6.12 issues and this is a new one. I have yet to get 6.12 run longer than a day on any version, including 6.12.4. I do tend to leave a browser open on some device or another conencted to my server. 

     

    If leaving a browser open causes a server to lock up, the problem is the OS, not the browser, lol. 

     

    I guess I'll stick to 6.11.5 where I don't have to worry about my server crashing if I forget to close a browser tab (which I do all the time).

    I had the same problem. There is something going on with .12 that causes hangs. I reverted back to 11.5 and uptime has been great. You'll notice the same problem said over and over again, but will hold of until something forces me to upgrade.

    Link to comment
    On 9/13/2023 at 11:38 AM, casperse said:

    Its happening again? closing Unraid connect window to see if it stops in the log.


    Attached diagnostic file.....

    image.thumb.png.8e429a6160a3a1b798419e44bf5c22bb.png

     

    diagnostics-20230913-1835.zip


    This is the same issue that others and I are having, in this thread:

     


    I have been counting the errors with this one-liner:

    grep -o 'Increase nchan_max_reserved_memory' /var/log/syslog | wc -l

     

    My server will crash if I leave a tab open for 18hrs on accident. 

     

    Please reopen this issue. 

    Diags upon request, but I will have to recreate as I have been nuking the logs and staying logged out. 

     

    Link to comment

    The problem is not resolved I am now doing reboots every week - sometimes I am lucky and can do it by a telnet command but mostly the only way is to cut the power. 😞
    (Syslog server is now enabled again)

     

    Edited by casperse
    • Upvote 1
    Link to comment

    I am having this issue as well; I am currently monitoring my server. I downgraded it all the way back down to 6.10.3 and will see if the server crashes again.

     

    Here's all the troubleshooting that I have done before getting to this ridiculous point!

     

    Troubleshooting:

     

    • I have replaced SATA cables
    • Power Supply
    • CMOS
    • Ran repair on USB flash
    • New Configs applied as well.
    • Memory is down clocked
    • Replaced Hard drives and Unraid server continues to crash after Parity-Sync

     

    Not hopeful at this point at all.

     

     

    • Upvote 1
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.