• [6.11.2] Complete Unraid Server Freeze-up


    Kendel
    • Solved Urgent

    I upgraded from 6.10.3 to 6.11.1.  After the upgrade everything appeared to be okay, but within a couple days I found the server locked up.  The gui was gone and couldn't connect remotely to the server.  Went to the server physically and the screen was blank.  No key presses, etc. would bring any prompt or characters to the screen.  I was forced to hard reboot.    Restarted the server and let it run the parity check.  The parity check found some errors, but was expected due to the hard reboot.  Another 2-3 days later after the parity check, same problem.... frozen.   I disabled and deleted some of the plugins and disabled my dockers.  Again the same problem.  Reading the syslog, I didn't see anything in the history to show a problem that would cause the system to completely freeze.   There are many rsyslog errors and a lot of the gui errors..... so I started being sure to close out of the gui when not in use.    Same problem.  After several 2 day long parity checks and the worry of wear on the drives, I reverted back to 6.10.3.   System ran fine for a few days and then I saw 6.11.2 was released.  Hoping this would resolve my issues, I updated again...... Same problems, so today went back again to 6.10.3.      Really unclear how to troubleshoot since the server is completely locking up and can't gain any access to troubleshoot and anytime it locks up I have to wait a couple days and have the wear/tear on my hard drives every time.    System has worked perfect for a few years before now, so hope the issues can be resolved.

     

    Thanks, Kendel

    kendelmedia-diagnostics-20221108-0000.zip

    syslog-192.168.1.191.log2.1




    User Feedback

    Recommended Comments

    Nothing relevant logged, that usually suggests a hardware issue, I would start by downgrading back to the previous known good release to confirm it's not happening there now.

    Link to comment

    Had reverted back to 6.10.3 when I started this post, so about 2-1/2 days in and no issues on the old version.  I'll check back in a few days.

    Link to comment

    Now 6 days in with version 6.10.3.   Appears to be an issue with the new version of Unraid and not a hardware problem (unless it is compatibility).

    Link to comment

    It does suggest some compatibility issue with the newer kernel but with nothing logged it's difficult to say for sure, look for a BIOS update and/or try 6.11.3, or the next release.

    Link to comment

    Ok, so found a BIOS update and made the update.  Restarted 1st with 6.10.3, ran for a day or two no issues, so tried the new 6.11.3.    Same problem has happened again after about 2 days.....complete frozen server, can't talk to the server and my router shows the computer not even there.......    Forced to hard reboot and again parity check is running of course for the next couple of days.   This is really a pain.  No way to troubleshoot since no errors in the log that appear to amount to anything.  I would like to just stay with 6.10.3 for now.... but what are the security risks?  Are there separate security patches that can be installed in the meantime to 6.10.3?  I see the Unraid website shows 6.11.4 and my server is now saying 6.11.5 is available.      This can't be good for all of my hard drives......  Lost on how to proceed.....

    Link to comment

    Let the parity check finished, rebooted, and disabled SMT.   Less than a day later, complete server crash again..... trying to disable VT-x / VT-d now..... not looking promising.

    • Upvote 1
    Link to comment

    So disabled everything noted above. Less than 6 hours after the parity check completed the server is again frozen and unreachable. It does seem odd that the server seems to keep churning along while the parity check is in progress (which takes about 1-1/2 days) and than it fails several hours later while the system is idle.  Something going to sleep??  The last several times it has crashed I’ve not seen anything in the system log of a recent error etc.   

    Link to comment
    9 hours ago, Kendel said:

    It does seem odd that the server seems to keep churning along while the parity check is in progress (which takes about 1-1/2 days) and than it fails several hours later while the system is idle.  Something going to sleep??

     

    Almost same symptom have found in 1st gen Ryzen due to CPU enter idle ( low power state ), but this can fix by disable C6 power management in BIOS. You may try similar setting in BIOS.

     

    I have Intel 3rd gen, may be I can make some test on it.

     

    Edit : No problem in idle for 9hr uptime ( 6.11.5 , 3570K no VT-d )

     

    image.png.cd8d3b70c9c66b941e58b713520ea7e5.png

    Edited by Vr2Io
    Link to comment

    Running clean so far on 6.11.5 after I disabled the C6 power state.   Been a little over a day now.... Crossing fingers.

    Link to comment

    Over 3 days now.... so looks like that fixed it!   Thanks for the help!  Seems weird that the prior version 6.10.3 of Unraid didn't have a problem with that setting, but working anyways!  Marking as resolved.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.