Jump to content
  • Server goes "unresponsive" every day


    martijndemulder
    • Urgent

    Since upgrading to 6.12.x (currently on 6.12.2) my server goes "unresponsive" every other day  make that: just when it feels like crashing....up to a couple of times a day!

    With "unresponsive" I mean: i'm still able to ping it (which is kind of strange) and judging to the activity-led it seams to be busy, but I just can't connect to it anymore. No management, no ssh, all dockers are unreachable and vm's as well.

     

    What I checked already is:

    - Docker mode = set to ipvlan

    - All network interfaces are set to ipv4 only

    - Logging is clean (except for the Parity Check Tuning plugin which kept flooding the log with a 255 message so I uninstalled it)

     

    When the server becomes unresponsive the only "solution" is to hard-reset it. Which is obviously really undesirable.

     

    Attached is my syslog. Any ideas so far how to approach this issue?syslog-10.10.5.3.logmassivedump-diagnostics-20230706-1753.zip

     




    User Feedback

    Recommended Comments

    Seeing a lot of OOM errors, this is generally not lack of RAM but RAM fragmentation, you can try limiting your containers/VMs or using for example the swap file plugin.

    Link to comment
    5 minutes ago, JorgeB said:

    Seeing a lot of OOM errors, this is generally not lack of RAM but RAM fragmentation, you can try limiting your containers/VMs or using for example the swap file plugin.

    Where do you see this in the logging? And not sure what the cause for this would be. VMs are limited, the dockers aren't but as far as I can see aren't consuming a lot of RAM.

    Link to comment

    e.g.:

     

    Jun  7 10:11:01 MassiveDump kernel: syncthing invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0

     

    Link to comment
    On 7/6/2023 at 6:46 PM, JorgeB said:

    e.g.:

     

    Jun  7 10:11:01 MassiveDump kernel: syncthing invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0

     

    On one of the latest crashes I was still logged on to ssh and noticed high cpu usage from "kswapd0" which kind of matches your indication of that the system was swapping memory too much.

    Still there's no obvious reason for it to do so. The memory isn't over-allocated at the moment.

     

    Since my last post, I downgraded back to 6.11.5 with minor issues. Just as the release note states you have to force update all of your dockers and manually start them. Besides that no specials.

     

    The system has been running great ever since. Indicating to me that there's ether something wrong with my combination of plugins/ dockers and 6.12.2 or the software i.c.m. with my hardware. 

    I'm leaning towards ruling the last one out since there's a lot of complaint about the stability of this version.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...