• 6.12.8 Crashing every day/other day


    feyded1020
    • Retest Urgent

    I made the mistake of upgrading while not being physically near my server from 6.11.4(I think it was .4)->6.12.4 a couple months ago, and since then I have been experiencing random system hang/crashing.

    It starts with me being notified by my UptimeKuma my websites crashed, and when I navigate to the Unraid GUI, it never resolves and then says page timed out. I check my KVM and the console is still booted up awaiting a CLI login input.

    When I tell the KVM to send a short power button press to initiate shutdown, it reponds but then hangs on 'Taking too long to shutdown, will shutdown in 90 seconds' or something to that effect. Obviously this leads to an unclean shutdown because I have to power chop the system to get it responsive again.

    I have switched my docker network to ipvlan instead of macvlan, back when it was 6.12.4, no real change. I have since upgraded to 6.12.8 and still seems to crash.

    I have attached my syslog and diag's. Just curious maybe what the cause is at this point since I never had issues before like this. I have just experienced another crash on 2March2024 around ~0500 eastern standard time. Also as a point of reference, I have just deleted whatever the file that was not 'moving' correctly from the cache to the array.

    Thank you for any help!

    syslog-192.168.1.55.log tower-diagnostics-20240302-0455.zip




    User Feedback

    Recommended Comments

    There are multiple apps segfaulting and btrfs is detecting data corruption, suggest running memtest, but since memtest is only definitive if it finds errors, if it doesn't, try running the server with just one stick of RAM, if the same try a different one, that will basically rule out a RAM issue.

    • Like 1
    Link to comment

    Thank you for the incredibly fast response. So it sounds like something Ill need to take care of when I get home from overseas. Any idea where that data corruption resides with btrfs. I was going to have each drive do a file system check as well.

    From what I was reading segfaults are related to issues with assigned blocks and incorrectly assigned blocks trying to be used to write while writing to RAM?

    I am just surprised it has manifested itself only now after being on 6.12+.

    Link to comment
    8 minutes ago, feyded1020 said:

    Any idea where that data corruption resides with btrfs

    You can run a scrub, ideally only after fixing the issue, assuming it's really RAM.

     

    9 minutes ago, feyded1020 said:

    From what I was reading segfaults are related to issues with assigned blocks and incorrectly assigned blocks trying to be used to write while writing to RAM?

    Can be, and when multiple apps are having that issue it's almost always a hardware problem, most often bad RAM.

     

    10 minutes ago, feyded1020 said:

    I am just surprised it has manifested itself only now after being on 6.12+.

    Could have just coincided with the upgrade.

    • Like 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.