Jump to content
  • [UNRAID 6.10 RC2] Server Shuts Down or Hard Locks After ~2 days


    aceofskies05
    • Minor

    Server Details - b450m + Ryzen 1700x, cstate disabled in boot config and bios

    It seems after update to RC2, every 2 days (ish) my server shuts down completely power off. Sometimes, like today, the screen goes black and the server is "on" yet the ssh/Ui is unresponsive, keyboard unresponsive, screen black.

    ive attached logs. 

    tower-diagnostics-20211115-1110.zip




    User Feedback

    Recommended Comments

    Not necessarily your problem, but all systems are far more stable when using the same memory sticks.  If you can't do that, then you really have to ensure that the CAS timing is identical between the two sets.

     

    Either way, you want to run a memtest for at least a pass if only to rule out a bad stick (won't rule out the above though)

     

    Now, if it actually shuts itself off then you're at either a power problem or an overheating problem.

    Link to comment

    Interestingly, this RAM has been in my system for 1-2 years. I did run a memtest last weekend and no errors were reported.

    Im using Netdata and temps seem stable. Im wondering if its the power supply, but curious why psu would fail only after a couple days use. 

    Link to comment

    Why are you installing so many (all?) packages from DevTools and NerdPack? Do you actually use all of those? Do you even know what many of them are? I recommend not installing anything you don't use regularly.

     

     

    Link to comment
    1 minute ago, aceofskies05 said:

    that diagnostic log is after crash

     

    40 minutes ago, trurl said:

    diagnostics only contains the current syslog since the last reboot

    so there is nothing there from before the crash. You need to get the syslog from wherever you told syslog server to save it and post that

    Link to comment
    5 hours ago, aceofskies05 said:

    Ah im following... thanks. Attached is the LOG before the crash..

    Maybe you understand, but you still attached a diagnostic instead of a syslog😉

    Link to comment
    22 minutes ago, aceofskies05 said:

    Isnt the syslog in the Diagnostic log?

    9 hours ago, trurl said:

    diagnostics only contains the current syslog since the last reboot

    23 minutes ago, aceofskies05 said:

    I've attached the syslog

    And that is also the current syslog since the last reboot. We want

    9 hours ago, trurl said:

    the syslog from wherever you told syslog server to save it

     

    Link to comment

    Im getting random reboots now. I managed to catch it in the sys log. I logged in at 12:39:26 then it random rebooted right at 12:42:19, it then became available at 12:46:22
     

    Dec  3 00:43:15 Tower sshd[27118]: pam_unix(sshd:session): session closed for user root

    Dec  3 12:39:26 Tower webGUI: Successful login user root from 192.168.1.170

    Dec  3 12:42:07 Tower kernel: veth4f47806: renamed from eth0

    Dec  3 12:42:07 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:08 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:08 Tower kernel: device veth69d1485 left promiscuous mode

    Dec  3 12:42:08 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered disabled state

    Dec  3 12:42:16 Tower kernel: device veth6996f31 entered promiscuous mode

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered forwarding state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered disabled state

    Dec  3 12:42:19 Tower kernel: eth0: renamed from vethe963a49

    Dec  3 12:42:19 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth6996f31: link becomes ready

    Dec  3 12:42:19 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:19 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered forwarding state

    Dec  3 12:46:22 Tower kernel: Linux version 5.14.15-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Thu Oct 28 09:56:33 PDT 2021

    Dec  3 12:46:22 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot pcie_acs_override=downstream,multifunction iommu=pt

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256

    Dec  3 12:46:22 Tower kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.

    Dec  3 12:46:22 Tower kernel: signal: max sigframe size: 1776

    Dec  3 12:46:22 Tower kernel: BIOS-provided physical RAM map:

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...