Jump to content
  • [UNRAID 6.10 RC2] Server Shuts Down or Hard Locks After ~2 days


    aceofskies05
    • Minor

    Server Details - b450m + Ryzen 1700x, cstate disabled in boot config and bios

    It seems after update to RC2, every 2 days (ish) my server shuts down completely power off. Sometimes, like today, the screen goes black and the server is "on" yet the ssh/Ui is unresponsive, keyboard unresponsive, screen black.

    ive attached logs. 

    tower-diagnostics-20211115-1110.zip




    User Feedback

    Recommended Comments

    Squid

    Posted

    Not necessarily your problem, but all systems are far more stable when using the same memory sticks.  If you can't do that, then you really have to ensure that the CAS timing is identical between the two sets.

     

    Either way, you want to run a memtest for at least a pass if only to rule out a bad stick (won't rule out the above though)

     

    Now, if it actually shuts itself off then you're at either a power problem or an overheating problem.

    aceofskies05

    Posted

    Interestingly, this RAM has been in my system for 1-2 years. I did run a memtest last weekend and no errors were reported.

    Im using Netdata and temps seem stable. Im wondering if its the power supply, but curious why psu would fail only after a couple days use. 

    aceofskies05

    Posted

    @Squid Is there a sort of stress test/cpu benchmark I can run in unraid to see if a specific component is dying out? Im thinking prime95 like tool... 

    trurl

    Posted

    1 minute ago, aceofskies05 said:

    attached

    diagnostics only contains the current syslog since the last reboot, not the sylog from

    58 minutes ago, JorgeB said:

    the syslog server and post that log after a crash.

     

    trurl

    Posted

    Why are you installing so many (all?) packages from DevTools and NerdPack? Do you actually use all of those? Do you even know what many of them are? I recommend not installing anything you don't use regularly.

     

     

    aceofskies05

    Posted

    I've had the syslog enabled for over  a year , that diagnostic log is after crash.... Is there something else you need? 

     

    trurl

    Posted

    1 minute ago, aceofskies05 said:

    that diagnostic log is after crash

     

    40 minutes ago, trurl said:

    diagnostics only contains the current syslog since the last reboot

    so there is nothing there from before the crash. You need to get the syslog from wherever you told syslog server to save it and post that

    aceofskies05

    Posted

    Ah im following... thanks. Attached is the LOG before the crash..

    Also nerd plugin has a check all button, must have click that on accident one day and enabled all plugins. I fixed that.

    tower-diagnostics-20211115-1110.zip

    trurl

    Posted

    5 hours ago, aceofskies05 said:

    Ah im following... thanks. Attached is the LOG before the crash..

    Maybe you understand, but you still attached a diagnostic instead of a syslog😉

    aceofskies05

    Posted

    Isnt the syslog in the Diagnostic log? Well, any who I've attached the syslog. Let me know if there's anything else you need. Appreciate the help

    syslog.txt

    trurl

    Posted

    22 minutes ago, aceofskies05 said:

    Isnt the syslog in the Diagnostic log?

    9 hours ago, trurl said:

    diagnostics only contains the current syslog since the last reboot

    23 minutes ago, aceofskies05 said:

    I've attached the syslog

    And that is also the current syslog since the last reboot. We want

    9 hours ago, trurl said:

    the syslog from wherever you told syslog server to save it

     

    aceofskies05

    Posted

    Got it. 7ths times a charm 😄. Here's the syslog from the mirror to flashdrive. 

    syslog

    bonienl

    Posted

    Please test your system in safe-mode (no plugins installed).

    aceofskies05

    Posted

    Im getting random reboots now. I managed to catch it in the sys log. I logged in at 12:39:26 then it random rebooted right at 12:42:19, it then became available at 12:46:22
     

    Dec  3 00:43:15 Tower sshd[27118]: pam_unix(sshd:session): session closed for user root

    Dec  3 12:39:26 Tower webGUI: Successful login user root from 192.168.1.170

    Dec  3 12:42:07 Tower kernel: veth4f47806: renamed from eth0

    Dec  3 12:42:07 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:08 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:08 Tower kernel: device veth69d1485 left promiscuous mode

    Dec  3 12:42:08 Tower kernel: br-4d8c83721e50: port 3(veth69d1485) entered disabled state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered disabled state

    Dec  3 12:42:16 Tower kernel: device veth6996f31 entered promiscuous mode

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered forwarding state

    Dec  3 12:42:16 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered disabled state

    Dec  3 12:42:19 Tower kernel: eth0: renamed from vethe963a49

    Dec  3 12:42:19 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth6996f31: link becomes ready

    Dec  3 12:42:19 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered blocking state

    Dec  3 12:42:19 Tower kernel: br-4d8c83721e50: port 3(veth6996f31) entered forwarding state

    Dec  3 12:46:22 Tower kernel: Linux version 5.14.15-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Thu Oct 28 09:56:33 PDT 2021

    Dec  3 12:46:22 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot pcie_acs_override=downstream,multifunction iommu=pt

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

    Dec  3 12:46:22 Tower kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256

    Dec  3 12:46:22 Tower kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.

    Dec  3 12:46:22 Tower kernel: signal: max sigframe size: 1776

    Dec  3 12:46:22 Tower kernel: BIOS-provided physical RAM map:

    aceofskies05

    Posted

    I've fresh installed unraid with 6.9-rc2. I'm narrowing down the error to Kernal "Dirty"

    image.thumb.png.960122323f756019180fde457a78f1cf.png 

    syslog

    aceofskies05

    Posted

    Overnight stability reported by adding

    "rcu_nocbs=0-15" 



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...