Jump to content
  • Server becomes unresponsive i9 13900k


    Payyn
    • Urgent

    Hi I've been having issues with my servers. I have two almost identical servers. The only difference is the motherboard and ram. Both have very similar issues with Kernels. I'm not running any dockers and I've ran Ram tests 2 times with no issues. I've swapped out the usb thumb drives. Swapped out the power supply, ram, cooler, motherboard. I've had times where the server will stay up for hours, but it randomly will become unresponsive and I have to manually reboot the server. 

     

     

    1st Server:

    MSI Pro Z790-A Max with latest bios

    13th Gen Intel i9-13900k

    Corsair iCUE H150i Elite CAPELLIX XT Desktop Liquid CPU Cooler

    Kingston FURY Beast RGB 64GB 5600MT/s DDR5 memory (4x32GB)

    be quiet! BN516 Straight Power 12-1000w

    x2 Nvidia Quadro P2000 GPU

    LSI 9200-8e 6Gbps 8-lane external SAS HBA P20

    NetApp DS4246 Disk Array Shelf

     

     

     

    2nd Server

    ASUS ProArt Z790-Creator with latest bios

    13th Gen Intel i9-13900k

    Cooler Master Hyper 622 Halo Black Dual Tower CPU Air Cooler

    CORSAIR DOMINATOR PLATINUM RGB DDR5 RAM 64GB (4x32GB) 5200MHz

    be quiet! BN516 Straight Power 12-1000w

    All drives are directly connected to the motherboard

     

    deathstar-diagnostics-20240410-0819.zip




    User Feedback

    Recommended Comments

    I'll defiantly get that info for you. Both system had different xmp profiles enabled. I started doing another memory test and there were some failures. I set both system bios back to default settings and currently running the latest memtest version on them.

     

    I was reading on some forum posts about memory issues on DDR5 and people are reporting that using all 4 slots causes some issues. However, I did read some people had success when manually overclocking and not using the XMP profiles.

    Link to comment

    Not seeing anything logged that points to a software issue, one thing you can try to run the server with a single RAM stick, if the same try the other one, that will basically rule out a RAM problem, you can also boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

    Link to comment

    So I replaced the motherboard that states that using 4 sticks of my ddr5 ram is compatible. However, now I'm getting another error and the server gui is unresponsive. I've re-formatted the usb flash drive. I've repaired the usb using windows chkdsk feature. The server boots up fine, but then after an hour or two I get these errors.

     

     

     

    PXL_20240423_234856978.thumb.jpg.4164e75b4bd4b37b072ff4179eb27187.jpg

     

     

    System Specs:

    13th Gen Intel® Core™ i9-13900K

    MSI Z790 EDGE WIFI

    CORSAIR DOMINATOR PLATINUM RGB DDR5 RAM x4 sticks of 32gb  (CMT64GX5M2B5200C40)

     

     

    syslog-192.168.1.47.log deathstar-diagnostics-20240423-1915.zip

    Link to comment

    Do you have XMP or any overclocking turned on? It is highly recommended  that you turn off all RAM overclocking.

    Link to comment

    Nope, using default settings. I tried turning up the voltage on the ram to 1.25, but really didn't see any change in stability so I went back to default settings.  memtest and windows memory diagnostic test showed no errors. I have had better stability since I swapped out the motherboard, but the server randomly crashes. I was thinking maybe a heating issue so I put a fan on the ram. I think I have two issues now.  Server crashing and now the usb flash drive is having problems probably due to the crashing.

    Link to comment

    swapped out the ram for 4 sticks of CORSAIR VENGEANCE DDR5 RAM and the server ran great for over 8 hours. Then while doing mover it crashed. The only part I haven't replaced was the cpu so I'm going to do that now. The other server with almost the same parts has been stable for over a week and half now.

    Edited by Payyn
    Link to comment

    I replaced the cpu and used the original ram. The server been stable for almost 15 hours now and I haven't had any of the error I was seeing before while using it.

    Edited by Payyn
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...