Jump to content

Server crashing, Trying to run Memtest just resets the system


Recommended Posts

My server ran fine all last week, but then on Friday and again on Sunday it will just randomly shut down every docker container. When I try to restart them I get an Execution Error, error 403. I've attached my diagnostics zip from Sunday afternoon. 

 

I have also tried to run memtestx86 from the boot up options, but all it does is restart my system. My motherboard screen comes back up and then I get the choices of booting UnRaid.

 

Do I just have bad RAM? It was bought brand new, but this is what I am currently thinking

 

Specs

Asus Z590-P plus Wifi

Intel I3-10320

G Skill D4 3600 2x8Gb

3x10TB Seagate Exos

1 128GB IPSG Sata SSD

Be Quiet Pure Power 11 500W

Capture.PNG

tower-diagnostics-20230219-1753.zip

Link to comment

My machine won't boot the USB with memtest86 on it. From what I understand I need to change my Compatibility Support Module from Legacy to UEFI. But, this option is always greyed out in my BIOS. I've seen some others mention that having a cpu with igpu causes this. But I don't have anything to swap out with. 

 

Any suggestions?

Link to comment

I dealt with memory issues a few weeks ago.  Spent hours (days?) of my life I'll never get back running memtest over and over.  Here are some suggestions...

 

1) In my experience, if there were going to be errors, it found them by Test 7.  If you have to run a bunch of tests for different scenarios (DIMMS, slots, speeds, timings, etc), I wouldn't run the entire test.  Stop after Test 7.  When you think you have a stable system, then you can run the entire test.  Even run multiple passes overnight.

2) Check your memory speeds timings in the BIOS, and set them to match the memory specs.  My ASUS Z790 board was not auto-populating timings correctly (I'm talking a about the 4 main timings).  Setting the timings manually fixed the vast majority of the errors.

3) If you have another system that supports your memory, check your DIMMs in that system, one stick at a time to determine if the DIMMS are good.  Otherwise, if you can find a dimm/slot combination that results in 0 errors, you can use that to identify which DIMMS or slots are causing errors.

4) I had different results from memtest for different slots on my server mobo.  Slots 1 and 3 threw errors every time, slots 2 and 4 had no errors (where slot 1 is closest to the CPU, slot 4 is furthest).

5) If you find another set of memory to test with, don't assume that kit is "good" just because you weren't having errors with it before.  I pulled my main desktop ram to test in my server, and still got errors.  Turns out two of my dimms from my desktop were bad, too, when I ran memtest on my desktop.

 

Ultimately I decided to switch motherboards and get ECC memory, because I saw the chaos that memory errors can cause in Unraid.

Edited by C4RBON
Updated numbering
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...