trying to get to the bottom of my consistent reboots


Recommended Posts

If it matters, I’ve recently upgraded all my hardware and kept the same install of unraid. Before installing unraid I did a mem test via the bios.

 

there were a few lines about samba sending reboot or something in the logs right around the time it went down (9:40 pm) if that helps. 

Link to comment

It's not currently connected to a UPS.  i've got it temporarily set up in my workshop after the build. But i've got one in my server room. wasnt gonna move it until it was stable but i can try if you suggest it.  

 

I actually re-used my case (Enthoo Pro) and my 2yo PS, (EVGA GQ 650W 80+G SM ATX PSU).  Rest of hardware is new.  Running an Intel i9-9900K on a Asus Z390-p mobo with two sticks of Crucial Balistix 8gb DDR4 3200 sticks. with two NVME drives installed and a LSI SAS controller holding 6 spinners. 

CPU cooled by an Arctic Freezer II 240mm AIO liquid cooler.  The mobo and CPU temps are reporting quite low and i'm pretty sure the CPU cooling is OK.  

 

 

Link to comment

so i went ahead and moved the machine to my server room to hook it up on the UPS.  it was up for about 3 hours and then became unresponsive and then crashed and rebooted.  attached is the latest log. Im at a loss.  i guess i can revert back to my old server setup for stability, but i was looking forward to using this new hardware.

 

Brought the machine up for more testing.  Ive reset the bios to defaults.  Although i cant get unraid to boot unless i enable CSM (Compatibility Support Module) in the BIOS.  is that normal?  could this be memory stick related?
 

syslog-192.168.1.4 (2).log

Edited by chalkdust
Link to comment
7 hours ago, chalkdust said:

so i went ahead and moved the machine to my server room to hook it up on the UPS.  it was up for about 3 hours and then became unresponsive and then crashed and rebooted.  attached is the latest log. Im at a loss.  i guess i can revert back to my old server setup for stability, but i was looking forward to using this new hardware.

 

Brought the machine up for more testing.  Ive reset the bios to defaults.  Although i cant get unraid to boot unless i enable CSM (Compatibility Support Module) in the BIOS.  is that normal?  could this be memory stick related?
 

syslog-192.168.1.4 (2).log 586.56 kB · 0 downloads

I just rebuilt mine with new hardware, I am running AMD though, but I saw something in one of the guides where I had to enable that CSM for it to work. Not sure if that was just for ryzen devices or not. I had an issue like you had after my previous upgrade, I was running dual Xeons on super micro MB, and I had an issue with a couple of sticks of memory being the issue. It was running something like 96G and failed pretty far into the memtest. i finally isolated it down to the sticks and replaced them to resolve the issue. 

 

Side note, i have had some weird behavior like that also running a PCIe sata expansion card. If i remember correctly, it didn't like my drives over 2TB being on there and caused some weird crashing stuff.

Link to comment
26 minutes ago, chalkdust said:

passed twice.  and it crashed again twice in a row within 10 minutes of eachother.  im at a loss.  any other suggestions?

Are you saying that the system crashed running memtest?    If so you definitely have a hardware problem that is not Unraid related :( Obvious candidates would be cooling and power supply.    The other option is something wrong at the motherboard/CPU/RAM level but hopefully not as that has no easy fix.

Link to comment

Quick update:  

in my effort to troubleshoot my unraid crashing, i've installed a stress tester docker app  ( https://hub.docker.com/r/progrium/stress/ ) and when i run it, my CPUS hit near 100% and my motherboard lets out a longgggg beep. It doesn't crash under the test settings ( $ docker run --rm -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 10s ) but the temp spikes from ~30deg to 68deg. I actually think i may have heard that same beep before a crash at some point recently.

My issue clearly is hardware based. does anyone have any ideas for what to look at first? my cooler? my CPU?

 

EDIT:  so it was not actually my motherboard sending out the long, loud beep, it was the UPS i had connected it to!  under load, my server took down every device connected to the UPS.  something seems up with my power draw?  should i swap out for a higher quality power supply or does something seem wrong with the hardware?

Edited by chalkdust
Link to comment

it was connected to a CyberPower 425VA with an iMac 27" and secondary monitor attached (only the mac is on battery backup and that lost power too).  The server has crashed like this in both my server room (on a different power circuit) and in my office.  Ive moved it to its own plug now and will test again.

Link to comment

Ive moved it off the UPS and ran docker run --rm -it progrium/stress --cpu 16 --io 1 --vm 2 --vm-bytes 128M --timeout 10s which 100% all my cpu cores for 10 seconds. it did not crash.  What does this mean?  Do i need a higher powered UPS to run this on?  Im not 100% this is the cause of my crashing though as i THINK i had it on its own plug and it still crashed yesterday. 

Link to comment
  • 2 months later...

I ran across this looking for something else, but out of curiosity, did you find the problem? It would appear the memory modules you are using (BL8G32C16U4B.M8FE) are not qualified on the MB. 

 

edit: Also, you are running a release candidate. I would suggest staying on stable versions. 

Edited by F3nris
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.