Server Randomly Reboots


Recommended Posts

Hi there,

 

I've been trying to solve an issue with my server that seems to baffle everyone I speak to and at this point, I'm running out of possibilities. As the title says my server randomly reboots and there's seemingly no reason. I've captured logs via telnet and looked at them after the crash and found absolutely nothing. The log just stops as if I pulled the power plug. I checked my docker containers and each time there's no consistent action happening that I can point to. Below is a synopsis of my setup:

Intel i7-920
EVGA X58 motherboard
12GB (6 2GB sticks) of DDR3 RAM
4 HDDs (2 4TB and 2 2TB)
1 SSD (an older 128GB Sandisk)
600W EVGA 80+ Bronze PSU (brand new as I thought this might be the issue originally)

I do live in an apartment so I have a 1500VA APC UPS between the server and the wall.

As I said I can't find any clear thing that causes the reboot and the only reason I know it happened is PLEX is no longer available or I hear the beep from the POST succeeding. I have found some potential contributing factors though:

1) When I'm running no docker containers the server seems fairly stable and was on for 24 straight hours where the reboot usually happens after 6hrs (rarely less, but it has happened after only 2 hours before).
2) At about 6 containers it seems to lag the web UI and then crash shortly after.
3) SabNZB seems to cause it when it's the only container fairly quickly.

During all this, I am watching the system stats on the dashboard and only a few cores ever spike to 100% and memory never passed 40%, but there's seemingly no consistency on CPU and RAM usage and the rebooting.

Finally, I have run memtest86 on it and after leaving the test to run for 2 straight days it never found an error so I've basically ruled out memory. I now have an error with community applications, likely corruption (I think this happened when it rebooted in the middle of trying to create a container), but even at the start, when CA was working, I was having this issue.

Any help is appreciated as this is basically making it unusable.

Edit: I have a telnet session into the server now to try and capture if/when the server reboots, this could take a while though.

Edited by firrae
Link to comment
7 minutes ago, firrae said:

Edit: I have a telnet session into the server now to try and capture if/when the server reboots, this could take a while though.

While probably very difficult to arrange, the more interesting information would be on the locally attached monitor (if you have one).  Major PITA as you'd basically need a video camera pointed at it for hours on end.  (unless you set up your phone or something to record, run your 6+ containers that cause it to shortly crash)

Link to comment
Just now, Squid said:

While probably very difficult to arrange, the more interesting information would be on the locally attached monitor (if you have one).  Major PITA as you'd basically need a video camera pointed at it for hours on end.  (unless you set up your phone or something to record, run your 6+ containers that cause it to shortly crash)

 

I have a monitor hooked up to it and was looking at it once when it happened, there was no shutdown procedure so it was a hard power off, the screen just went black and then the BIOS boot screen, that's where my thought of it being the PSU came from originally.

Link to comment
2 minutes ago, firrae said:

 

I have a monitor hooked up to it and was looking at it once when it happened, there was no shutdown procedure so it was a hard power off, the screen just went black and then the BIOS boot screen, that's where my thought of it being the PSU came from originally.

No errors, just an up and reboot would imply

  • Check for BIOS updates, especially if you're running VM's with passthrough
  • Power Supply
  • Mismatched RAM
  • RAM not on the MB approved list
  • Cooling issues
  • Very excessive dust bunnies

But, realistically, try googling for an updated Memtest, as the included one with unRaid is rather dated and doesn't actually catch all errors (make a live linux USB stick or something for Memtest)

Link to comment
Just now, Squid said:

No errors, just an up and reboot would imply

  • Check for BIOS updates, especially if you're running VM's with passthrough
  • Power Supply
  • Mismatched RAM
  • RAM not on the MB approved list
  • Cooling issues
  • Very excessive dust bunnies

But, realistically, try googling for an updated Memtest, as the included one with unRaid is rather dated and doesn't actually catch all errors (make a live linux USB stick or something for Memtest)

 

1) It seems the BIOS is fully updated, but I'll check again.
2) brand new PSU, replaced the original PSU. Tested the PSU on my main PC and it ran it fine for 3 days.
3) All the RAM is identical and was purchased at the same time.
4) RAM meets the mobo's requirements and is within their spec.
5) This is the one I can't decide if it is the problem. I have 4 fans in the case, one over the HDDs that's an intake, 1 more intake on the side, and 2 exhaust (back and top). I've had this issue happen with the fans in place and the case closed and with the side fan removed and the case fully open.
6) I cleaned pretty well everything before I put it in there. I moved the old PC into a new case and took that time to basically clean everything via compressed air before putting it back in.

I'll follow up with the BIOS though and for heat, I figured there'd be some sort of warning or error log somewhere, but I can't find anything that indicates that.

Link to comment
  • Check for BIOS updates, especially if you're running VM's with passthrough

You may have found it. I thought I had updated it, and while I need to go into the BIOS to be sure this is one of the update features on the second to last update they gave:
 

Quote

Fix cold reset when VT is enabled

 

I have VT enabled. I will try these BIOS updates if the server crashes otherwise I'll try them tomorrow night by gracefully shutting it down.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.