Random Reboots


meestark

Recommended Posts

Diagnostics and syslog attached from FCP Troubleshooting mode.

 

My system has been getting random reboots going on a year now.  This is a not a lockup - it is a hard reboot, as if someone hit the reset switch.  Sometimes lasts a few weeks, sometimes only a few days between reboots.  Sometimes reboots several times in a few days.  Nothing in the syslog at the time of reboot (I've been leaving ssh connected from another machine tailing the syslog for a while now, and nothing ever shows in the syslog, just like the attache syslog)

 

Things I've tested/swapped so far:

 

  • Ran Memtest for 24hours
  • Tried new PSU
  • Bypassed SATA to Molex power adapters
  • Confirmed UPS power is working properly.
  • Installed telegraf/grafana to monitor CPU temperatures
  • Replaced CPU fan/heatsink

 

I have no idea what else to look at, hoping the diagnostics will help the community track down my issue

 

About my system:

 

AsRock Z87 Extreme 6

Intel® Xeon® CPU E3-1241 v3 @ 3.50GHz

Rosewill RSV-L4412 4U Server Chassis

Seasonic X Series X650 Gold

10 attached HDDs (mix of 2,3,4TB drives and one 500GB Cache drive)

16GB Memory (2x8GB)

Dockers running: plex, sabnzbd, sonarr, radarr, deluge, tautulli, duckdns, hddtemp, homeassisant, telegraf, grafana, influxdb, ombi, unifi, nzbhydra2

 

 

ds9-diagnostics-20181112-2031.zip

FCPsyslog_tail.txt

Edited by meestark
Link to comment
4 hours ago, meestark said:

as if someone hit the reset switch. 

Is this even a remote possibility?  I seem to recall small children and cats being culprits in a few cases over the years.  (Cats are attracted by the nearby LED lights.) I would suggest pulling the reset switch connector for the case at the MB just in case you have a defective switch.  Although, this would be a new one...) 

Link to comment

I don't recall which Plugin it is, but there is a plugin that will record System Logs to USB for this very reason, but you appear to have something in place to do the same thing. 

 

I had the same issue and it ended up being I had a bad Motherboard. I discovered it when I would loose my USB stick when I was trying to write to it. I discovered in so many words I was loosing my USB header and I guess unRAID doesn't like it when it can't read/write to the USB when it wants to. 

 

Swapped out Motherboard and it worked well for another couple of years. Now I'm on a different setup, but that's what I learned it to be. 

Link to comment
12 minutes ago, Frank1940 said:

Is this even a remote possibility?  I seem to recall small children and cats being culprits in a few cases over the years.  (Cats are attracted by the nearby LED lights.) I would suggest pulling the reset switch connector for the case at the MB just in case you have a defective switch.  Although, this would be a new one...) 

 

The physical pressing of is not a possibility (no pets, no kids) but I can definitely disconnect the Reset SW header and see!

 

8 minutes ago, Squid said:

Another remote possibility is dust bunnies.  They do conduct electricity

 

So clean in there I could eat off it!

 

3 minutes ago, kizer said:

I don't recall which Plugin it is, but there is a plugin that will record System Logs to USB for this very reason, but you appear to have something in place to do the same thing. 

 

I had the same issue and it ended up being I had a bad Motherboard. I discovered it when I would loose my USB stick when I was trying to write to it. I discovered in so many words I was loosing my USB header and I guess unRAID doesn't like it when it can't read/write to the USB when it wants to. 

 

Swapped out Motherboard and it worked well for another couple of years. Now I'm on a different setup, but that's what I learned it to be. 

 

Yeah that's the Troubleshooting mode in the Fix Common Problems plugin.  Bad motherboard is definitely a possibility, but this is a tough one to really test!

 

1 minute ago, John_M said:

It would be useful to confirm that the BIOS is configured not to restart when power is restored after an outage, so as to eliminate a number of possibilities. In other words, with situations involving the loss of power it should stay off and not reboot.

 

BIOS is set to not power on after power outage.  I have confirmed this already when I pulled power to the UPS - server performed a graceful shutdown and stayed shut down.

Link to comment
1 hour ago, meestark said:

 

BIOS is set to not power on after power outage.  I have confirmed this already when I pulled power to the UPS - server performed a graceful shutdown and stayed shut down.

Moot point as your BIOS is set correctly (and you have a UPS), but that test wouldn't have worked anyways since it shutdown cleanly.  To test that BIOS setting you have to pull the power cord from the server.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.