Jump to content
shEiD

Need help: constant random server reboots

11 posts in this topic Last Reply

Recommended Posts

I started having this problem about 2 weeks ago. unRAID server randomly reboots. Sometimes it takes hours, sometimes just minutes. A couple of times, it did not reboot for a day or two, and I thought the problem went away, but no - it's still happening.

 

  • I have run a memtest86+ for more than 6 hours (almost 2 passes) - no errors. Memory seems OK. Should I run a very long 24+ hours test?
  • I have used Troubleshooting Mode of Fix Common Problems - the syslog shows no errors, when server reboots :/ I have added the diagnostics and FCPsyslog_tail files to this post, below.
  • Server is connected to more than adequate UPSCyberPower CP1500PFCLCD 1500VA/900W. UPS seems to be working perfectly OK, and this server is the only thing connected to it.

 

My server details:

  • unRAID Version: 6.6.7
  • M/B: Supermicro - X9SCL/X9SCM
  • CPU: Intel® Xeon® CPU E3-1230 V2 @ 3.30GHz
  • Memory: 32 GB ECC
  • HBA:  LSI LSI00276 PCI-Express 2.0 x8 SATA / SAS 9201-16e Host Bus Adapter
  • 2x SGI Rackable SE3016 SATA SAS Expander 16 Hard Drive Bay
  • 1 cache drive - Samsung 850 EVO 1TB SSD
  • 28 data drives - all WD Reds - 7x3TB, 14x6TB and 7x10TB
  • NO parity drives - I do not use any parity and run unprotected.

Cache SSD is connected to the motherboard. All other drives are in those two 16-bay enclosures, connected to the LSI HBA with SFF-8088 mini-SAS cables.

  • I do not run VMs. I have created a couple VMs long time ago, just to try it out, but I do not use them since then.
  • I have 20 docker containers, but rarely use most of them. I do run 6 docker containers all the timePlexNZBGetTransmissionSonarrRadarrTautulli. Plex is binhex-plexpass the other 5 are all from linuxserver.

 

Please, help. I have no idea what is going on, and what is wrong with my server.

  1. How come there are no errors in the syslog, when unraid reboots?
  2. What/if any kind of other logs does unraid have to help me figure this out?
  3. Basically, what do I do now?

 

Please, bare in mind, I do not know linux at all - the only linux experience I have is unRAID. If the logs or diagnostics have some information that explains this, I most probably did not understand and missed it.

Edited by shEiD
Unable to attach the files...

Share this post


Link to post

I would run a few days in safe mode with all docker disable, if stable try turning on the dockers on by one, if still unstable it's likely a hardware problem, like PSU, etc.

Share this post


Link to post

The PSU is not the problem, for sure:

I have 2 identical PSUs. I bought them both at the same time and only about 18 months ago. One was powering the unRAID server. The 2nd one was powering all my network (modem, router, switch) and my main rig, which is also running 24/7. And this 2nd PSU never had any problems.

After switching the PSUs, server had the same unexplained reboot in less than an hour 👎

 

Running my main unraid server in safe mode for days is the very last resort, that I really do not want to do. I have cut the cable TV cord more than 10 years ago, way before it was "popular". Without their Plex, my family will drive me crazy. And how long do I run it the safe mode to be really sure? Like I said - the reboots seem totally random - some times it takes minutes, sometimes it takes days! There was "good" periods of 3-5 days without a reboot...

 

Questions:

  1. How and what logs can I use to try and figure out why this server is rebooting willy-nilly?
    If PSU would have been the problem - I understand, how abruptly cutting the power to the machine would not give it the time to write anything to any logs. But PSU is not the problem. Hence, my thinking is, there should be logs of something going wrong - be it hardware or software. It must be, no?
  2. Does docker itself have logs? I mean not the individual containers but docker daemon itself? Where can I find it and can I make it persist after reboot? Can I make docker write log to the flash, or better still, to the cache SSD?

Share this post


Link to post
32 minutes ago, shEiD said:

The PSU is not the problem, for sure:

I have 2 identical PSUs. I bought them both at the same time and only about 18 months ago. One was powering the unRAID server. The 2nd one was powering all my network (modem, router, switch) and my main rig, which is also running 24/7. And this 2nd PSU never had any problems.

After switching the PSUs, server had the same unexplained reboot in less than an hour 👎

You seem to be referencing a pair of battery backup UPS's, not the computer power supply unit that others and I have said could be your issue.

Share this post


Link to post
43 minutes ago, jonathanm said:

You seem to be referencing a pair of battery backup UPS's, not the computer power supply unit that others and I have said could be your issue.

Holly crap, you're right. Thank you xD

I don't know what is wrong with me. This is not the first time, that I mixed those two abbreviations up in my head.

 

So, the UPS is not a problem...

 

As for PSU, I think I have a brand new one somewhere, that I bought on Black Friday last year or the year before.

Fingers crossed...

Share this post


Link to post

The PSU is not the problem. For real this time.

I switched the PSU to brand spanking new EVGA Supernova 750 G2, 80+ Gold 750W. First reboot came in less than 10 minutes, after I started the server for the very first time with the new PSU. And it keeps randomly rebooting the whole week. Sometimes it takes minutes, sometimes hours, sometimes more than a day...

 

Help, please.

If power is not the problem. There must be some type of logs with some information of something going tits up, no?

I'm at a loss what to do now.

Share this post


Link to post

Have you seen it reboot with only the basic file sharing services? No dockers, no vm's, no plugins?

Share this post


Link to post

Sticky button? Seems silly, but I had random reboots on my HTPC driving me nuts, turned out to be a dodgy button switch.

Share this post


Link to post

Edit: Nevermind.  I figured out that I was underpowered using a 550W PSU.  Jumped to a 1000W and flawless since.

 

This ever get solved?  I've been experiencing similar.  Starting to think it's not that I had a bad PSU like I initially thought but that my 550W is maybe underpowered.  Or in my case also potentially a bad UPS.  Oddly enough I have the same exact UPS as you.  I've switched UPSs since last night.  It is now sharing the same UPS as my desktop PC for the purposes of trying to isolate the failure point.  No reboots since but it's only been about 15 hours.  knock on wood.  I went ahead and put in an order last night for a 1000w EVGA G+ from B-stock sale to absolutely eliminate underpowering as a possible culprit.  New CPU arrives today.  I've already replaced 2 motherboards and RMAd the previous EVGA SuperNOVA G2 550w.  If none of this helps me solve wtf is going on I'm throwing in the towel and chalking it up to being a walking magnet.

Edited by DontWorryScro

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.