Jump to content

Upgraded and now having major issues with server rebooting randomly


dbltap11

Recommended Posts

Apologies in advanced for the terrible formatting. Upgraded to a new(from gaming pc) motherboard for unraid last weekend. Everything was great until i started having random restarts, i would have a double beep, like unraid was receiving the reboot/shutdown command. I have checked as many bios settings as i know to check to make sure there is no longer an OC on anything, made sure it was the latest BIOS version, reseated ram into 1 and 3 slots, did a memtest and no errors. PC stayed on for 7 hours so i Ran a full parity check and no errors, PC stayed on the whole 20 hours with no issues. Uptime was well over 48 hours before issue started up again, PC/Unraid will randomly receive the shutdown command and do the double beep, then will go into a reboot cycle, off and on  restarting a different times. all drives show green in SMART no Temp issues other than during parity check and those weren't terribly high anyway.

 

SPECS:

Z170a ProGaming Aura

6600K

CORSAIR Vengeance LPX 16GB (2 x 8GB)

RM750x

No GPU for now

 

 

Posted the diag and the syslog for reference. any help would be great.

tower-diagnostics-20181115-1710.zip

FCPsyslog_tail.txt

Link to comment

Make sure that the BIOS is not set to automatically restart on restorage of power.

 

Is it a reboot or a shutdown?  The two are different animals...

 

Are you getting a clean restart?  (A clean restart is without a parity check.)  If this is the case, then something is triggering that.   

 

What all is "new" and what re-purposed in this system?  MB, PS, RAM, CPU??? 

 

If you didn't change the case, did you relocate the MB standoffs and make sure that they were installed in the proper places?

 

Do you have a UPS installed on this system?

 

 

 

 

Link to comment

I did some more troubleshooting this morning and found that the flash drive was not responding or appearing in BIOS, had to repair it in windows, pulled everything off and reloaded everything onto a new flash drive, reassigned license and all that and i seem to have stability so far, at least a lot more so than i was getting last night. Also replaced the CMOS this morning since i was having odd time issues with unraid and the bios.

 

Only thing repurposed would be all the storage, and case. MB and CPU are from gaming station (bought 2016), PSU and RAM were bought a week ago. Standoffs are all good, i do have an UPS but was troubleshooting from both wall power and UPS last night, BIOS is set to not restart.


The clean restart is whats confusing me most. when server is stable and i even watched it in command line, sitting on login and then it appears to get sent some command to reboot, the two beeps from MB speaker happen and it reboots. it does not ask for a parity check when it comes back up after that issue happens.

 

Link to comment

IF the problem returns, My first suspicion would be the PSU since it is new and your seeing a 'clean' reboot.  PSU's used to be the really dependable portion of a computer setup.  But in the past few years, they have become much more complex and there have been several instances of the PSU being the cause of random reboots and shutdowns of system.  (In fact, I was one...) Substitution seems to be the most dependable way to eliminate that as the source of the problem.  

Link to comment

Issue started again. I will check the PSU, it has a quiet mode where the fans stop spinning if it doesn't need to i'm not sure how to change that setting. i have the fan facing the bottom of the case where the vent is.

 

Case: Corsair 200R

 

posting more diag and logs again from this morning. time change is because i was setting the time in the unraid gui

 

 

FCPsyslog_tail.txt

tower-diagnostics-20181116-0137.zip

tower-diagnostics-20181116-0207.zip

Link to comment

I am hoping someone, with a lot more experience that I have, will jump in here.  There is this entry in the FCPsyslog_tail.txt file: 

 

Nov 16 02:09:29 Tower sshd[10741]: Received signal 15; terminating.

And right after that, the shutdown process appears to start.  It appears to be a 'normal' shutdown procedure from what I could see.  But it unclear to me what it triggering it.  

 

I don't believe that is the correct orientation for the PS.  I did some googling and it appears that the fan should be on top.  You really want to keep that PS cool and it really does not generate that much heat that has to dissipated.  Remember most PS are 80-90% efficient these days.  (You really don't want to use preheated air to cool it.)  Plus, I did check my case and it does not even have holes in the bottom to blow air out of.  I believe that one of the reasons why they decided to move the PS location to the bottom of the case where it cooler because warm air always wants to rise. 

 

Link to comment

I'll double check about the PSU and yeah I understand, especially for something you want on 24x7. For most of the troubleshooting I've been doing I have the case on it's side and open so the PSU is getting airflow. Still same symptoms. Only temp issues I have noticed during the whole process is when I was doing the parity check and it was mostly just the parity drive that got high-ish temps. CPU even under load from transcoding it barely got above 55C. Now this issue is even happening while I was sitting in the BIOS just letting the system run and it cut out which points to something outside of the OS. Does anyone know if unraid shuts down if it detects a brownout symptom from the PSU or anything like that? Maybe that's why it's receiving the shutdown command

Link to comment

You're correct  At 2:09 the server began a normal shutdown.

But the syslog then continues on at 9:09 and continues the shutdown procedure.  But, there is an entry that states that it's adjusting the clock accordingly.  (Is the time in the BIOS matching you current time?)

 

Best guess is that the s3 sleep plugin is set to shutdown.  (Enable the debugging on the s3 plugin to confirm)

Link to comment

Sounds good. I will check that next. Yes, I changed the BIOS, and then changed the unraid os time. BIOS time is now stable after replacing the CMOS, still having issues in unraid when it does the reboot issue where it will not accept bios time as default even when adding an NTP sever with correct us time set.

Link to comment

Read carefully about setting the BIOS time.  I seem to remember that the time in some BIOS's were to be set to UTC with a time zone modifier/offset while others were set to Local TIME with the proper time zone specified.  It can be a confusing situation.  Hopefully, the MB manual will be clear on how to set it.   

Link to comment

fixed time settings with a new NTP server and all was good...for almost 48 hours. i did not turn on any VMs thinking that might be a problem. also did turn all s3 setting off...still same symptoms as before, gets a reboot command, then starts into its power on and off cycle. once unraid comes back up does not ask for a parity check. attaching logs and diag again just to see if anyone can figure out what could be happening.

FCPsyslog_tail.txt

tower-diagnostics-20181119-1457.zip

Link to comment

Any possibility a child or animal (cats have been involved in such incidents because they are fascinated by the blinking LEDs) might be hitting the reboot switch.  Any chance that the wires attached to that switch might have gotten pinched during the swap?  You might try pulling the connector for the switch from its MB pins as a test.  

Link to comment
  • 2 weeks later...

sorry to open this back up. PSU was dead. replaced with brand new EVGA 850GQ (I know overkill, but futureproofing). No boot. tested everything individually and no dice. even put the RAM into my gaming pc and its running great. so Mobo dead. Went back to the old setup (which ran without issues for 2+ years)

AMD 6350 and Gigabyte AM3+ AMD. Everything connected fine and was stable and working for just over 48 hours before the shutdown happened again and same symptoms from the logs as far as i can tell.

 

So to recap the only 'new' items in this build as of Monday night, would be the PSU and the Flash Drive which i just copied everything over to the new drive. Flash Drive is in USB 2.0

PSU is not on eco mode so fan is consistently running this time.

 

Adding logs again for troubleshooting.

tower-diagnostics-20181128-1355.zip

tower-diagnostics-20181128-1425.zip

FCPsyslog_tail.txt

Link to comment

First off, it appears that the shut was initiated by you, an inadvertent power switch push, or a flakey switch.

 

But, you should uninstall advanced buttons plugin (its incompatible) (and FCP should have been complaining about it)

I'd also uninstall s3 sleep and then after powering back up, pull the cable for the power switch off the motherboard. 

 

Failing that, try the system in safe mode and see what happens.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...