Jump to content

(SOLVED) Random Reboots


Recommended Posts

EDIT: (SOLVED) Replacing the CPU has stabilized the system and I no longer am having this issue.

 

 

I have had unraid running smoothly for about 6 months now with only minor issues until about two days ago after I upgraded to the latest unraid . The system will randomly reboot. At first I thought it was plex causing the system to reboot since it only seemed to be rebooting when plex was in use. I disabled plex, which was only docker running and I do not have any VM's and was still encountering the same issue. I downgraded to my previous unraid build and I am still having same issue. I was getting no error messages or anything so I installed 'fix common problems' and enabled troubleshooting mode to get a error log which does not seem to show why it crashed (i attached latest logs). I read several posts stating that it might be the PSU or Memory. I have ECC memory and the bios does not show any errors with the memory  and memtest showed non also. I replaced the PSU and still same issues is happening. Any help would be awesome.

 

Unraid Version - 6.5.3 (when I upgraded to 6.6.1 is when I started having issues, almost immediately)

M/B: Supermicro - X8STi

CPU: Intel® Xeon® CPU X5660 @ 2.80GHz

Memory: 6x2GB   Hynix HMT125R7BFR8C-H9

FCPsyslog_tail.txt

tower-diagnostics-20181001-1943.zip

 

Edit: Also it seems when it reboots and I have not started the array yet that it will not reboot randomly (went 7 hours last night after a crash). I also have not detecting any overheating it is something I closely watch and the system crashes/reboots rather often (5-10 minutes but sometimes it will stay up for an hour)

Edited by fleton
SOLVED
Link to comment

Software doesn't usually cause reboots, although I am not ruling it out you are right to look to hardware first. How long can you leave memtest running before it reboots? If you boot the computer up and go into the BIOS and just stay there, will it reboot? Have you checked all your BIOS settings? Did anything change before you upgraded unRAID? In other words in between when it was running stable for six months, and upgrading to unRAID where it began rebooting, did anything change? Are you plugged into a UPS?

Link to comment
3 hours ago, ashman70 said:

Software doesn't usually cause reboots, although I am not ruling it out you are right to look to hardware first. How long can you leave memtest running before it reboots? If you boot the computer up and go into the BIOS and just stay there, will it reboot? Have you checked all your BIOS settings? Did anything change before you upgraded unRAID? In other words in between when it was running stable for six months, and upgrading to unRAID where it began rebooting, did anything change? Are you plugged into a UPS?

I have not ran into any reboots while in the bios (and all setting are good) or running memtest, which I ran for 3 hours before I exited off. About a month and a half ago I did upgrade my CPU but it has been rock solid for that time. When I upgraded to 6.6.1 it immediately started giving me this issue and downgrading did not help. Even tried unraid in safemode and still had this issue. I dont know what else it could be. If I do not start the array it seems to run with no issues, has been up for 6 hours now and doing an extended smart test on all the drives.

Link to comment

Unfortunately, memtst will usually not find problems with ECC memory.  Make sure that the memory strips are fully seated in the MB slots.

 

If you don't find anything there, re-think how you did the CPU upgrade.  How much disassembly was required? Could you have a loose connector?

 

One more question.  Did you post THE Diagnostics and FCPsyslog_tail.txt files right after a reboot?  From the times in the file, I can only assume that you did.  I would almost bet that the process of random rebooting erased the old file which might have contained some useful information.  @Squid, any comments?

Edited by Frank1940
  • Upvote 1
Link to comment
37 minutes ago, Frank1940 said:

Unfortunately, memtst will usually not find problems with ECC memory.  Make sure that the memory strips are fully seated in the MB slots.

 

If you don't find anything there, re-think how you did the CPU upgrade.  How much disassembly was required? Could you have a loose connector?

 

One more question.  Did you post THE Diagnostics and FCPsyslog_tail.txt files right after a reboot?  From the times in the file, I can only assume that you did.  I would almost bet that the process of random rebooting erased the old file which might have contained some useful information.  @Squid, any comments?

All the memory is seated properly and the bios has a log for pci and ecc errors and those were both empty. And the log files I posted were generated by the 'fix common problems' plugin which was in troubleshoot mode and it did not register anything strange with the crash, it activly writes log onto the flash drive and not the ram disk so it is not lost. I am going to swap the cpu later and see if there is any stability changes but it is strange that I had absolutely no issues until upgrading to 6.6.1. Also my system has been up for 10 hours now no problems. Only difference is I have not started the array.

Edited by fleton
Link to comment

When I looked at the log files, I noticed that the syslog file time stamps seem to indicate that the Diagnostics file was obtain shortly after a reboot.  The final time in that file was Oct 1 19:43:21.  Now looking at the tail file, the first time in it was Oct 1   19:41:56 and the last time was Oct 1  19:43:40.    I am having a problem seeing if the tail file is actually a record of what was happening just before the reboot.  That is why I asked @Squid for his insight.  

  • Upvote 1
Link to comment

A few more questions.  Are you getting a Parity Rebuilt after these reboots?  If not, is there any possibility that a child or pet is pushing the Reboot button on the server case?  (This has happened more than once on this Forum!)  If that is not possible, is your network secure?  Do you have a secure password for the root login on your server?  You might want to consider changing the password... 

  • Upvote 1
Link to comment

I am getting a parity check after the reboot after I start the array and the server is currently in my office and I watched this crash/reboot happen thats from the time frame from the log files. No prompts on my monitor just rebooted. And network is secured and my server has a complex password.

 

'fix common problems' plugin also states it was an unclean shutdown

Edited by fleton
Link to comment

Is the CPU or RAM Overclocked?   Is the CPU temperature on the high side?  (Googling showed that an overheated CPU has caused this type of problem.)  By the way, isolating this type of problem is probably the most difficult type of problem to solve as the evidence is usually destroyed by what has just happened!

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...