(SOLVED) Server Rebooting Multiple Times Daily


cowgill

Recommended Posts

So, a few weeks ago my server started crashing/rebooting every few minutes to few hours, never lasting more than 6 hours or so at a time.

 

I do not expect it is a hardware issue related to CPU/MB/RAM/Power Supply, as I've taken troubleshooting this as an opportunity to perform some upgrades and these are now new, yet the .

 

If anyone sees any clues as to what may be going on or what to try it would be greatly appreciated.

tower-diagnostics-20210131-0726.zip

 

Unraid Version 6.8.3

CPU: Intel Core i7-9700K Coffee Lake 3.6GHz Eight-Core

MB: Gigabyte Z390 Aorus Pro WiFi Intel LGA 1151 ATX

RAM: Crucial Ballistix Gaming 32GB (2 x 16GB) DDR4-3600 PC4-28800 (not running it anywhere near that speed...2666 I believe right now)

PSU: SEASONIC FOCUS+ 750W 80+G FM ATX

 

Hardware items which were not upgraded:

LSI Logic SAS9211-8I 8PORT Int 6GB Sata+SAS Pcie 2.0 (appears to have latest firmware installed, v20.00.07.00)

Old (fanless) ASUS Video Card for booting prior to setting up iGPU as default, but still installed

(2) 1TB SSD cache drives

(1) 8TB parity drive

(9) data drives (various sizes)

Edited by cowgill
Link to comment
3 hours ago, cowgill said:

I do not expect it is a hardware issue related to CPU/MB/RAM/Power Supply, as I've taken troubleshooting this as an opportunity to perform some upgrades and these are now new, yet the .

Have you at least run a MEMTEST from the boot menu for at least a pass or two (ideally a day)?  Are you overclocking the CPU?  XMP enabled on the RAM?

Link to comment
20 hours ago, Squid said:

Have you at least run a MEMTEST from the boot menu for at least a pass or two (ideally a day)?  Are you overclocking the CPU?  XMP enabled on the RAM?

Appreciate the feedback! No overclocking. I did not have XMP enabled and lowered the RAM speed manually to 2666 from whatever it defaulted at. I did not mess with any timings.

 

I had not run MEMTEST as you may have suspected due to my faulty logic above. I started MEMTEST, and about 5.5 hours later I get an email from my server that it has started a parity check. I guess MEMTEST crashed and the system rebooted?

 

When MEMTEST fails in this fashion (at least I think that means it failed) does that typically imply memory issues? I am not too familiar with digging into RAM issues.

 

I checked into the RAM / MB settings and it was at 1.20v when Crucial recommends 1.35v. I changed that as well as changing all the timings to Crucial's recommended settings and am running MEMTEST again now. Is that a reasonable course of action or should I retard some of the timings while keeping the voltage at recommended in order to gain more stability?

 

Will report back on MEMTEST results.

Link to comment
9 hours ago, ChatNoir said:

Are you sure that you don't have power issues ?

Are you using a UPS ?

I have a UPS but disconnected it when I first started having issues. Your reply inspired me to check the PSU cables...the 2x2 motherboard power cable was not connected.  😭 Apparently that is optional unless you need extra juice though? Anyway I connected it.

Edited by cowgill
Link to comment

I found a bad stick of memory and made it through 24hrs of MEMTEST w/ no errors, but that doesn't solve the issue of the server rebooting itself while in Unraid. Using two completely seperate sets of PSU/MB/CPU/RAM and I have the same issue. Both sets of hardware make it through MEMTEST fine now that I eliminated the bad stick of RAM.

 

Unraid safe mode is not safe, still reboots randomly there as well.

 

Potential problems (things I haven't tried): SATA cables, LSI SAS card, Hard drives, USB boot drive. Any ideas on how best to troubleshoot? The USB drive checked out fine in my Windows machine, is it worth just trying a new one? I don't have enough SATA slots without the SAS card or I would pull that. I also don't have enough spare cables to try all that as well. Is there anything else that I'm missing?

Edited by cowgill
Link to comment
  • 2 weeks later...
On 2/7/2021 at 11:25 AM, itimpi said:

Safe mode just stops plugins from running.    Have you tried disabling the Docker and VM services so that you are running as a basic NAS?

 

I have always wondered if those services should also be stopped from running in Safe Mode.

 

 

I had no VM's running and that service was disabled. I tried disabling Docker on your advice, that did not make a difference for me. I have since tried replacing my flash boot drive, that did not help either.

 

Since the start of the thread I've ditched all the new hardware and gone back to my old stuff (first gen i7 w/ Gigabyte GA-X58A-UD3R). As far as I know the only things I have left to try are reseating/new SATA cables and a new SATA expansion card. I currently have the SAS9211-8I. My motherboard has 10x onboard SATA and I could get away with a 4xSATA expansion card, so I ordered an ASMedia ASM1064 (IO CREST SI-PEX40156). I figure even if that isn't the issue that card is flexible for use under other circumstances being 1x slot friendly. Also I guess I'll order a bunch of SATA cables to swap out just in case.

 

If that doesnt work I'm out of ideas.

Edited by cowgill
Link to comment
  • 4 weeks later...

Okay...ready to mark this solved.

 

...

 

The reset button on my case was shorting out. When I disconnected it from the motherboard case header my issues were resolved. 🥴 

 

Other minor item: The IO CREST SI-PEX40156 if installed would not allow the Gigabyte GA-X58A-UD3R to get far enough booting to even get to the BIOS. The card boots fine installed in a modern (LGA 1200) motherboard. No idea what was going on there. Wanted to put that data point out there in case others have issues in the future as I have learned finding unresolved issue threads is beyond frustrating.

Link to comment
  • cowgill changed the title to (SOLVED) Server Rebooting Multiple Times Daily

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.