Jump to content

One Unraid server is crashing v6.5.3


Can0n

Recommended Posts

Hello everyone I have noticed twice in two days the up time of one of my two unRaid server is resetting at night. the hardware monitor on the server through its IPMI port shows no events at all but unRaid and my VM's have reset. of course each time a Parity check is being done...

 

Yesterday I did have an alarm message from Fix common problems about a low memory error and i saw it was at 81% to resolve I started shutting down my VM's one at a time and watching my memory usage and saw when my Windows 10 system shut down the memory went from 81% to 10% once i restarted the VM's everything was ok once again until approx 2am Mountain time this morning when it appeared to restart unraid services again with no Memory issues being mentioned by Fix Common Problems.now I have not manually rebooted since this Memory outage message that popped up the other day I will try that and hope the issues end there.

Here is my setup for this server:

Dual Xeon E5-2620's with 6 Cores and 24 threads total

56GB ECC DDR3 ram,

2x256GB SSD's for cache

3x 1TB 2.5" drives one used for Parity

 

unRaid is running v6.5.3 with the following VM's and dockers

 

Docker:

Homebridge

Tautulli

Unifi Video

 

VM's

Windows 10 Pro (running on cache only assigned 6 cores and 16GB ram

Fedora (running off array) 4 cores and 20GB ram

Ubuntu Server running PiHole DNS Ad blocker (running off the array) 2 cores and 2GB of ram

 

in theory I have used 12 of the 24 threads and 38GB of 56GB of Ram that should leave plenty for unRaid and the Dockers.

 

I have included my diagnostics wondering if anyone can assist?

sif-diagnostics-20180706-0828.zip

Link to comment
On 7/6/2018 at 9:30 AM, DoItMyselfToo said:

What did you change on this server two days ago?

nothing from what I can tell

 

 

it just crashed again 29min ago servers IPMI logs show nothing hardware wise only thing i did was stop and start the array for some odd Yoda icon that was showing up next to my server name 

logs.png

Link to comment

3 days uptime and it crashed again...looking for the diag files from troubleshooting mode..

 

again the server's IPMI info showed no hardware faults of anykind...i cant ping it and my switch is reporting disconnected ports (using LAG)

Link to comment

just crashed again cant ping the server. my LAG ports are showing disconnected on my switch and my VM's went down I have a monitor on it but I have no idea what its saying cause im at work. I am avoiding a hard reset of my server through its IPMI connection to see if I can gather any data on the attached monitor

Link to comment

ok thanks

44 minutes ago, DoItMyselfToo said:

I just saw this.  It's late.  I'll look at everything in the morning.  Not sure if I can be helpful at all.

 

 

no worries thanks for your assistance...im working the graveyard and its why im up

Link to comment
18 hours ago, DoItMyselfToo said:

I just saw this.  It's late.  I'll look at everything in the morning.  Not sure if I can be helpful at all.

and it crashed and self recovered while i was sleeping again current "uptime" is just under 7 hours

Link to comment

so ultimately both servers started crashing, one recovering on its own (the smaller one with dual xeons) the bigger one with 75TB started hard locking up.  found a major fan failure in the big one so replaced the fan and it started crashing like the smaller one.....downgraded to 6.5.2 and no issues yet..... so starting to think there is something wrong with the kernel for my two distinct hardware configurations.

Link to comment

Just a little update the big server was locking up due to overheating from a failed fan (blades came right off the frame) smaller server runs more stable on 6.5.2 but then started freezing again..checked the IPMI logs and it kept saying DIMMC1 asserted....i checked it on the manufactures website and means likely failing ram pulled the stick, ordered some more and everything is running great

new issue was on 6.5.2 and 6.5.3 my larger server was pinning the CPU at idle while a parity check was running .....top only  showing  a few things but looks like it was the File Intregity Plugin causing my CPU to pin for days at a time making most thing unresponsive as soon as I uninstalled it things went back to normal

Link to comment
  • 3 weeks later...

well my crashing server issue has an update.. I caught the issue finally while i was home and awake. as soon as my web gui was non-responsive i ran to the room the server is in and saw it was in the process of booting back up, ah i thought it is hardware. but due to this and while i was going to troubleshoot I shut one un needed VM down (ubuntu 18.04 LTS that was being used as a secondary Pihole DNS ad blocker) I also moved one docker over to my main server (Tautulli) and since doing those two things the rebooting has stopped for over 24 hours now (at time of this writing 1 day, 9min) Tautulli is not causing my other system to reboot so thinking its all ubuntu doing this.

 

so unless anything else happens I consider this issue resolved.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...