My Server keeps half crashing

DannyG · November 25, 2020

This issue has been happening for about a week now. server needs to be rebooted every 1 to 3 days.
All the VMs and dockers die.
but I can still login to host via the web gui. I can also navigate to some menus.

the Dashboard wont load at all, but I can go to settings > diagnostics... but unfortunately the diag file never downloads.

I can load up "System log" in settings though. (attached)

I'm not sure what to do.

The only major chance I've done recently was change my router and added vlans.
What should I do to resolve this?

tower-syslog-20201125-0327.zip

DannyG · November 26, 2020

it keeps happening every day...
Today, is the first day that the "dashboard" is available, I'm able to see that some of the CPUs are pinned. (this never happens)

Also, on the Main page, only the drive show up, today though, I can see the "Array Operation" section. I tried to stop the array... but it don't think it'll work.

I'm leaning towards this being a software issue... but... I would really appreciate some help on determining the cause.

DannyG · November 30, 2020

My server keeps crashing every 24-48 hours.
I shutdown all my VMs and my server stayed up for almost 3 days.. but it crashed the same way.

I'm attaching my diagnostic logs.tower-diagnostics-20201130-1201.zip but I'm not seeing much in there.

JorgeB · December 1, 2020

You can try this and post that log after a crash.

DannyG · December 2, 2020

On 12/1/2020 at 5:08 AM, JorgeB said:

You can try this and post that log after a crash.

Thank you!

I configured the syslog last night, and my unraid server crashed this morning. I have attached the logs.

syslog-10.0.10.10.log

DannyG · December 2, 2020

I'm noticing errors relating to. "192.168.16.10" - this was it's old IP.

the only major change I did prior to my unraid server crashing like that was change my network.
I have have multiple VLANs and this servers local IP is no 10.0.10.10

I didn't think it was related.

JorgeB · December 2, 2020

Network related crash, simply your LAN config as much as possible.

DannyG · December 3, 2020

What in the logs whats you think it's a Network issue?

Here are my configs..

JorgeB · December 4, 2020

Call trace refers network related modules and the NIC driver.

DannyG · December 6, 2020

Alright.. I got rid of the Band and Trunks.
Let see if I can stay up for more than 2 days.
Thanks for helping me out.

DannyG · December 10, 2020

UPDATE!

My Network configurations was redone, more simple as suggested. I basically removed the bond on my unraid server. And reconfigured both ports on my switch to 2 single ports with their own native vlans instead of 2 ports in a trunk group.

Instead of the usual 24-48 hours, I was up to 3 days! I wanted to wait for a 4th day before posting back with a success story,

Unfortunately, I never got to the 4th day.
My server was frozen this morning, which wasn't what was happening before.

I have included the syslog files.

syslog-10.0.10.10.log

Edited December 10, 2020 by DannyG

JorgeB · December 10, 2020

Do you known what time it crashed? Can't find anything crash related in the log.

DannyG · December 11, 2020

yes i do!

I just so happened to be logged in with GUI mode, and because the server was frozen, so was the clock.
it said 2:30pm

I think that clock is UTC, which is 4 hours behind my time, but the logs are in UTC.
I had already check though before posting.. and it looks like the logs stop at 1:30pm that day.

JorgeB · December 11, 2020

When there's nothing on the log it's usually a hardware issue, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

DannyG · January 14, 2021

so I just wanted to circle back on this issue.

I tried using Unraid in safemode with all VMs off and 99% of my dockers off. and it still half crashes after 25-30 hours. it's very consistant.

I'm not 100% convinced it's hardware issue...

But here's my work around for now.

I scripted a daily reboot. it happens at 5am.

This way, plex is always up and running at 6am for the kiddos, and everything is always working the rest of the day. everything reboots daily, which isn't as bad as i thought. A Clean reboot doesn't cause my large array to rescan all of it's drive all the time, so i'm back to kinda normal circumstances.

I really would like to find the root cause, but I'm sure that will surface on its own in due time.

Thanks for the help.

Edited January 19, 2021 by DannyG
spelling

My Server keeps half crashing

Recommended Posts

DannyG

Link to comment

DannyG

Link to comment

DannyG

Link to comment

JorgeB

Link to comment

DannyG

Link to comment

DannyG

Link to comment

JorgeB

Link to comment

DannyG

Link to comment

JorgeB

Link to comment

DannyG

Link to comment

DannyG

Link to comment

JorgeB

Link to comment

DannyG

Link to comment

JorgeB

Link to comment

DannyG

Link to comment

Join the conversation