Jump to content

Unraid crashing and taking down network


Haas360

Recommended Posts

Hello,

 

I have an Unraid system that seems to be crashing once a week at random intervals. When this happens it actually takes my entire network down. I am a IT administrator so it could be related to a broadcast storm, but the fact that unraid freezes locally on the monitor leads me to believe we have a bigger issue. I am about to purchase the product but wont unless we can fix this bug.  Thanks!

Link to comment

It seems to me that I have heard of a couple of other cases of this happening.  What happens  if you disconnect the cat 5 cable from the server after the network lockup?  Will this unlock the server?

 

You should probably also install the 'Fix Common Problems' plugin.  Then turn on the trouble shooting mode. Hopefully, it will catch something happening that will provide some  insight as to what is going on. 

Link to comment
45 minutes ago, Frank1940 said:

It seems to me that I have heard of a couple of other cases of this happening.  What happens  if you disconnect the cat 5 cable from the server after the network lockup?  Will this unlock the server?

 

You should probably also install the 'Fix Common Problems' plugin.  Then turn on the trouble shooting mode. Hopefully, it will catch something happening that will provide some  insight as to what is going on. 

It will unfreeze my entire network instantly, however the console on Unraid will NOT come back. 

 

I have also disabled C states to see if that fixes anything. No difference so far. 

 

Something i have noticed, I have been running for 4 days with my W10 VM disabled. No crashes so far... Might be related? 

Link to comment

I just had this exact same thing happen to me the other day. I've been running this unraid server for 4 or 5 years without incident (extremely stable). For what ever reason, it became non-responsive and took out my network. Didn't realize the two were connected until after I got it back up and running. Ended up having to take it down hard, did a parity check (8 hours) and everything seems fine now. 

Link to comment
8 hours ago, jonp said:

Please obtain your system diagnostics from the tools > diagnostics page in the webgui and post them here. This will give us more insight into your system configuration.

Sent from my SM-G930P using Tapatalk
 

 

 

You can check this out while you're at it.  I have had the same issue for a while now. I'm not convinced it's a crash because that wouldn't flood the network. So, I suspect it's some kind issue with a process taking over all the resources to the point nothing else will work.

 

I have limited the docker container resources and updated the motherboard bios a number of times and it;s finally been more stable. But, I don't believe it's perfect yet.

 

I have tried tailing the syslog locally to catch it and there is nothing.

 

Here is a previous thread

 

 

I've had this issue too.

 

 

mediaserver-diagnostics-20170803-2147.zip

Link to comment
  • 3 weeks later...

Have had this a few times myself. Thought it was a VM or Docker so I tried disabling different ones to no effect. Completely locks up the server. There are a few things that I've noticed looking at the logs. http://prntscr.com/gbhgis Bunch of enables/disables and promiscuous/blocking on boot up. Maybe after awhile something gets out of wack and locks a port?

 

Also I've noticed FTP keeps turning to Enabled on every reboot. Even after changing it to disabled.

cloud-diagnostics-20170822-0047.zip

Link to comment

Hi All this seems to be a common theme as I am having the same problems (ps I am really new at this unraid 6 and this formum stuf). In my case I am loading a 6TB stack with movies and videos from a patriot Valkyrie and other drives. The Network link drops out for no apparent reason. I was getting call track errors but these appear to have stopped but have been replaces with wrong csrf Tockens. I recently loaded cAdvisor into the Docker summary and note that although the Dashboard indicated <5% mem usage the dial indicator in cAdvisor zoomed up to 95% (i have 16GiB) and when it hits 99 the network drops out (see zip of word doc with screen shots also has system overview) with a 50:50 chance of the server freezing. this also happens when watch 1080p video through Plex but is OK for  720p or lower. I am running Plex, Dolfin and cAdvisor in Docker

 

appreciate any advice/ solutions.

Erratum

They say when all else fails read the instructions. my docker virtual disk is at “/mnt/user/docker.img which the manual says is not a valid location - can I drag and drop this to the cashe disc with Dolfin? or do I start from scratch by disabling docker and reset from start? 

 

 

tower-syslog-20170822-1606 (1).zip

tower-diagnostics-20170822-1607.zip

Unraid 6 issues.zip

Link to comment

Well I really mucked that up without waiting for advice I disabled Docker and reset the docker path to /mnt/cache/docker/docker.img deleted the previous image now I get the message

 

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 673 Couldn't create socket: [111] Connection refused Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 853

 

And any attempt to install Plex, Dolphin or cAdvisor fail.

Link to comment
  • 2 weeks later...
On 8/3/2017 at 11:46 AM, jonp said:

Please obtain your system diagnostics from the tools > diagnostics page in the webgui and post them here. This will give us more insight into your system configuration.

Sent from my SM-G930P using Tapatalk
 

Hello sir. Can we get an update?

Link to comment
On 8/30/2017 at 0:11 PM, Haas360 said:

Hello sir. Can we get an update?

 

Just reviewed your logs and I can't seem to find any smoking gun.  My guess is that these diagnostics are coming from a time where unRAID has booted, but hasn't crashed.  A few things on this issue:

 

#1:  We have not been able to reproduce this internally.

This is by far the biggest problem preventing us from further troubleshooting / diagnosis.  If we can't recreate it, then we can't fix it.  The reason we can't recreate it I think is due to the fact that its hardware-specific to either the network equipment or server gear (or maybe both) being used.

 

#2:  Could be an ARP storm.  Login to the webGui and check your network settings (click Help if need be).

There is a note about ARP storms in the Help on the network settings page and I believe that may be the cause of your issue.  Try toggling this setting and see if that helps.

 

#3:  If all else fails, attach a monitor and keyboard directly to the server and type "tail /var/log/syslog -f" and when it crashes, take a picture of what's on the screen and post it here.

This is the only way we can see what happened right before a complete server crash.

Link to comment
9 hours ago, jonp said:

 

Just reviewed your logs and I can't seem to find any smoking gun.  My guess is that these diagnostics are coming from a time where unRAID has booted, but hasn't crashed.  A few things on this issue:

 

#1:  We have not been able to reproduce this internally.

This is by far the biggest problem preventing us from further troubleshooting / diagnosis.  If we can't recreate it, then we can't fix it.  The reason we can't recreate it I think is due to the fact that its hardware-specific to either the network equipment or server gear (or maybe both) being used.

 

#2:  Could be an ARP storm.  Login to the webGui and check your network settings (click Help if need be).

There is a note about ARP storms in the Help on the network settings page and I believe that may be the cause of your issue.  Try toggling this setting and see if that helps.

 

#3:  If all else fails, attach a monitor and keyboard directly to the server and type "tail /var/log/syslog -f" and when it crashes, take a picture of what's on the screen and post it here.

This is the only way we can see what happened right before a complete server crash.

 

 

Well, after a server crashes and doesn't respond it's rather hard to get the syslog.

 

#2 - I don't see any ARP storm settings on the Network settings page. Is this new in the beta versions???

 

#3 - For me this didn't produce any results. No new lines appeared before the crash/lockup.

 

Link to comment
16 hours ago, lionelhutz said:

 

 

Well, after a server crashes and doesn't respond it's rather hard to get the syslog.

 

#2 - I don't see any ARP storm settings on the Network settings page. Is this new in the beta versions???

 

#3 - For me this didn't produce any results. No new lines appeared before the crash/lockup.

 

 

It looks like I misspoke about the setting.  I don't see it either after having reviewed a few test systems.

 

4 hours ago, brando56894 said:

I've noticed multiple times that tail stops following the log after a few hundred lines and has to be restarted.

 

I've never had this problem tailing a syslog locally using a monitor and keyboard.

Link to comment
  • 2 years later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...