unRAID network stops responding


Recommended Posts

Recently I had a drive failure so I took the opportunity to upgrade to a bigger drive and add in a new drive. After a few issues that turned out to be a dead CMOS battery I got everything up and running fine. At this point the system was running for over a day without errors and had already rebuilt the replacement drive and added in the new drive, so I did a parity check and everything came back fine as well.  I proceeded to run a program that manages my video files artwork/info files and almost immediately the system stopped responding to the network.

 

Everything including the Web GUI, Telnet, Ping, and Samba stopped responding. I accessed the system directly and I am unable to find anything wrong. CPU, Ram, Temps are all normal. There were no errors with the ifconfig, and I could ping the loopback. No obvious errors in the logs except some minor SAS errors during boot, but nothing at the time of the incident. I rebooted the system and everything looked fine again so I thought it might have been a fluke, I re-ran the software and the system instantly stopped responding to network again. I am at a loss as there are no clear errors anywhere.

 

My system is running 6.1.4 and has 11 reiserfs drives and 1 xfs drive. I have 1 drive being managed by SNAP for PLEX. I have 1 PLEX (needo/plex:latest) docker but it is not running. I am using the following up to date plugins:

 

"SNAP non-array drive mount and share"

"NTFS-3G Package"

"Dynamix System Temperature"

"Dynamix System Statistics"

"Dynamix System Information"

"Dynamix Active Streams"

"Nerd Tools"

 

and of course:

 

"Dynamix webGui"

"unRAID Server OS"

 

 

The software I am using is "Ember Media Manager" http://forum.kodi.tv/showthread.php?tid=191781 however I do not believe it is the cause of the issue, I believe it must be ether a hardware issue or a Samba issue.

 

Any help would be appreciated as I am at a loss, please find attached my syslog.

syslog.txt

Link to comment

Ok I will try and remove those plugins and report how it goes. My issue with this though is if SNAP is somehow crashing or blocking network related processes, why is there nothing in the logs indicating this?

 

EDIT:

 

I have replaced SNAP with "Unassigned Devices" and everything works fine now. I would still love to know what SNAP did to crash the network without leaving any trace in the logs, but at least the system is working now.

 

Thanks itimpi (Y)

Link to comment

I have replaced SNAP with "Unassigned Devices" and everything works fine now. I would still love to know what SNAP did to crash the network without leaving any trace in the logs, but at least the system is working now.

 

It appears I spoke too soon. After getting the system in what I thought was a functional state I started doing some housekeeping with the files and I noticed unstable behaviour. The web GUI would be slow to load, files copied to the server would time out, attempts to watch movies stored on the server resulted in time outs as well.

 

I have still not been able to find anything resembling an error though, so I decided to try checking parity again. I stopped the PLEX docker, started a Parity check and went to bed. Now I just woke up and the system is in the same state as before. The system will not respond to HTTP, Telnet, or SMB traffic. Accessing the system directly shows nothing wrong, CPU Memory and such is fine.

 

I have not yet checked the logs again this morning but based on last time I do not think there will be anything different that what is in the logs I have already posted.

 

Any ideas what is going on? I'm willing to believe it is a hardware issue but if it is how can I narrow it down?

 

EDIT:

 

Some more information:

 

- I have verified valid ARP and route information in my router and my main system.

- I have verified that the PLEX docker is not running with "docker ps"

- I verified CPU and memory via TOP and free /mem

- I have stopped and restarted the network with "/etc/rc.d/rc.inet1 stop/start" after that I was still able to ping the local interface but I could get no response from http or telnet.

 

Only thing I can think of doing is to run a TCP dump to verify the traffic is hitting the interface but I have already verified traffic via the interface RX/TX counts.

 

I hope someone else has an idea about what could be going on because I am stumped.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.