morikaweb Posted November 30, 2015 Share Posted November 30, 2015 Recently I had a drive failure so I took the opportunity to upgrade to a bigger drive and add in a new drive. After a few issues that turned out to be a dead CMOS battery I got everything up and running fine. At this point the system was running for over a day without errors and had already rebuilt the replacement drive and added in the new drive, so I did a parity check and everything came back fine as well. I proceeded to run a program that manages my video files artwork/info files and almost immediately the system stopped responding to the network. Everything including the Web GUI, Telnet, Ping, and Samba stopped responding. I accessed the system directly and I am unable to find anything wrong. CPU, Ram, Temps are all normal. There were no errors with the ifconfig, and I could ping the loopback. No obvious errors in the logs except some minor SAS errors during boot, but nothing at the time of the incident. I rebooted the system and everything looked fine again so I thought it might have been a fluke, I re-ran the software and the system instantly stopped responding to network again. I am at a loss as there are no clear errors anywhere. My system is running 6.1.4 and has 11 reiserfs drives and 1 xfs drive. I have 1 drive being managed by SNAP for PLEX. I have 1 PLEX (needo/plex:latest) docker but it is not running. I am using the following up to date plugins: "SNAP non-array drive mount and share" "NTFS-3G Package" "Dynamix System Temperature" "Dynamix System Statistics" "Dynamix System Information" "Dynamix Active Streams" "Nerd Tools" and of course: "Dynamix webGui" "unRAID Server OS" The software I am using is "Ember Media Manager" http://forum.kodi.tv/showthread.php?tid=191781 however I do not believe it is the cause of the issue, I believe it must be ether a hardware issue or a Samba issue. Any help would be appreciated as I am at a loss, please find attached my syslog. syslog.txt Quote Link to comment
itimpi Posted November 30, 2015 Share Posted November 30, 2015 I am not sure that SNAP is compatible with the latest unRAID release. It has been superseded by the Unassigned Devices plugin. That plugin also includes NTFS-3G support so that does not need installing separately. Quote Link to comment
morikaweb Posted November 30, 2015 Author Share Posted November 30, 2015 Ok I will try and remove those plugins and report how it goes. My issue with this though is if SNAP is somehow crashing or blocking network related processes, why is there nothing in the logs indicating this? EDIT: I have replaced SNAP with "Unassigned Devices" and everything works fine now. I would still love to know what SNAP did to crash the network without leaving any trace in the logs, but at least the system is working now. Thanks itimpi (Y) Quote Link to comment
morikaweb Posted December 2, 2015 Author Share Posted December 2, 2015 I have replaced SNAP with "Unassigned Devices" and everything works fine now. I would still love to know what SNAP did to crash the network without leaving any trace in the logs, but at least the system is working now. It appears I spoke too soon. After getting the system in what I thought was a functional state I started doing some housekeeping with the files and I noticed unstable behaviour. The web GUI would be slow to load, files copied to the server would time out, attempts to watch movies stored on the server resulted in time outs as well. I have still not been able to find anything resembling an error though, so I decided to try checking parity again. I stopped the PLEX docker, started a Parity check and went to bed. Now I just woke up and the system is in the same state as before. The system will not respond to HTTP, Telnet, or SMB traffic. Accessing the system directly shows nothing wrong, CPU Memory and such is fine. I have not yet checked the logs again this morning but based on last time I do not think there will be anything different that what is in the logs I have already posted. Any ideas what is going on? I'm willing to believe it is a hardware issue but if it is how can I narrow it down? EDIT: Some more information: - I have verified valid ARP and route information in my router and my main system. - I have verified that the PLEX docker is not running with "docker ps" - I verified CPU and memory via TOP and free /mem - I have stopped and restarted the network with "/etc/rc.d/rc.inet1 stop/start" after that I was still able to ping the local interface but I could get no response from http or telnet. Only thing I can think of doing is to run a TCP dump to verify the traffic is hitting the interface but I have already verified traffic via the interface RX/TX counts. I hope someone else has an idea about what could be going on because I am stumped. Quote Link to comment
morikaweb Posted December 2, 2015 Author Share Posted December 2, 2015 Unfortunately I was too busy today to do any trouble shooting today so I do not have any update, however I have updated unRAID to 6.1.6 hopping that might somehow fix the issue. A copy of the syslog from this mornings event is here: http://pastebin.com/PpRfeLuk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.