TedatTNT Posted December 29, 2015 Share Posted December 29, 2015 I've no idea how to read the syslog, or what to search for. Can anyone tell me if there is anything obvious in it that might be causing it to become unresponsive either by the GUI or by telnet? Here is a short history: About 6 mos. ago, Unraid was becoming unresponsive every few minutes. I replaced the power supply (which was a crappy one) and things were better. About 3 mos. ago, Unraid was becoming unresponsive every few minutes. someone pointed out to me that my router was pinging and doing DHCP stuff about every 6 seconds -- router firmware issue. I changed it and now it talks to the router each hour. Also, they mentioned a possible issue with the NIC. I moved to the other NIC on the board. It was reliable for months. Last week, I connected to a share from a computer that plays music. The player was not working great, and so I mapped a network drive to the share and launched Windows Media Player and let it build the library based on that share. Since then, Unraid becomes unresponsive every few minutes. I've restarted it about 20 times or more and it works for a few minutes, then no GUI and no telnet connection. Any help is GREATLY appreciated, because I'm in the middle of a video project for work and all my files are stored on the Unraid server and I can no longer access them. unraid-syslog-20151229-0835.zip Quote Link to comment
itimpi Posted December 29, 2015 Share Posted December 29, 2015 There is nothing obvious from the syslog that I can see, but the log supplied just covers the boot up sequence. Ideally you would obtain the diagnostics (via Tools->Diagnostics or by running the 'diagnostics' command from a command line session) as this provides much more information on your setup than just the syslog and get it at a point where a problem is occurring. However if you are losing all access this is obviously not possible. Do you have a monitor/keyboard attached. Your description sounds rather like a hardware issue that is causing the server to crash. If so something might be displayed on an attached monitor. It could be worth booting into the memory test option and letting that run for some hours as failing RAM can have all sorts of unpredictable side-effects. Quote Link to comment
TedatTNT Posted December 29, 2015 Author Share Posted December 29, 2015 I've attached the diagnostics file. After disconnecting the mapped drive and removing the link to the share and avoiding browsing/using the files, it has kept running for hours. I DID see, right before one crash, that Disc 3 showed that it was running hot. I've ordered a couple of spare drives in case that is the issue -- also because a couple of the drives are pretty full. I do have a monitor/keyboard attached. The next time it goes down, I'll try the memory test. unraid-diagnostics-20151229-1357.zip Quote Link to comment
TedatTNT Posted December 30, 2015 Author Share Posted December 30, 2015 Any thoughts? Quote Link to comment
TedatTNT Posted January 4, 2016 Author Share Posted January 4, 2016 Okay, this it the follow up... It appeared that Disk 3 was showing some errors and was overheating regularly. I ordered some new drives and installed one over the weekend in place of Disk 3. The system rebuilt to the new drive and parity was restored. Although this MAY have been an issue, it was not THE issue that has been causing me issues... I then navigate to a share from my PC, open a folder, then another, then another which contains about 40 photos from Christmas. I open a photo in the MS Photos application and begin scrolling through them - quickly. before I get through them all, I lose my connection to the share, the GUI won't load, and Telnet is useless. I used the console with a local keyboard to obtain the current syslog using the tail command. It reported the following: Jan 4 12:30:07 UNRAID ntpd[1342]: new interface(s) found: waking up resolver Jan 4 12:31:25 UNRAID kernel: sky2 0000:02:00.0 eth0: tx timeout Jan 4 12:31:25 UNRAID kernel: sky2 0000:02:00.0 eth0: transmit ring 1 .. 26 report=1 done=1 Jan 4 12:31:27 UNRAID ntpd[1342]: Deleting interface #20 eth0, 192.168.1.150#123, interface stats: received=0, sent=1, dropped=0, active_time=80 secs Jan 4 12:31:27 UNRAID ntpd[1342]: 198.55.111.50 local addre 192.168.1.150 -> <null> Jan 4 12:31:28 UNRAID kernel: sky2 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex, flow control rx Jan 4 12:31:30 UNRAID ntpd[1342]: Listen normally on 21 eth0 192.168.1.150:123 Jan 4 12:31:30 UNRAID ntpd[1342]: new interface(s) found: waking up resolver -Then I sent a poweroff command... It logged the follwoing: -eth0 tx timeout -transmit ring 1 .. 26 report=1, done=1 -Deleting interface #21 eth0, 192.168.1.150#123, received 0, sent 2, dropped 0, active time 75secs -198.55.11.50 local address 192.168.1.150 -> <null> -eth0: Link is up at 1000Mbps, full duplex, flow control rx -Listen normally on 22 eth0 192.168.1.150:123 -new interface found - waking up resolver -eth0 tx timeout ...and so on. Seems to happen every minute. This may be DHCP - if so, I may assign a static IP after all. Any other thoughts? Quote Link to comment
TedatTNT Posted January 4, 2016 Author Share Posted January 4, 2016 memtest completed a pass -- no errors found. Quote Link to comment
TedatTNT Posted January 4, 2016 Author Share Posted January 4, 2016 Latest Diagnostics -- Any suggestions? unraid-diagnostics-20160104-1634.zip Quote Link to comment
UncleDirtNap Posted January 4, 2016 Share Posted January 4, 2016 I've no idea how to read the syslog, or what to search for. Can anyone tell me if there is anything obvious in it that might be causing it to become unresponsive either by the GUI or by telnet? Here is a short history: About 6 mos. ago, Unraid was becoming unresponsive every few minutes. I replaced the power supply (which was a crappy one) and things were better. About 3 mos. ago, Unraid was becoming unresponsive every few minutes. someone pointed out to me that my router was pinging and doing DHCP stuff about every 6 seconds -- router firmware issue. I changed it and now it talks to the router each hour. Also, they mentioned a possible issue with the NIC. I moved to the other NIC on the board. It was reliable for months. Last week, I connected to a share from a computer that plays music. The player was not working great, and so I mapped a network drive to the share and launched Windows Media Player and let it build the library based on that share. Since then, Unraid becomes unresponsive every few minutes. I've restarted it about 20 times or more and it works for a few minutes, then no GUI and no telnet connection. Any help is GREATLY appreciated, because I'm in the middle of a video project for work and all my files are stored on the Unraid server and I can no longer access them. Just out of curiosity what version are you running and have you upgraded recently? I'm one of a couple of people who've posted a similar problem where in we've been running version 5 with no problems whatsoever for an extended period of time then after upgrading to 6 started having problems with our servers becoming unresponsive. In my case not as frequently as you, after approximately three days of being up with no errors or problems it just stops responding. The last time it happened I had access to the console for a short time before it completely locked up and noted a process had pegged my CPU at 100% for and extended period before it went completely dark. I know I can resolve this problem by rolling back to version 5 but I sure hate to now that I have the docker running SABNZBd and Sonarr. Quote Link to comment
TedatTNT Posted January 4, 2016 Author Share Posted January 4, 2016 I'm on Ver. 6.1.6 -- upgraded several weeks ago from 6.1.?. As long as I don't access the data, mine seems to run forever. So, I suppose one fix would be to take it off the network and then it will keep my data secure. Quote Link to comment
althoralthor Posted January 4, 2016 Share Posted January 4, 2016 Hi TedatTNT- I think i read that you are using DHCP for your server? I would probably recommend a static IP address anyway. Not sure that this would be causing the issues you describe but at least it would eliminate that variable. Quote Link to comment
TedatTNT Posted January 5, 2016 Author Share Posted January 5, 2016 I think i read that you are using DHCP for your server? I would probably recommend a static IP address anyway. Can you tell me why? My IP address doesn't change -- it is reserved by the router using the MAC address. Perhaps it is my networking background, but I RARELY like to use static IP's and much prefer reserving them on the router/switch upstream from them in case of network changes that I make. If there REALLY is a benefit to having a static IP, I'll change it, but I'd rather not change if it if the only concern is that the server could be assigned a different IP -- my reservation on the router will prevent that. Thanks for any clarification on this. Ted Quote Link to comment
CHBMB Posted January 5, 2016 Share Posted January 5, 2016 I think i read that you are using DHCP for your server? I would probably recommend a static IP address anyway. Can you tell me why? My IP address doesn't change -- it is reserved by the router using the MAC address. Perhaps it is my networking background, but I RARELY like to use static IP's and much prefer reserving them on the router/switch upstream from them in case of network changes that I make. If there REALLY is a benefit to having a static IP, I'll change it, but I'd rather not change if it if the only concern is that the server could be assigned a different IP -- my reservation on the router will prevent that. Thanks for any clarification on this. Ted Ted, what you're doing is just fine, you've essentially got a static IP with the setup you've got albeit router side rather than server side. FWIW I do exactly the same as you. I much prefer handling IP addresses all in one location at the router than changing individual machines one by one... Quote Link to comment
trurl Posted January 5, 2016 Share Posted January 5, 2016 I always reserve IP by MAC on my router also and never have a problem. What does your router have for lease time? Quote Link to comment
TedatTNT Posted January 5, 2016 Author Share Posted January 5, 2016 Lease time WAS 1 day -- I just changed it to 10 days. Quote Link to comment
althoralthor Posted January 5, 2016 Share Posted January 5, 2016 Missed that you had it reserved on your DHCP server. Thats what i do as well. Quote Link to comment
TedatTNT Posted January 5, 2016 Author Share Posted January 5, 2016 Okay, everything is up and running, new drive is in place and parity is again established. Memtest found no errors. It has been running overnight and this is my latest diagnostic file. I really don't understand what I'm looking at -- does anyone understand how to read these, and does it indicate any issue? I'm still certain that I can crash Unraid by quickly scrolling through pictures in a folder from my computer. Oh, and just to restate, by crash, what I mean is that it becomes inaccessible -- no shares on the network, no drives on the network, no GUI, and no Telnet access. I can still interact with the console, but that is all. Ted unraid-diagnostics-20160105-0836.zip Quote Link to comment
trurl Posted January 5, 2016 Share Posted January 5, 2016 ... Oh, and just to restate, by crash, what I mean is that it becomes inaccessible -- no shares on the network, no drives on the network, no GUI, and no Telnet access. I can still interact with the console, but that is all. Have you tried network access from a different computer when this happens? Quote Link to comment
grither Posted January 5, 2016 Share Posted January 5, 2016 i have something a bit related... every few days, the server becomes un-responsive after i upgraded to 6.1.6. interestingly, i can acccess sickbeard and plex (running from dockers) but can't access sab, or the unraid webgui, or any shares. very frustrating, i have to power down, that's the only way to restore connection Quote Link to comment
HellDiverUK Posted January 5, 2016 Share Posted January 5, 2016 To me, it looks like the NIC is crapping out and restarting. It's not a Realtek by any chance? What hardware are you running (particularly motherboard), TedatTNT? Quote Link to comment
TedatTNT Posted January 5, 2016 Author Share Posted January 5, 2016 My board is the Asus P5P-DH Deluxe - with dual LAN (Gb) 2 x Marvell 88E8053 Gigabit LAN Controller, both featuring AI NET2 Quote Link to comment
TedatTNT Posted January 5, 2016 Author Share Posted January 5, 2016 trurl - yes, when I lose the shares/drives and the web GUI, I lose it from each PC that can normally access it. Quote Link to comment
TedatTNT Posted January 7, 2016 Author Share Posted January 7, 2016 Okay, I have tried several tests, accessing (or, attempting to access) the server from the console, telnet, and the web GUI from multiple computers. I've checked through logs, I've monitored the drives, and I have performed MANY restarts. I am thinking that my problem is between the onboard NIC (dual GB Marvell 88E8053 - approved hardware) and Unraid. I checked the BIOS. I am doing no overclocking or AI stuff. I'd originally disabled most extra features or components that I'm not using. I just set DRAM ECC to auto instead of disabled (in case it helps), but I'm guessing since I'm not using server-class RAM, this won't do anything. I've switched to the other NIC, and I have the both enabled in BIOS (previously, only the other was enabled). FYI, I'm running a Core2 Quad 6600 processor and 8Gb of RAM on this build. As of right now, the method that I'd used a dozen times to test reliability, only to lose connection to the server, is not working. I can't break it. So, the only thing different is the ECC change and moving to a different NIC. Fingers crossed... Quote Link to comment
TedatTNT Posted January 9, 2016 Author Share Posted January 9, 2016 Follow up -- Days later, still working well... Quote Link to comment
snowmangoh Posted December 1, 2016 Share Posted December 1, 2016 i have something a bit related... every few days, the server becomes un-responsive after i upgraded to 6.1.6. interestingly, i can acccess sickbeard and plex (running from dockers) but can't access sab, or the unraid webgui, or any shares. very frustrating, i have to power down, that's the only way to restore connection Did this ever resolve for you? My issue is similar. I can connect to the apps that started, but my dockers never start, and when I hang up the log just says, "ntpd[1577]: new interface(s) found: waking up resolver" and never moves past that... everything was working fine, and then this started randomly happening! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.