My Unraid is Unreliable? [Maybe Solved??]


TedatTNT

Recommended Posts

I've no idea how to read the syslog, or what to search for. Can anyone tell me if there is anything obvious in it that might be causing it to become unresponsive either by the GUI or by telnet? Here is a short history:

 

About 6 mos. ago, Unraid was becoming unresponsive every few minutes. I replaced the power supply (which was a crappy one) and things were better.

 

About 3 mos. ago, Unraid was becoming unresponsive every few minutes. someone pointed out to me that my router was pinging and doing DHCP stuff about every 6 seconds -- router firmware issue. I changed it and now it talks to the router each hour. Also, they mentioned a possible issue with the NIC. I moved to the other NIC on the board. It was reliable for months.

 

Last week, I connected to a share from a computer that plays music. The player was not working great, and so I mapped a network drive to the share and launched Windows Media Player and let it build the library based on that share. Since then, Unraid becomes unresponsive every few minutes. I've restarted it about 20 times or more and it works for a few minutes, then no GUI and no telnet connection.

 

Any help is GREATLY appreciated, because I'm in the middle of a video project for work and all my files are stored on the Unraid server and I can no longer access them.

unraid-syslog-20151229-0835.zip

Link to comment

There is nothing obvious from the syslog that I can see, but the log supplied just covers the boot up sequence.  Ideally you would obtain the diagnostics (via Tools->Diagnostics or by running the 'diagnostics' command from a command line session) as this provides much more information on your setup than just the syslog and get it at a point where a problem is occurring.  However if you are losing all access this is obviously not possible.

 

Do you have a monitor/keyboard attached.  Your description sounds rather like a hardware issue that is causing the server to crash.  If so something might be displayed on an attached monitor.  It could be worth booting into the memory test option and letting that run for some hours as failing RAM can have all sorts of unpredictable side-effects.

Link to comment

I've attached the diagnostics file.

 

After disconnecting the mapped drive and removing the link to the share and avoiding browsing/using the files, it has kept running for hours. I DID see, right before one crash, that Disc 3 showed that it was running hot. I've ordered a couple of spare drives in case that is the issue -- also because a couple of the drives are pretty full.

 

I do have a monitor/keyboard attached. The next time it goes down, I'll try the memory test.

unraid-diagnostics-20151229-1357.zip

Link to comment

Okay, this it the follow up...

 

It appeared that Disk 3 was showing some errors and was overheating regularly. I ordered some new drives and installed one over the weekend in place of Disk 3. The system rebuilt to the new drive and parity was restored. Although this MAY have been an issue, it was not THE issue that has been causing me issues...

 

I then navigate to a share from my PC, open a folder, then another, then another which contains about 40 photos from Christmas. I open a photo in the MS Photos application and begin scrolling through them - quickly. before I get through them all, I lose my connection to the share, the GUI won't load, and Telnet is useless.

 

I used the console with a local keyboard to obtain the current syslog using the tail command. It reported the following:

 

Jan 4 12:30:07 UNRAID ntpd[1342]: new interface(s) found: waking up resolver

Jan 4 12:31:25 UNRAID kernel: sky2 0000:02:00.0 eth0: tx timeout

Jan 4 12:31:25 UNRAID kernel: sky2 0000:02:00.0 eth0: transmit ring 1 .. 26 report=1 done=1

Jan 4 12:31:27 UNRAID ntpd[1342]: Deleting interface #20 eth0, 192.168.1.150#123, interface stats: received=0, sent=1, dropped=0, active_time=80 secs

Jan 4 12:31:27 UNRAID ntpd[1342]: 198.55.111.50 local addre 192.168.1.150 -> <null>

Jan 4 12:31:28 UNRAID kernel: sky2 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex, flow control rx

Jan 4 12:31:30 UNRAID ntpd[1342]: Listen normally on 21 eth0 192.168.1.150:123

Jan 4 12:31:30 UNRAID ntpd[1342]: new interface(s) found: waking up resolver

 

 

 

-Then I sent a poweroff command...

 

It logged the follwoing:

-eth0 tx timeout

-transmit ring 1 .. 26 report=1, done=1

-Deleting interface #21 eth0, 192.168.1.150#123, received 0, sent 2, dropped 0, active time 75secs

-198.55.11.50 local address 192.168.1.150 -> <null>

-eth0: Link is up at 1000Mbps, full duplex, flow control rx

-Listen normally on 22 eth0 192.168.1.150:123

-new interface found - waking up resolver

-eth0 tx timeout

 

...and so on. Seems to happen every minute. This may be DHCP - if so, I may assign a static IP after all.

 

Any other thoughts?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Link to comment

I've no idea how to read the syslog, or what to search for. Can anyone tell me if there is anything obvious in it that might be causing it to become unresponsive either by the GUI or by telnet? Here is a short history:

 

About 6 mos. ago, Unraid was becoming unresponsive every few minutes. I replaced the power supply (which was a crappy one) and things were better.

 

About 3 mos. ago, Unraid was becoming unresponsive every few minutes. someone pointed out to me that my router was pinging and doing DHCP stuff about every 6 seconds -- router firmware issue. I changed it and now it talks to the router each hour. Also, they mentioned a possible issue with the NIC. I moved to the other NIC on the board. It was reliable for months.

 

Last week, I connected to a share from a computer that plays music. The player was not working great, and so I mapped a network drive to the share and launched Windows Media Player and let it build the library based on that share. Since then, Unraid becomes unresponsive every few minutes. I've restarted it about 20 times or more and it works for a few minutes, then no GUI and no telnet connection.

 

Any help is GREATLY appreciated, because I'm in the middle of a video project for work and all my files are stored on the Unraid server and I can no longer access them.

 

Just out of curiosity what version are you running and have you upgraded recently?

 

I'm one of a couple of people who've posted a similar problem where in we've been running version 5 with no problems whatsoever for an extended period of time then after upgrading to 6 started having problems with our servers becoming unresponsive.  In my case not as frequently as you, after approximately three days of being up with no errors or problems it just stops responding.  The last time it happened I had access to the console for a short time before it completely locked up and noted a process had pegged my CPU at 100% for and extended period before it went completely dark.

 

I know I can resolve this problem by rolling back to version 5 but I sure hate to now that I have the docker running SABNZBd and Sonarr. 

Link to comment
I think i read that you are using DHCP for your server?  I would probably recommend a static IP address anyway.

 

Can you tell me why?

 

My IP address doesn't change -- it is reserved by the router using the MAC address. Perhaps it is my networking background, but I RARELY like to use static IP's and much prefer reserving them on the router/switch upstream from them in case of network changes that I make.

 

If there REALLY is a benefit to having a static IP, I'll change it, but I'd rather not change if it if the only concern is that the server could be assigned a different IP -- my reservation on the router will prevent that.

 

Thanks for any clarification on this.

 

Ted

Link to comment

I think i read that you are using DHCP for your server?  I would probably recommend a static IP address anyway.

 

Can you tell me why?

 

My IP address doesn't change -- it is reserved by the router using the MAC address. Perhaps it is my networking background, but I RARELY like to use static IP's and much prefer reserving them on the router/switch upstream from them in case of network changes that I make.

 

If there REALLY is a benefit to having a static IP, I'll change it, but I'd rather not change if it if the only concern is that the server could be assigned a different IP -- my reservation on the router will prevent that.

 

Thanks for any clarification on this.

 

Ted

 

Ted, what you're doing is just fine, you've essentially got a static IP with the setup you've got albeit router side rather than server side.

 

FWIW I do exactly the same as you.  I much prefer handling IP addresses all in one location at the router than changing individual machines one by one...

Link to comment

Okay, everything is up and running, new drive is in place and parity is again established. Memtest found no errors. It has been running overnight and this is my latest diagnostic file.

 

I really don't understand what I'm looking at -- does anyone understand how to read these, and does it indicate any issue? I'm still certain that I can crash Unraid by quickly scrolling through pictures in a folder from my computer.

 

Oh, and just to restate, by crash, what I mean is that it becomes inaccessible -- no shares on the network, no drives on the network, no GUI, and no Telnet access. I can still interact with the console, but that is all.

 

Ted

unraid-diagnostics-20160105-0836.zip

Link to comment

...

Oh, and just to restate, by crash, what I mean is that it becomes inaccessible -- no shares on the network, no drives on the network, no GUI, and no Telnet access. I can still interact with the console, but that is all.

Have you tried network access from a different computer when this happens?
Link to comment

i have something a bit related... every few days, the server becomes un-responsive after i upgraded to 6.1.6.  interestingly, i can acccess sickbeard and plex (running from dockers) but can't access sab, or the unraid webgui, or any shares.  very frustrating, i have to power down, that's the only way to restore connection

Link to comment

Okay, I have tried several tests, accessing (or, attempting to access) the server from the console, telnet, and the web GUI from multiple computers. I've checked through logs, I've monitored the drives, and I have performed MANY restarts.

 

I am thinking that my problem is between the onboard NIC (dual GB Marvell 88E8053 - approved hardware) and Unraid.

 

I checked the BIOS. I am doing no overclocking or AI stuff. I'd originally disabled most extra features or components that I'm not using. I just set DRAM ECC to auto instead of disabled (in case it helps), but I'm guessing since I'm not using server-class RAM, this won't do anything. I've switched to the other NIC, and I have the both enabled in BIOS (previously, only the other was enabled).

 

FYI, I'm running a Core2 Quad 6600 processor and 8Gb of RAM on this build.

 

As of right now, the method that I'd used a dozen times to test reliability, only to lose connection to the server, is not working. I can't break it.

 

So, the only thing different is the ECC change and moving to a different NIC.

 

Fingers crossed... :-\

 

 

Link to comment
  • 10 months later...

i have something a bit related... every few days, the server becomes un-responsive after i upgraded to 6.1.6.  interestingly, i can acccess sickbeard and plex (running from dockers) but can't access sab, or the unraid webgui, or any shares.  very frustrating, i have to power down, that's the only way to restore connection

 

Did this ever resolve for you? My issue is similar. I can connect to the apps that started, but my dockers never start, and when I hang up the log just says, "ntpd[1577]: new interface(s) found: waking up resolver" and never moves past that... everything was working fine, and then this started randomly happening!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.