[SOLVED] Server repeatedly becomes uresponsive


Recommended Posts

My server was working smoothly for almost a year.  A few months ago, every once in awhile it would become unresponsive--could not connect to shares, couldn't bring up the standard menu or unmenu in a browser using the name or IP, can't ping IP, and server shows no signs of activity.  A hard power cycle was the only way to bring it back up.  The past few weeks, though, this has happened with increasing frequency and comes on quicker.  I can't even get through a resync before it dies again.

 

Hardware specs:

Case: Antec 900.

Power Supply:  Enermax Modu 82+ 525W, Modular.

Motherboard: Gigabyte GA-MA74GM-S2

Processor: AMD Athlon 64 LE-1640

Ram: Transcend JETRAM 2GB 2 x 1GB DDR2 800

Drive Controller: Adaptec 1430SA PCI-E x4

Drives: 6 total, mix of 1 and 2 TB WDEARS plus a 1TB Maxtor and an older 160GB cache

Flash drive: 2GB Cruzer

 

Software:

Version 4.5.6

unMenu with about half the available plugins enabled

TwonkyMedia

 

go file:

#!/bin/bash
# Start the Management Utility
/usr/local/sbin/emhttp &

cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c 

# Directory caching
# sysctl vm.vfs_cache_pressure=1 off, until find out whether should run alongside cache_dirs
/boot/cache_dirs -d 10 -w -e "Backup"

#####################################
###  Wait for the array to start  ###
#####################################
# (before installing any packages that may expect the array to be fully started)
until `cat /proc/mdcmd 2>/dev/null | grep -q -a "STARTED" ` ; do echo ">>>waiting..." ; sleep 1 ; done ; echo ">>>STARTED."

#unMenu autostart cmd
/boot/unmenu/uu

# Twonky
/boot/custom/twonkymedia-i386-glibc-2.2.5-5.1/twonkymedia -inifile /boot/custom/twonkymedia-i386-glibc-2.2.5-5.1/twonkymedia-server.ini

 

 

I've read through the FAQ, Troubleshooting guide, and done my best to search the forum.  I'd appreciate any advice anyone has to offer.  I've attached logs from top and tail operations I left running up to the point of the latest crash, a syslog generated by powerdown at the last successful clean shutdown (20110629), and a syslog from a little after boot but before lockup (20110710).  If there's any more info I can provide, please let me know.

 

Should I go ahead and upgrade to the latest stable version?  I considered this and thought it might be better to get my server stabilized before attempting that but if others advice to the contrary, I can do that right away.

 

Thanks!

Mike

 

MK-unraid-log.zip

Link to comment

Looks like networking problems to me.

 

Jun 27 15:23:27 Unraid kernel: r8169: eth0: link down

Jun 27 15:23:27 Unraid ifplugd(eth0)[1309]: Link beat lost.

Jun 27 15:23:32 Unraid kernel: r8169: eth0: link up

Jun 27 15:23:33 Unraid ifplugd(eth0)[1309]: Link beat detected.

Jun 27 15:23:35 Unraid kernel: r8169: eth0: link down

Jun 27 15:23:35 Unraid ifplugd(eth0)[1309]: Link beat lost.

Jun 27 15:23:37 Unraid kernel: r8169: eth0: link up

Jun 27 15:23:38 Unraid ifplugd(eth0)[1309]: Link beat detected.

 

Plenty of alignment errors, too. Incompatibilities usually look much worse than this though, so perhaps check the cable, the switch port it's plugged into, or the switch/router itself.

Link to comment

Really? Could networking issues lock up the machine entirely?  Even on a local monitor and keyboard the thing is unresponsive.

If the system log fills up all available memory (with those error messages) Linux will terminate processes in an attempt to make more memory available.  It will terminate the processes that have been idle the longest...  typically, that will end up killing processes used to log in and to supply the management interface.

 

So yes, network errors can fill the system log and use all available memory, and subsequently lock up the machine.

Link to comment

Really? Could networking issues lock up the machine entirely?  Even on a local monitor and keyboard the thing is unresponsive.

 

I have had in the past had MB's lock up on me when the on-board NIC was failing. The NIC had not completely failed, but was dying, and it would lock up my computer (not my unRAID server, just another computer). It was somewhat frustrating, but eventually I figured it out, and diabled the on-board NIC, and added a PCI NIC I had lying around. Fixed it right up and had no problems from it since.

 

So, yes, networking issues can sometimes lock up a computer.

 

Bruce

Link to comment

I'd try the PCI NIC.  Adding more RAM isn't going to fix the problem, it might delay the onset of the issue but eventually you'll run out of space.

 

The analogy here is a leaking roof. If your roof was leaking would you fix the roof (fix the NIC) or would you just put a bigger bucket in the house to catch the water (add more RAM)?

Link to comment

I tried to cancel my d-link order and get an Intel but too late.  Oh well.  Here's something I wished had occurred to me earlier: couldn't I test this theory of network errors causing syslog overrun by unplugging the ethernet cable entirely, then booting and letting the array finish it's resync before I ever plug it back in?  Giving that a try today--got nothing else to do until NIC gets here tomorrow, anyway.

Link to comment

Tried that out.  Unplugged the ethernet cable, booted unraid, left it alone for 24 hours, just plugged back in and checked on it, looks to be running smoothly.  Now I've started a Parity Check.  I'm going to unplug the net-cable again, check on it tonight.  And my new NIC should be here today, so I'll install that ASAP.

 

Thanks for the help everyone.

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.