unRAID still running, web & network not


Recommended Posts

Hi - bit of a noob question, but...

 

I'm copying lots of files to my new unRAID server from various machines on the network.  Twice now I appear to overloaded it, whereby I can't reach the server from the web interface or network browser.  However, I can still ping and telnet onto the box.

 

The first time I turned it off and on again, which obviously sparked off a parity sync.  It's now sitting there again, and I'm wondering whether there is a clean way I can restart it?

 

Be gentle - I know nothing about linux!

Link to comment
It's now sitting there again, and I'm wondering whether there is a clean way I can restart it?

Can you access the browser/http interface?

If so, there is a STOP button, and after that there will be a REBOOT or POWERDOWN button.

 

However, there is something else going on in your network if you are halting in the middle of a transfer.

See the wiki about posting a syslog.

Link to comment

odds are pretty good you've managed to run out of memory and Linux is killing off unused process to make room.

 

So, if you can telnet in, log in and then

 

1. Save a copy of the syslog to the flash drive.  (This is important as it will provide the clues needed to advise you on how to proceed.)

Attach the syslog to the next post you make in response to this thread.

 

To capture a syslog, type:

cp  /var/log/syslog  /boot/syslog_09-05-2008.txt

chmod  a-x  /boot/syslog_09-05-2008.txt

 

Next, to see if the web-interface is still running, type:

ps -ef

 

This will show you a process list.  One of the processes should be "/usr/local/sbin/emhttp"

 

If it exists, it should be able to serve up a web-page, if it does not exist, you can re-start it by typing:

/usr/local/sbin/emhttp &

 

Once started, you can use the web-interface to stop the array and then reboot.

 

If you still cannot get to the web-interface to shut down and reboot, type these commands to stop the array cleanly. (or as clean as possible)

/root/samba stop

for i in /dev/md*

do

killall `fuser -c $i`

umount $i

done

/root/mdcmd stop

reboot

 

Let us know how you do with re-gaining access via the web-interface.  And post the syslog, it is the only way to figure out what is happening.

 

Joe L.

Link to comment

Thanks for that guys - very helpful.  I performed a manual restart on this occasion, but I'll post a syslog up next time.  Something's not right, since it's happening quite often.  Hope it's not memory, since there's 2x1gb in there.

 

IIRC, if you enable user shares and put a ton of files at the top level, you will run out of memory.

 

Try turning user shares off, copy files over, clean up your directories, then turn user shares back on.

 

 

Bill

Link to comment

When you boot the server you can run the Memtest when the flash first starts to boot to test the memory.

 

You can set-up the box to reboot cleanly when the power button is pressed. Look here for some help http://lime-technology.com/forum/index.php?topic=2068.0

 

 

IIRC, if you enable user shares and put a ton of files at the top level, you will run out of memory.

 

You're talking files in the root directory of the disk, correct? ie lots of files in \\tower\disk1

Or do you mean in the first level of a user share? ie lots of files in say \\tower\disk1\Movies

 

Peter

 

Link to comment

Thanks for that guys.  Apologies for the delay in getting back to you, the unRAID server is in a rack under my stairs with no keyboard or screen.  I've therefore had to do a lot of shifting about.

 

I've disabled user shares.  I'm only copying from one location at a time.  I'm copying to either \\tower\disk1\dvds or \\tower\disk2\dvds.  I ran memtest and it passed fine.

 

It's still going down around once a day, and when I looked at the screen last night, it was locked up completely (see attached photo - sorry, it's a bit of a mess!).  Haven't been able to telnet into it since the original post, but now I've got a screen/keyboard there I can keep an eye on it.

 

Kit is...

Gigabyte GA-G33-DS3R motherboard

Intel Pentium Dual-Core, E2180 processor

2GB (2x1GB) Corsair TwinX XMS2 DDR2 memory

3 x 1TB (1000 GB) Western Digital WD10E (green power) drives

 

GAH!  Thanks :)

Link to comment

Thank you for including a screen shot of your kernel panic, and including some detail of your hardware setup.  There's one thing that is more important though, and that is to capture and post your syslog, something already mentioned by 2 others.  Please see JoeL's instructions, and do read the Troubleshooting link in my sig.  Try to capture one as soon as possible, so you will have at least one syslog saved, but it would be very helpful if you could attempt your normal file operations and copying for awhile, so we can see any corresponding errors that may be logged.  Capturing and posting a syslog with the errors that may be occurring just prior to a kernel panic would be the most helpful.

 

If you monitor the syslog with the tail command (discussed in the Viewing the System Log page), you should be able to tell when status of your server is degrading.  Don't wait to capture it too long then, because kernel panics halt the system.

Link to comment

Thanks - I've read that, and I'm currently waiting to get a syslog.  Now I've got a screen and keyboard on the machine it'll be much easier.  I had meant to put that in the previous post, but forgot - sorry.

 

However, I'm assuming that taking a syslog after a power cycle or before the problem starts will not be of much use?  I'm copying over files and trying to catch the system in a crashed-but-not-frozen state.  Unfortunately, the last couple of times have been whilst at work or in bed.

 

Have I understood correctly - apologies if not.  I've gone round and round reading so much information recently.

Link to comment

Thanks - I've read that, and I'm currently waiting to get a syslog.  Now I've got a screen and keyboard on the machine it'll be much easier.  I had meant to put that in the previous post, but forgot - sorry.

 

However, I'm assuming that taking a syslog after a power cycle or before the problem starts will not be of much use?  I'm copying over files and trying to catch the system in a crashed-but-not-frozen state.  Unfortunately, the last couple of times have been whilst at work or in bed.

 

Have I understood correctly - apologies if not.  I've gone round and round reading so much information recently.

We think the reason for your system crash is that you are running out of memory.  That can occur if you use the cache drive and it makes thousands and thousands of entries in the syslog as it moves the files.   It also can occur if errors occur and you fill the syslog to where it runs the OS out of memory. 

 

In that respect, and since it is log entries in the syslog that usually run unRAID out of memory, taking a copy now, while you are copying files and before you crash might be helpful.  RobJ might just spot the cause of the entries and suggest a fix.

 

As you already described, the syslog is in RAM, and once you have a kernel crash, it is gone.   You can use the "tail -f /var/log/syslog" command as described in the wiki to watch it on the console as it grows.   Then, the picture of the failure will have the activity leading up to the crash.

 

Joe L.

Link to comment

could also do

 

tail -f /var/log/syslog > /boot/syslog

 

to capture it statically.

(or modify the /etc/syslog.conf file directly). \

 

my /etc/syslog.conf has the following.

 

# Everything to syslog:
*.*                                                     -/var/log/syslog
*.*                                                     -/dev/tty12
*.*                                                     [email protected]

 

So it could be changed to write to /boot/syslog just as well. (just for the case of crash capture).

 

Hmmm, wonder if we should create a syslogon syslogoff script to do this

 

 

Link to comment

Your syslog looks perfectly normal.  It is not super-huge, so at least at this point, it is not showing anything as a clue to me.

 

I woud suggest the tail -f /var/log/syslog to see if you can capture more of what caused the kernel crash.  As I said earlier, it is usually caused by running out of memory, but it can also be a problem with the RAM itself, or, just about any hardware.  The kernel drivers used in unRAID are pretty stable... in fact, I don't know if I remember them causing any crash in past reports.  (poor performance on a specific network chipset family,  but not a crash)

 

Joe L.

Link to comment

I am folowing this with care, as you have the same family motherboard as mine (EP35-DS3R) and have the same kernel panic message as me under heavy work, with the same normal syslog. I just have a few more drives (15 first but could make it work without kernel panic, and now 12 where it seems to work better)

Link to comment

It seems to have been stable for the past 24 hours, which it hasn't done before.  Over the weekend I'm going to see if I can delve in a bit deeper, Since I'd just like to be able to throw stuff at it, and it copes (or at least doesn't fall on its back).  Adding two more drives next week too.

 

I was following your threads too, yodine, for exactly the same reasons.

Link to comment

After a disasterous weekend, I'm still none the wiser (disasterous, since I got two windows mixed up and emptied one of the drives, ::) Anyway...)

 

- I've been keeping the tail command open in a telnet window, and it's not reporting anything, right up until a kernel panic.  Should it show anything?

- I had it copying files for a whole day without failure (albeit rebooting several times, and only copying from one location)

- On other occasions, it can crash out on me straight into a kernel panic half way through the first DVD copy

- When I'm copying it's always off a USB drive on a networked machine, and it's only ever a couple of DVDs at a time (max 7).  Sometimes it'll copy the lot, sometimes it'll stop at any point in between.

- When it works, it's working fine.  When it crashes, it's going straight to kernel panic without letting me create a syslog

 

I'm not sure whether this is of use to anybody - I'm partly thinking out loud here :)  Should I copy as much as I dare, then create a syslog?

Link to comment

You should log in via telnet and then type:

tail -f /var/log/syslog

 

It will keep sending the syslog contents to your screen.  Do not log off of that window, leave it open.  that way, when the panic occurs, the last screen full of messages written to the syslog will be available for you to copy and paste to this thread.

 

Kernel panics are almost always caused by hardware,  usually memory, but a almost anything can be the culprit.  It can easily be the power supply... It may have plenty of amperage, but the noise level on some of the voltages may induce errors in memory when the disks are accessed a lot, or all at once in spikes .

 

Make sure you have all the motherboard screws installed and secure to the case.  Often they provide for shielding and if missing, you might be susceptible to power supply noise.

 

Joe L.

Link to comment

Thanks again Joe.  Yes, that's the tail command I've left running in the background, and as far as I can remember it hasn't listed anything after a ROOT LOGIN line.

 

It's running again now though, so I'll see what happens later.

 

All mounting screws were used when building, but I can't vouch for the quality of the PSU - one that came with the case.

Link to comment

I am folowing this with care, as you have the same family motherboard as mine (EP35-DS3R) and have the same kernel panic message as me under heavy work, with the same normal syslog. I just have a few more drives (15 first but could make it work without kernel panic, and now 12 where it seems to work better)

By the way, are you the same yodine from the MyMovies forum?

Link to comment

I tried the shutdown script below and got messages saying 'unmount' wasn't a known command.

Do I need to be in a specific directory before this will work?

(Sorry for being stupid, but I'm new to Linux)

 

/root/samba stop

for i in /dev/md*

do

killall `fuser -c $i`

umount $i

done

/root/mdcmd stop

reboot

 

Link to comment

I tried the shutdown script below and got messages saying 'unmount' wasn't a known command.

Do I need to be in a specific directory before this will work?

(Sorry for being stupid, but I'm new to Linux)

 

/root/samba stop

for i in /dev/md*

do

killall `fuser -c $i`

umount $i

done

/root/mdcmd stop

reboot

 

That might be because the command is "umount"  not "unmount"  (notice the correct spelling command is missing the first letter "n")

 

The issue is not that you are stupid, quite the opposite... you are smart, and you know how to spell. ;D 

You do not need to be in any special directory to use the command.

 

Joe L.

Link to comment
That might be because the command is "umount"  not "unmount"   (notice the correct spelling command is missing the first letter "n")

 

The issue is not that you are stupid, quite the opposite... you are smart, and you know how to spell. ;D 

You do not need to be in any special directory to use the command.

 

Joe L.

I have done exactly the same thing, and wondered what was going on.

 

Oddly enough my machine has been up for a few days now without problem, although I'm only copying files via an old machine with pre-USB2 interface.

 

Hopefully over the weekend I'll lift the lid and see whether all cables are snug.  I'm a bit worried that it's a general incompatibility with the motherboard (although I did clear it with another user on here, so I'll check what memory/power supply they have).

Link to comment
  • 2 weeks later...

Installed two more drives in the machine yesterday.  Whilst it was clearing it crashed, so I re-ran it using the tail -f /var/log/syslog > /boot/syslog command.

 

Find attached the results.  Does it say anything?

 

Fortunately, the machine is now at work, so it's easier to get up on the bench and work on.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.