Jump to content

unRAID unresponsive intermittently


JP

Recommended Posts

I've been using an unRAID server for years without much in the way of issues.  However, over the past week the server has become intermittently unresponsive 3 different times.  Upon reboot the first time there were ~500 parity errors, but they were all repaired and all the drives were seemingly good (green ball).

 

The process to get the server back online each time is the same.  For whatever reason when the server becomes unresponsive I can't even do a hard shutdown via the power button on the server.  It does nothing.  I have to switch the power supply off.  Then when I hit the power supply back on the server automatically comes to life without me having to hit the power button.  This may be due to how I have it configured to work with a UPS.  Anyway, from here the server is again unresponsive, but power is ON and fans are spinning.  There appears to be no hdd activity.  Then I have to do a hard shutdown via the power button and now it works?  Upon reboot for the second time it boots up just fine, does a parity check (once with errors, twice with no errors), and works as it should until the next "freeze."

 

Admittedly, I don't know a thing about Linux commands.  I am running unRAID version 5.0.5.  I've never run in to something like this before.  Could anyone offer some guidance on what I should be looking at to try and understand where the problem might reside? 

Link to comment
  • Replies 85
  • Created
  • Last Reply

I forgot to mention that for this third "freeze" I've brought the system back up using the same process reflected in post above, however, I'm not going to start the array.  I am trying to understand what next steps might be best for me to resolve the problem before doing so.  Again, any help is appreciated.

Link to comment

I would do a couple of things

 

First off would be to restart the system in safe mode and see if the problem continues.  If it doesn't then odds on its somewhere related to a plugin

 

It also wouldn't be a bad idea to run memtest on the system (at least one pass)

 

Assuming you have a monitor connected to the server (which you need to do any of the above), next time it freezes, see if there's anything on the screen and post a picture of it.

Link to comment

Thanks.  I don't have a monitor connected, but can certainly do that.  Just so I'm clear, I can't do a memtest or run the Server in safe mode via telnet?  I've never tried either before so it is new to me.

Link to comment

Thanks.  I don't have a monitor connected, but can certainly do that.  Just so I'm clear, I can't do a memtest or run the Server in safe mode via telnet?  I've never tried either before so it is new to me.

You select memtest or SAFE mode from the boot menu before unRAID starts.
Link to comment

The behaviour of turning 'automatically' on after powering down hard way is related to your bios settings.

 

If it doesn't boot fine this way it should be better to change so it remains power off after a power failure (this is the related setting name usually).

 

Instead... When the server apparently hangs, shares are still available? Can you ping it from Lan?

Are you still able to login to console directly and/or by telnet?

Link to comment

Thanks for the help.  I just powered down cleanly from the web interface to hook up a monitor to the server.  When I powered back up I did notice the memtest so I thought I would cleanly power down again and try to run it on the next reboot.  However, this time (and this is new behavior) when I attempted to power down clean it got hung again.  The monitor screenshot is below:

 

screenshot2.jpg

 

Fans are spinning and power is on, but I can't telnet in to it.  I don't know of a clean way to power it down from here so I guess I'll have to force it again to try and run the memtest and safe mode to see what information that might provide.

Link to comment

I tried to do a Memtest, but until I read about it I wasn't aware that you might need to have it run for 24 hours.  I had it run for a little while and everything apparently passed.  See below:

 

memtest.jpg

 

Also, this time, when I tried to go in to the BIOS and cleanly shut down the server it did just fine.  Confusing.  Anyway, here are the ACPI settings in the BIOS:

 

acpi.jpg

Link to comment

Strange.  From the screen shot Linux has finished closing down (which is why the console and/or telnet will not work), but for some reason the machine has not been powered off.

I got exactly the SAME behaviour when emht hanged and I still had "stock" powerdown.

 

Installing Powerdown plugin v.2.16, allowed me to powerdown (by CLI, CTRL+ALT+CANC or power button) fine even with emhttp hanged.

 

Reboot from CLI will work fine too and none of these will make unRAID to parity check on reboot.

 

Having a "strong" reboot command, is very important for me, since my unRAID system is often more than 25 KM from me...  :P

 

 

 

Link to comment

Drat...I just realized I posted this in the unRAID 6.0 forum.  I am using 5.0.5.  I hope an admin can move this to the correct forum.

 

I booted back and it is doing a parity check and finding a ton of sync errors.  1875 at the moment.  Again, a couple days ago it was around 500.  It seemingly fixes the sync errors, but I've got to say I don't remember getting these many sync errors until now.  If the system is freezing abruptly could it be because something and failing and the sync errors are a byproduct or does this point to the parity drive itself being an issue? 

 

Now I just noticed that something appears to have spun down data drive #2?  It now has a blinking green ball in the web interface, but there aren't any errors reflected.  Is this likely the problem?

 

The latest log file is attached is attached in case it helps.  Most of the data in the syslog files are like greek to me.  You'll see where unRAID spun down data drive #2 for some reason.

syslog2.txt

Link to comment

Your disk2 appears to be smaller than the others. When unRAID has finished checking parity against a drive it will spin down but other larger drives will still be involved in the parity check.

 

You have a lot of plugins, many of which are very old and no longer supported.

 

After the parity sync is complete boot in SAFE mode and see if you still have any problems. You won't have any plugins running but that may be part of your problem. Yes, I know they have been doing fine up till now.

Link to comment

Thanks trurl.  Happy to hear the data drive spinning down is normal.

 

I didn't really think I had that many plug-ins.  I really only use Control Panel, Sabnzbd, Sickbeard, Plex, Plex Server Updater, and APCUPS.

 

This thing is a mess and I really can't put my finger on where the issue is.  It just finished a parity check and released the same number of corrected sync errors as I mentioned before (1875) so at least there wasn't any more.  I let the server sit idle for a while and then copied a 2 gig file to the server.  Boom, the same problem resurfaced.  During the transfer the server froze up and now I'm left with forcing a hard shut down again.

 

I'll try to boot it in to safe mode and see what happens there. 

 

 

Link to comment

Could someone walk me through how to boot in to safe mode with os 5.0.5?  I thought it might be listed on the blue screen when you boot where memtest is listed, but it isn't, only memtest and unraid os.

Link to comment

Could someone walk me through how to boot in to safe mode with os 5.0.5?  I thought it might be listed on the blue screen when you boot where memtest is listed, but it isn't, only memtest and unraid os.

That is probably because you have only updated by copying across the bzroot and bzimage files and have not got the latest syslinux and associated configuration file.    The downloadable release now includes a syslinux folder and if you look at the syslinux.cfg file in there you will see there is an entry for running in Safe mode.

Link to comment

That is probably because you have only updated by copying across the bzroot and bzimage files and have not got the latest syslinux and associated configuration file.    The downloadable release now includes a syslinux folder and if you look at the syslinux.cfg file in there you will see there is an entry for running in Safe mode.

 

Thank you,, and I'm sorry for asking what I suspect is a simple question for most, but my knowledge of Linux is very weak. So what exactly do I need to do to get a safe mode option to show up?  Can I simply get the new release and copy the syslinux folder on to the flash drive?

Link to comment

You should be able to get the syslinux files from the latest v5 download. If you open up the zip you can see where they belong on your flash. You will probably also have to copy the make_bootable and run it.

 

The point to running in SAFE mode is to skip installation of all addons. You can accomplish the same thing by simply renaming the extra, plugins, and config/plugins folders. If any of these don't exist that's OK. And also rename the config/go file and get the one from the zip file instead.

 

Link to comment

Thanks trurl.  Please bear with me as I ask what I know are probably very basic questions:

 

- I'm at work and haven't actually looked at my own flash drive yet, but should I simply copy and paste the new syslinux (from unRAID's site) folder over the old syslinux folder?  I guess I could just rename the older folder just in case something doesn't work right.

 

- Also, I'm assuming you are saying why I need to run the make_bootable again is because it does something to the files in the syslinux folder...correct? 

 

- I just run the make_bootable batch file from windows in the root directory on the flash...correct?     

Link to comment

Thanks trurl.  Please bear with me as I ask what I know are probably very basic questions:

 

- I'm at work and haven't actually looked at my own flash drive yet, but should I simply copy and paste the new syslinux (from unRAID's site) folder over the old syslinux folder?  I guess I could just rename the older folder just in case something doesn't work right.

 

- Also, I'm assuming you are saying why I need to run the make_bootable again is because it does something to the files in the syslinux folder...correct? 

 

- I just run the make_bootable batch file from windows in the root directory on the flash...correct?   

Yes all correct.
Link to comment

Hey trurl.  I followed the steps, but got a "not a com32r image" error.  Should I have possibly used the original make_bootable file already on the flash?  I did make a backup of the flash before changing anything.  Also, there was no syslinux folder on the flash when I copied the new one to the flash.  There was a syslinux file in the root directory of the flash.

 

notacom32.jpg

Link to comment

That did it.  I deleted syslinux and syslinux.cfg from the root directory of the flash.  I did copy and paste the make_bootable file and ran it as an admin this time, but I'm already an admin on this machine so I'm not sure if that was the issue.  The resulting batch commands ran the same as before.

 

So this is good news.  I'll at least be able to test things away from the plugins in hopes of narrowing down what is causing the issue.  All the drives look good (green solid ball).  I guess I should just allow the parity check to run and then test things or is there a better recommendation before I wait on the parity check?

Link to comment

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...