JP Posted June 7, 2015 Share Posted June 7, 2015 I've been using an unRAID server for years without much in the way of issues. However, over the past week the server has become intermittently unresponsive 3 different times. Upon reboot the first time there were ~500 parity errors, but they were all repaired and all the drives were seemingly good (green ball). The process to get the server back online each time is the same. For whatever reason when the server becomes unresponsive I can't even do a hard shutdown via the power button on the server. It does nothing. I have to switch the power supply off. Then when I hit the power supply back on the server automatically comes to life without me having to hit the power button. This may be due to how I have it configured to work with a UPS. Anyway, from here the server is again unresponsive, but power is ON and fans are spinning. There appears to be no hdd activity. Then I have to do a hard shutdown via the power button and now it works? Upon reboot for the second time it boots up just fine, does a parity check (once with errors, twice with no errors), and works as it should until the next "freeze." Admittedly, I don't know a thing about Linux commands. I am running unRAID version 5.0.5. I've never run in to something like this before. Could anyone offer some guidance on what I should be looking at to try and understand where the problem might reside? Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 I forgot to mention that for this third "freeze" I've brought the system back up using the same process reflected in post above, however, I'm not going to start the array. I am trying to understand what next steps might be best for me to resolve the problem before doing so. Again, any help is appreciated. Link to comment
Squid Posted June 7, 2015 Share Posted June 7, 2015 You'll need to post your syslog http://lime-technology.com/wiki/index.php/Troubleshooting#Capturing_your_syslog Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 Thanks. The syslog is attached. unraid_sys_log.txt Link to comment
Squid Posted June 7, 2015 Share Posted June 7, 2015 I would do a couple of things First off would be to restart the system in safe mode and see if the problem continues. If it doesn't then odds on its somewhere related to a plugin It also wouldn't be a bad idea to run memtest on the system (at least one pass) Assuming you have a monitor connected to the server (which you need to do any of the above), next time it freezes, see if there's anything on the screen and post a picture of it. Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 Thanks. I don't have a monitor connected, but can certainly do that. Just so I'm clear, I can't do a memtest or run the Server in safe mode via telnet? I've never tried either before so it is new to me. Link to comment
trurl Posted June 7, 2015 Share Posted June 7, 2015 Thanks. I don't have a monitor connected, but can certainly do that. Just so I'm clear, I can't do a memtest or run the Server in safe mode via telnet? I've never tried either before so it is new to me. You select memtest or SAFE mode from the boot menu before unRAID starts. Link to comment
enetec Posted June 7, 2015 Share Posted June 7, 2015 The behaviour of turning 'automatically' on after powering down hard way is related to your bios settings. If it doesn't boot fine this way it should be better to change so it remains power off after a power failure (this is the related setting name usually). Instead... When the server apparently hangs, shares are still available? Can you ping it from Lan? Are you still able to login to console directly and/or by telnet? Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 Thanks for the help. I just powered down cleanly from the web interface to hook up a monitor to the server. When I powered back up I did notice the memtest so I thought I would cleanly power down again and try to run it on the next reboot. However, this time (and this is new behavior) when I attempted to power down clean it got hung again. The monitor screenshot is below: Fans are spinning and power is on, but I can't telnet in to it. I don't know of a clean way to power it down from here so I guess I'll have to force it again to try and run the memtest and safe mode to see what information that might provide. Link to comment
itimpi Posted June 7, 2015 Share Posted June 7, 2015 Strange. From the screen shot Linux has finished closing down (which is why the console and/or telnet will not work), but for some reason the machine has not been powered off. Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 I tried to do a Memtest, but until I read about it I wasn't aware that you might need to have it run for 24 hours. I had it run for a little while and everything apparently passed. See below: Also, this time, when I tried to go in to the BIOS and cleanly shut down the server it did just fine. Confusing. Anyway, here are the ACPI settings in the BIOS: Link to comment
enetec Posted June 7, 2015 Share Posted June 7, 2015 Strange. From the screen shot Linux has finished closing down (which is why the console and/or telnet will not work), but for some reason the machine has not been powered off. I got exactly the SAME behaviour when emht hanged and I still had "stock" powerdown. Installing Powerdown plugin v.2.16, allowed me to powerdown (by CLI, CTRL+ALT+CANC or power button) fine even with emhttp hanged. Reboot from CLI will work fine too and none of these will make unRAID to parity check on reboot. Having a "strong" reboot command, is very important for me, since my unRAID system is often more than 25 KM from me... Link to comment
JP Posted June 7, 2015 Author Share Posted June 7, 2015 Drat...I just realized I posted this in the unRAID 6.0 forum. I am using 5.0.5. I hope an admin can move this to the correct forum. I booted back and it is doing a parity check and finding a ton of sync errors. 1875 at the moment. Again, a couple days ago it was around 500. It seemingly fixes the sync errors, but I've got to say I don't remember getting these many sync errors until now. If the system is freezing abruptly could it be because something and failing and the sync errors are a byproduct or does this point to the parity drive itself being an issue? Now I just noticed that something appears to have spun down data drive #2? It now has a blinking green ball in the web interface, but there aren't any errors reflected. Is this likely the problem? The latest log file is attached is attached in case it helps. Most of the data in the syslog files are like greek to me. You'll see where unRAID spun down data drive #2 for some reason. syslog2.txt Link to comment
trurl Posted June 7, 2015 Share Posted June 7, 2015 Your disk2 appears to be smaller than the others. When unRAID has finished checking parity against a drive it will spin down but other larger drives will still be involved in the parity check. You have a lot of plugins, many of which are very old and no longer supported. After the parity sync is complete boot in SAFE mode and see if you still have any problems. You won't have any plugins running but that may be part of your problem. Yes, I know they have been doing fine up till now. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 Thanks trurl. Happy to hear the data drive spinning down is normal. I didn't really think I had that many plug-ins. I really only use Control Panel, Sabnzbd, Sickbeard, Plex, Plex Server Updater, and APCUPS. This thing is a mess and I really can't put my finger on where the issue is. It just finished a parity check and released the same number of corrected sync errors as I mentioned before (1875) so at least there wasn't any more. I let the server sit idle for a while and then copied a 2 gig file to the server. Boom, the same problem resurfaced. During the transfer the server froze up and now I'm left with forcing a hard shut down again. I'll try to boot it in to safe mode and see what happens there. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 Could someone walk me through how to boot in to safe mode with os 5.0.5? I thought it might be listed on the blue screen when you boot where memtest is listed, but it isn't, only memtest and unraid os. Link to comment
itimpi Posted June 8, 2015 Share Posted June 8, 2015 Could someone walk me through how to boot in to safe mode with os 5.0.5? I thought it might be listed on the blue screen when you boot where memtest is listed, but it isn't, only memtest and unraid os. That is probably because you have only updated by copying across the bzroot and bzimage files and have not got the latest syslinux and associated configuration file. The downloadable release now includes a syslinux folder and if you look at the syslinux.cfg file in there you will see there is an entry for running in Safe mode. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 That is probably because you have only updated by copying across the bzroot and bzimage files and have not got the latest syslinux and associated configuration file. The downloadable release now includes a syslinux folder and if you look at the syslinux.cfg file in there you will see there is an entry for running in Safe mode. Thank you,, and I'm sorry for asking what I suspect is a simple question for most, but my knowledge of Linux is very weak. So what exactly do I need to do to get a safe mode option to show up? Can I simply get the new release and copy the syslinux folder on to the flash drive? Link to comment
trurl Posted June 8, 2015 Share Posted June 8, 2015 You should be able to get the syslinux files from the latest v5 download. If you open up the zip you can see where they belong on your flash. You will probably also have to copy the make_bootable and run it. The point to running in SAFE mode is to skip installation of all addons. You can accomplish the same thing by simply renaming the extra, plugins, and config/plugins folders. If any of these don't exist that's OK. And also rename the config/go file and get the one from the zip file instead. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 Thanks trurl. Please bear with me as I ask what I know are probably very basic questions: - I'm at work and haven't actually looked at my own flash drive yet, but should I simply copy and paste the new syslinux (from unRAID's site) folder over the old syslinux folder? I guess I could just rename the older folder just in case something doesn't work right. - Also, I'm assuming you are saying why I need to run the make_bootable again is because it does something to the files in the syslinux folder...correct? - I just run the make_bootable batch file from windows in the root directory on the flash...correct? Link to comment
trurl Posted June 8, 2015 Share Posted June 8, 2015 Thanks trurl. Please bear with me as I ask what I know are probably very basic questions: - I'm at work and haven't actually looked at my own flash drive yet, but should I simply copy and paste the new syslinux (from unRAID's site) folder over the old syslinux folder? I guess I could just rename the older folder just in case something doesn't work right. - Also, I'm assuming you are saying why I need to run the make_bootable again is because it does something to the files in the syslinux folder...correct? - I just run the make_bootable batch file from windows in the root directory on the flash...correct? Yes all correct. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 Hey trurl. I followed the steps, but got a "not a com32r image" error. Should I have possibly used the original make_bootable file already on the flash? I did make a backup of the flash before changing anything. Also, there was no syslinux folder on the flash when I copied the new one to the flash. There was a syslinux file in the root directory of the flash. Link to comment
trurl Posted June 8, 2015 Share Posted June 8, 2015 Did you run make_bootable.bat as administrator? Try deleting the syslinux file in the root and do make_bootable again. Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 That did it. I deleted syslinux and syslinux.cfg from the root directory of the flash. I did copy and paste the make_bootable file and ran it as an admin this time, but I'm already an admin on this machine so I'm not sure if that was the issue. The resulting batch commands ran the same as before. So this is good news. I'll at least be able to test things away from the plugins in hopes of narrowing down what is causing the issue. All the drives look good (green solid ball). I guess I should just allow the parity check to run and then test things or is there a better recommendation before I wait on the parity check? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.