how do I force reboot unraid server?


Recommended Posts

Hello folks,

 

I upgraded to 6.0-beta14b the other day and since then I've been having a nightmare.

 

Slowly things seems to be hanging up, first Sickbeard would hang when looking at certain programs, then it stopped responding all together but no matter what I did I couldn't get the container to die.

 

If I cd into /dev/user/TV and do an ls it just hangs.

 

Plex continued to run so I ignored it, but after a couple of hours plex has stopped working, the main gui will not respond at all now either.

 

I can still ssh in, so I rebooted. the server didn't reboot.

fired up another ssh, reboot -f didn't reboot.

fired up another ssh, reboot -n didn't reboot.

halt didn't reboot it

tried kill -9 on init..

 

Ie.

root@NAS-server:/etc/rc.d# kill -9 1

root@NAS-server:/etc/rc.d# kill -9 1

root@NAS-server:/etc/rc.d# kill -9 2

root@NAS-server:/etc/rc.d# kill -9 2

root@NAS-server:/etc/rc.d# kill -9 2

root@NAS-server:/etc/rc.d# kill -9 31373

root@NAS-server:/etc/rc.d# kill -9 31373

root@NAS-server:/etc/rc.d# kill -9 31373

root@NAS-server:/etc/rc.d# kill -9 31373

root@NAS-server:/etc/rc.d# ps -ef | grep reboot

root      7663 16402  0 Mar13 pts/4    00:00:00 reboot -f

root    16733 14190  0 00:20 pts/6    00:00:00 grep reboot

 

This thing won't die something is stopping this thing from kill processes and I can't get it to restart, the server is in the loft so I don't want to go up there and just pull the power.

 

Is there some way of forcing a crash dump from an ssh session on this box ? or getting any kind of usefull information for the dev team before I go pulling the power ?

 

Or is there a way to force it to reboot, in the old Solaris days I'd of done a uadmin 2 0, but that doesn't exist on linux :)

 

Any advice welcomed

 

Thanks Steve

Link to comment

nope :)

 

root@NAS-server:~# shutdown now -r

 

Broadcast message from root@NAS-server (pts/6) (Sat Mar 14 09:58:01 2015):

The system is going down for reboot NOW!

 

root@NAS-server:~# ps -ef | grep shutdown                                                                                 

root    10652    1  0 00:00 ?        00:00:00 shutdown -r 0 w

root    10955    1  0 00:00 ?        00:00:00 shutdown -r 0 w

root    14060    1  0 00:06 ?        00:00:00 shutdown -h 0 w

root    22270 22247  0 09:58 pts/6    00:00:00 shutdown 0 w -r

root    22356 22322  0 09:58 pts/7    00:00:00 grep shutdown

root@NAS-server:~#

 

it's impossible to kill this thing :)

 

 

Link to comment

As much as you can stop writing to the server... (so stop sickbeard/couchpotatato and the likes as much as possible).

 

if you can still logon to the console copy the syslog for checking latereon:

 

cp /var/log/syslog /mnt/disk1/

 

Then press the button on the server and hold it.. it will forcefully go down.. You run a risk of parity errors so the system will force a parity check to start after reboot.

 

This is not something you like to do but I think everyone has had to do it one time or  another..

 

As far as your experiences... From what release did you upgrade ?

Link to comment

Hello I upgraded from 6-12b to 14, not a massive jump.

 

I know what the problem was if it helps, or I suspect anyway.

 

so during boot I noticed this :-

 

kernel BUG at fs/btrfs/inode.c:3123!

invalid opcode: 0000 [#1] SMP

Modules linked in: md_mod it87 hwmon_vid k10temp r8169 mii sata_sil i2c_piix4 ahci libahci

CPU: 2 PID: 7466 Comm: btrfs-cleaner Not tainted 3.18.5-unRAID #3

 

If I tried to cd into /mnt/cache directory and do anything in there the command would hang.

 

which explained why trying to stop the array was hanging to, the sync commnd ges stuck and you cant kill -9 it.

 

also couldn't get the mover to work, or access any of the shares that used the cache device.

 

So I edited /boot/config/disks.cfg and changed the autostart from yes to no and removed all the lines pertaining to cache and rebooted.

 

server came back no errors.

 

so I put the cache disk back left it at auto and it chose btrfs. As soon as I started to access the cache disk I got the stack trace from BTRFS and I was back in the same position.

 

rebooted again, autostart still set to no. removed the cache disk again and added it back again this time as xfs and it now works a treat.

 

The strange thing is I have 4 smaller ssd's lumped into a stripe and they have btrfs on them this is where I save all the docker programs etc. and even though thats BTRFS it works fine.

 

Thanks Steve

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.