stevoith Posted March 14, 2015 Share Posted March 14, 2015 Hello folks, I upgraded to 6.0-beta14b the other day and since then I've been having a nightmare. Slowly things seems to be hanging up, first Sickbeard would hang when looking at certain programs, then it stopped responding all together but no matter what I did I couldn't get the container to die. If I cd into /dev/user/TV and do an ls it just hangs. Plex continued to run so I ignored it, but after a couple of hours plex has stopped working, the main gui will not respond at all now either. I can still ssh in, so I rebooted. the server didn't reboot. fired up another ssh, reboot -f didn't reboot. fired up another ssh, reboot -n didn't reboot. halt didn't reboot it tried kill -9 on init.. Ie. root@NAS-server:/etc/rc.d# kill -9 1 root@NAS-server:/etc/rc.d# kill -9 1 root@NAS-server:/etc/rc.d# kill -9 2 root@NAS-server:/etc/rc.d# kill -9 2 root@NAS-server:/etc/rc.d# kill -9 2 root@NAS-server:/etc/rc.d# kill -9 31373 root@NAS-server:/etc/rc.d# kill -9 31373 root@NAS-server:/etc/rc.d# kill -9 31373 root@NAS-server:/etc/rc.d# kill -9 31373 root@NAS-server:/etc/rc.d# ps -ef | grep reboot root 7663 16402 0 Mar13 pts/4 00:00:00 reboot -f root 16733 14190 0 00:20 pts/6 00:00:00 grep reboot This thing won't die something is stopping this thing from kill processes and I can't get it to restart, the server is in the loft so I don't want to go up there and just pull the power. Is there some way of forcing a crash dump from an ssh session on this box ? or getting any kind of usefull information for the dev team before I go pulling the power ? Or is there a way to force it to reboot, in the old Solaris days I'd of done a uadmin 2 0, but that doesn't exist on linux Any advice welcomed Thanks Steve Quote Link to comment
PCRx Posted March 14, 2015 Share Posted March 14, 2015 try rebooting with: shutdown now -r Quote Link to comment
stevoith Posted March 14, 2015 Author Share Posted March 14, 2015 nope root@NAS-server:~# shutdown now -r Broadcast message from root@NAS-server (pts/6) (Sat Mar 14 09:58:01 2015): The system is going down for reboot NOW! root@NAS-server:~# ps -ef | grep shutdown root 10652 1 0 00:00 ? 00:00:00 shutdown -r 0 w root 10955 1 0 00:00 ? 00:00:00 shutdown -r 0 w root 14060 1 0 00:06 ? 00:00:00 shutdown -h 0 w root 22270 22247 0 09:58 pts/6 00:00:00 shutdown 0 w -r root 22356 22322 0 09:58 pts/7 00:00:00 grep shutdown root@NAS-server:~# it's impossible to kill this thing Quote Link to comment
Helmonder Posted March 14, 2015 Share Posted March 14, 2015 As much as you can stop writing to the server... (so stop sickbeard/couchpotatato and the likes as much as possible). if you can still logon to the console copy the syslog for checking latereon: cp /var/log/syslog /mnt/disk1/ Then press the button on the server and hold it.. it will forcefully go down.. You run a risk of parity errors so the system will force a parity check to start after reboot. This is not something you like to do but I think everyone has had to do it one time or another.. As far as your experiences... From what release did you upgrade ? Quote Link to comment
stevoith Posted March 15, 2015 Author Share Posted March 15, 2015 Hello I upgraded from 6-12b to 14, not a massive jump. I know what the problem was if it helps, or I suspect anyway. so during boot I noticed this :- kernel BUG at fs/btrfs/inode.c:3123! invalid opcode: 0000 [#1] SMP Modules linked in: md_mod it87 hwmon_vid k10temp r8169 mii sata_sil i2c_piix4 ahci libahci CPU: 2 PID: 7466 Comm: btrfs-cleaner Not tainted 3.18.5-unRAID #3 If I tried to cd into /mnt/cache directory and do anything in there the command would hang. which explained why trying to stop the array was hanging to, the sync commnd ges stuck and you cant kill -9 it. also couldn't get the mover to work, or access any of the shares that used the cache device. So I edited /boot/config/disks.cfg and changed the autostart from yes to no and removed all the lines pertaining to cache and rebooted. server came back no errors. so I put the cache disk back left it at auto and it chose btrfs. As soon as I started to access the cache disk I got the stack trace from BTRFS and I was back in the same position. rebooted again, autostart still set to no. removed the cache disk again and added it back again this time as xfs and it now works a treat. The strange thing is I have 4 smaller ssd's lumped into a stripe and they have btrfs on them this is where I save all the docker programs etc. and even though thats BTRFS it works fine. Thanks Steve Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.