bfeist Posted November 8, 2009 Share Posted November 8, 2009 Hi all, I've noticed that seemingly intermittently, unRAID refuses to stop. The main console will get stuck unmounting one of the drives. It's not always the same drive that gets stuck. I've taken a tail of the syslog below (this time it was my cache drive that won't unmount). It just keeps repeating the same thing. If i issue a shutdown -h now via telnet then it does reboot. This has happened with 4.5beta7 and 4.5beta8. Anyone know what's happening? My motherboard is a GA-MA74GM-S2 which has been given a full compatibility green light here previously. One interesting note: If I unplug the ethernet cable from the unRAID server, then plug it back in, the unmount is successful. Perhaps samba is stuck with a file being left open? Thanks for any suggestions, Ben syslog snippet: Nov 8 08:24:58 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:24:59 Tower emhttp: shcmd (234): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:24:59 Tower emhttp: _shcmd: shcmd (234): exit status: 1 Nov 8 08:24:59 Tower emhttp: shcmd (235): rmdir /mnt/cache >/dev/null 2>&1 Nov 8 08:24:59 Tower emhttp: _shcmd: shcmd (235): exit status: 1 Nov 8 08:25:00 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:25:01 Tower emhttp: shcmd (236): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:25:01 Tower emhttp: _shcmd: shcmd (236): exit status: 1 Nov 8 08:25:01 Tower emhttp: shcmd (237): rmdir /mnt/cache >/dev/null 2>&1 Nov 8 08:25:01 Tower emhttp: _shcmd: shcmd (237): exit status: 1 Nov 8 08:25:02 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:25:03 Tower emhttp: shcmd (238): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:25:03 Tower emhttp: _shcmd: shcmd (238): exit status: 1 Nov 8 08:25:03 Tower emhttp: shcmd (239): rmdir /mnt/cache >/dev/null 2>&1 Nov 8 08:25:03 Tower emhttp: _shcmd: shcmd (239): exit status: 1 Nov 8 08:25:04 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:25:05 Tower emhttp: shcmd (240): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:25:05 Tower emhttp: _shcmd: shcmd (240): exit status: 1 Nov 8 08:25:05 Tower emhttp: shcmd (241): rmdir /mnt/cache >/dev/null 2>&1 Nov 8 08:25:05 Tower emhttp: _shcmd: shcmd (241): exit status: 1 Nov 8 08:25:06 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:25:07 Tower emhttp: shcmd (242): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:25:07 Tower emhttp: _shcmd: shcmd (242): exit status: 1 Nov 8 08:25:07 Tower emhttp: shcmd (243): rmdir /mnt/cache >/dev/null 2>&1 Nov 8 08:25:07 Tower emhttp: _shcmd: shcmd (243): exit status: 1 Nov 8 08:25:08 Tower emhttp: Retry unmounting disk share(s)... Nov 8 08:25:09 Tower emhttp: shcmd (244): umount /mnt/cache >/dev/null 2>&1 Nov 8 08:25:09 Tower emhttp: _shcmd: shcmd (244): exit status: 1 Nov 8 08:25:09 Tower emhttp: shcmd (245): rmdir /mnt/cache >/dev/null 2>&1 Link to comment
wholly Posted November 8, 2009 Share Posted November 8, 2009 Use the lsof tool to see if you have files open on a share. The latest versions will block waiting for the file to be closed (for safety reasons) before shutting down. There are many more threads here about this situation that a search will turn up. Rob Link to comment
Joe L. Posted November 16, 2009 Share Posted November 16, 2009 In that syslog extract it appears as if you have either an open file on the cache drive being read or written... OR you have a process that you started when your current directory was /mnt/cache OR you have changed directory to /mnt/cache and it is your current directory. To find the open files and or processes keeping the cache drive "busy" and unable to be un-mounted, type: lsof /mnt/cache or lsof /dev/sdX (where sdX is your cache drive device) Link to comment
BW Posted December 2, 2009 Share Posted December 2, 2009 In that syslog extract it appears as if you have either an open file on the cache drive being read or written... OR you have a process that you started when your current directory was /mnt/cache OR you have changed directory to /mnt/cache and it is your current directory. To find the open files and or processes keeping the cache drive "busy" and unable to be un-mounted, type: lsof /mnt/cache or lsof /dev/sdX (where sdX is your cache drive device) Hi, I have zero knowledge with linux. And have some question regarding "Can't stop the server" Is this scrip will list the open file only or also unmount the server? I tried to type them while some files from the server open but it says "no such file or directory" I could not stop the server once and found out my pc with w7 is open. Is this the problem? Is there a way to stop the server manually from the telnet console in case the one from the menu not working? Thanks! Link to comment
EdgarWallace Posted April 30, 2010 Share Posted April 30, 2010 Hi, the same here - it's always the same drive that is stuck with "UNMOUNT". I'm running 4.5.3 and what what do I need to do if I discover (using lsof) that there is a file open? It's always the same story (most probably since I have activated the Cache drive), I'm trying to stop & shutdown via web interface and have to hard reset the server after the above message is showing up. If I switch off the server it will drive a Parity Check each time on reboot. Last time my users were not existing any more so that I wasn't able to get into the system via AFP....with that my Timemachine isn't working.... Where should I start first it's kind of frustrating. Link to comment
Joe L. Posted April 30, 2010 Share Posted April 30, 2010 Hi, the same here - it's always the same drive that is stuck with "UNMOUNT". I'm running 4.5.3 and what what do I need to do if I discover (using lsof) that there is a file open? It's always the same story (most probably since I have activated the Cache drive), I'm trying to stop & shutdown via web interface and have to hard reset the server after the above message is showing up. If I switch off the server it will drive a Parity Check each time on reboot. Last time my users were not existing any more so that I wasn't able to get into the system via AFP....with that my Timemachine isn't working.... Where should I start first it's kind of frustrating. Basically, the array will not stop if a disk is busy. 1. A disk is busy if a file on it is in use or 2. A disk is busy if it is the "current directory" for any process. or 3. A disk is busy if another disk has been mounted on a mount-point (a directory) on it. the lsof command will not detect the third situation. (since no open files exist) Situation #2 could even be your login. If you type "cd /mnt/disk2" you will then have disk2 as your current directory and it can not be un-mounted. You can type fuser -cu /dev/md1 fuser -cu /dev/md2 fuser -cu /dev/md3 fuser -cu /dev/md4 fuser -cu /dev/md5 fuser -cu /dev/md6 for each of your disks in turn to identify the process holding a disk busy. If you have any add-on-processes, you'll want to stop them. They might be keeping your disks busy. The basic method is to stop any add-on processes you might have running that might be keeping a disk busy. If you are trying to shut down the array you might install WeeboTech's "powerdown" add-on. It will check for and terminate processes holding disks busy prior to cleanly stopping the array and powering down. You would invoke it as /sbin/powerdown Joe L. Link to comment
EdgarWallace Posted April 30, 2010 Share Posted April 30, 2010 Joe, thank you very much. Actually I used WeeboTech's "powerdown" add-on in my go script and everything went fine. But it switched off my server 11pm and that wasn´t always a good choice. So I removed it from the go script and since that time I had the issues.... Btw. some music files were opened and here is the log: root@Tower:~# /sbin/powerdown Capturing information to syslog. Please wait... version[4065]: Linux version 2.6.32.9-unRAID (root@Develop) (gcc version 4.2.3) #1 SMP Fri Feb 26 19:35:20 MST 2010 ls: cannot access /dev/hd[a-z]: No such file or directory ls: cannot access /dev/hd[a-z]: No such file or directory /etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect /etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect status[4170]: State: STARTED status[4170]: D# Model / Serial Status Device status[4170]: 0 WDC WD15EARS-00 / WD-WCAVY2530562 DISK_OK sda status[4170]: 1 WDC WD15EARS-00 / WD-WCAVY2657059 DISK_OK sdb status[4170]: 2 WDC WD2500JS-40 / WD-WCANY1940426 DISK_OK sdc status[4170]: SMART overall health assessment ls: cannot access /dev/hd[a-z]: No such file or directory status[4170]: /dev/sda: Device is in STANDBY mode, exit(2) status[4170]: /dev/sdb: Device is in STANDBY mode, exit(2) status[4170]: /dev/sdc: Device is in STANDBY mode, exit(2) status[4170]: /dev/sdd: SMART Health Status: OK status[4170]: /dev/sde: SMART overall-health self-assessment test result: PASSED status[4170]: ACTIVE PIDS on the array status[4170]: root 3644 3641 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3652 3644 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3653 3652 0 07:16 ? 00:00:07 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3658 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3660 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3662 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3664 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3669 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3670 3652 0 07:16 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config status[4170]: root 3891 3652 0 13:01 ? 00:00:00 /boot/mediatomb/usr/bin/mediatomb -m /boot/mediatomb -f config Removing old syslog: /boot/logs/syslog-20100425-030435.txt Saving current syslog: /boot/logs/syslog-20100430-152901.txt -rwxrwxrwx 1 root root 149266 Apr 30 15:29 /boot/logs/syslog-20100430-152901.txt adding: syslog.txt (deflated 84%) Broadcast message from root (pts/0) (Fri Apr 30 15:29:01 2010): The system is going down for system halt NOW! root@Tower:~# Connection to 192.168.0.1 closed by remote host. Connection to 192.168.0.1 closed. The only issue now is that I can´t see any user (except root) in the web console but this might be another subject and off topic for this thread. Thanks again. [Edit]Guide to include the execution of the above command into a menu entry of unMENU: http://lime-technology.com/forum/index.php?topic=5475.15 Reply #27 Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.