bubbaQ Posted August 7, 2010 Share Posted August 7, 2010 If you try to stop the array, and it can't stop, say, because of some open files, it ties up the UI, so you can't get to any other tabs.... which is where the utilities are (or will be) to close/kill processes so you can have a clean shutdown. Not sure how to get around that... but just something to think about. In the mean time, the next bubbaTools adds a small widget to the Array Status page, that indicated if any files are open.... seeing open files indicated there at least will give me fair warning that if I press stop, the array won't shut down. Link to comment
limetech Posted August 10, 2010 Share Posted August 10, 2010 Yes this is a tough problem. I can add some code to grey-out the Stop button if a callout script returns false. Then we can create a list of conditions where array won't stop. Usually it's because of open files (which can be detected), or some process with cwd set to a mounted disk's path (which can be detected). The hardest thing to detect is if a file in a mounted disk path is itself mounted via loopback. This issue has been around a while... The reason code just 'loops' is to keep user from losing data as a result of emhttp killing processes or forcibly unmounting. Link to comment
bubbaQ Posted August 10, 2010 Author Share Posted August 10, 2010 This issue has been around a while... The reason code just 'loops' is to keep user from losing data as a result of emhttp killing processes or forcibly unmounting. Understood. Perhaps when shutting down, and having to loop, you time it out, or provide a "stop trying" button, so we could have the UI back... particularly since there can be plugin utilities to close open files so you can get a clean shutdown. Link to comment
limetech Posted August 10, 2010 Share Posted August 10, 2010 What makes this "messy" is that by the time stop code gets around to unmounting the disks, it has already shutdown all the network services (nfs, samba, etc). Also, unmounting takes place in parallel, so if one disk won't unmount, this can obviously be detected, but now we're left with a state where all the 'other' disks are unmounted and all the network services are shut down. In other words, we're in-between states. So do I bring everything back up? I was trying to avoid writing a lot more PHP code that handles proper operation and display of pages when "in-between" states Link to comment
bubbaQ Posted August 10, 2010 Author Share Posted August 10, 2010 In other words, we're in-between states. So do I bring everything back up? I was trying to avoid writing a lot more PHP code that handles proper operation and display of pages when "in-between" states Understood.... but being stuck in an infinite look that locks you out of the tools to use to allow the loop to terminate, is a bit Kafkaesque. Link to comment
limetech Posted August 10, 2010 Share Posted August 10, 2010 Ok, you made me look up Kafkaesque... LOL, you don't like my solution? The reason it's like this is because most users are using their server as an appliance - ie, not logging in & mucking around with bash, and not running add-ons - and hence never see this problem. For those of you who have made customizations and forget to shut them down, well this is your punishment Seriously.. yes this needs to change. Link to comment
Joe L. Posted August 10, 2010 Share Posted August 10, 2010 What makes this "messy" is that by the time stop code gets around to unmounting the disks, it has already shutdown all the network services (nfs, samba, etc). Also, unmounting takes place in parallel, so if one disk won't unmount, this can obviously be detected, but now we're left with a state where all the 'other' disks are unmounted and all the network services are shut down. In other words, we're in-between states. So do I bring everything back up? I was trying to avoid writing a lot more PHP code that handles proper operation and display of pages when "in-between" states I think all you need is port 80 (or whatever port emhttp is listening on) nfs, samba are not needed. One possibility is to present a unique page to handle the stop/kill/un-mount, etc of processes that do not terminate in a timely manner. Link to comment
bubbaQ Posted August 12, 2010 Author Share Posted August 12, 2010 How about an event callout when emhttp has to retry unmounting? Particularly if you can count the loops, and do a callout after every some interval, such as after 10 retries. Link to comment
bubbaQ Posted August 13, 2010 Author Share Posted August 13, 2010 Here's another idea.... In the stop callout, I could launch a watchdog program in the background, that has been configured with: - initial timeout period - list of file that are safe to kill - secondary timeout period - list of programs to never kill After it is launched in the background, it will: - Wait the initial timeout period - Kill "safe to kill" processes that have open files on the array and no children. It could stop there... or optionally it could continue: - wait the secondary timeout period. - Kill other processes not listed in the "never kill" list with open files on the array (checking for children). - umount any loopbacks on the array. Link to comment
limetech Posted August 13, 2010 Share Posted August 13, 2010 The correct solution to this is going to be well-designed plugins. Link to comment
BRiT Posted August 13, 2010 Share Posted August 13, 2010 Does a well designed plugin prevent the user from doing a "cd /mnt/disk1/" at a shell level? Link to comment
bubbaQ Posted August 13, 2010 Author Share Posted August 13, 2010 The correct solution to this is going to be well-designed plugins. Well, that's a theory vs practice thing. I agree with the theory, and in large part in application, but it is not a 100% solution. For example, open terminal sessions.... ."user errors" Many applications that have stock scripts, that record PIDs of the running process when starting, and use that pid file to stop the app, but are pretty stupid if the pid file is lost. The "stop" code in an application start|stop|restart script needs to make SURE the app stops, such as testing the existence of the PID file, and testing if the process is still alive after a "nice" stop command.... and escalate from a sighup to a sigkill. Devs need to closely examine, and recode if needed, the start|stop|restart scripts for their plugins.... stock that comes with the application may likely not be sufficient for unRAID. And just to be clear, I'm not looking for unRAID doing this on its own... I was just thinking out loud about possible approaches for a "Shutdown Watchdog" plugin. About half the time I go to shutdown unRAID, it fails and I have to go hunting for what the problem is.... (i.e user error)... so I'm using a script to kill stuff with open files now. Link to comment
limetech Posted August 13, 2010 Share Posted August 13, 2010 Does a well designed plugin prevent the user from doing a "cd /mnt/disk1/" at a shell level? It does if the telnet app is a plugin. But yes I see your point. I want to propose a slight shift in thinking. If you think of an unRAID server as an "appliance" instead of a "general purpose server", things get simplified. For example, with the correct set of plugins, you should never have to telnet into the server (or use the console), and thus 'cd /mnt/disk1' should never happen. My goal with 5.0 is exactly that: an extensible NAS appliance. Link to comment
bubbaQ Posted August 13, 2010 Author Share Posted August 13, 2010 If you think of an unRAID server as an "appliance" instead of a "general purpose server", things get simplified. I'll buy that... you could even go as far as the PCH folks did, and remove telnet from the appliance. But I think I failed in communicating my goal -- I am doing a "Shutdown Watchdog" plugin for *me* because I'm a dev.... just like Lundman developed a procedure to get telnet going on the pch. There are going to be plugins that, whether unRAID is an appliance or not, are aimed at making a dev's life easier, and are not necessarily something for a generic "appliance" user. I posted a couple of ideas so other devs could kick the tires. I have long-encouraged the "appliance' model for unRAID, just like the other NAS boxes -- but this idea was just for a dev tool. Link to comment
WeeboTech Posted August 13, 2010 Share Posted August 13, 2010 If you think of an unRAID server as an "appliance" instead of a "general purpose server", things get simplified. I'll buy that... you could even go as far as the PCH folks did, and remove telnet from the appliance. But I think I failed in communicating my goal -- I am doing a "Shutdown Watchdog" plugin for *me* because I'm a dev.... just like Lundman developed a procedure to get telnet going on the pch. There are going to be plugins that, whether unRAID is an appliance or not, are aimed at making a dev's life easier, and are not necessarily something for a generic "appliance" user. I posted a couple of ideas so other devs could kick the tires. I have long-encouraged the "appliance' model for unRAID, just like the other NAS boxes -- but this idea was just for a dev tool. In this case, the shutdown watchdog would be useful. I'm sure if there are plugins that hang the server. a community member will report it and it will be fixed. But for those who are hacking away, this could be a useful helper. Link to comment
Kaygee Posted August 15, 2010 Share Posted August 15, 2010 "you could even go as far as the PCH (Syabas) folks did, and remove telnet from the appliance." Only for some nice whitehat (Lundman) to hack the kernel and add it back...Cheers Mr Lundman BTW! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.