[SOLVED] Stopping array, disk1 never finished unmounting, webserver crashed


Recommended Posts

Hope this doesn't appear twice since I had a problem with attachment size limit and had to do over.

 

Everything seemed to be working fine until I tried to stop the array this evening. Disk1 never finished unmounting, and finally the web-server stopped responding. I can't access the shares on the network, but I can still telnet in OK and disk1 is in fact still mounted.

 

Even though I can't access the unRAID shares on the network, from telnet I was able to smbmount a share on a different non-unRAID NAS and copied to that NAS syslog, syslog.1, and a file on disk1 just to check it. I have not tried anything else so far so that is the current state of my unRAID server.

 

syslog.1 was very large due to copying a lot of music that day. There seemed to be a lot of rsync errors logged when mover ran that first time. Don't know if that is normal or not. I checked one of the files that it logged an error for and it seems to have been copied to disk1 OK.

 

I had to omit part of that log due to attachment size limit. Only the first and last minute of mover execution is included, but the missing log entries were very similar. My hardware is here, unRAID 4.7, logs zipped and attached.

 

TIA

syslog.zip

Link to comment

Anything else I can do to help diagnose the problem? I haven't done anything with the server since capturing the logs last night, in case anyone had any additional ideas about how to proceed.

 

What should I do to get this back up? Just reboot? Try to umount disk1? Try to run shutdown?

 

Thanks for your attention.

 

Link to comment

unRAID can not un-mount a disk if it is busy.  A disk is busy if a file on it is open, or, if it is the current director of a process.  (If you "cd" there, it is your login shell's current directory.  if you start a process when "cd'd there, it is that processes current directory.  If you've mounted a disk there, it is the home of a mount-point and it will be unable to be un-mounted.  If the "mover" script is writing a file to that disk, it is busy until the file is written.

 

You can use the "fuser" command to learn what is keeping a disk busy.

/usr/bin/fuser -mv /mnt/disk* /mnt/user/*

 

Now, while the disk is "busy" unRAID will loop every few seconds trying to see if it is no longer busy.  Eventually the log entries it makes will use up all available memory and you'll see other processes killed off in an attempt by the linux kernel to free up more memory.  It typically frees process it thinks are idle the longest, and one of those is typically emhttp, the other the samba processes.

 

Easiest will be stop kill the process holding the disk busy, then stop the array.  If "emhttp" was killed, you can re-start it to get to the management interface by typing:

nohup /usr/local/sbin/emhttp &

 

Then, stop the array and reboot.

 

Joe L.

 

Joe L.

 

 

Link to comment

Thanks.

 

Not only did this work, but your explanation helped me understand what was going on.

 

The process that was preventing unmount was mysqld.

 

So, do I need to manually kill mysqld every time I want to stop the array, or is there a better way?

 

Link to comment

Would the powerdown package add-on in unmenu solve this problem?

It would kill the mysql program, but only if you use it.  (The stock management interface would not use it). 

It is up to you if that is how you want to stop it.  (for some programs it is better to gracefully stop them rather than just kill their process.)

 

Best eventually it to use the events built into unRAOD 5.0 but they are VERY VERY crude now... and as far as I know, nobody is using them yet.

 

For now, stop the process first, or have it running on a disk not in the protected array/cache.

 

Joe L.

Link to comment
  • 1 month later...

Hello all,

 

I'm having the same problem. Yesterday, I added a disk to preclear and started it. No problem.

 

This morning, around 22 hours in, I tried to move over some pictures - 1.3mb - and it took 20 (twenty) minutes.

 

I didn't want to interrupt the preclear, so I went out and did some stuff. I got an email the preclear was done, great. At this point I can't even browse the server, but telnet seems fine.

 

I stopped the array, which stopped normally. Then I shut it down. I then woke it up using the routers WOL. Fine.

 

I use the webgui to try and stop the array, so I can add the new disk. It won't unmount the disks, and it's filling the syslog about it...

 

er, ok. As I was typing this, it finally stopped, but it dumped about 2600 lines into the syslog, and it's usually more like 700-900.

 

I read the posts previous to mind, butI don't know how to identify which drive, from the syslog, is causing the issue to be able to the use fuser command.

 

Any way to identify what's cause the problem? The only thing I've added recently was SNAP. I can live without it, if that is indeed the root cause.

 

thank you!

syslog-2011-05-21.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.