[Solved] Unraid 6.01 - webgui crashes - Unraid 6.1 Also


Recommended Posts

Limetech,

 

No Dockers or plugins are running...completely stock unraid 6.01.

I am not sure how to define the "Crash"  other than the webgui becomes unresponsive...If I go to tower my web browser wait icon just spins.

 

Look at the attached htop screenshot (you may need to download and zoom in to read)  https://www.dropbox.com/s/wuyb33vm4zgr1kt/unraidhtop.jpg?dl=0

 

Memory and CPU have a high utilization...maybe that is keeping the webgui from running properly???

 

Checkout the attached unraid_crash_ps-A.txt file...it shows the output of ps -A and ps -ef.  For the first time I counted...there are 255 instances of smbd running.  Why would this many processes be running???

 

Thanks again for your assistance!

 

Dan

 

Yes a large number of smbd processes does not seem right.  What is using the storage?  Meaning, I'm guessing you have a windows PC or other o.s. accessing unRaid shares.  What exactly is it?

Link to comment

Limetech,

 

I have one single Windows 7 64 bit professional PC connecting to unraid via samba.  That windows 7 PC is running SageTV which records TV shows to unraid and Plex which I use to playback media on several devices..tablets, smartTV's, and a couple of roku players.

 

I am utilizing a cache drive, and the mover script moves things at 3:00am.

 

Any suggestions on how to troubleshoot this?

 

 

 

 

Link to comment

One thing to check first.  Please run a 'File system check' on your devices.  Stop array and then Start in Maintenance mode.

 

You can then click each device on the Main page and click the 'Chec' button under Check Filesystem Status section.

 

Assuming all file systems are good - next thing would be to have a terminal window open with htop running.  Then fire up the applications you're running on your windows PC and see if you can observe any correlation between something started and the number of 'smbd' processes created.

Link to comment

Limetech,

 

I checked each disk in the array and the cache drive and the file system check reported no corruptions found for all of the drives.

 

Earlier you asked about what was connecting to unraid.  I thought of two other applications: Lightroom, I import photos directly to a photo unraid share and I also use MCEBuddy to convert my SageTV recordings from Mpeg-2 to Mpeg-4.

 

Please keep in mind that all of these applications, including plex and SageTV mentioned in an earlier email, were present and working in unraid 5 for well over a year.  The crashes started as soon as I upgraded to unraid 6.01. 

 

Is there any other way to log what is spawning the smbd processes.  I can try what you are suggesting but the crashes seem random, and I don't know what fires them off.  I can tell you that all but one of them occurred during the night after the mover script ran...as I always see that in my syslog just before the unresponsive GUI.

 

Dan

Link to comment

Limetech,

 

I may have caught my server on its way to a crash...normal memory usage (2GB on my server) is at about 10 to 15% in htop.  When I got home from work, htop was showing 40% usage and I watched and it was slowly increasing  (went from 500mbytes to 760 megabytes of overall memory usage in an hour or so).  There were a bunch of smbd processes listed in htop.  The webgui was very sluggish...but after a minute or so the webgui refreshed.  During that hour I closed PLEX, SageTV, MCEBuddy and Lightroomon on the windows 7 PC and ended any associated services to ensure that is was not using any samba connections and the memory usage still climbed.  I then shutdown the Windows 7 PC for two hours or so...the memory usage did not climb it maintained at 760 megabytes of overall usage.  Ultimately I tried to shutdown unraid via the webgui, but it did not shut down, so I had to do a manual reboot.

 

Is there any way to use smbtree to debug the samba connection?  Does unraid kill unused processes after a certain amount of time (my Win 7 PC was off for 2 hours and no processes were killed).  Does unraid limit the number of smbd processes?  If not can it until we understand what is causing them to spawn off.

 

Thanks,

Dan

Link to comment

Update:  latest crash shows two processes in htop that are each utilizing 100% of cpu.  Based on the time they started around the time that the mover script runs.  Both processes have the same command:

 

/use/local/sbin/shfs /mnt/user0 -disks 494 -o noatime,big_writes,allow_other

 

There are also 100+ smbd processes running.

 

As usual any suggestions are appreciated!

 

Dan

Link to comment

Still Crashing...less than 24 hours this time...no where near the mover script.  Its crashed now...memory is only at 325megabytes used...cpu usage keeps showing the following line as using 200%??????  Message is a little different than a few posts ago. 

 

/use/local/sbin/shfs /mnt/user0 -disks 495 2048000000 -o noatime,big_writes,allow_other -o remember=330

 

There are also 50+ smbd processes running,  more instances of smbd keep showing up and memory usage increasing???  However when I rebooted my windows 7 machine...the memory freed up, however the same process above keeps indicating 200% cpu

 

Will manual reboot again so I can get the diagnostics file.  Again...this never happened in Unraid 5...I am really frustrated.

 

Dan

Link to comment

Thanks Tom...I will need to do a manual reboot...which will kick off a parity check... I assume that I should let this complete before performing the upgrade?  Unless I hear otherwise, that is what I will do.  I didn't realize 6.1 stable was out.

 

Dan

 

In general to shut things down from the command line this should do it:

 

(logged in at console or telnet session)

 

/etc/rc.d/rc.docker stop

samba stop

sync

umount /mnt/*

mdcmd stop

 

If that doesn't do it you might have other programs hanging onto file handles, or VM's or plugins that are not shut down.

 

Link to comment

Limetech,

 

After 8 hours Unraid 6.1 crashed.  Same exact symptoms as described in this thread for unraid 6.01. 

 

I attempted the manual shutdown you described...I typed the first command in the console and telnet locked up and the command never finished :  /etc/rc.d/rc.docker stop

 

I kept a log window open to see if it captured anything, attached is a screen capture of that window.  If you look at the end of the screen capture you will see 3 instances of telnet.  Instance #1 = the rc.docker stop above that never completed, Instance #2 - I typed diagnostics and I got the message Starting diagnostics collection, however it never completed, Instance #3 - is my telnet login this morning to just see if I could connect to unraid to see what was going on since Instance 1 and 2 locked up my telnet session.  Its interesting to note that unraid is still functioning in some manner as the third instance this morning shows up in the log window.  I will also include a .jpg screenshot of htop showing the 200% cpu usage.

 

https://www.dropbox.com/s/vjo0bs6bcjnhahz/unraid_logwindow.jpg?dl=0

https://www.dropbox.com/s/wuyb33vm4zgr1kt/unraidhtop.jpg?dl=0

 

Please note that prior to instance #1 above, I did run diagnostics in a telnet console and it completed, so I should have a diagnostics file for this crash.  I have not manually rebooted yet (which I need to do to get this diagnostics file...unless someone can tell me an easy way to grab it or send it somewhere via command line).  I am always hesitant to manually reboot in case there is something that someone on the forums would like me to try in this crashed state.

 

Again any help is appreciated!

 

 

 

 

 

Link to comment

Interesting development...I left my unraid 6.1 server in the crashed state with a log window open and additional items were written to the log...many hours after the original crash.  I am hoping this will provide additional details to work towards a solution and/or better understanding of what is happening.

 

Look at the log starting at 9:20 this morning...a lot going on...starts with the line below:

 

Sep 3 09:20:20 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

 

My telnet session at 13:08 stopped responding...I had htop running...CPU usage was no longer at 100%...more like 28%...tried to close htop and it would not return to command prompt.  Attempted to open a new telnet session and it no longer connects.  I have left Unraid in this state.

 

Dan

 

 

 

unraid_logwindow9_3_15.txt

Link to comment

I have been having issues with the WEBUI hanging and not responding as well. I can always telnet in though. My shares DO go offline when the WEBUI hangs though, so I cannot access any data at all. I also cannot get into any docker apps that are running when the web UI hangs. They just spin in the browser and eventually timeout. Plex seems to die too and become inaccessible. I will bookmark this forum post and keep watching it. I have to force an unclean shutdown anywhere from 8 hours to 4 or 5 days as well. I will post diagnostic files here in this forum post when/if it happens again. I will also be certain to update as soon as they come out from Limetech. I run quite a bit on my server as far as dockers and VMs go but I would be willing to stop almost all of them for debugging and finding a root cause to this issue. My wife would probably scream at me if I left the Plex docker off, as this is our only form of watching TV now that I cut the cord. I am sure I could leach off a few friends servers while we resolve this issue though. Best of luck to us all!

 

 

Link to comment

I have been having issues with the WEBUI hanging and not responding as well. I can always telnet in though. My shares DO go offline when the WEBUI hangs though, so I cannot access any data at all. I also cannot get into any docker apps that are running when the web UI hangs. They just spin in the browser and eventually timeout. Plex seems to die too and become inaccessible. I will bookmark this forum post and keep watching it. I have to force an unclean shutdown anywhere from 8 hours to 4 or 5 days as well. I will post diagnostic files here in this forum post when/if it happens again. I will also be certain to update as soon as they come out from Limetech. I run quite a bit on my server as far as dockers and VMs go but I would be willing to stop almost all of them for debugging and finding a root cause to this issue. My wife would probably scream at me if I left the Plex docker off, as this is our only form of watching TV now that I cut the cord. I am sure I could leach off a few friends servers while we resolve this issue though. Best of luck to us all!

Have you done the upgrade to 6.1.2 which includes a fix for GUI hangs and has been getting positive feedback on the fix.  Submitting any reports for an earlier release would almost certainly result in being told to upgrade to 6.1.2 to see if it still happens.
Link to comment

I have been having issues with the WEBUI hanging and not responding as well. I can always telnet in though. My shares DO go offline when the WEBUI hangs though, so I cannot access any data at all. I also cannot get into any docker apps that are running when the web UI hangs. They just spin in the browser and eventually timeout. Plex seems to die too and become inaccessible. I will bookmark this forum post and keep watching it. I have to force an unclean shutdown anywhere from 8 hours to 4 or 5 days as well. I will post diagnostic files here in this forum post when/if it happens again. I will also be certain to update as soon as they come out from Limetech. I run quite a bit on my server as far as dockers and VMs go but I would be willing to stop almost all of them for debugging and finding a root cause to this issue. My wife would probably scream at me if I left the Plex docker off, as this is our only form of watching TV now that I cut the cord. I am sure I could leach off a few friends servers while we resolve this issue though. Best of luck to us all!

Have you done the upgrade to 6.1.2 which includes a fix for GUI hangs and has been getting positive feedback on the fix.  Submitting any reports for an earlier release would almost certainly result in being told to upgrade to 6.1.2 to see if it still happens.

No but this will be done as soon as I get home from work. I'm running the 6.1 right now still. Thanks!

Link to comment
Guest
This topic is now closed to further replies.