theone Posted October 22, 2012 Share Posted October 22, 2012 Hello, During the day I found that the WEBUI was inaccessible. I restarted the server from command line "powerdown" and took a look at the saved syslog after reboot. This is what I found that's related to emhttp: Before (nothing wrong visible): Oct 21 16:00:01 Tower YAMJ_HTML: YAMJ HTML creation/update - Started... Oct 21 16:03:55 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 1/6 Oct 21 16:11:46 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 2/6 Oct 21 16:16:26 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 3/6 Oct 21 16:19:09 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 4/6 Oct 21 16:22:12 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 5/6 Oct 21 16:22:50 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 6/6 Oct 21 16:23:00 Tower YAMJ_FLASH: YAMJ FLASH creation/update - PCH Livingroom not connected. No Jukebox created. Oct 21 19:01:42 Tower kernel: mdcmd (75): spindown 3 Oct 21 19:21:16 Tower kernel: mdcmd (76): spindown 2 Oct 21 19:22:27 Tower kernel: mdcmd (77): spindown 4 Oct 21 20:53:16 Tower in.telnetd[1891]: connect from 192.168.2.100 (192.168.2.100) Oct 21 20:53:18 Tower login[1892]: ROOT LOGIN on '/dev/pts/0' from '192.168.2.100' Oct 21 20:53:41 Tower YAMJ_FLASH: YAMJ FLASH creation/update - Started... Oct 21 21:52:30 Tower YAMJ_FLASH: YAMJ FLASH creation/update - Finished 1/1 Oct 21 22:36:32 Tower kernel: mdcmd (78): spindown 3 Oct 21 23:30:57 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed Oct 21 23:30:57 Tower kernel: hdparm: sending ioctl 2285 to a partition! Oct 21 23:31:00 Tower last message repeated 5 times Oct 21 23:31:00 Tower kernel: smartctl: sending ioctl 2285 to a partition! Oct 21 23:31:00 Tower last message repeated 3 times Oct 21 23:52:51 Tower kernel: mdcmd (79): spindown 0 Oct 22 02:31:09 Tower kernel: mdcmd (80): spindown 3 Oct 22 02:57:24 Tower kernel: mdcmd (81): spindown 1 Oct 22 02:57:25 Tower kernel: mdcmd (82): spindown 4 Oct 22 03:32:52 Tower kernel: mdcmd (83): spindown 2 Oct 22 11:31:00 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed Oct 22 11:31:00 Tower kernel: hdparm: sending ioctl 2285 to a partition! Oct 22 11:31:01 Tower last message repeated 5 times Oct 22 11:31:01 Tower kernel: smartctl: sending ioctl 2285 to a partition! Oct 22 11:31:01 Tower last message repeated 3 times Oct 22 15:51:08 Tower in.telnetd[512]: connect from 192.168.2.106 (192.168.2.106) Oct 22 15:51:11 Tower login[513]: ROOT LOGIN on '/dev/pts/0' from '192.168.2.106' Suspected failure Oct 22 15:51:19 Tower emhttp: unRAID System Management Utility version 5.0-rc8a Oct 22 15:51:19 Tower emhttp: Copyright (C) 2005-2012, Lime Technology, LLC Oct 22 15:51:19 Tower emhttp: Pro key detected, GUID: 13FE-3100-07A8-1108F1480299 Oct 22 15:51:19 Tower emhttp: get_config_idx: fopen /boot/config/flash.cfg: No such file or directory - assigning defaults Oct 22 15:51:20 Tower emhttp: rdevName.25 not found Oct 22 15:51:20 Tower emhttp: diskFsStatus.1 not found Oct 22 15:51:20 Tower kernel: emhttp[545]: segfault at 0 ip b7530760 sp bfafb4d0 error 4 in libc-2.11.1.so[b74b7000+15c000] After (things are still running fine): Oct 22 16:00:01 Tower YAMJ_HTML: YAMJ HTML creation/update - Started... Oct 22 16:04:24 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 1/6 Oct 22 16:11:49 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 2/6 Oct 22 16:16:23 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 3/6 Oct 22 16:18:45 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 4/6 Oct 22 16:21:45 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 5/6 Oct 22 16:22:21 Tower YAMJ_HTML: YAMJ HTML creation/update - Finished 6/6 Oct 22 16:22:21 Tower YAMJ_FLASH: YAMJ FLASH creation/update - Started... Oct 22 17:31:46 Tower YAMJ_FLASH: YAMJ FLASH creation/update - Finished 1/1 Everything else was functioning OK even though unRIAD WEBUI could not be accessed (utserver, plex, YAMJ cron job, etc...). What caused this? Quote Link to comment
joyless Posted October 22, 2012 Share Posted October 22, 2012 did you copy large amounts of data to unraid prior to emhttp segfaulting? Quote Link to comment
theone Posted October 22, 2012 Author Share Posted October 22, 2012 How much is alot? I have copied large amount of data in the past and this never happened. As you can see this happened at 15:51 - I was at work and nobody home. It seems from the log that something in unRAID reset/restarted because the USB license and GUID were rediscovered. Anyone else? Quote Link to comment
dgaschk Posted October 23, 2012 Share Posted October 23, 2012 See here for shutdown instructions: http://lime-technology.com/wiki/index.php?title=Console#To_cleanly_Stop_the_array_from_the_command_line Quote Link to comment
theone Posted October 23, 2012 Author Share Posted October 23, 2012 See here for shutdown instructions: http://lime-technology.com/wiki/index.php?title=Console#To_cleanly_Stop_the_array_from_the_command_line I had no problem cleanly shutting down my server. I am asking because I would like to know why the WEBUI crashed. Quote Link to comment
int13h Posted October 23, 2012 Share Posted October 23, 2012 The webUI *may* have been killed by the OOM, see here for further informations : http://lime-technology.com/forum/index.php?topic=22971.msg203593#msg203593 Quote Link to comment
theone Posted October 23, 2012 Author Share Posted October 23, 2012 The webUI *may* have been killed by the OOM, see here for further informations : http://lime-technology.com/forum/index.php?topic=22971.msg203593#msg203593 I thought about it but found no evidence of this in the syslog. Quote Link to comment
bcbgboy13 Posted October 23, 2012 Share Posted October 23, 2012 See here for shutdown instructions: http://lime-technology.com/wiki/index.php?title=Console#To_cleanly_Stop_the_array_from_the_command_line I had no problem cleanly shutting down my server. I am asking because I would like to know why the WEBUI crashed. I have a theory but is is not widely shared here. However there is a reason why the commercial servers use ECC memory and are powered by UPS. Unraid is loaded in the RAM and a single bit flip (cosmic rays, momentarily power glitch) in that area can cause a "crash" in a module - consequences will be different but can be observed in one or another way. Some readings - http://perspectives.mvdirona.com/2009/10/07/YouReallyDONeedECCMemory.aspx www.amd.com/us/documents/47644a_ecc_embedded.pdf Quote Link to comment
theone Posted October 26, 2012 Author Share Posted October 26, 2012 Well it happened again - No Access to WEBUI. Can access using PUTTY, shares, utserver, etc... OK This time there is no evidence in syslog of that anything happened. emhttp is still running in processes list. Quote Link to comment
theone Posted October 30, 2012 Author Share Posted October 30, 2012 It happened yet again - No Access to WEBUI. Can access using PUTTY, shares, utserver, etc... OK Again no evidence in syslog that anything happened. emhttp is still running in processes list. It seems to be happening every ~3 days. Please help... I have not restarted my server so any help/suggestions would be appreciated. Quote Link to comment
dgaschk Posted October 30, 2012 Share Posted October 30, 2012 Attach a syslog taken after the problem occurs. Quote Link to comment
theone Posted October 30, 2012 Author Share Posted October 30, 2012 in the OP there are excerpts from the syslog. Attached is the full syslog from the first time it happened (syslog_OP_2012.10.22.txt) and another from the current syslog (syslog_2012.10.30.txt) still running taken from "/var/log/syslog" (haven't restarted server yet). syslog_OP_2012.10.22.txt.zip syslog_2012.10.30.txt.zip Quote Link to comment
whiteatom Posted October 31, 2012 Share Posted October 31, 2012 I have this problem as well. Constantly loosing access to the web ui with nothing in the syslog (2-3 days). I have seen emhttp segfault, but not most of the time. I had reduced it down to some blocking process causing emhttp to stop responding to external requests. Sometimes I can catch an event that has screwed things up (couch potato was a common culprit on the array_strarted event) but now that's fixed, it seems to be spin up related. One thing I have noticed is that simplefeatures Ajax features seem to lock up emhttp more often. I have taken out all the pieces that actively poll the system. My latest approach is to telnet in and use the command line unraid control to spin up all the disks, the go to the web ui. I'd say this is 90% effective. When emhttp does lock up, it can't be restarted with the array running... It segfaults immediately. I have found this to work: I kill it, stop the array with the command line control, then restart emhttp in the background, go to the web GUI and start the array back up. This unexplainable unstableness drives me crazy. It's the only thing that makes me think about other "raid" products. It's also frustrating that as I get deeper into resolving it, the less help you'll get here. Nothing against the wonderful unraid community, but it just seems no one else can explain it either. Try the usual: take out all your plugins and it will probably be fine... Too bad plugins were one of my main selling points for unRAID. whiteatom Quote Link to comment
whiteatom Posted October 31, 2012 Share Posted October 31, 2012 Ps: I am using ECC ram and a ups. Here is some of my efforts: http://lime-technology.com/forum/index.php?topic=23069 Quote Link to comment
theone Posted October 31, 2012 Author Share Posted October 31, 2012 How do you accomplish this from command line? My latest approach is to telnet in and use the command line unraid control to spin up all the disks and this? I have found this to work: I kill it, stop the array with the command line control, then restart emhttp in the background Quote Link to comment
joelones Posted October 31, 2012 Share Posted October 31, 2012 I'd be interested in this as well, experienced same as OP but the web ui was not responsive after a parity check nothing in syslog explaining that anything happened. emhttp was still running in processes list, running a fair number plugins i'm on 5rc8a as well Quote Link to comment
whiteatom Posted October 31, 2012 Share Posted October 31, 2012 Hmm.. This I'd love to tell you, but I can't remember. I have an "unraid" command line executable, but I don't remember where I found it. Check the add-ons page and if you don't find it ill do some searching when I get home from work. Quote Link to comment
bcbgboy13 Posted October 31, 2012 Share Posted October 31, 2012 Ps: I am using ECC ram and a ups. Here is some of my efforts: http://lime-technology.com/forum/index.php?topic=23069 No, you are not using the ECC functionality (but you do have a real ECC memory and ECC capable motherboard) - in order to use this you will have to use Xeon and not the crippled i3 (the memory controller is build in the CPU and intel charges a pretty penny for ECC). Quote Link to comment
whiteatom Posted October 31, 2012 Share Posted October 31, 2012 Hmm.. well anyway. I think this is the script I'm using.. http://lime-technology.com/wiki/index.php/Manage_from_Telnet but mine is called "unraid" - I may have just renamed it. Quote Link to comment
theone Posted November 28, 2012 Author Share Posted November 28, 2012 Again it happened... twice in ~4 days. WEBUI not responsive, emhttp still running, syslog shows nothing out of the ordinary. Any ideas? (500) Quote Link to comment
dgaschk Posted November 28, 2012 Share Posted November 28, 2012 Disable all add-ons. Re-enable them one at a time until you determine the culprit. Quote Link to comment
theone Posted November 28, 2012 Author Share Posted November 28, 2012 Disable all add-ons. Re-enable them one at a time until you determine the culprit. I would like to but some too essential for shutting them down for a whole week to see if MAYBE something happens. I would say it started mainly after upgrading to RC8a (4.7 previously). I will try and disable SimpleFeatures Core - as it will hopefully have the least effect on the server usage around the house. Quote Link to comment
dbezerra Posted November 28, 2012 Share Posted November 28, 2012 Disable all add-ons. Re-enable them one at a time until you determine the culprit. I would like to but some too essential for shutting them down for a whole week to see if MAYBE something happens. I would say it started mainly after upgrading to RC8a (4.7 previously). I will try and disable SimpleFeatures Core - as it will hopefully have the least effect on the server usage around the house. Same problem happening with me, and only after migrate to RC8a. The only plug-in I have installed is the UPS one. I also have UnMenu running. Quote Link to comment
theone Posted November 29, 2012 Author Share Posted November 29, 2012 Same problem happening with me, and only after migrate to RC8a. The only plug-in I have installed is the UPS one. I also have UnMenu running. Are you running Simplefeatures GUI? Is there anything in your syslog that suggests OOM (out of memory) or segfaults occuring? Quote Link to comment
theone Posted December 5, 2012 Author Share Posted December 5, 2012 Everything has been OK for the last 7 days. I will now enable SF CORE plugin without the WEB SERVER plugin and let it run for another period of ~7 days. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.