Unraid webconsole unreachable ? Transport endpoint not connected ? Read this.

October 7, 201213 yr

This thread describes errors with the unraid webconsole dying and "transport endpoint not connected" errors. In this fist post you will find the combination of the tips to solve these issues, how we got to it is found in the posts hereunder.

Add the following lines of command to your GO script:

syntax for 4.7:

pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
pgrep -f "/usr/sbin/smbd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

(I will only maintain a set of commands for the 5.x branch from now on.

The following set of commands will make sure your needed base functionality does not get killed off in case of an out of memory error:

on 5.*: (thanks to int13h)

pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done^M
pgrep -f "/usr/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done^M
pgrep -f "in.telnetd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done

Make SABnzbd likely to get killed

pgrep -f "/usr/local/sabnzbd/SABnzbd.py" | while read PID; do echo 1000 > /proc/$PID/oom_score_adj; done^M[/i]

The SABnzbd setting will actually help better then the other ones, it will make sure that whenever a process needs to be sacrificed then SABnzbd will be the one that is killed off. Good thing is that you just disable/enable from the webinterface and it will run again, no need to go into console or reboot.

Please take note of the thread below with respect to the how and why of these additions

-----

Original first post follows:

-----

Just noticed I could no longer access my user shares. Thru console I get the following error if I try to CD to it:

-bash: cd: user: Transport endpoint is not connected

This looks like a bug.. And it caused the loss of some files for me, nothing unsolvable but still..

I cannot add syslog now since I cannot reach the flash drive also from my system, looks like SMB has crashed ..

I will now reboot, this will undoubtedly solve the issue but I hope it does not happen again though..

The array refuses to shut down, there are no processes running that normally make that happen, I have saved the results of ps -elf.

I have now rebooted and can access the syslog I saved on the flashdrive, it is in the attached zip together with the ps -elf output.

System is backup but (ofcourse) parity check is now running.

syslog.zip

Quote

October 8, 201213 yr

I had the same problem this morning.

Upgraded to RC8 on October 5th

Here is the syslog

http://pastebin.com/dEMgjQEg

Quote

October 8, 201213 yr

Author

We need Joe on this :-)

I did some digging and I think I have found what is going on. Most probably the server ran out of memory, that will then trigger linux to stop processes that are used not very often. Downside is that this means that emhttp (that gives you the webinterface for administering unraid) and SMB (that actually gives you your drive shares) are the first ones to go.

I recently had the web interface die on me and now SMB, both happened after I started using SABNzbd/Couchpotato/Sickbeard, all are used a lot recently and might even have memory leaks.

Issue is that I can hardly believe that this is linux behaviour that cannot be altered... Core system processes (and in case of unraid SMB and web interface are that) should be protected against this behaviour.. On the other hand, if that would be possible it most probably would allready have been done..

Fortunately memory is quite cheap these days so yesterday I ordered another 8 gigs, my box will then run on 16gigs which I believe is in basis total nonsense but if it helps me survive a couple of days longer it is worth the 50 euro for me ..

Quote

October 8, 201213 yr

Author

Did some digging...

Now we really need some linux expertise...:

http://lwn.net/Articles/317814/

Apparently linux uses something called an "OOM-killer" (out of memory killer).

OOM is used to make sure a server will not crash, if memory becomes desperately low OOM will kill of processes according to a predefined rule set in an event to keep the server afloat.

The article behind the above link describes ways to (quote) "SAVE IMPORTANT PROCESSES FROM BEIING KILLED".

Now that is exactly what we need right ?

I have no problem in the system killing Couchpotato, Sickbeard, SABnzb, Airvideo.. But the server should keep itself running, and ITSELF for an unraid box includes emhttp and smb (and possibly also the daemons for ntfs etc.).

The way to do this is give processes a score that defines the order in which they are available for "killing", give an application a score of -17 and it will not be considered a candidate for killing.

Sounds good eh ?

Problem is there is no trace of these options within the unraid distro (at least not to my untrained eye), it could be we need to compile something in, or possibly OOM is a specific implementation for one distro and unraid might use something else..

So... linux savvy guys out there... get your hack on ? This would be a very valuable thing to have imho..

Quote

October 8, 201213 yr

Author

Ehm....

Someone needs to help me and tell me I am talking crap because I think I actually found the solution...

Someone PLEASE verify...:

STEPS:

1) Login your server.

2) do PS -elf | grep emhttp

This now gives you the process-ID of the emhttp process

3) go to /proc/<the process id you just found>

Here you will find a whole lot of stuff that I do not understand, but you will also find a file called oom_adj, if you look into it you find one number: 0 That is the basic number every process gets. If we can set this to -17 the EMHTTP process will no longer get killed.

For me the process ID for EMHTTP is 14736.

So I gave the following command:

echo -17 > /proc/14736/oom_adj

The file now contains the figure of -17 and should no longer be attacked by the OOM killer..

Now this is done on a per process-id basis and that changes when you reboot, we need something that can go into the GO file and think we have that also:

pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -17 > /proc/$PID/oom_adj; done

Before anyone tries this: Someone with linux knowledge needs to confirm.. I am not afraid to experiment but I found this after 15 minutes of googling, so it might be that this totally does not work or is dangereous somehow ...

Quote

October 8, 201213 yr

I've never used it, but I've read that before....Should be safe as long as your setting emhttp and smb to not be killed, could get dangerous if you started setting other apps to not be killed. emhttp and smb are relatively low-resource so "not" killing them shouldn't make a lot of difference.

If your scores are like mine, there are a whole slew of applications that would be killed before smb and emhttp though. Checking right now, these would be killed before smb for me:

Sabnzbd

sickbeard

mylar

couchpotato

sickbeard(second instance)

headphones

telnet

+2 others I'm forgetting(9 total)

Quote

October 9, 201213 yr

Author

If only the oom_adj score would be used to define to process to be killed that would be the case, but as I understand it there is a whole formule system behind that decides on the process to be sacrificed by looking at the last use of the process as well as the oom_adj.

Therefor it is perfectly possible that emhttp gets killed off while couchpotato has an oom_adj score that offers it up sooner...

Personally I would never make sab/couch/sick unkillable, if only because they can be easily restarted... Emhttp can no longer be restarted in the v5 release making it necessary to perform a full reboot, possibly get a parity check at bootup..

Sent from my iPad using Tapatalk HD

Quote

October 9, 201213 yr

The list I was referring to was the list of oom_score's. From my understanding, and I just went back and read on it again to make sure I wasn't completely off base, the oom_score is the result of the formula your speaking of, and is "shifted" according to the value set in oom_adj. So since I haven't set any oom_adj values, they are all currently "0". In this case, the list I made would be the order oom_killer would go in order to free memory.

That being said it would definitely be possible that emhttp is killed before other apps depending on that formula. My list is just an example.

If it does happen again, print the output of

egrep -i 'killed process' | /var/log/
[code]

That will let you know what was killed atleast.

Quote

October 9, 201213 yr

Author

That helps, thanks!

Imho the whole thing is tightly related to the fact that processes get killed that cannot be restarted...

Also:

With 8 gigs in my box there is no -real- reason there should be out of memory situations, that (imho) are applications behaving badly, and they should be punished (eg killed). Should it occur more often (I will monitor my system closely, then I could also create a small cron job (ddaily for example) that checks if the sab/sic/couch are still running and if not; restart them. That probably is not to difficult to make and it will make sure that bad behaving applications get killed, and after some time restarted with fresh opportunities again... That would keep the ecosystem running up until the point latest updates of these apps have made them more nice behaving...

Quote

October 9, 201213 yr

It's true, with 8 gb you shouldnt see those situations often, but they may under the right circumstances happen. If you were to start seeing them often, I'd run memtest to rule out ram going bad.

Sab has been known to be a resource hog but its never been bad enough for me to have issues out of it.

The problem isn't so much running out of memory, its letting applications over commit. If Sab tries to get 1 gb of memory but only 500mb is available, linux will let Sab have it. Sab most likely will not use it all so the 500mb is fine until Sab does need it. Then linux tries to kill off other processes so Sab can use it, since linux has already told Sab it can.

Problem here is: if linux has to kill off processes what will it kill? If it doesn't kill enough in time Sab will try to write to memory it can't, causing a segfault.

Thanks for the thread, made me reread on the topic. Been a while since I've thought about how it works other than acknowledging its there.

Quote

October 9, 201213 yr

Author

And I do feel that we have stumbled upon a solution for emhttp getting killed.

A lot of memory could even make it more likely that issues occur if the cause is lack of low memory.

Since low memory is used to address high memory adding more gigs will cause less low memory to be available. That will then cause the OOM killer to run more frequent..

Should unraid become 64 bit this issue would not occur..

Quote

October 10, 201213 yr

I really hate to be one of those guys but here goes:

This issue only happened to me with RC8. I went back to RC5 and it's been flawless (as it was before i upgraded).

My setup has 3 gigs of memory + a 2 gig swapfile.

so the bottom line is if it's a memory issue, what is different in RC8 causing the memory to fill up in a matter of a day?

Quote

October 15, 201213 yr

Thanks Helmonder for bringing this up. I too have been having this issue since installing SAB/SB/CP onto unraid and I too have 8gigs of memory.

See this post http://lime-technology.com/forum/index.php?topic=20013.msg200189#msg200189

If you watch top closely while SAB is downloading you will see that it eats memory at the same rate that it downloads (what it looks like anyway -no calculations to back up that statement). It seems to hold onto that memory in cache but then when the post processor starts OOM starts killing off processes. This, to me, looks like an issue with SAB.

I have resolved my issue by setting SAB to pause the queue whilst post processing. I haven't ran out of memory once since doing that, however I do prefer your solution. Being able to manage this via the gui or via a conf file would be excellent.

Quote

October 15, 201213 yr

Keep in mind. Processes that access large amounts of memory or filecounts tend to cause these OOM issues.

I had issues in the past where a large rsync job caused OOM issues. I had to drop the cache before and after I ran the job to prevent the OOM issues.

I wonder if the unRAID md drive was re-tuned in RC8.

Can anyone validate differences between the two?

Also anything that uses /tmp or /var/tmp can cause the system to run out of memory since that uses root fs which cannot be swapped out.

If you have a swap partition you can mount a tmpfs on /var/tmp and /tmp to allow swapping to assist. (I've verified that tmpfs filesystems can be swapped out where as rootfs will not be swapped out).

Another point to consider, if you are running the cache dirs script, this could cause OOM conditions too. It did on mine, but I have drives with enourmous amounts of files, so I cannot use it without causing crashes.

The issue boils down to how much low memory is available for the kernel. 2g, 4g, 8g doesn't matter, it's low memory that's the issue.

If we use tmpfs for the root filesystem one day, I think the low memory pressure issue would be better since tmpfs can be swapped out in an emergency.

I did find a way of remounting root onto a tmpfs filesystem, but it requires some new tools and startup in unRAID that has to occur at the boot level.

Quote

October 16, 201213 yr

Author

In the meantime I am happy with an unraid system that does not let its emhttp crash, at least I can restart plugins or reboot gracefully then.

Quote

October 21, 201213 yr

After upgrading to RC8 from 4.7 I have the same issue. I can't run SAB, sickbeard, and couchpotato server at the same time. They will all crash within a minute or so of starting up and running for a bit. I'm going to try RC5 so we'll see. Just thought I'd throw my hat in so you're not alone.

Quote

October 21, 201213 yr

Author

Crashing is not something that has something to do with the OOM Killer... I have been running all with 8 gigs for over a week without issues.

Am now running with 16 gigs and the OOM-adjust parameter for emhttp and smb. No issues for over a week sofar.

Quote

October 21, 201213 yr

I think you guys are running into similar issues that I have, see my post "Lost connectivity". I feel that my problems are related to SAB/SickBeard/CouchPotato on unraid 4.7. My warning to everyone is that prior to performing a disk upgrade or parity check, they should reboot to a vanilla unraid installation I.e. no unmenu or other extraneous apps running. I have been down for 4 days now and I'm still unsure I will be whole again:-(

Sent from my ASUS Transformer Pad TF700T using Tapatalk 2

Quote

October 25, 201213 yr

My syslog says :

/proc/1245/oom_adj is deprecated, please use /proc/1245/oom_score_adj instead

Ok now, if I write a value in oom_score_adj file, do I have to use the same scoring system (from -17 to 17) ?

Or is it expecting different kind of values ?

autoreply : Actually, the "new" scoring system uses range [-1000;+1000]

So I'll add to my GO file these three lines :

# OOM daemon trick : do not kill unRaid WebUI and SMB shares too easily.
pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done
pgrep -f "/usr/local/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done

I'll see how it goes.

Quote

October 25, 201213 yr

Author

Thats weird... I do not see that in my syslog and I running the latest unraid beta version... What version are you running ?

Quote

October 25, 201213 yr

I'm running unRaid version 5.0-rc8a.

I saw this line in the syslog trying to grep something else.

I'll post my syslog tonight (if I'm home early).

Quote

October 25, 201213 yr

Author

It is described in the following knowledgebase article:

http://www.dbasquare.com/kb/how-to-adjust-oom-score-for-a-process/

/proc/[pid]/oom_score_adj for kernels 2.6.29 and newer need a value between -1000 and 1000

/proc/[pid]/oom_adj for kernels older than 2.6.29 need a value between -17 and 15

Unraid is now on 3.4.11 so indeed, the -1000 and 1000 range should be used !

Quote

October 25, 201213 yr

i think i have the same problem everything works sabnzbd, sickbeard and smb but tower web interface will stop working didn't have this problem before. I get in chrome

Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

Quote

October 25, 201213 yr

Yep, I had the same symptom.

Quote

October 25, 201213 yr

Using oom_adj :

Oct 24 18:49:26 Tower emhttp: unRAID System Management Utility version 5.0-rc8a
Oct 24 18:49:26 Tower emhttp: Copyright (C) 2005-2012, Lime Technology, LLC
Oct 24 18:49:26 Tower emhttp: Plus key detected, GUID: 090C-6300-AA00-000000168261
Oct 24 18:49:26 Tower kernel: go (11365): /proc/11363/oom_adj is deprecated, please use /proc/11363/oom_score_adj instead.

Using oom_score_adj :

Oct 25 22:50:49 Tower emhttp: unRAID System Management Utility version 5.0-rc8a
Oct 25 22:50:49 Tower emhttp: Copyright (C) 2005-2012, Lime Technology, LLC
Oct 25 22:50:49 Tower emhttp: Plus key detected, GUID: 090C-6300-AA00-000000168261

Better

So we should use oom_score_adj, at least for unraid 5.0 branch.

Quote

Unraid webconsole unreachable ? Transport endpoint not connected ? Read this.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)