cofin Posted November 4, 2012 Share Posted November 4, 2012 I'm on the latest 5 rc (rc8a) and the output of 'which smbd' points to /usr/sbin/smbd. Perhaps I am missing something, but shouldn't the pgrep line be adjusted for the 5.* smbd? Linux 3.4.11-unRAID. root@SATURN:~# which smbd /usr/sbin/smbd Add the following lines of command to your GO script: on 4.7: pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -17 > /proc/$PID/oom_adj; done pgrep -f "/usr/sbin/smbd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done on 5.*: (thanks to int13h) pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done pgrep -f "/usr/local/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done Please take note of the thread below with respect to the how and why of these additions Quote Link to comment
htpcnewbie Posted December 13, 2012 Share Posted December 13, 2012 Have folks found a solution to this? I have a new unraid system setup. Did not see this issue when I tested it out for few weeks before migrating all my data to the pro server. I am running v5rc8 and the syslog points to emthttp error. =========== Dec 13 08:57:45 Tower emhttp: get_filesystem_status: statfs: /mnt/user/Downloads Transport endpoint is not connected =========== The strange thing is I was accessing the user shares from Windows 7 machine until a minute before. When I was trying to write a file to the "Downloads" share directory mounted on cache drive only, the share went offline. Connected through putty and noticed the above error message before a whole bunch of "Transport endpoint is not connected" messages. Scrolled the syslog and didn't find anything outstanding. Browser access and cache drive (SABnzbd downloads still working) access seems to be unaffected. Added the following code following the discussion before my previous reboot. Looks like it did not help. ==================== on 5.*: (thanks to int13h) pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done pgrep -f "/usr/local/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done ==================== The system used to run for few days before I had to reboot due tho this issue, the last reboot lasted less than couple of days before this message occured. I am running the v5rc8 unraid on Intel Q6600 with 4GB of ram. Extensively used plugins are slimserver, SABnzbd and utserver. Installed couchpotato but haven't configured yet. Quote Link to comment
moose Posted December 13, 2012 Share Posted December 13, 2012 Just a random thought, but could this memory issue be caused by the same memory issue that is referenced in this thread? (See reply #70 and link provided) http://lime-technology.com/forum/index.php?topic=22675.0 Quote Link to comment
htpcnewbie Posted December 13, 2012 Share Posted December 13, 2012 I ran memtest for a day and stress tested it thoroughly before installing unraid. I highly doubt corrupted memory but willing to debug that scenario if the obvious choices are ruled out. I am unfamiliar with unraid setup and need suggestions on what I can try. Thanks for the suggestion. Quote Link to comment
dgaschk Posted December 13, 2012 Share Posted December 13, 2012 Added the following code following the discussion before my previous reboot. Looks like it did not help. ==================== on 5.*: (thanks to int13h) pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done pgrep -f "/usr/local/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done ==================== As cofin mentioned, "/usr/local/sbin/smbd" is incorrect and the correct location is "/usr/sbin/smbd". Quote Link to comment
htpcnewbie Posted December 13, 2012 Share Posted December 13, 2012 ^^ Thanks for the pointer. Missed that message. I will reboot the server after adjusting the smb path. Thanks! Quote Link to comment
Helmonder Posted December 13, 2012 Author Share Posted December 13, 2012 Added the following code following the discussion before my previous reboot. Looks like it did not help. ==================== on 5.*: (thanks to int13h) pgrep -f "/usr/local/sbin/emhttp" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done pgrep -f "/usr/local/sbin/smbd" | while read PID; do echo -1000 > /proc/$PID/oom_score_adj; done ==================== As cofin mentioned, "/usr/local/sbin/smbd" is incorrect and the correct location is "/usr/sbin/smbd". Or remove the path statement and it always works: pgrep -f "smbd" Quote Link to comment
htpcnewbie Posted December 18, 2012 Share Posted December 18, 2012 I tried both ways, using the path and just using the exec as suggested by dgaschk and Hemonder. The system runs for a day and then I get the 'Transport endpoint not connected' error. It is frustrating to see an unstable system. Any further thoughts for this? Here are the last few lines from the log just before it crashed .. ====== Dec 18 04:31:34 Tower logger: .d..t...... WORK/ Dec 18 04:31:34 Tower logger: skipping temp/ Dec 18 04:31:34 Tower logger: skipping utserver/ Dec 18 04:31:34 Tower logger: mover finished Dec 18 05:02:01 Tower kernel: mdcmd (102): spindown 2 Dec 18 05:02:14 Tower kernel: mdcmd (103): spindown 0 Dec 18 05:03:15 Tower kernel: mdcmd (104): spindown 3 Dec 18 05:03:17 Tower kernel: mdcmd (105): spindown 4 Dec 18 05:03:18 Tower kernel: mdcmd (106): spindown 5 Dec 18 05:03:19 Tower kernel: mdcmd (107): spindown 6 Dec 18 05:03:59 Tower kernel: mdcmd (108): spindown 1 Dec 18 06:47:03 Tower sSMTP[21473]: Creating SSL connection to host Dec 18 06:47:03 Tower sSMTP[21473]: SSL connection using RC4-SHA Dec 18 06:47:07 Tower sSMTP[21473]: Sent mail for root@[email protected] (221 2.0.0 closing connection di16sm1310564vdb.11) uid=0 username=root outbytes=743 Dec 18 07:19:45 Tower in.telnetd[22401]: connect from 10.10.1.110 (10.10.1.110) Dec 18 07:19:49 Tower login[22404]: ROOT LOGIN on '/dev/pts/0' from 'yyyy' Dec 18 08:07:33 Tower emhttp: get_filesystem_status: statfs: /mnt/user/Downloads Transport endpoint is not connected =============== Quote Link to comment
Helmonder Posted December 18, 2012 Author Share Posted December 18, 2012 Search your syslog for messages concerning OOM or Out Of Memory, if you do not see messages then someting else is going on and SMBD is dying off for some other reason.. Quote Link to comment
htpcnewbie Posted December 18, 2012 Share Posted December 18, 2012 ^^^ Thanks Helmonder. Will check and report back tonight. Quote Link to comment
RussellinSacto Posted January 10, 2013 Share Posted January 10, 2013 Any updates on this? I'm getting a lot of these "Transport endpoint not connected" errors in my logs and my user shares then lock out - permission denied, but still visible. (Disk shares are unaffected, web interface unaffected). Running CrashPlan and Unmenu, not really anything else. (Using Unmenu to stop/restart samba doesn't help). Any ideas? Thanks, Russell Quote Link to comment
BBoYTuRBo Posted January 10, 2013 Share Posted January 10, 2013 I also just started getting this "Transport endpoint is not connected" error. And I just started using CrashPlan on my UnRAID box the other day. Going to try adding those lines to my go script to see if that helps. In the meantime, is there a way to get the user shares working again with a terminal command or something? I'm already in the middle of a parity check from the last time I lost connectivity and had to restart the box, and I'd like to let it finish. Quote Link to comment
Helmonder Posted January 10, 2013 Author Share Posted January 10, 2013 Any updates on this? I'm getting a lot of these "Transport endpoint not connected" errors in my logs and my user shares then lock out - permission denied, but still visible. (Disk shares are unaffected, web interface unaffected). Running CrashPlan and Unmenu, not really anything else. (Using Unmenu to stop/restart samba doesn't help). Any ideas? Thanks, Russell There is not really an update needed... Situation is clear (system runs out of memory). Solution is clear: run less plugins Workaround is also clear: set oom_score paramters To know if this pertains to you check if your syslog shows OOM events. Quote Link to comment
Helmonder Posted January 10, 2013 Author Share Posted January 10, 2013 I also just started getting this "Transport endpoint is not connected" error. And I just started using CrashPlan on my UnRAID box the other day. Going to try adding those lines to my go script to see if that helps. In the meantime, is there a way to get the user shares working again with a terminal command or something? I'm already in the middle of a parity check from the last time I lost connectivity and had to restart the box, and I'd like to let it finish. Good question... You would need to restart smbd... No idea if that will work... Quote Link to comment
limetech Posted January 10, 2013 Share Posted January 10, 2013 This problem might be related to a bug in a linux component called "fuse": http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/11922 I have built the latest fuse release 2.9.2 which has the fix for above bug. This will show up in the next build, -rc9b I guess. Can't guarantee this will solve this particular issue. Quote Link to comment
RussellinSacto Posted January 11, 2013 Share Posted January 11, 2013 Thanks Tom, I hope that's it... Helmonder, my total plugin count: UnMenu, CrashPlan (which I guess loads SSH, but not sure), and PowerDown. I'm a pretty plugin free guy. :-) Both my UnRaids are crashing like this - and both are 100% Lime-Tech Spec (the physical boxes they sell, except maybe different cases - I can't remember that part). Thanks, Russell Quote Link to comment
BBoYTuRBo Posted January 15, 2013 Share Posted January 15, 2013 Situation is clear (system runs out of memory). Solution is clear: run less plugins Workaround is also clear: set oom_score paramters To know if this pertains to you check if your syslog shows OOM events. After the latest incident of my user shares ceasing to function, upon checking my syslog, there were no OOM events, or errors of any kind. The user shares just stopped working. "Jan 14 21:54:21 Biggie unmenu[15844]: df: `/mnt/user': Transport endpoint is not connected" is what shows up if I try to access /mnt/user from the terminal However, I did just discover something curious. From the terminal, I noticed there is a /mnt/user0 directory, and all of my user shares show up there, and seem to be working just fine. Hopefully that bit of information can be useful somehow. Quote Link to comment
Helmonder Posted January 15, 2013 Author Share Posted January 15, 2013 This means it is not an oom event. The user0 is supposed to be there. Quote Link to comment
trurl Posted January 15, 2013 Share Posted January 15, 2013 ...I noticed there is a /mnt/user0 directory, and all of my user shares show up there, and seem to be working just fine... When things are working normally you will have a /mnt/user0 directory also. /mnt/user is the user shares including any files that are still on the cache drive. /mnt/user0 is the user shares excluding any files that are still on the cache drive. Quote Link to comment
BBoYTuRBo Posted January 16, 2013 Share Posted January 16, 2013 Oh, okay. Good to know. I've now updated to version 5.0-rc10, and Windows Explorer seems to be a lot snappier when browsing my user shares (I guess it's either the updated fuse module, or placebo :-P). I'll post again if I get the transport endpoint error on this version. So far so good though! Quote Link to comment
bobbintb Posted March 5, 2013 Share Posted March 5, 2013 recently i was getting a lot of weird behavior and thought it was my plugins. i concluded it wasnt and just did a clean install of unraid. all was going fine until yesterday when i started getting these "transport endpoint" errors. webui was accessible but my shares were not, though they appeared to be mounted. i restarted safely and it only got worse. now the webui wont start and the flash is the only drive that will mount. i even put in another 4gb of ram (total of 12) and it is still acting up. i do have a number of plugins but 12gb (or even 8gb) out to be enough. i see no OOM errors in the log anyway. i tried the fix on the first page and it might have worked. maybe just a coincidence. it was late and i havent had time to check into it. i really hope this issue gets fixed because it completely renders my system, and everything that relies on it, unusable. Quote Link to comment
chrisbirkinshaw Posted March 5, 2013 Share Posted March 5, 2013 This means it is not an oom event. The user0 is supposed to be there. I have the same issue, with UnRAID 5.0-rc8a. No OOM error messages in my logs at all. Syslog: Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/._08 He's Angry.m4a Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/._03 All Day.m4a Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/04 All Day Remix.mp3 Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/._09 Move (Original Mix).m4a Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/09 Move (Original Mix).mp3 Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/._02 Halloween (Remix).m4a Mar 5 16:11:23 Unraid shfs/user: duplicate object: /mnt/disk5/Music/iTunes/iTunes Media/Music/Ministry/Early Trax/10 I'm Falling.mp3 Mar 5 16:15:06 Unraid sshd[12166]: Accepted password for root from 172.16.0.13 port 56009 ssh2 Mar 5 16:15:15 Unraid sshd[12181]: Accepted password for root from 172.16.0.13 port 56017 ssh2 Mar 5 16:15:15 Unraid sshd[12185]: lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory Mar 5 16:15:15 Unraid sshd[12185]: lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory Mar 5 16:24:32 Unraid sshd[12166]: Received disconnect from 172.16.0.13: 11: disconnected by user Mar 5 16:29:01 Unraid crond[1216]: failed parsing crontab for user root: cron="" Mar 5 16:47:01 Unraid sudo: root : TTY=unknown ; PWD=/root ; USER=nobody ; COMMAND=/usr/local/bin/plex.sh Mar 5 17:23:48 Unraid kernel: mdcmd (54): spindown 0 Mar 5 17:23:50 Unraid kernel: mdcmd (55): spindown 3 Mar 5 17:23:51 Unraid kernel: mdcmd (56): spindown 5 Mar 5 17:28:01 Unraid crond[1216]: failed parsing crontab for user root: cron="" I know that the shares were lost between 16:11 and 17:28, as I was running an rsync job on my Mac at this time and it failed while transferring data to UnRAID. Rsync client log: Music/The Prodigy/Music For The Jilted Generation/08 Poison.m4a Music/The Prodigy/Music For The Jilted Generation/10 One Love (Edit).m4a Music/The Prodigy/Music For The Jilted Generation/13 Ckaustrophobic Sting.m4a Music/The Prodigy/No Good (Start The Dance) [xls51cd]/ rsync: rename "/mnt/user/Music/iTunes/iTunes Media/Music/The Orb/The Dream/.07 Phantom Of Ukraine.m4a.lrdR7q" -> "Music/The Orb/The Dream/07 Phantom Of Ukraine.m4a": Software caused connection abort (103) Music/The Prodigy/No Good (Start The Dance) [xls51cd]/01 No Good [Edit].m4a rsync: recv_generator: failed to stat "/mnt/user/Music/iTunes/iTunes Media/Music/The Prodigy/No Good (Start The Dance) [xls51cd]/02 No Good [bad For You].m4a": Transport endpoint is not connected (107) rsync: recv_generator: failed to stat "/mnt/user/Music/iTunes/iTunes Media/Music/The Prodigy/No Good (Start The Dance) [xls51cd]/03 No Good [CJ Bollands Museum Rmx].m4a": Transport endpoint is not connected (107) rsync: recv_generator: failed to stat "/mnt/user/Music/iTunes/iTunes Media/Music/The Prodigy/No Good (Start The Dance) [xls51cd]/04 No Good [Original Rmx].m4a": Transport endpoint is not connected (107) I have managed to cause this issue 3 times now by doing an rsync transfer. Is it possible rsync can be using all the memory and causing processes to quite, but not logging anything in syslog? The other weird things is that samba still appears to be running: root@Unraid:~# ps aux | grep smb root 4539 0.0 0.0 17804 3924 ? S 08:25 0:00 /usr/sbin/smbd -D root 6494 0.0 0.1 17940 4656 ? S 08:50 0:00 /usr/sbin/smbd -D root 14292 0.0 0.0 15808 3820 ? Ss Mar03 0:00 /usr/sbin/smbd -D root 14297 0.0 0.0 15812 1896 ? S Mar03 0:00 /usr/sbin/smbd -D root 30756 0.0 0.0 2452 584 pts/0 R+ 18:42 0:00 grep smb It seems to be suggested in this thread that samba is responsible for mounting the /mnt/user/share endpoint. Does anyone know any more about this? how it is configured, how to check, restart etc? Thanks, Chris Quote Link to comment
madburg Posted March 5, 2013 Share Posted March 5, 2013 I know that the shares were lost between 16:11 and 17:28, as I was running an rsync job on my Mac at this time and it failed while transferring data to UnRAID. I have managed to cause this issue 3 times now by doing an rsync transfer. Is it possible rsync can be using all the memory and causing processes to quite, but not logging anything in syslog? Your using AFP right, and rysnc and/or mover. The X (extended) attribute is the culprit (should not be) and thus drops your 'user0' Read these two posts: http://lime-technology.com/forum/index.php?topic=26085.0 http://lime-technology.com/forum/index.php?topic=25689.0 Quote Link to comment
chrisbirkinshaw Posted March 5, 2013 Share Posted March 5, 2013 I'm just using "rsync -auv" on the Mac, and have AFP disabled. Thanks for the suggestions though. I did however repeat the rsync with Plex disabled and now it works! Still perplexed (LOL) as to why I didn't see any logging about memory? Only thing in the syslog is what looks like Plex restarting. Quote Link to comment
bobbintb Posted March 5, 2013 Share Posted March 5, 2013 another thing i noticed with my box is that i never seem to see this happen during normal operation but only during an initial startup. i have never noticed unraid running fine for any amount of time and then this issue suddenly happens. this issue only seems to occur when i first turn the system on. could be coincidence. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.