Jump to content
RedXon

SHFS Memory Leak

80 posts in this topic Last Reply

Recommended Posts

Hi everyone.

 

Now, first of all I want to say that 6.4 in all the RC Versions I tried so far has fixed my CPU issue I had mit SHFS. Before that SHFS would randomly just hog the CPU 100% and never go down until I reset the server. It obviously was unresponsive when that happened and it didn't fix it by itself.

 

Problem is, I have a new issue with SHFS and so far I found no other post describing the issues I am experiencing, so that's why I'm starting a new post. 

So basically SHFS has some sort of memory leak, because when I restart my server, after about 3 days, my 16GB of memory is just full and never empties again. 

I could see that it is indeed one, or multiple, instance(es) of SHFS which causes the issue. Now everything starts off fine with everything running etc. and my memory usage sitting between 3 and 12 GB or something like that.

But as time progresses, SHFS progressively fills up the memory (or something else for that cause, it's just that in HTOP SHFS is the prime offender). The many tasks sit at each 42.8% MEM at the moment, and it is progressively going up, but never down.

 

After about 3 days, like I said, the 16GB (or 15.7 actual) is just full and nothing else has enough memory to start. Plex won't work any longer, the WebUI is unresponsive, if working at all, and no of my other dockers are accessible. I then have to restart the server from SSH and everything is fine again, until 3 days later.

 

Does anyone know the issue or knows what the issue could be? Because it is very unnerving to be honest.

Share this post


Link to post
56 minutes ago, RedXon said:

Just happened again, and these are the diagnostics after the reboot:

We need the diagnostics captured before the reboot, after it happens.

Share this post


Link to post

I will try to capture it next time it happens, unfortunately the WebUI often becomes unresponsive when this happens so I might not be able to do so. But I'll try.

Share this post


Link to post
5 minutes ago, RedXon said:

I will try to capture it next time it happens, unfortunately the WebUI often becomes unresponsive when this happens so I might not be able to do so. But I'll try.

Type diagnostics in your ssh session and wait for it to complete before issuing the reboot command.

Share this post


Link to post
14 minutes ago, RedXon said:

Oh great, didn't know that thanks. The logs are then stored on the usb drive right?

Hopefully.

Share this post


Link to post

Your getting constant out of memory errors, I would update to v6.4 stable and then then run the server for few days without any dockers/VMs, if stable start enabling them one at a time, this one probably my number one suspect:

 

Quote

Jan 16 17:42:45 Azeroth kernel: sabnzbdplus invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0

 

Edited by johnnie.black

Share this post


Link to post

Okay thanks, just upgraded to the stable, somehow my server never gives me notifications about that so I missed the final release.

 

I deactivated the dockers, however, Plex is still active as I can't just deactivate this without my family revolting (lol).

I hope this is not the culprit and if it is, well, I have to figure something out. As you pointed out, sabnzbd could be the problem, right? 

 

I'll report back in 3-4 days if something happened, because that is when it normally run out of memory..

 

One more thing, I'm guessing adding more memory to the server wont fix the problem, just enable it to run longer before the same happened right?

Share this post


Link to post
Just now, RedXon said:

One more thing, I'm guessing adding more memory to the server wont fix the problem, just enable it to run longer before the same happened right?

Most likely

Share this post


Link to post

I don't

 

I don't use any VWms, I just run in dockers:

Plex

Transmission

Sonarr

Radarr

SabNzbd

CrashplanPro

Headphones

Jackett

Plexpy

Netdata

Share this post


Link to post

I'm having the same issue. shfs keeps using more and more memory until it is force terminated causing shares to disappear causing dockers to stop working until I restart. I've tried turning off every docker and I've uninstalled almost every plugin.

Share this post


Link to post

Okay i ofund the issue for me:

 

It was disk cache plugin.

 

Ive set now specific folders i need, cpu went from 30%+ to 0-10%, lower ram, and no shfs error so far.

 

before i just set one folder to be exluded. -> so just add the directorys you REALLY need to be cached.

Edited by nuhll

Share this post


Link to post
1 minute ago, nuhll said:

Okay i ofund the issue for me:

 

It was disk cache plugin.

 

Ive set now specific folders i need, cpu went from 30%+ to 0-10%, lower ram, and no shfs error so far.

 

Strange, I had that plugin and it was one of the first ones I uninstalled. Maybe the changes it made persist even when uninstalled? 

Share this post


Link to post

I guess you need to restart after changing that plugin.

 

Just add a few directorys you shouldnt have a problem, i set mine to include ALL... that was a mistake like it seems 

Edited by nuhll

Share this post


Link to post

I did restart and still had the issue. I just reinstalled the plugin to confirm the caching is disabled and it is.

Share this post


Link to post

Then, no idea. 


But u have errors in docker log,

 

And in your syslog:

Jan 22 12:19:52 Tower sshd[25199]: Bad protocol version identification 'GET /etc/passwd HTTP/1.1' from 83.35.180.39 port 51582
Jan 22 12:19:52 Tower sshd[25200]: Bad protocol version identification 'GET /.htpasswd HTTP/1.1' from 83.35.180.39 port 51584
Jan 22 12:19:52 Tower sshd[25201]: Bad protocol version identification 'GET /wp-admin/admin-ajax.php?action=revslider_show_image&img=../wp-config.php HTTP/1.1' from 83.35.180.39 port 51568
Jan 22 12:19:52 Tower sshd[25202]: Bad protocol version identification 'GET /config.php HTTP/1.1' from 83.35.180.39 port 51588
Jan 22 12:19:52 Tower sshd[25203]: Bad protocol version identification 'GET /server-status HTTP/1.1' from 83.35.180.39 port 51566
Jan 22 12:19:52 Tower sshd[25204]: Bad protocol version identification 'GET /passwd.bak HTTP/1.1' from 83.35.180.39 port 51578
Jan 22 12:19:52 Tower sshd[25205]: Bad protocol version identification 'GET / HTTP/1.1' from 83.35.180.39 port 51572
Jan 22 12:19:52 Tower sshd[25206]: Bad protocol version identification 'GET /passwd.bak HTTP/1.1' from 83.35.180.39 port 51576
Jan 22 12:19:52 Tower sshd[25207]: Bad protocol version identification 'GET /passwd HTTP/1.1' from 83.35.180.39 port 51580
Jan 22 12:19:52 Tower sshd[25208]: Bad protocol version identification 'GET /database.sql HTTP/1.1' from 83.35.180.39 port 51570
Jan 22 12:19:52 Tower sshd[25209]: Bad protocol version identification 'GET / HTTP/1.1' from 83.35.180.39 port 51574
Jan 22 12:19:52 Tower sshd[25210]: Bad protocol version identification 'GET /.htpasswd.bak HTTP/1.1' from 83.35.180.39 port 51586

 

is your server open to the internet? (dont do that!)

 

Do you mean this errors:
Jan 21 17:59:27 Tower shfs: error: shfs_rmdir, 1517: Directory not empty (39): rmdir: /mnt/cache/temp/plextranscode/Transcode/Sessions/plex-transcode-suyey4mp4gta9k9t06g3p44v-aa1dd9ae-7fca-4532-98c3-5b8dc4d261d6
Jan 21 17:59:28 Tower shfs: error: shfs_rmdir, 1517: Directory not empty (39): rmdir: /mnt/cache/temp/plextranscode/Transcode/Sessions/plex-transcode-suyey4mp4gta9k9t06g3p44v-aa1dd9ae-7fca-4532-98c3-5b8dc4d261d6

 

Nothing to worry, its just tmp files which gets deleted. Normal.

Edited by nuhll

Share this post


Link to post

The issue I am referring to is the high RAM usage of shfs that leads to an eventual force kill and requires a restart to bring my shares back.


The errors you posted are not related as you say.

 

Share this post


Link to post
1 hour ago, nuhll said:

is your server open to the internet? (dont do that!)

+1

1 hour ago, ffiarpg said:

The errors you posted are not related as you say.

No, but if you have ports open to unraid native services (not dockers, unraid itself) then the probability is high that the repeated hammering of hackers could be using up RAM, whether or not they succeed in getting in. I've seen similar situations on this forum before.

Share this post


Link to post
5 minutes ago, jonathanm said:

+1

No, but if you have ports open to unraid native services (not dockers, unraid itself) then the probability is high that the repeated hammering of hackers could be using up RAM, whether or not they succeed in getting in. I've seen similar situations on this forum before.



I've had months of uptime prior to 6.4 with restarts only for my own purposes and now have to restart every 3 or so days. 

Why would attempts to access a system increase ram usage in shfs? Why would it increase boundlessly? My external use of sshd is a red herring.

Share this post


Link to post

Because the system needs to reserve ram and cpu for each request, since unraid doesnt has any protection, as far as i know, they could try to get into unraid like 1000 times a second if they want. (brute force)


It may not related, but you risk ur network of a crazy security hole. Unraid is designed atm for safe networks. (LAN)

 

As far as i know shsf could be anything, its creating the user shares you use for ur dockers, plugins, system and so on... high utilizeration of this, means hes doin something heavy..

Edited by nuhll

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.