Jump to content
Djoss

[6.4.0, 6.4.1] shfs taking a lot of memory (~8GB now)

45 posts in this topic Last Reply

Recommended Posts

Memory usage of my system is high.  There is not enough memory to start my VM:

2018-02-03T16:03:09.010195Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

So after looking at the system to see what was taking the memory, I see that shfs is taking a lot of memory:

 3454 root      20   0 9973492 8.620g    768 S   0.0 55.8 202:50.03 shfs

Also, its memory consumption seems to increase over time.  Two days ago it was:

 3454 root      20   0 8531520 7.349g    768 S   5.9 47.6 167:15.73 shfs

Never had a such issue with 6.3.5 (running the same things).

 

Tell me is there is any additional information I can provide.

homeserver-diagnostics-20180204-1232.zip

Share this post


Link to post

Same here. SHFS memory leaks... Began right after upgrade from 6.3.5 to 6.4.0. Installing 6.4.1 didn't fix the problem.

Mine 16 GB of RAM is enough for 24 hours. Then "out of memory" errors and all user shares disappear. After reboot everything is OK for another 24 hours.

home-diagnostics-20180205-2251.zip

Edited by Jeronyson
added problem desc

Share this post


Link to post

Still increasing...

 3454 root      20   0 10.136g 9.223g    768 S   1.3 59.7 229:46.60 shfs

 

Share this post


Link to post

The same.

 

In the report I have:

3557 root      20   0 9796.7m 8.680g    792 S   0.0 56.2  34:31.67 shfs

 

And 3 hours later:

3557 root      20   0 10.755g 9.992g    792 S   2.0 64.7  40:00.93 shfs

 

So, +1G of RAM...

Share this post


Link to post

Rolled back to 6.3.5

Already 34 hours up with small SHFS memory consumption.

It's definitely BUG in 6.4 update, because I just downgraded and didn't do anything more (changing settings, dockers, etc.)

3395 root      20   0  1.376g 0.008g 0.001g S   2.3  0.1  34:35.92 shfs

 

P.S.: 6.3.5 diagnostics attached.

home-diagnostics-20180209-1042.zip

UPD: 8 days uptime on 6.3.5:

3395 root      20   0  1.891g 0.011g 0.001g S   3.6  0.1 266:24.14 shfs

 

Edited by Jeronyson
update new stats

Share this post


Link to post

I have been able to reproduce this with just transmission container, vm engine disabled and no plugins, docker image is loaded from a ssd outside of the array manually mounted before array start.

 

The test was:

All plugins removed, vm engine disabled, server rebooted to clean shfs from other test, ssd manually mounted, array started, transmission started, a few torrents seeding and/or downloading, in just a few minutes its clear that shfs memory is growing fast, but to be sure, waited 2+ hours  and check again to see the ram over 200+ Mb and not getting lower even after stopping transmission.

 

The exact same test with deluge instead transmission, it never went over 30Mb after 15h and loads of torrents.

 

So, recap, the ¿leak? looks triggered by transmision container (linuxserver.io version) and needs a reboot to clear it (if the container is stopped and deluge started the ram usage continues to grow).

 

What is the exact problem, no idea at the moment, as Jeronyson noted, it's an issue not present on unraid 6.3.5

 

Now that I have isolated the issue I will validate it on the main server, changing transmission to deluge while I think what more test to do.

Share this post


Link to post

Could be related to sparse files and how transmission is configured, possibly prepopulating/preallocating the torrents.  Post up your transmission configuration settings.

Share this post


Link to post

I don't think so, the issue persist after transmission is killed, also happens only seeding files, and I'm think it was causing the bizarre behavior on my openhab VM, now looks like its working fine.

 

the settings are:

{
    "alt-speed-down": 50,
    "alt-speed-enabled": false,
    "alt-speed-time-begin": 540,
    "alt-speed-time-day": 127,
    "alt-speed-time-enabled": false,
    "alt-speed-time-end": 1020,
    "alt-speed-up": 50,
    "bind-address-ipv4": "0.0.0.0",
    "bind-address-ipv6": "::",
    "blocklist-enabled": false,
    "blocklist-url": "http://www.example.com/blocklist",
    "cache-size-mb": 4,
    "dht-enabled": true,
    "download-dir": "/downloads/complete",
    "download-queue-enabled": true,
    "download-queue-size": 5,
    "encryption": 1,
    "idle-seeding-limit": 30,
    "idle-seeding-limit-enabled": false,
    "incomplete-dir": "/downloads/incomplete",
    "incomplete-dir-enabled": true,
    "lpd-enabled": false,
    "message-level": 2,
    "peer-congestion-algorithm": "",
    "peer-id-ttl-hours": 6,
    "peer-limit-global": 200,
    "peer-limit-per-torrent": 50,
    "peer-port": 51413,
    "peer-port-random-high": 65535,
    "peer-port-random-low": 49152,
    "peer-port-random-on-start": false,
    "peer-socket-tos": "default",
    "pex-enabled": true,
    "port-forwarding-enabled": true,
    "preallocation": 1,
    "prefetch-enabled": true,
    "queue-stalled-enabled": true,
    "queue-stalled-minutes": 30,
    "ratio-limit": 3,
    "ratio-limit-enabled": true,
    "rename-partial-files": true,
    "rpc-authentication-required": false,
    "rpc-bind-address": "0.0.0.0",
    "rpc-enabled": true,
    "rpc-host-whitelist": "",
    "rpc-host-whitelist-enabled": true,
    "rpc-password": "{1ddd3f1f6a71d655cde7767242a23a575b44c909n5YuRT.f",
    "rpc-port": 9091,
    "rpc-url": "/transmission/",
    "rpc-username": "",
    "rpc-whitelist": "127.0.0.1",
    "rpc-whitelist-enabled": false,
    "scrape-paused-torrents-enabled": true,
    "script-torrent-done-enabled": false,
    "script-torrent-done-filename": "",
    "seed-queue-enabled": false,
    "seed-queue-size": 10,
    "speed-limit-down": 100,
    "speed-limit-down-enabled": false,
    "speed-limit-up": 100,
    "speed-limit-up-enabled": false,
    "start-added-torrents": true,
    "trash-original-torrent-files": false,
    "umask": 2,
    "upload-slots-per-torrent": 14,
    "utp-enabled": true,
    "watch-dir": "/watch",
    "watch-dir-enabled": true
}

 

Share this post


Link to post

12 days after having upgraded to 6.4.1:

 4188 root      20   0 7334352 6.381g    800 S   1.0 41.3 238:10.15 shfs

 

Share this post


Link to post

@cferrero make sure to edit your post in this thread to either change or remove out the rpc-password field, just in case someone can get to your server address. I was unaware it would have included that field.

 

Now as to what preallocation is configured as ...

0 - None - No preallocation, just let the file grow whenever a new packet comes in

1 - Sparse - Preallocate by writing just the final block in the file

2 - Full - Preallocate by writing zeroes to the entire file

 

A method of Sparse should be fine, however I have mine set to "2". I would try setting it to "2" and do a restart to set everything to a clean slate and see where it goes from there.

 

For reference from a 6.3.5 system uptime of 78 days:


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     10342  0.0  0.0 153296   596 ?        Ssl   2017   0:00 /usr/local/sbin/shfs /mnt/user0 -disks 14 -o noatime,big_writes,allow_other

root     10352  0.1  0.0 1514560 19240 ?       Ssl   2017 157:46 /usr/local/sbin/shfs /mnt/user -disks 15 2048000000 -o noatime,big_writes,allow_other -o remember=0

Share this post


Link to post

For testing purposes I did this:

  1. Upgraded back to 6.4.1
  2. Installed Deluge (linuxserver) and moved all seeding torrent to Deluge
  3. Stopped Transmission docker and removed it with image.
  4. Rebooted the server

After the start:

3536 root      20   0  1.116g 0.025g 0.001g S   1.3  0.2   0:14.46 shfs

 

Will report soon with the results. 

Share this post


Link to post
15 hours ago, BRiT said:

@cferrero make sure to edit your post in this thread to either change or remove out the rpc-password field, just in case someone can get to your server address. I was unaware it would have included that field.

 

I didn't check etheir, i was a clean test install with auth disabled, there is no outside access, but edited and removed just in case.

 

15 hours ago, BRiT said:

 

Now as to what preallocation is configured as ...

0 - None - No preallocation, just let the file grow whenever a new packet comes in

1 - Sparse - Preallocate by writing just the final block in the file

2 - Full - Preallocate by writing zeroes to the entire file

 

A method of Sparse should be fine, however I have mine set to "2". I would try setting it to "2" and do a restart to set everything to a clean slate and see where it goes from there.

 

I will test it after a reboot just to be sure, but it happened too just seeding files

 

15 hours ago, BRiT said:

For reference from a 6.3.5 system uptime of 78 days:

 


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     10342  0.0  0.0 153296   596 ?        Ssl   2017   0:00 /usr/local/sbin/shfs /mnt/user0 -disks 14 -o noatime,big_writes,allow_other

root     10352  0.1  0.0 1514560 19240 ?       Ssl   2017 157:46 /usr/local/sbin/shfs /mnt/user -disks 15 2048000000 -o noatime,big_writes,allow_other -o remember=0

 

I think 6.3.5 is free of this, I didn't notice any problems. But for reference, in my test, an example would be:

around 500 bytes in standby, before starting transmission

around 300Mb after 4 hours with 5-8 torrents. (test)

around 5Gb after 4 days (observed)

 

the main server was using 10Gb in 9 days uptime ...

 

 

 

Share this post


Link to post

Testing results:

 

After reboot: 25 Mb

After 5 hours: 900 Mb

After 7 hours: 1.22 Gb

After 9 hours: 1.67 Gb

After 24 hours: 4.56 Gb

 

During testing I had only seeding torrents. Total 89 torrents, avg speed was 10 MB/s (disk activity was 15-25 MB/s)

So the issue is not related to the transmission docker container. I think issue is related to active I/O usage.

 

After restart: 3536 root      20   0  1.116g 0.025g 0.001g S   1.3  0.2   0:14.46 shfs
After 5 hours: 3536 root      20   0  1.436g 0.885g 0.001g S   1.7  5.7   6:47.03 shfs
After 7 hours: 3536 root      20   0  2.126g 1.220g 0.001g S   2.8  7.9   9:17.33 shfs
After 9 hours: 3536 root      20   0  2.564g 1.678g 0.001g S   3.0 10.9  16:53.78 shfs
After 24 hours: 3536 root      20   0  5.765g 4.562g 0.001g S  18.3 29.5  36:34.77 shfs

 

UPDATE: tried to make heavy I/O load using SMB share. SHFS memory consumption stays normal. Also tried making heavy I/O load using Plex docker container (transcoding, syncing to devices) and SHFS is ok again. So the issue maybe related, how torrent clients (transmission, deluge) manage to read data, so they somehow cause SHFS leak memory.

Edited by Jeronyson
additional testing

Share this post


Link to post

unfortunately, I confirm SHFS memory leaks when using torrents (transmission, in my case)

I usually use the torrent to the ratio of 2, and they stop seeding

but yesterday i removed restrictions and began to seed avg speed was 1-2 MB/s and after 5-6hours i got 85% of memory usage (at the beginnign of the test it was 75% usage)

i have all symptoms as described above

@limetech is worth paying attention to this

Edited by vanes

Share this post


Link to post

Also experiencing this issue. About once a week now shfs will have consumed everything. I'm running two instances of transmission, perma-seeding, ~2400 torrents with ~30 active at any given time. Out of 16GB of ram, shfs consumes around 80MB an hour, currently at 20.9% with 40 hours of uptime. Next time I have to reboot, I'll hold off on starting the transmission dockers to see if shfs still leaks.

Share this post


Link to post

Same issue here. Memory increasing quickly with transmission open and barely increasing at all with it disabled. It does still seem to increase and will eventually restart but the tranmission container should be a great control for developers to troubleshoot the bug in shfs.

Share this post


Link to post
17 hours ago, ffiarpg said:

Same issue here. Memory increasing quickly with transmission open and barely increasing at all with it disabled. It does still seem to increase and will eventually restart but the tranmission container should be a great control for developers to troubleshoot the bug in shfs.

 

How are you monitoring shfs memory usage?

Share this post


Link to post
41 minutes ago, limetech said:

 

How are you monitoring shfs memory usage?

 

With top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                         
 4188 root      20   0 7861252 6.791g    800 S   1.0 44.0 339:39.00 shfs

With proc status:

# cat /proc/4188/status 
Name:    shfs
Umask:    0000
State:    S (sleeping)
Tgid:    4188
Ngid:    0
Pid:    4188
PPid:    1
TracerPid:    0
Uid:    0    0    0    0
Gid:    0    0    0    0
FDSize:    512
Groups:     
NStgid:    4188
NSpid:    4188
NSpgid:    4188
NSsid:    4188
VmPeak:     7927068 kB
VmSize:     7861252 kB
VmLck:           0 kB
VmPin:           0 kB
VmHWM:     7120836 kB
VmRSS:     7120836 kB
RssAnon:     7120036 kB
RssFile:           4 kB
RssShmem:         796 kB
VmData:     7165892 kB
VmStk:         132 kB
VmExe:          60 kB
VmLib:        4568 kB
VmPTE:       14104 kB
VmPMD:          48 kB
VmSwap:           0 kB
Threads:    11
SigQ:    0/62689
SigPnd:    0000000000000000
ShdPnd:    0000000000000000
SigBlk:    0000000000000000
SigIgn:    0000000000001006
SigCgt:    0000000180004001
CapInh:    0000000000000000
CapPrm:    0000003fffffffff
CapEff:    0000003fffffffff
CapBnd:    0000003fffffffff
CapAmb:    0000000000000000
NoNewPrivs:    0
Seccomp:    0
Cpus_allowed:    ff
Cpus_allowed_list:    0-7
Mems_allowed:    00000000,00000001
Mems_allowed_list:    0
voluntary_ctxt_switches:    1
nonvoluntary_ctxt_switches:    0

 

 

Share this post


Link to post
1 hour ago, Djoss said:

With top:

 

Ok, please try this.  Create a file in 'config' directory on usb flash called 'extra.cfg', with this single line in there:

 

shfsExtra=-logging 2

 

Next, unfortunately you have to Stop array and then Start again.  This will cause 'shfs' to start dumping debug info to syslog.  Start up whatever is the app you think is triggering this.  Depending on how much I/O app is generating, syslog will grow very rapidly.  Let it run a while, hopefully long enough to observe memory leakage.  But syslog might grow and consume RAM before you get to that point 9_9  Anyway, before all memory gets exhausted, please capture diags, which will include the syslog.  I want to see a trace of what kinds of operations are being done.

 

To stop the logging, Stop array and delete the extra.cfg file.

 

Share this post


Link to post
22 hours ago, limetech said:

 

Ok, please try this.  Create a file in 'config' directory on usb flash called 'extra.cfg', with this single line in there:

 


shfsExtra=-logging 2

 

Next, unfortunately you have to Stop array and then Start again.  This will cause 'shfs' to start dumping debug info to syslog.  Start up whatever is the app you think is triggering this.  Depending on how much I/O app is generating, syslog will grow very rapidly.  Let it run a while, hopefully long enough to observe memory leakage.  But syslog might grow and consume RAM before you get to that point 9_9  Anyway, before all memory gets exhausted, please capture diags, which will include the syslog.  I want to see a trace of what kinds of operations are being done.

 

To stop the logging, Stop array and delete the extra.cfg file.

 

 

Here you go, I sent you via private message my unaltered diagnostics.  I was able to run the shfs debugging only for a couple of minutes before memory gets exhausted...

Hope it will help!

Share this post


Link to post

 

On 26/02/2018 at 2:37 PM, limetech said:

Yes, looking into this.

 

+1

 

I have the same issue, after few day SHFS take up to 4-5Gig. 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.