[6.4.0, 6.4.1] shfs taking a lot of memory (~8GB now)


48 posts in this topic Last Reply

Recommended Posts

Memory usage of my system is high.  There is not enough memory to start my VM:

2018-02-03T16:03:09.010195Z qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory

So after looking at the system to see what was taking the memory, I see that shfs is taking a lot of memory:

 3454 root      20   0 9973492 8.620g    768 S   0.0 55.8 202:50.03 shfs

Also, its memory consumption seems to increase over time.  Two days ago it was:

 3454 root      20   0 8531520 7.349g    768 S   5.9 47.6 167:15.73 shfs

Never had a such issue with 6.3.5 (running the same things).

 

Tell me is there is any additional information I can provide.

homeserver-diagnostics-20180204-1232.zip

Link to post

Rolled back to 6.3.5

Already 34 hours up with small SHFS memory consumption.

It's definitely BUG in 6.4 update, because I just downgraded and didn't do anything more (changing settings, dockers, etc.)

3395 root      20   0  1.376g 0.008g 0.001g S   2.3  0.1  34:35.92 shfs

 

P.S.: 6.3.5 diagnostics attached.

home-diagnostics-20180209-1042.zip

UPD: 8 days uptime on 6.3.5:

3395 root      20   0  1.891g 0.011g 0.001g S   3.6  0.1 266:24.14 shfs

 

Edited by Jeronyson
update new stats
Link to post
  • 2 weeks later...

I have been able to reproduce this with just transmission container, vm engine disabled and no plugins, docker image is loaded from a ssd outside of the array manually mounted before array start.

 

The test was:

All plugins removed, vm engine disabled, server rebooted to clean shfs from other test, ssd manually mounted, array started, transmission started, a few torrents seeding and/or downloading, in just a few minutes its clear that shfs memory is growing fast, but to be sure, waited 2+ hours  and check again to see the ram over 200+ Mb and not getting lower even after stopping transmission.

 

The exact same test with deluge instead transmission, it never went over 30Mb after 15h and loads of torrents.

 

So, recap, the ¿leak? looks triggered by transmision container (linuxserver.io version) and needs a reboot to clear it (if the container is stopped and deluge started the ram usage continues to grow).

 

What is the exact problem, no idea at the moment, as Jeronyson noted, it's an issue not present on unraid 6.3.5

 

Now that I have isolated the issue I will validate it on the main server, changing transmission to deluge while I think what more test to do.

Link to post

I don't think so, the issue persist after transmission is killed, also happens only seeding files, and I'm think it was causing the bizarre behavior on my openhab VM, now looks like its working fine.

 

the settings are:

{
    "alt-speed-down": 50,
    "alt-speed-enabled": false,
    "alt-speed-time-begin": 540,
    "alt-speed-time-day": 127,
    "alt-speed-time-enabled": false,
    "alt-speed-time-end": 1020,
    "alt-speed-up": 50,
    "bind-address-ipv4": "0.0.0.0",
    "bind-address-ipv6": "::",
    "blocklist-enabled": false,
    "blocklist-url": "http://www.example.com/blocklist",
    "cache-size-mb": 4,
    "dht-enabled": true,
    "download-dir": "/downloads/complete",
    "download-queue-enabled": true,
    "download-queue-size": 5,
    "encryption": 1,
    "idle-seeding-limit": 30,
    "idle-seeding-limit-enabled": false,
    "incomplete-dir": "/downloads/incomplete",
    "incomplete-dir-enabled": true,
    "lpd-enabled": false,
    "message-level": 2,
    "peer-congestion-algorithm": "",
    "peer-id-ttl-hours": 6,
    "peer-limit-global": 200,
    "peer-limit-per-torrent": 50,
    "peer-port": 51413,
    "peer-port-random-high": 65535,
    "peer-port-random-low": 49152,
    "peer-port-random-on-start": false,
    "peer-socket-tos": "default",
    "pex-enabled": true,
    "port-forwarding-enabled": true,
    "preallocation": 1,
    "prefetch-enabled": true,
    "queue-stalled-enabled": true,
    "queue-stalled-minutes": 30,
    "ratio-limit": 3,
    "ratio-limit-enabled": true,
    "rename-partial-files": true,
    "rpc-authentication-required": false,
    "rpc-bind-address": "0.0.0.0",
    "rpc-enabled": true,
    "rpc-host-whitelist": "",
    "rpc-host-whitelist-enabled": true,
    "rpc-password": "{1ddd3f1f6a71d655cde7767242a23a575b44c909n5YuRT.f",
    "rpc-port": 9091,
    "rpc-url": "/transmission/",
    "rpc-username": "",
    "rpc-whitelist": "127.0.0.1",
    "rpc-whitelist-enabled": false,
    "scrape-paused-torrents-enabled": true,
    "script-torrent-done-enabled": false,
    "script-torrent-done-filename": "",
    "seed-queue-enabled": false,
    "seed-queue-size": 10,
    "speed-limit-down": 100,
    "speed-limit-down-enabled": false,
    "speed-limit-up": 100,
    "speed-limit-up-enabled": false,
    "start-added-torrents": true,
    "trash-original-torrent-files": false,
    "umask": 2,
    "upload-slots-per-torrent": 14,
    "utp-enabled": true,
    "watch-dir": "/watch",
    "watch-dir-enabled": true
}

 

Link to post
  • Djoss changed the title to [6.4.0, 6.4.1] shfs taking a lot of memory (~8GB now)

@cferrero make sure to edit your post in this thread to either change or remove out the rpc-password field, just in case someone can get to your server address. I was unaware it would have included that field.

 

Now as to what preallocation is configured as ...

0 - None - No preallocation, just let the file grow whenever a new packet comes in

1 - Sparse - Preallocate by writing just the final block in the file

2 - Full - Preallocate by writing zeroes to the entire file

 

A method of Sparse should be fine, however I have mine set to "2". I would try setting it to "2" and do a restart to set everything to a clean slate and see where it goes from there.

 

For reference from a 6.3.5 system uptime of 78 days:


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     10342  0.0  0.0 153296   596 ?        Ssl   2017   0:00 /usr/local/sbin/shfs /mnt/user0 -disks 14 -o noatime,big_writes,allow_other

root     10352  0.1  0.0 1514560 19240 ?       Ssl   2017 157:46 /usr/local/sbin/shfs /mnt/user -disks 15 2048000000 -o noatime,big_writes,allow_other -o remember=0

Link to post

For testing purposes I did this:

  1. Upgraded back to 6.4.1
  2. Installed Deluge (linuxserver) and moved all seeding torrent to Deluge
  3. Stopped Transmission docker and removed it with image.
  4. Rebooted the server

After the start:

3536 root      20   0  1.116g 0.025g 0.001g S   1.3  0.2   0:14.46 shfs

 

Will report soon with the results. 

Link to post
15 hours ago, BRiT said:

@cferrero make sure to edit your post in this thread to either change or remove out the rpc-password field, just in case someone can get to your server address. I was unaware it would have included that field.

 

I didn't check etheir, i was a clean test install with auth disabled, there is no outside access, but edited and removed just in case.

 

15 hours ago, BRiT said:

 

Now as to what preallocation is configured as ...

0 - None - No preallocation, just let the file grow whenever a new packet comes in

1 - Sparse - Preallocate by writing just the final block in the file

2 - Full - Preallocate by writing zeroes to the entire file

 

A method of Sparse should be fine, however I have mine set to "2". I would try setting it to "2" and do a restart to set everything to a clean slate and see where it goes from there.

 

I will test it after a reboot just to be sure, but it happened too just seeding files

 

15 hours ago, BRiT said:

For reference from a 6.3.5 system uptime of 78 days:

 


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     10342  0.0  0.0 153296   596 ?        Ssl   2017   0:00 /usr/local/sbin/shfs /mnt/user0 -disks 14 -o noatime,big_writes,allow_other

root     10352  0.1  0.0 1514560 19240 ?       Ssl   2017 157:46 /usr/local/sbin/shfs /mnt/user -disks 15 2048000000 -o noatime,big_writes,allow_other -o remember=0

 

I think 6.3.5 is free of this, I didn't notice any problems. But for reference, in my test, an example would be:

around 500 bytes in standby, before starting transmission

around 300Mb after 4 hours with 5-8 torrents. (test)

around 5Gb after 4 days (observed)

 

the main server was using 10Gb in 9 days uptime ...

 

 

 

Link to post

Testing results:

 

After reboot: 25 Mb

After 5 hours: 900 Mb

After 7 hours: 1.22 Gb

After 9 hours: 1.67 Gb

After 24 hours: 4.56 Gb

 

During testing I had only seeding torrents. Total 89 torrents, avg speed was 10 MB/s (disk activity was 15-25 MB/s)

So the issue is not related to the transmission docker container. I think issue is related to active I/O usage.

 

After restart: 3536 root      20   0  1.116g 0.025g 0.001g S   1.3  0.2   0:14.46 shfs
After 5 hours: 3536 root      20   0  1.436g 0.885g 0.001g S   1.7  5.7   6:47.03 shfs
After 7 hours: 3536 root      20   0  2.126g 1.220g 0.001g S   2.8  7.9   9:17.33 shfs
After 9 hours: 3536 root      20   0  2.564g 1.678g 0.001g S   3.0 10.9  16:53.78 shfs
After 24 hours: 3536 root      20   0  5.765g 4.562g 0.001g S  18.3 29.5  36:34.77 shfs

 

UPDATE: tried to make heavy I/O load using SMB share. SHFS memory consumption stays normal. Also tried making heavy I/O load using Plex docker container (transcoding, syncing to devices) and SHFS is ok again. So the issue maybe related, how torrent clients (transmission, deluge) manage to read data, so they somehow cause SHFS leak memory.

Edited by Jeronyson
additional testing
Link to post

unfortunately, I confirm SHFS memory leaks when using torrents (transmission, in my case)

I usually use the torrent to the ratio of 2, and they stop seeding

but yesterday i removed restrictions and began to seed avg speed was 1-2 MB/s and after 5-6hours i got 85% of memory usage (at the beginnign of the test it was 75% usage)

i have all symptoms as described above

@limetech is worth paying attention to this

Edited by vanes
Link to post

Also experiencing this issue. About once a week now shfs will have consumed everything. I'm running two instances of transmission, perma-seeding, ~2400 torrents with ~30 active at any given time. Out of 16GB of ram, shfs consumes around 80MB an hour, currently at 20.9% with 40 hours of uptime. Next time I have to reboot, I'll hold off on starting the transmission dockers to see if shfs still leaks.

Link to post

Same issue here. Memory increasing quickly with transmission open and barely increasing at all with it disabled. It does still seem to increase and will eventually restart but the tranmission container should be a great control for developers to troubleshoot the bug in shfs.

Link to post
17 hours ago, ffiarpg said:

Same issue here. Memory increasing quickly with transmission open and barely increasing at all with it disabled. It does still seem to increase and will eventually restart but the tranmission container should be a great control for developers to troubleshoot the bug in shfs.

 

How are you monitoring shfs memory usage?

Link to post
41 minutes ago, limetech said:

 

How are you monitoring shfs memory usage?

 

With top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                         
 4188 root      20   0 7861252 6.791g    800 S   1.0 44.0 339:39.00 shfs

With proc status:

# cat /proc/4188/status 
Name:    shfs
Umask:    0000
State:    S (sleeping)
Tgid:    4188
Ngid:    0
Pid:    4188
PPid:    1
TracerPid:    0
Uid:    0    0    0    0
Gid:    0    0    0    0
FDSize:    512
Groups:     
NStgid:    4188
NSpid:    4188
NSpgid:    4188
NSsid:    4188
VmPeak:     7927068 kB
VmSize:     7861252 kB
VmLck:           0 kB
VmPin:           0 kB
VmHWM:     7120836 kB
VmRSS:     7120836 kB
RssAnon:     7120036 kB
RssFile:           4 kB
RssShmem:         796 kB
VmData:     7165892 kB
VmStk:         132 kB
VmExe:          60 kB
VmLib:        4568 kB
VmPTE:       14104 kB
VmPMD:          48 kB
VmSwap:           0 kB
Threads:    11
SigQ:    0/62689
SigPnd:    0000000000000000
ShdPnd:    0000000000000000
SigBlk:    0000000000000000
SigIgn:    0000000000001006
SigCgt:    0000000180004001
CapInh:    0000000000000000
CapPrm:    0000003fffffffff
CapEff:    0000003fffffffff
CapBnd:    0000003fffffffff
CapAmb:    0000000000000000
NoNewPrivs:    0
Seccomp:    0
Cpus_allowed:    ff
Cpus_allowed_list:    0-7
Mems_allowed:    00000000,00000001
Mems_allowed_list:    0
voluntary_ctxt_switches:    1
nonvoluntary_ctxt_switches:    0

 

 

Link to post
1 hour ago, Djoss said:

With top:

 

Ok, please try this.  Create a file in 'config' directory on usb flash called 'extra.cfg', with this single line in there:

 

shfsExtra=-logging 2

 

Next, unfortunately you have to Stop array and then Start again.  This will cause 'shfs' to start dumping debug info to syslog.  Start up whatever is the app you think is triggering this.  Depending on how much I/O app is generating, syslog will grow very rapidly.  Let it run a while, hopefully long enough to observe memory leakage.  But syslog might grow and consume RAM before you get to that point 9_9  Anyway, before all memory gets exhausted, please capture diags, which will include the syslog.  I want to see a trace of what kinds of operations are being done.

 

To stop the logging, Stop array and delete the extra.cfg file.

 

Link to post
22 hours ago, limetech said:

 

Ok, please try this.  Create a file in 'config' directory on usb flash called 'extra.cfg', with this single line in there:

 


shfsExtra=-logging 2

 

Next, unfortunately you have to Stop array and then Start again.  This will cause 'shfs' to start dumping debug info to syslog.  Start up whatever is the app you think is triggering this.  Depending on how much I/O app is generating, syslog will grow very rapidly.  Let it run a while, hopefully long enough to observe memory leakage.  But syslog might grow and consume RAM before you get to that point 9_9  Anyway, before all memory gets exhausted, please capture diags, which will include the syslog.  I want to see a trace of what kinds of operations are being done.

 

To stop the logging, Stop array and delete the extra.cfg file.

 

 

Here you go, I sent you via private message my unaltered diagnostics.  I was able to run the shfs debugging only for a couple of minutes before memory gets exhausted...

Hope it will help!

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.