Jump to content
  • oom out of memory may docker related ?


    alturismo
    • Solved Minor

    as this is now the 2nd oom crash in the past weeks

     

    recognized this morning that several dockers where closed, i recognized last nite while emby playback that emby shutted down but wasnt motivated to look for it righ away ;)

     

    personal assumption, happens as longer the Server runs and something is "hanging" to free up mem (shared mem)

     

    checked today morning and saw several dockers killed

     

    Suche "oom" (47 Treffer in 1 Dateien von 1 gesucht) [Normal]
      C:\Users\alturismo\AppData\Local\Temp\632114af-2f42-41fc-94d5-446b0c47dbcf_alsserverii-diagnostics-20240814-0434.zip.bcf\alsserverii-diagnostics-20240814-0434\logs\syslog.1.txt (47 Treffer)
    	Zeile 4024: Aug 13 00:24:45 AlsServerII kernel: rcloneorig invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
    	Zeile 4031: Aug 13 00:24:45 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 4081: Aug 13 00:24:45 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 4527: Aug 13 00:24:45 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/8688c7367a96c96da6996ec4e082782508d92c47b43ab9cd38619f59d9bf30c1,task=EmbyServer,pid=1500523,uid=2
    	Zeile 4528: Aug 13 00:24:45 AlsServerII kernel: Out of memory: Killed process 1500523 (EmbyServer) total-vm:275401408kB, anon-rss:515100kB, file-rss:52kB, shmem-rss:97920kB, UID:2 pgtables:2424kB oom_score_adj:0
    	Zeile 4543: Aug 13 01:05:03 AlsServerII kernel: smbd-notifyd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
    	Zeile 4550: Aug 13 01:05:03 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 4604: Aug 13 01:05:03 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 4990: Aug 13 01:05:03 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/54852d7f57308b4b8c4335610f98215fd422eeab221b9294b09ea4840e02ab8c,task=python3,pid=3596017,uid=99
    	Zeile 4991: Aug 13 01:05:03 AlsServerII kernel: Out of memory: Killed process 3596017 (python3) total-vm:1331040kB, anon-rss:583548kB, file-rss:180kB, shmem-rss:0kB, UID:99 pgtables:1784kB oom_score_adj:0
    	Zeile 4992: Aug 13 01:05:03 AlsServerII kernel: oom_reaper: reaped process 3596017 (python3), now anon-rss:240kB, file-rss:180kB, shmem-rss:0kB
    	Zeile 4993: Aug 13 01:09:04 AlsServerII kernel: containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
    	Zeile 5000: Aug 13 01:09:04 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 5055: Aug 13 01:09:04 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 5450: Aug 13 01:09:04 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/b01c49c8be7955574befc3997a4c7fb7b83e7218d0e2a21d2b7a4d96437e4800,task=java,pid=20153,uid=99
    	Zeile 5451: Aug 13 01:09:04 AlsServerII kernel: Out of memory: Killed process 20153 (java) total-vm:10040192kB, anon-rss:470360kB, file-rss:80kB, shmem-rss:0kB, UID:99 pgtables:1244kB oom_score_adj:0
    	Zeile 5452: Aug 13 01:09:04 AlsServerII kernel: oom_reaper: reaped process 20153 (java), now anon-rss:44kB, file-rss:80kB, shmem-rss:0kB
    	Zeile 5453: Aug 13 01:10:05 AlsServerII kernel: containerd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
    	Zeile 5460: Aug 13 01:10:05 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 5514: Aug 13 01:10:05 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 5910: Aug 13 01:10:05 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/fb88ff9a1ebef5b72af29302b403b3071e02c6237a43ce2cb37f2b5ad58a792f,task=xteve,pid=24857,uid=0
    	Zeile 5911: Aug 13 01:10:05 AlsServerII kernel: Out of memory: Killed process 24857 (xteve) total-vm:1743692kB, anon-rss:426236kB, file-rss:140kB, shmem-rss:0kB, UID:0 pgtables:2140kB oom_score_adj:0
    	Zeile 5912: Aug 13 01:10:05 AlsServerII kernel: oom_reaper: reaped process 24857 (xteve), now anon-rss:88kB, file-rss:140kB, shmem-rss:0kB
    	Zeile 5914: Aug 13 01:12:41 AlsServerII kernel: containerd-shim invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=1
    	Zeile 5921: Aug 13 01:12:41 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 5976: Aug 13 01:12:41 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 6370: Aug 13 01:12:41 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/74cfda3ab6f968004873a2918cdeabe9704ba259cbb43bfed2dc7e7ca3eafb61,task=java,pid=2078583,uid=99
    	Zeile 6371: Aug 13 01:12:41 AlsServerII kernel: Out of memory: Killed process 2078583 (java) total-vm:9939208kB, anon-rss:386024kB, file-rss:192kB, shmem-rss:228kB, UID:99 pgtables:1240kB oom_score_adj:0
    	Zeile 6372: Aug 13 01:12:41 AlsServerII kernel: oom_reaper: reaped process 2078583 (java), now anon-rss:160kB, file-rss:192kB, shmem-rss:228kB
    	Zeile 6378: Aug 13 01:15:08 AlsServerII kernel: rcloneorig invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
    	Zeile 6385: Aug 13 01:15:09 AlsServerII kernel: oom_kill_process+0x7d/0x184
    	Zeile 6435: Aug 13 01:15:09 AlsServerII kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    	Zeile 6805: Aug 13 01:15:09 AlsServerII kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/87f3d71cc2fa14f91b0c68492bf0a5967451d7e689b47be564f44883d2c7ff21,task=Radarr,pid=25456,uid=99
    	Zeile 6806: Aug 13 01:15:09 AlsServerII kernel: Out of memory: Killed process 25456 (Radarr) total-vm:4578508kB, anon-rss:358644kB, file-rss:160kB, shmem-rss:0kB, UID:99 pgtables:2976kB oom_score_adj:0
    	Zeile 6807: Aug 13 01:15:09 AlsServerII kernel: oom_reaper: reaped process 25456 (Radarr), now anon-rss:0kB, file-rss:160kB, shmem-rss:0kB

     

    which where not actually really doing something, python was homeassistant (active as always), jave (jdownloader) as sample was idle only, xteve, .. also idle only ... so i would assume its not directly attached to these running dockers overall.

     

    what really wonders me then, i turned off all services

     

    VM > no change as there is no active VM running > expected

    Docker > ~ 20 % down > expected

     

    now with all services off, even then turning array off, unmounting rclone shares, unmounting UAD smb Shares, i still had almost 50 % usage

     

    image.thumb.png.09f8a6675964df1cdb0c4fa9b5adad55.png

     

    also then sync .. 1, 2, 3 vm_cache ... made overall 1 % difference, so not really helpful ...

     

    what really wondered me then is the difference in docker folder view while Docker Service was off

     

    image.png.ae6d7d120e62cf134d0e13383e61e312.png

     

    while my docker folder is on cache drive and usually ~ 50 - 60 % filled it shouldnt show this, so i looked closer

     

    the 86 % are 13,3 GB ... from what ? i assume here is the culprit ...

     

    image.png.c2e4ae1555895fbb0ba9b0affee1a44d.png

     

    even with Docker Service off it still keeps like 13+ GB "somewhere", may the upper shared RAM usage ?

    like 86% from 50% RAM ... ?

     

    my cache drive is like this (which is ~ equal to docker folder usage when turning Docker Service on)

     

    image.png.a89eb828aac7e4ce8bdb91b7c8a4b8a0.png

     

    so this 13+ GB are really wondering me, i made a diagnostics (attached) and rebootet again without Docker, VM Services

     

    image.png.dfe24345ebde7907f6abfd73044e30a9.png

     

    now i started all Services again, thats the Startpoint while all Services are up and running

     

    ~ 20 % overall > expected

    ~ 57 % docker folder > expected

     

    image.thumb.png.4ffbab91f46d26f0f376d4018db86d0d.png

     

    so, finally my assumption is, all those docker updates etc etc ... are filling up the RAM slowly (but sure), may something keeps "stuck" in the mounted /var/lib/docker/ overlay ? and even doesnt get freed up when turning Docker Service complete off ?

     

    im out of ideas actually here.

     

    sadly i cant say if the 1st oom error i had here was also already on 7 beta1 or before on 6.12 ...

     

    thanks ahead for taking a look, syslog.1 is where it all happens

     

    alsserverii-diagnostics-20240814-0434.zip




    User Feedback

    Recommended Comments

    This should probably be in the general support forum, Emby was using a lot of RAM, but since there were more events after that it may not be the only issue, still check it's configuration or limit it's RAM usage and retest.

    Link to comment
    8 hours ago, JorgeB said:

    still check it's configuration or limit it's RAM usage and retest.

    configuration is checked, and emby barely running here (and if so, only direct play), remote playback with transcoding i use plex as its the better choice for me therefore.

     

    i ll look at it and try to follow when it grows and grows ... but my concern is still, even shutting all dockers down the RAM didnt get freed up, thats why my assumption is still there is something wrong "under the hood"

     

    this is the shut down state .... no Dockers, no Services, ... and those 13,3 GB used for ? somewhere in a tempfs ...

    and overall still on 48 % usage, so shutting down the Docker Service after running the mashine 14 days freed those ~ 17%, but coming from ...

     

    image.thumb.png.5a11cf27e9a110666b32bee365098e6b.png

     

    fresh rebootet i start with ~ 20 % while all services up and running.

     

    i ll keep watching it now and report back here.

     

    may you have some ideas what could cause this 13+ GB staying somewhere ...

    Link to comment
    8 minutes ago, alturismo said:

    and emby barely running here

    It was using a lot of RAM for something barely running, so I would recommend limiting its RAM usage.

     

    9 minutes ago, alturismo said:

    and those 13,3 GB used for ?

    That's not RAM usage, it's the size of the docker folder.

     

    If you update to 7.0.0 it will show the total RAM used by the docker service:

     

    image.png

     

    Link to comment
    51 minutes ago, JorgeB said:

    That's not RAM usage, it's the size of the docker folder.

     

    sorry, but nope, my docker folder is much more as im using docker folder, with running docker service i see the proper value

     

    image.thumb.png.02e9f98bf42f01345b5b2ae9d13a47ec.png

     

    the upper mentioned 13.3 GB are from a tempfs "somewhere" ... while the docker service was off ;) nothing was running anymore.

     

    and as mentioned, 86% (13,3 GB) are ~ 100 % (16 GB) which would be ~ 50 % from my total RAM (32 GB)

    used while there is no dockers running and the docker service is turned off.

     

    i ll look forward and check when i figure something when its growing in the background ... and i report back.

    Link to comment

    @JorgeB as note, from yesterday > today RAM filled up fast

     

    from 30 % > 76 % now, more Info's will follow

     

    image.png.16143f17897ab93a40a25bac8327995e.png

     

    while i can say afaik, dockers didnt follow up (monitored sep)

     

    upper chart is plex and emby, freed up actually ;) lower chart is docker usage overall

     

    image.thumb.png.c35440eda1c08394ed912dfe417ad062.png

     

    htop now again has this shared mem locked

     

    image.thumb.png.742df3a96516f283bfc1be02d397bb2c.png

     

    will try to nail more down, just as note

    Link to comment
    On 8/14/2024 at 6:47 PM, JorgeB said:

    It was using a lot of RAM for something barely running, so I would recommend limiting its RAM usage.

     

    for now i d say i can exlude Docker usage

     

    as seen before the usage now with all systems off

     

    image.thumb.png.6c9c31ad575f66cf061bd94df9d6e756.png

    image.thumb.png.1a156854730e8026dc3236848862abff.png

     

    image.thumb.png.458bc60dadf661530e60fc86f08c5b86.png

     

    also all remote Shares offline

     

    image.png.958c2f47d81ff9d9150dbc06d162b903.png

     

    what i can say, i have a cloud backup running every monday > tuesday nite which would fit to the issues happening since a little while now

     

    so i ll investigate further into this and report back

    Link to comment

    ok @JorgeB, it really looks like its related to the cloud backups on mounted rclone's

     

    tested now with and without realtek drivers, starting sync to a mount will immediately fill up the RAM

     

    image.thumb.png.e83104a3675bf2e8deabea889970d05a.png

     

    which will never be given back ...

     

    i can say im using this kind of sync since ... 1 year ++ and never experienced any RAM issues.

     

    what i changed is, the source was local before and changed to a net ssh source, so i ll test this too once i have some more time ...

    for now i quit the cloud sync backups here.

     

    at least i guess i have 1 answer now ;) thanks @ich777 for pointing someway into this direction.

    Link to comment

    after a few tests i can say, all copys to remote mounted shares (rlone) are ending up in this scenario here

     

    shared memory gets locked here

     

    its been 1 purple line, after 2 files to 2 different targets with 2 different mount types, 1 stripe to 7, 25% > 36% RAM usage

     

    image.png.41f35fcb1e8879241d0c465a90cf05a4.png

     

    as info

    Link to comment

    ok, confirmed its rclone vfs cache related, can be closed here, either kernel compatibility or rclone ... @JorgeB

     

    vfs cache is not cleaned up as expected, manually cleaning works

    • Like 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...