• [6.12.1] Docker ram usage increase


    Shagon
    • Minor

    It seems that, compared to 6.11.5 Docker is using more RAM overall.

    Looking at last 7 days it hovers around 2GB, after installing 6.12.1 and rebooting it jumped to 6GB, after that I had to manually restart each container and now it hovers around 3GB.

     

    110515940_Screenshot2023-06-23at11_54_40AM.thumb.png.7adc2c38316b6ed3298318c5a1f57863.png

     

     

    Taking a closer look at the last 12 hours410629850_Screenshot2023-06-23at11_54_56AM.thumb.png.0d23c42d62c2ba56e40380ab34162a3e.png

     

    We can clearly see a jump in RAM usage and a point where I restarted the containers. Overall usage is just higher compared to docker version in 6.11.5.

    flareon-diagnostics-20230623-1157.zip




    User Feedback

    Recommended Comments

    Additional issues - machine crashed for no reason, rolling back to 6.11.5 as I had 0 issues on that version for months.

    Link to comment

    I would agree that Docker RAM seems to have increased a lot since 6.12. I have no issues with the server though.

    (I made no change to my containers)

     

    image.thumb.png.0f06e1fc32b723c8140d8f2034fd1164.png

    Link to comment

    I don't have before/after graphs but I have 32GB RAM and 60 containers running and am at ~50% RAM usage which is about the same as before.

    Edited by Kilrah
    Link to comment

    The RAM usage with version is 20.10.21 is fine, my initial thinking was that there was a large change in docker that introduced this, however it might be a memory leak introduced in Docker version 23, I can't pinpoint the cause of it, nevertheless its a side-effect, larger one as the usage seems to jump 50% compared to previous version - when no changes to the containers has been made.

     

    The main issue that I have now is that 6.12.1 is unstable - crashing the OS after several hours.

    I've reverted back to 6.11.5 as it is much more stable - running it for months without issues.

     

    Looking into posts on reddit as well as here it seems I'm not the only one. Narrowing down the issue my guess is that kernel support for intel cpus (i have i3-10100) isn't as great with the newer kernel (10th or 11th gen issues?).

     

    either way I'm happy being on the previous version for a few more months until everything is resolved :)

     

     

    Link to comment

    The question is :

    • is it a Unraid problem ?
    • or a linux+docker problem ? (ie new kernel + new docker version)

     

    If it's the second, not much that can be done here.

    Link to comment

    Not sure to see a significant improvement from 6.12.2.

    It seems that some containers are more impacted than others (Plex, Jellyfin for example for me)

     

    image.thumb.png.8f6098bc3aad63a232c5fc76fcbd0cd0.png

     

    On the other end, I am wondering it is an actual RAM usage problem or an issue of reporting from docker and/or kernel interpretation. (my table above uses 'docker_container_mem' )

     

    On the same time period than the image above (15 days), my system RAM usage does not seem to be fluctuating much while docker is supposed to have reached ~27GB from an average of 8-9GB on 6.11.

    My total RAM is 64GB, so going from 9 to 27 should be clearly visible on the system.

     

    image.png.21c65430f97b396c21170885e394dcf3.png

     

    The Unraid Dashboard is consistant with my Grafana dashboard.

    image.png.b4e8a9d22bb232cc9a86b3d1507198f5.png

    Link to comment

    Getting back to the crashes, I narrowed it down to ipvlan, for some reason it crashes more often than macvlan. Macvlan setting is more stable - at least on my 16GB DDR4, B460M-K, i3-10100 system, looking at usage I managed to get 6+ months without any crash, while with ipvlan it crashed 3 times in 48h. Twice on 6.12.1 and once on 6.11.5. My only conclusion is tha ipvlan doesn't play well with this system.

     

    That being said I am hesitant to upgrade to 6.12.2 with the above RAM usage as well as crashes. I can live with the ram usage, I can just buy more ram, but crashes are a big issue - as I have DNS running for the household as well as plex - something that _requires_ 100% uptime as family members do not use netflix and other subscription based services 😄

     

    Is there any way for me to troubleshoot why ipvlan crashes the system? Or better yet - are there any plans deprecating macvlan and removing it from future versions of unraid?

    I'm basically looking for a stable solution whatever it might be.

    Link to comment

    In my setup I narrowed down the docker crash to a container that did interact with the network layer. I'm using 6.12.1 with ipvlan.

     

    https://github.com/thrnz/docker-wireguard-pia

     

    The idea is to tell other containers to use this one for their network. This container act as a VPN router that redirects the traffic to Private Internet Access.

     

    As soon as I removed it, my server is fully stable again. So my guess is, there is some issues with stability and networking.

    To be clear I had those issue already in 6.11.5, so I think there is some underlying issue in docker network.

     

    Weirdly enough, if I use a container that has built it wireguard capabilities, it just work and the server stay stable.

    Link to comment

    Have you tried switching from macvlan  as the network driver (Settings - Docker) to instead ipvlan?

    Link to comment

    I tried ipvlan but the system seems to crash sometime with it. Ever since I rolled back to 6.11.5 I had only one issue (that I already reported however nothing can be done about it it seems -

    Considering the increased ram usage and everything else I am tempted to leave the system until 6.12.5 or later version as daily reboots are not something my family will take for granted (media server for 4 people).

     

    I really need to make sure the system is stable before updating, macvlan seems to be stable in my usage, I've had 0 reboots because of it until today, changed to ipvlan and I had issues on 6.12.2, at least 3 reboots in 2 days.

     

    If there's anything I need to do to make sure 6.12.X works properly on my machine please let me know, otherwise I can't have daily crashes that might corrupt the data on disk or have the server be down since multiple people in my household use it (several services, mostly plex and book reading).

    Link to comment

    Even with the above, I can take the increased ram usage - that's fine - it's the system stability that's critical. Having to try out ipvlan vs macvlan and tweaking stuff - I could do this if I'm alone, but with a family that _depends_ on the media on the server - I can't change stuff on-the-fly not knowing if it will crash while I'm away :(

     

    Again, I am willing to do anything to ensure system stability - I don't even run VMs in this thing, just docker containers - unraid - for me at least - is a better alternative than other software that offer the same thing which is a stable OS with upgrade support with a friendly and helpful community.

     

    As a family person I am done with the times where I can make my own networking cables, fine tune the system to my liking - now I just need stuff to work :)

    Link to comment

    I've performed the following changes today:

     

    - Updated OS to 6.12.3 (`6.11.5` => `6.12.3`)

    - Updated BIOS to latest version (was a few versions behind)

    - Removed 5 containers

    - Added `--log-driver none --no-healthcheck` to all containers

     

    Docker custom network type is still set to `macvlan`, will monitor for any crashes.

     

    Link to comment

    Not an hour later I see the following via `dmesg`:

     

    Spoiler

    [Mon Jul 24 14:40:18 2023] ------------[ cut here ]------------
    [Mon Jul 24 14:40:18 2023] WARNING: CPU: 2 PID: 80 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023] Modules linked in: af_packet xt_mark veth xt_nat xt_tcpudp macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat xt_addrtype br_netfilter xfs ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp iosf_mbi drm_buddy i2c_algo_bit coretemp ttm crct10dif_pclmul crc32_pclmul drm_display_helper crc32c_intel joydev ghash_clmulni_intel input_leds sha512_ssse3 mei_hdcp mei_pxp drm_kms_helper wmi_bmof mxm_wmi aesni_intel drm crypto_simd cryptd intel_gtt rapl intel_cstate r8169 i2c_i801 ahci mei_me agpgart i2c_smbus syscopyarea sysfillrect hid_apple sysimgblt tpm_crb intel_uncore i2c_core mei libahci realtek led_class fb_sys_fops thermal fan tpm_tis
    [Mon Jul 24 14:40:18 2023]  video tpm_tis_core wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix
    [Mon Jul 24 14:40:18 2023] CPU: 2 PID: 80 Comm: kworker/u16:3 Tainted: P           O       6.1.38-Unraid #2
    [Mon Jul 24 14:40:18 2023] Hardware name: ASUS System Product Name/PRIME B460M-K, BIOS 1620 07/09/2021
    [Mon Jul 24 14:40:18 2023] Workqueue: events_unbound macvlan_process_broadcast [macvlan]
    [Mon Jul 24 14:40:18 2023] RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023] Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
    [Mon Jul 24 14:40:18 2023] RSP: 0018:ffffc900001a8d98 EFLAGS: 00010202
    [Mon Jul 24 14:40:18 2023] RAX: 0000000000000001 RBX: ffff888327e34600 RCX: 09c939e45870cc48
    [Mon Jul 24 14:40:18 2023] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888327e34600
    [Mon Jul 24 14:40:18 2023] RBP: 0000000000000001 R08: a1f12dc52477d268 R09: 0f8457e5fca31fda
    [Mon Jul 24 14:40:18 2023] R10: fa2f08dcb877b495 R11: ffffc900001a8d60 R12: ffffffff82a11d00
    [Mon Jul 24 14:40:18 2023] R13: 000000000002667b R14: ffff8883205b8e00 R15: 0000000000000000
    [Mon Jul 24 14:40:18 2023] FS:  0000000000000000(0000) GS:ffff88845f480000(0000) knlGS:0000000000000000
    [Mon Jul 24 14:40:18 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [Mon Jul 24 14:40:18 2023] CR2: 00001480549716da CR3: 0000000161f58001 CR4: 00000000003706e0
    [Mon Jul 24 14:40:18 2023] Call Trace:
    [Mon Jul 24 14:40:18 2023]  <IRQ>
    [Mon Jul 24 14:40:18 2023]  ? __warn+0xab/0x122
    [Mon Jul 24 14:40:18 2023]  ? report_bug+0x109/0x17e
    [Mon Jul 24 14:40:18 2023]  ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023]  ? handle_bug+0x41/0x6f
    [Mon Jul 24 14:40:18 2023]  ? exc_invalid_op+0x13/0x60
    [Mon Jul 24 14:40:18 2023]  ? asm_exc_invalid_op+0x16/0x20
    [Mon Jul 24 14:40:18 2023]  ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023]  ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023]  ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]
    [Mon Jul 24 14:40:18 2023]  nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
    [Mon Jul 24 14:40:18 2023]  nf_hook_slow+0x3a/0x96
    [Mon Jul 24 14:40:18 2023]  ? ip_protocol_deliver_rcu+0x164/0x164
    [Mon Jul 24 14:40:18 2023]  NF_HOOK.constprop.0+0x79/0xd9
    [Mon Jul 24 14:40:18 2023]  ? ip_protocol_deliver_rcu+0x164/0x164
    [Mon Jul 24 14:40:18 2023]  __netif_receive_skb_one_core+0x77/0x9c
    [Mon Jul 24 14:40:18 2023]  process_backlog+0x8c/0x116
    [Mon Jul 24 14:40:18 2023]  __napi_poll.constprop.0+0x28/0x124
    [Mon Jul 24 14:40:18 2023]  net_rx_action+0x159/0x24f
    [Mon Jul 24 14:40:18 2023]  __do_softirq+0x126/0x288
    [Mon Jul 24 14:40:18 2023]  do_softirq+0x7f/0xab
    [Mon Jul 24 14:40:18 2023]  </IRQ>
    [Mon Jul 24 14:40:18 2023]  <TASK>
    [Mon Jul 24 14:40:18 2023]  __local_bh_enable_ip+0x4c/0x6b
    [Mon Jul 24 14:40:18 2023]  netif_rx+0x52/0x5a
    [Mon Jul 24 14:40:18 2023]  macvlan_broadcast+0x10a/0x150 [macvlan]
    [Mon Jul 24 14:40:18 2023]  ? _raw_spin_unlock+0x14/0x29
    [Mon Jul 24 14:40:18 2023]  macvlan_process_broadcast+0xbc/0x12f [macvlan]
    [Mon Jul 24 14:40:18 2023]  process_one_work+0x1a8/0x295
    [Mon Jul 24 14:40:18 2023]  worker_thread+0x18b/0x244
    [Mon Jul 24 14:40:18 2023]  ? rescuer_thread+0x281/0x281
    [Mon Jul 24 14:40:18 2023]  kthread+0xe4/0xef
    [Mon Jul 24 14:40:18 2023]  ? kthread_complete_and_exit+0x1b/0x1b
    [Mon Jul 24 14:40:18 2023]  ret_from_fork+0x1f/0x30
    [Mon Jul 24 14:40:18 2023]  </TASK>
    [Mon Jul 24 14:40:18 2023] ---[ end trace 0000000000000000 ]---

     

    Changing "Docker custom network type" to "ipvlan" and monitoring for issues.

     

    Decided to change "Host access to custom networks" to "disabled" as well - not sure why that was enabled.

    - figured that one out, I route my containers through adguard for adblock/DNS, so without this it doesn't work :(

    Figured out how to get containers to resolve the DNS from Adguard DNS from a container, custom network and then in network settings for unraid itself I just use a static 172.18.0.0/16 IP.

    Edited by Shagon
    Link to comment

    Hello @ChatNoir 👋 Just wanted to keep you updated regarding this case and have everything condensed into one post.

     

    6h ago I've done the following changes on Unraid:

     

    - Updated OS to 6.12.3 (`6.11.5` => `6.12.3`)

    - Updated BIOS to latest version (was a few versions behind)

    - Removed 5 containers

    - Added `--log-driver none --no-healthcheck` to all containers

     

    With that the "Docker custom network type" was set to "macvlan". 30 minutes later I've gotten the nasty kernel bug - see comment - comment-with-bug.

     

    I've changed the Docker custom network type to "ipvlan" and disabled "Host access to custom networks" since it uses a "macvlan" method to expose routes.

     

    After a while I noticed that something isn't working and it turns out the feature "Host access to custom networks" is rather useful if you want containers communicating with other containers, particularly AdGuard that I use. Normal communication between containers, e.g. sending a request with curl worked, but not DNS.

    Adding every container to a custom network and using Settings > Network Settings > IPv4 default gateway allowed me to have a default DNS set for both unraid and docker - because I used a static IP for AdGuard.

    For bonus points I added static IP for every container and then created a client in AdGuard to track what containers are sending what requests.

     

    Getting back to the issue at hand, to test the stability I watched a movie, had my family use it normally and then observed logs via `dmesg -T --follow`. I am happy to report that after 6h I see no errors in the dmesg logs - however as I am in IT work I know it might require more time to manifest hence why I will monitor for the next week or so and report my findings back.

     

    The reason why I tagged you is because I think maybe people are using "Host access to custom networks" that causes the macvlan errors in dmesg and it might be a good idea to add that information to the documentation.

     

    Hopefully this is the end of the stability issues reported. I'll also continue monitoring RAM usage however that is a minor issue versus crashes caused by the docker network driver.

    Link to comment
    2 minutes ago, Shagon said:

    Hello @ChatNoir 👋 Just wanted to keep you updated regarding this case and have everything condensed into one post.

    Thanks for that, but I do not see any MACVLAN or any other issues, I was just answering on your thread title about RAM increase.

     

    And on that topic, it is still higher (though not problematic with my usecase). Here is the view from the last 90 days.

    image.png.ca9943bf997832368c8d806920847d8c.png

    The main culprit seems to be Plex being around 300MB previously to 1 to 11GB now. Maybe the --no-healthcheck command could help, I'll look into it (maybe) when I have some vacation time at home.

    Link to comment

    24h later the system has no errors in the dmesg log. I believe the root cause was the macvlan host access to network.

     

    In terms of the docker usage:

     

    # docker stats --no-stream --format='table {{.Name}}\t{{.MemUsage}}' | (sed -u 1q; sort)
    NAME             MEM USAGE / LIMIT
    adguard          107.1MiB / 15.49GiB
    calibre-web      142.7MiB / 15.49GiB
    filebrowser      19.52MiB / 15.49GiB
    home-assistant   274.4MiB / 15.49GiB
    homepage         115.2MiB / 15.49GiB
    plex             507.2MiB / 15.49GiB
    prowlarr         156MiB / 15.49GiB
    qbittorrent      45.89MiB / 15.49GiB
    radarr           275.7MiB / 15.49GiB
    sonarr           355.4MiB / 15.49GiB
    tautulli         77.74MiB / 15.49GiB
    yarr             29.61MiB / 15.49GiB

     

    Mine went back to somewhat normal usage, every container has "--no-healthcheck --log-driver none", except plex - it has "--device=/dev/dri --log-driver none --no-healthcheck" as I need quicksync  😄

     

    Overall - after 24h I'm satisfied with the system stability - will continue to monitor until the end of the week (until July 30th) and report back then (or earlier if I do see any issues).

    Link to comment

    Almost a week later - no issues - macvlan was to blame for freezes and reboots. As for the docker usage - no time to troubleshoot that (20.10.24 on 6.12.3 seems to be the same as on 6.11.5 I think), if we upgrade docker again and the usage jumps 50% I'll just get a 16GB stick and upgrade the 8GB=>16GB stick.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.