trajpar Posted January 31, 2022 Share Posted January 31, 2022 I've been having out of memory errors over this past weekend. I have 16GB available, yesterday I was doing an incremental backup with borg and noticed my memory usage was at 99% and my cpu usage was 100%. I'm wondering if I have a misconfigured borg script. Logs are attached. alexandria-diagnostics-20220131-0752.zip Quote Link to comment
Squid Posted January 31, 2022 Share Posted January 31, 2022 See if this helps diagnose Quote Link to comment
trajpar Posted January 31, 2022 Author Share Posted January 31, 2022 Thanks! I'll use this next time I am having issues. Quote Link to comment
trajpar Posted February 1, 2022 Author Share Posted February 1, 2022 On 1/31/2022 at 11:40 AM, Squid said: See if this helps diagnose My system had this issue again today, I was able to run the script. Also, I noticed the syslog was spammed with this: Feb 1 12:42:08 Alexandria kernel: caller _nv000649rm+0x1ad/0x200 [nvidia] mapping multiple BARs Feb 1 12:42:10 Alexandria kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] plugin: installing: https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg plugin: downloading https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg plugin: downloading: https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg ... done This script may take a few minutes to run, especially if you are manually mounting a remote share outside of /mnt/disks or /mnt/remotes /usr/bin/du --exclude=/mnt/user --exclude=/mnt/user0 --exclude=/mnt/disks --exclude=/proc --exclude=/sys --exclude=/var/lib/docker --exclude=/boot --exclude=/mnt -h -d2 / 2>/dev/null | grep -v 0$' ' 132M /.cache/borg 132M /.cache 16K /.config/borg 16K /.config 4.0K /tmp/ca_notices 23M /tmp/user.scripts 9.2M /tmp/fix.common.problems 24K /tmp/unassigned.devices 11M /tmp/community.applications 112K /tmp/notifications 528K /tmp/plugins 4.0K /tmp/emhttp 44M /tmp 8.0K /etc/vulkan 4.0K /etc/OpenCL 4.0K /etc/nvidia-container-runtime 8.0K /etc/docker 4.0K /etc/netatalk 556K /etc/libvirt 260K /etc/libvirt- 4.0K /etc/pkcs11 136K /etc/lvm 8.0K /etc/libnl 8.0K /etc/ssmtp 24K /etc/samba 4.0K /etc/rsyslog.d 40K /etc/php-fpm.d 16K /etc/php-fpm 8.0K /etc/php 40K /etc/nginx 2.0M /etc/file 24K /etc/avahi 48K /etc/apcupsd 4.0K /etc/sysctl.d 48K /etc/security 232K /etc/ssl 608K /etc/ssh 100K /etc/pam.d 4.0K /etc/openldap 88K /etc/mc 36K /etc/logrotate.d 4.0K /etc/sensors.d 36K /etc/iproute2 36K /etc/modprobe.d 7.2M /etc/udev 4.0K /etc/cron.monthly 4.0K /etc/cron.hourly 12K /etc/cron.daily 4.0K /etc/cron.d 4.0K /etc/cron.weekly 12K /etc/dbus-1 4.0K /etc/sasl2 68K /etc/profile.d 56K /etc/default 304K /etc/rc.d 8.0K /etc/acpi 13M /etc 20K /usr/info 1.3M /usr/include 1.2M /usr/man 22M /usr/doc 4.0K /usr/systemtap 21M /usr/libexec 4.0M /usr/src 158M /usr/local 872M /usr/lib64 43M /usr/share 82M /usr/sbin 1.7M /usr/lib 372M /usr/bin 1.6G /usr 4.0K /lib64/xfsprogs 4.0K /lib64/e2fsprogs 972K /lib64/security 24M /lib64 21M /sbin 61M /lib/modules 4.0K /lib/systemd 76K /lib/modprobe.d 36K /lib/dhcpcd 6.5M /lib/udev 115M /lib/firmware 182M /lib 8.0K /run/blkid 4.0K /run/avahi-daemon 1004K /run/udev 1.0M /run 11M /bin 82M /var/sa 89M /var/local 4.0K /var/kerberos 24K /var/state 2.5M /var/cache 4.0K /var/lock 28K /var/tmp 20K /var/spool 1.6M /var/run 2.3M /var/log 3.5M /var/lib 181M /var 16K /root 2.2G / 0 /mnt Finished. NOTE: If there is any subdirectory from /mnt appearing in this list, then that means that you have (most likely) a docker app which is directly referencing a non-existant disk or cache pool script: memorystorage.plg executed DONE Quote Link to comment
Xaero Posted February 1, 2022 Share Posted February 1, 2022 (edited) How much data are you piping through BORG? I see above it's configured to use /.cache/borg as its cache directory. "/" is in memory on Unraid, so it's going to cache everything in memory while it does its work. If you have individual files larger than your remaining memory this is likely to be the cause. As far as the high CPU utilization - that's more or less expected with BORG. It's going to calculate compressed, hashed backups of files and make sure they aren't duplicates of one another using the hashes before it does the compression. Calculating hashes is computationally complex, so it takes CPU power. Compression is also computationally complex and also takes CPU power. That's expected. As far as the BAR messages in the log - that's not related. Seems like the nvidia driver is asking the kernel to reserve more memory than the bus uses - which isn't completely outside of reason, but also isn't how things *should* be written according to the Linux kernel; its mostly an informational/debugging message. Edited February 1, 2022 by Xaero Quote Link to comment
trajpar Posted February 1, 2022 Author Share Posted February 1, 2022 This past weekend borg was working though a bit of data, I added my appdata backups. So 60GB? Today, borg was not running. Kids were watching Plex when the stream froze. I checked unraid and the CPU and MEM utilization was 100%, that is when I ran the script to see what was using all the memory. Thank you for the clarification on the BAR messages. Quote Link to comment
Solution Xaero Posted February 1, 2022 Solution Share Posted February 1, 2022 (edited) 1 hour ago, trajpar said: This past weekend borg was working though a bit of data, I added my appdata backups. So 60GB? Today, borg was not running. Kids were watching Plex when the stream froze. I checked unraid and the CPU and MEM utilization was 100%, that is when I ran the script to see what was using all the memory. Thank you for the clarification on the BAR messages. The stream froze because ffmpeg is getting killed as a high memory consumer when the OOM procedure runs: Jan 30 01:12:57 Alexandria kernel: Lidarr invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 ... ... Jan 30 01:12:57 Alexandria kernel: Tasks state (memory values in pages): Jan 30 01:12:57 Alexandria kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 30 01:12:57 Alexandria kernel: [ 19532] 0 19532 1466611 1447233 11796480 0 0 ffmpeg ... ... Jan 30 01:12:57 Alexandria kernel: Out of memory: Killed process 19532 (ffmpeg) total-vm:5866444kB, anon-rss:5789584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:11520kB oom_score_adj:0 Jan 30 01:12:58 Alexandria kernel: oom_reaper: reaped process 19532 (ffmpeg), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB That just tells us what got killed as a result of the oom state, not what was consuming the most memory. The biggest memory consumer at the time ffmpeg was killed was this java process: Jan 30 01:12:57 Alexandria kernel: Tasks state (memory values in pages): Jan 30 01:12:57 Alexandria kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 30 01:12:57 Alexandria kernel: [ 21823] 99 21823 3000629 145993 2207744 0 0 java But this table only keeps track of running processes, not files stored in tmpfs, for example. From what I can see, tmpfs doesn't appear to be using that much memory. Edited February 1, 2022 by Xaero 1 Quote Link to comment
trajpar Posted February 2, 2022 Author Share Posted February 2, 2022 Thanks @Xaero for looking into this. I'll be on the look out for any java processes running. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.