Out of memory errors


trajpar
Go to solution Solved by Xaero,

Recommended Posts

On 1/31/2022 at 11:40 AM, Squid said:

See if this helps diagnose

 

 

My system had this issue again today, I was able to run the script. 

 

Also, I noticed the syslog was spammed with this:

 

Feb 1 12:42:08 Alexandria kernel: caller _nv000649rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Feb 1 12:42:10 Alexandria kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

 

plugin: installing: https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg
plugin: downloading https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg
plugin: downloading: https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg ... done


This script may take a few minutes to run, especially if you are manually mounting a remote share outside of /mnt/disks or /mnt/remotes

/usr/bin/du --exclude=/mnt/user --exclude=/mnt/user0 --exclude=/mnt/disks --exclude=/proc --exclude=/sys --exclude=/var/lib/docker --exclude=/boot --exclude=/mnt -h -d2 / 2>/dev/null | grep -v 0$' '
132M /.cache/borg
132M /.cache
16K /.config/borg
16K /.config
4.0K /tmp/ca_notices
23M /tmp/user.scripts
9.2M /tmp/fix.common.problems
24K /tmp/unassigned.devices
11M /tmp/community.applications
112K /tmp/notifications
528K /tmp/plugins
4.0K /tmp/emhttp
44M /tmp
8.0K /etc/vulkan
4.0K /etc/OpenCL
4.0K /etc/nvidia-container-runtime
8.0K /etc/docker
4.0K /etc/netatalk
556K /etc/libvirt
260K /etc/libvirt-
4.0K /etc/pkcs11
136K /etc/lvm
8.0K /etc/libnl
8.0K /etc/ssmtp
24K /etc/samba
4.0K /etc/rsyslog.d
40K /etc/php-fpm.d
16K /etc/php-fpm
8.0K /etc/php
40K /etc/nginx
2.0M /etc/file
24K /etc/avahi
48K /etc/apcupsd
4.0K /etc/sysctl.d
48K /etc/security
232K /etc/ssl
608K /etc/ssh
100K /etc/pam.d
4.0K /etc/openldap
88K /etc/mc
36K /etc/logrotate.d
4.0K /etc/sensors.d
36K /etc/iproute2
36K /etc/modprobe.d
7.2M /etc/udev
4.0K /etc/cron.monthly
4.0K /etc/cron.hourly
12K /etc/cron.daily
4.0K /etc/cron.d
4.0K /etc/cron.weekly
12K /etc/dbus-1
4.0K /etc/sasl2
68K /etc/profile.d
56K /etc/default
304K /etc/rc.d
8.0K /etc/acpi
13M /etc
20K /usr/info
1.3M /usr/include
1.2M /usr/man
22M /usr/doc
4.0K /usr/systemtap
21M /usr/libexec
4.0M /usr/src
158M /usr/local
872M /usr/lib64
43M /usr/share
82M /usr/sbin
1.7M /usr/lib
372M /usr/bin
1.6G /usr
4.0K /lib64/xfsprogs
4.0K /lib64/e2fsprogs
972K /lib64/security
24M /lib64
21M /sbin
61M /lib/modules
4.0K /lib/systemd
76K /lib/modprobe.d
36K /lib/dhcpcd
6.5M /lib/udev
115M /lib/firmware
182M /lib
8.0K /run/blkid
4.0K /run/avahi-daemon
1004K /run/udev
1.0M /run
11M /bin
82M /var/sa
89M /var/local
4.0K /var/kerberos
24K /var/state
2.5M /var/cache
4.0K /var/lock
28K /var/tmp
20K /var/spool
1.6M /var/run
2.3M /var/log
3.5M /var/lib
181M /var
16K /root
2.2G /
0 /mnt


Finished.
NOTE: If there is any subdirectory from /mnt appearing in this list, then that means that you have (most likely) a docker app which is directly referencing a non-existant disk or cache pool

script: memorystorage.plg executed
DONE

Link to comment

How much data are you piping through BORG? I see above it's configured to use /.cache/borg as its cache directory. "/" is in memory on Unraid, so it's going to cache everything in memory while it does its work. If you have individual files larger than your remaining memory this is likely to be the cause. 

As far as the high CPU utilization - that's more or less expected with BORG. It's going to calculate compressed, hashed backups of files and make sure they aren't duplicates of one another using the hashes before it does the compression. Calculating hashes is computationally complex, so it takes CPU power. Compression is also computationally complex and also takes CPU power. That's expected.

As far as the BAR messages in the log - that's not related. Seems like the nvidia driver is asking the kernel to reserve more memory than the bus uses - which isn't completely outside of reason, but also isn't how things *should* be written according to the Linux kernel; its mostly an informational/debugging message.

Edited by Xaero
Link to comment

This past weekend borg was working though a bit of data, I added my appdata backups. So 60GB?

 

Today, borg was not running. Kids were watching Plex when the stream froze. I checked unraid and the CPU and MEM utilization was 100%, that is when I ran the script to see what was using all the memory.

 

Thank you for the clarification on the BAR messages.

Link to comment
  • Solution
1 hour ago, trajpar said:

This past weekend borg was working though a bit of data, I added my appdata backups. So 60GB?

 

Today, borg was not running. Kids were watching Plex when the stream froze. I checked unraid and the CPU and MEM utilization was 100%, that is when I ran the script to see what was using all the memory.

 

Thank you for the clarification on the BAR messages.



The stream froze because ffmpeg is getting killed as a high memory consumer when the OOM procedure runs:
 

Jan 30 01:12:57 Alexandria kernel: Lidarr invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
...
...
Jan 30 01:12:57 Alexandria kernel: Tasks state (memory values in pages):
Jan 30 01:12:57 Alexandria kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jan 30 01:12:57 Alexandria kernel: [  19532]     0 19532  1466611  1447233 11796480        0             0 ffmpeg
...
...
Jan 30 01:12:57 Alexandria kernel: Out of memory: Killed process 19532 (ffmpeg) total-vm:5866444kB, anon-rss:5789584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:11520kB oom_score_adj:0
Jan 30 01:12:58 Alexandria kernel: oom_reaper: reaped process 19532 (ffmpeg), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB



That just tells us what got killed as a result of the oom state, not what was consuming the most memory. 
The biggest memory consumer at the time ffmpeg was killed was this java process:
 

Jan 30 01:12:57 Alexandria kernel: Tasks state (memory values in pages):
Jan 30 01:12:57 Alexandria kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jan 30 01:12:57 Alexandria kernel: [  21823]    99 21823  3000629   145993  2207744        0             0 java




But this table only keeps track of running processes, not files stored in tmpfs, for example. From what I can see, tmpfs doesn't appear to be using that much memory.

Edited by Xaero
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.