February 29, 20242 yr I recently started seeing an out of memory error, that is killing one of my VMs throughout he day. I currently have 128g total memory and the single VM I run is using 64g. I did recently upgrade the server CPU, but i have tried putting the previous CPU back in and the issue still occurs. I did try reducing the VM to 16g and the OOM error still occurs, but less frequently. Thank you for any assistance you can provide. Edited March 12, 20242 yr by JudasD
February 29, 20242 yr Community Expert Possibly there's something else causing the issue, problem with OOM errors is that we can only see what got killed, not what caused the issue, also it's usually not just about not enough RAM but more about fragmented RAM, usually what gets killed is whatever is using more RAM, do you have any other VMs or docker containers running?
March 2, 20242 yr Author I have narrowed the timing down, the issue occurs when the daily script schedule executes (MyServersGetJS.sh, fix.common.problems.sh, logrotate, tailscale-daily, user.script.start.daily.sh, etc). I only have a single VM using 32g of memory, in a system with 128g total. I do have dockers running, but the issue still occurs even if I have all dockers shutdown.
March 3, 20242 yr Community Expert DO you have any user scripts running at that time? The other ones mentioned should not be a problem, but I guess it can still be one of those.
March 3, 20242 yr Community Expert On 2/29/2024 at 12:53 PM, JudasD said: Thank you for any assistance you can provide mprime is probably your culprit. As soon as it started it killed the system, your VM was just the first thing the reaper saw - then after that it killed the mprime process because it was using 132GB of RAM Feb 28 21:50:13 Server kernel: process '/downloads/prime95/mprime' started with executable stack Feb 28 21:50:32 Server kernel: mprime invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0 Feb 28 21:50:40 Server kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=55a7482a9a2d896c5ce337890b0711b65664b2c735e2fb64aabe326efab7bb4e,mems_allowed=0-1,global_oom,task_memcg=/,task=mprime,pid=31853,uid=0 Feb 28 21:50:40 Server kernel: Out of memory: Killed process 31853 (mprime) total-vm:132210072kB, anon-rss:128343568kB, file-rss:0kB, shmem-rss:2352kB, UID:0 pgtables:251728kB oom_score_adj:0 Feb 28 21:50:44 Server kernel: oom_reaper: reaped process 31853 (mprime), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
March 3, 20242 yr Author mprime was manually run earlier in that day. This was my attempt at trying to test my memory, etc. The OOM instance i am reporting was well after mprime was executed. I do not have any other scripts running at the same time as the "daily" schedule. If i move the daily schedule, the OOM issue follows the execution time. It is the fix common problems plugin that reports the issue because it is part of the daily script.
March 4, 20242 yr Community Expert 14 hours ago, JudasD said: mprime was manually run earlier in that day. This was my attempt at trying to test my memory, etc. The OOM instance i am reporting was well after mprime was executed. I do not have any other scripts running at the same time as the "daily" schedule. If i move the daily schedule, the OOM issue follows the execution time. It is the fix common problems plugin that reports the issue because it is part of the daily script. Do you have any other logs? The only instances in your attached diagnostics were mprime related.
March 12, 20242 yr Author Solution issue was tracked to a bent pin in MB in CPU#1. all is well now, thank you everyone for your inputs. Initially i believed the fix common problems plug-in was complaining about OOM and exiting. I now realize the plug-in was alerting me to having an OOM event in the logs. This caused me to chase my tail a bit, but i now understand the scope of the plug-in better. Edited March 12, 20242 yr by JudasD
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.