High RAM usage, diagnosing the culprit


Recommended Posts

So I've had a few issues recently where the server would crash, which seemed to be because it ran out of RAM (I managed to log in and most dockers were stopped with errors, and RAM was almost 100% of 32GB).

I've noticed my RAM usage seems to be stable-ish at roughly 70%, but that's way more than usual.

How can I diagnose what is using all the RAM, and potentially fix it? I've looked at the advanced docker tab with RAM/CPU usage and nothing stands out.

 

Edit: Completely stopping the docker service only drops usage to 60%. Only other running thing is Pfsense VM with 3GB allocated.

skynet-diagnostics-20210420-2341.zip

Edited by Boo-urns
New info
Link to comment

 

14 hours ago, trurl said:

Diagnostics shows rootfs 95% used. Something must be writing to RAM. The OS is in RAM, so anything that specifies a path that isn't to actual storage (some subfolder of /mnt) is a path in RAM. Check each Host Path for each of your containers.

So I found my Plex transcoding directory was mapped to /tmp, so this was probably causing it. This has been changed now. Would a restart delete the ram usage or do I need to manually remove files?

Link to comment
1 hour ago, trurl said:

reboot

So I've performed a reboot and now the server UI isn't accessible at all. Device doesn't show up in router DHCP either.

Boot screen (black screen/white text) gets to tower login, but GUI doesn't load. I did notice it mentioned eth0 not found during the boot sequence whether that affects it.

Not sure how this has broken it, but any suggestions to troubleshoot? I've tried safe mode with/without GUI with no luck.

Link to comment
12 minutes ago, trurl said:

Get us new diagnostics from command line as explained here

 

OK, randomly after a few reboots it magically showed up in router devices and I could login to the webGUI. I thought perhaps my previous pfsense VM could be contributing, so I disabled VMs and set PCIe ACS Override back to disabled. I rebooted and once again network was down during boot and unreachable.

I did notice that the network activity light on the mobo LAN connection intermittently turned off completely, where otherwise it staying flashing Gb connection. So I thought perhaps faulty cable? Replaced cable no change. Faulty LAN port maybe?

 

I restored a previous flash backup, which didn't actually change anything because it updated to Unraid.net during the brief period it was working. But it booted somehow.

I created a bond between my mobo LAN port (eth0) and my 4 port PCIe NIC (eth1) and made them redundant (mode 1), so I could test whether it was the mobo LAN port, but since then I haven't had it fail.

 

Attached are diagnostics from when unraid was not booting (network down), and currently working.

 

The only thing I can think of is if the motherboard ethernet port was intermittently failing, in which case i'd be looking to RMA as it's only a few months old. However if it's an Unraid issue i'd love to know what was causing it so I can avoid this in future.

 

If you can make sense of the logs it would be appreciated, and massive thanks for your help so far (my RAM is back down to 15%. Hooray!

skynet-diagnostics-20210422-2304 (not booting).zip skynet-diagnostics-20210424-1103 (working).zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.