High RAM usage, diagnosing the culprit

Boo-urns · April 20, 2021

So I've had a few issues recently where the server would crash, which seemed to be because it ran out of RAM (I managed to log in and most dockers were stopped with errors, and RAM was almost 100% of 32GB).

I've noticed my RAM usage seems to be stable-ish at roughly 70%, but that's way more than usual.

How can I diagnose what is using all the RAM, and potentially fix it? I've looked at the advanced docker tab with RAM/CPU usage and nothing stands out.

Edit: Completely stopping the docker service only drops usage to 60%. Only other running thing is Pfsense VM with 3GB allocated.

skynet-diagnostics-20210420-2341.zip

Edited April 20, 2021 by Boo-urns
New info

trurl · April 20, 2021

Diagnostics shows rootfs 95% used. Something must be writing to RAM. The OS is in RAM, so anything that specifies a path that isn't to actual storage (some subfolder of /mnt) is a path in RAM. Check each Host Path for each of your containers.

Boo-urns · April 21, 2021

14 hours ago, trurl said:

Diagnostics shows rootfs 95% used. Something must be writing to RAM. The OS is in RAM, so anything that specifies a path that isn't to actual storage (some subfolder of /mnt) is a path in RAM. Check each Host Path for each of your containers.

So I found my Plex transcoding directory was mapped to /tmp, so this was probably causing it. This has been changed now. Would a restart delete the ram usage or do I need to manually remove files?

trurl · April 22, 2021

19 hours ago, Boo-urns said:

Would a restart delete the ram usage

reboot

trurl · April 22, 2021

Using RAM for transcode is a common technique though.

Boo-urns · April 22, 2021

1 hour ago, trurl said:

reboot

So I've performed a reboot and now the server UI isn't accessible at all. Device doesn't show up in router DHCP either.

Boot screen (black screen/white text) gets to tower login, but GUI doesn't load. I did notice it mentioned eth0 not found during the boot sequence whether that affects it.

Not sure how this has broken it, but any suggestions to troubleshoot? I've tried safe mode with/without GUI with no luck.

trurl · April 22, 2021

11 hours ago, Boo-urns said:

Boot screen (black screen/white text) gets to tower login, but GUI doesn't load.

Are you sure you booted in GUI mode? Can you login at that screen?

Boo-urns · April 23, 2021

11 hours ago, trurl said:

Are you sure you booted in GUI mode? Can you login at that screen?

I booted in normal mode (no GUI) and can login via the terminal directly. But it not available over network and won't boot in GUI mode.

Boo-urns · April 23, 2021

14 hours ago, trurl said:

Are you sure you booted in GUI mode? Can you login at that screen?

Sorry, for clarity, booting to GUI Mode (and GUI Safe Mode) shows the login screen, but i'm unable to enter text to try and login.

trurl · April 24, 2021

Get us new diagnostics from command line as explained here

Boo-urns · April 24, 2021

12 minutes ago, trurl said:

Get us new diagnostics from command line as explained here

OK, randomly after a few reboots it magically showed up in router devices and I could login to the webGUI. I thought perhaps my previous pfsense VM could be contributing, so I disabled VMs and set PCIe ACS Override back to disabled. I rebooted and once again network was down during boot and unreachable.

I did notice that the network activity light on the mobo LAN connection intermittently turned off completely, where otherwise it staying flashing Gb connection. So I thought perhaps faulty cable? Replaced cable no change. Faulty LAN port maybe?

I restored a previous flash backup, which didn't actually change anything because it updated to Unraid.net during the brief period it was working. But it booted somehow.

I created a bond between my mobo LAN port (eth0) and my 4 port PCIe NIC (eth1) and made them redundant (mode 1), so I could test whether it was the mobo LAN port, but since then I haven't had it fail.

Attached are diagnostics from when unraid was not booting (network down), and currently working.

The only thing I can think of is if the motherboard ethernet port was intermittently failing, in which case i'd be looking to RMA as it's only a few months old. However if it's an Unraid issue i'd love to know what was causing it so I can avoid this in future.

If you can make sense of the logs it would be appreciated, and massive thanks for your help so far (my RAM is back down to 15%. Hooray!

skynet-diagnostics-20210422-2304 (not booting).zip skynet-diagnostics-20210424-1103 (working).zip

High RAM usage, diagnosing the culprit

Recommended Posts

Boo-urns

Link to comment

trurl

Link to comment

Boo-urns

Link to comment

trurl

Link to comment

trurl

Link to comment

Boo-urns

Link to comment

trurl

Link to comment

Boo-urns

Link to comment

Boo-urns

Link to comment

trurl

Link to comment

Boo-urns

Link to comment

Join the conversation