Jump to content

Unusual docker behaviour, eventually the whole machine becomes unresponsive


Recommended Posts

Hi! I have an i7 4790 desktop PC set up as Unraid with a bunch of storage and docker containers (normal stuff like plex/seafile, but also vscode and jupyter lab as a remote workspace). It's been rock solid for about a year, but recently (roughly after the 6.12 update, but I'm not quite sure if there is a direct correlation), I've been running into some very strange behaviour:

 - After Unraid starts, docker starts as well without any issues. However, once I disable it, I cannot turn it back on ("Docker service failed to start"), but I don't see anything relevant in the system log. I have to reboot, at which point it again starts without any problems.
 - After running for some non-deterministic amount of time (usually more than 2 days and less than 2 weeks), the server becomes unresponsive to anything except ping. Smart plug reports unusually high power usage for an idle machine.

Other relevant information:
 - It also responds to ping over Tailscale... so not only is ping running, the VPN is running too.

 - Since the behaviour started, I've more or less recreated all the docker containers from scratch, but the underlying images/configuration is the same.
 - Originally, I thought it was related to OOM errors, since I've been running some memory intensive computations in one of the docker containers. I've added quite aggressive memory limits to all containers, but this did not solve it.
 - I'm running "maclan" instead of "iplan" because my basic router is super not happy about "iplan" (drops connection randomly for several minutes and some containers just can't connect at all). I've seen warnings about "maclan" causing kernel panics. Could this be it?
 - A year ago, before setting it up, I ran memtest and a bunch of stability tests and it all seemed good. Right now, it's physically in a place where connecting screen/keyboard is very inconvenient, so I'd like to avoid doing that if possible (ofc, I can reboot it just fine).
 - At some point, there was a GPU in the PC that was used by a Windows VM. The GPU is now out and the VM is deleted. I don't see why this should be related, yet the problems started roughly when the GPU was removed.

What's the best course of action here? I'm quite used to basic management of linux systems, but I'm relatively new to Unraid.

I am more-or-less open to just reinstalling the whole thing. I can safely nuke the docker configuration (I keep all config for that in a separate github repo), but I am a bit anxious about messing up my array in the process (I have plans for an offsite backup, but that's not going to be online for a few months still and the freezing is driving me crazy since I'm actually need to work on the machine remotely from time to time).

Edited by daemontus
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...