Docker Fails to Start, After Resetting Docker Apps Fail to Download


ct3426

Recommended Posts

I was using my server, and some of my docker containers started crashing on me - I thought it was the specific app, tried restarting it, it hung on restart. Couldn't clean shut down server because docker couldn't stop, had to hard reset.

 

Docker is on a 4 drive ZFS cache with 64GB RAM allocated

 

- On server startup, hangs on docker start

- Disabling docker allows for server startup

- Turning docker on, hangs again

- With docker off, changed docker from 'directory' to a new clean directory - server is able to start with docker enabled, but when I go to add a previous app, it gets stuck on "please wait...." with several of my CPU cores pinning at 100%

- Tried using both kinds of disk based docker shares, same issue - will start up if no apps, but if I attempt to install an app, it will hang and then the server will not be able to be stopped (sometimes it will successfully kill docker process when I try to shut down).

 

Doing a ZFS scrub right now just in case, but it's going very slow.

 

At first I assumed that It was a specific docker image, but at this point I've tried all the flavors of docker storage on fresh directories, different apps both previous and new. Also tried updating to Unraid 6.12.4 from 6.12.3 just in case - upgrade went fine.

 

Edited by ct3426
Added Diagnostics. Removed diagnostics once issue gone.
Link to comment

Posting in case anyone runs into this:

- I found that putting my docker folder on an array only drive worked

- My ZFS scrub completed, didn't find anything wrong

- I tried switching back to my original ZFS cache pool docker folder... and this time it worked, everything ran fine.

 

So even though the scrub didn't find any issues, it seemed to resolve the problem - or maybe just waiting overnight fixed something somehow.

Link to comment

Adding for posterity - problem was not solved, just took slightly longer for my docker images to start failing on me when I used BTRFS.

 

I replaced the cache drives entirely, and now everything is working fine. Turns out my consumer grade (and low end consumer grade at that, with no DRAM) SSDs did not play nice with ZFS raidZ. Worked great for awhile, but it put a high level of wear on the drives - not enough to actually start seeing smart errors or any direct signs that anything was wrong, but enough that they slowed down to the point that it would hang up anything trying to work with them.

 

So if anyone runs into issues where they have trouble even just downloading docker images, and things hang on docker start, keep in mind that can happen if your cache drives are SSDs that are at the "really exceptionally slow on write" failure point.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.