Docker Disk Usage 100%; Service Won't Start


Recommended Posts

Hey all,

I noticed my Docker disk space was at 100% as all my containers were stopped.

I've read quite a few threads that point to potentially setting up a container incorrectly where the data it downloads goes to the wrong folder.

I have been unable to figure out what is responsible.

I upped the Docker "Docker vDisk size:" from 40gb to 50gb and still the Docker service reports "Docker Service failed to start."

Any advice is appreciated!

Attached is my diagnostics, as I've seen many people ask for this data.

tower-diagnostics-20210124-1449.zip

Link to comment

Ah okay that makes sense, I remember one of my two cache disks disconnecting. I did not realize that would cause an issue.

I ran the `$ btrfs dev stats /mnt/cache` script and got the output of

 

[/dev/sdb1].write_io_errs 0
[/dev/sdb1].read_io_errs 0
[/dev/sdb1].flush_io_errs 0
[/dev/sdb1].corruption_errs 0
[/dev/sdb1].generation_errs 0
[/dev/sdc1].write_io_errs 1507246927
[/dev/sdc1].read_io_errs 137577961
[/dev/sdc1].flush_io_errs 19733411
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0


Which is what you showed me from the diagnostics output above.

So I'm a bit confused from your link above, and trying to be extra careful to not result in any data loss as I have like 50+ containers configured.

I've re-seated the cable to the cache that disconnected and believe that is resolved. I've also reset the btrfs dev stats. I'm now at the point where I want to:
 

Quote

Finally run a scrub, make sure there are no uncorrectable errors...



As I want to be able to bring the Docker containers back online. Any advice?

Edited by GrehgyHils
Link to comment

Apologies, I just reread what I wrote and realized what I wrote wasn't clear. I'm trying to express that I don't actually follow what command one runs to perform the scrub.I ran

 

btrfs dev stats -z /mnt/cache

 

and the output now shows no errors:

 

Quote

btrfs dev stats -z /mnt/cache
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

 

If the btrfs dev command was not the correct way to perform a scrub, can you help me understand that? I've googled this, with respect to unraid, and have not been able to piece that together.

Thank you for your patience

Link to comment

I apologize but I'm still not seeing this.

If I navigate to the Cache drive (sdc1)'s page I see sections like:
 

  • Cache 2 Settings
  • SMART Settings

  • Self-Test

  • Attributes

  • Capabilities

  • Identity

I see the SMART tests I could run but I do not see, nor did CTRL + F find, anything named scrub.

Am I misunderstanding something?

Link to comment

Ah! That was absolutely my problem here.

Okay! I began a scrub with "repair corrupted blocks". Since I have two 500 GB SSDs, I imagine my slow CPU might take awhile.

I'll let this command run then rerun the above command to ensure no more errors occur.

From there I'll learn what "recreate the docker image" means in this context and give that a go.

Thanks fro you help so far! :)

Link to comment

Recreate docker.img as only 20G. That should be more than enough. Making it larger won't fix the problem of filling it, it will only make it take longer to fill.

 

When you use Previous Apps to reinstall your dockers, do them one at a time and see if you can figure out what is causing you to fill it. Any application that writes much data is a suspect. Each application must be configured to only write to a path that corresponds to a container path in the mappings. Linux is case-sensitive so make sure you get that correct.

Link to comment

Okay so it looks like the scrub process finished successfully:

 

UUID:             some-uuid
Scrub started:    Mon Jan 25 08:41:42 2021
Status:           finished
Duration:         1:08:11
Total to scrub:   318.25GiB
Rate:             79.66MiB/s
Error summary:    verify=18593 csum=1805142
  Corrected:      1823735
  Uncorrectable:  0
  Unverified:     0


Everything is back to working as expected! So a big thank you to JorgeB and Trurl. I'm going to document what happened and what I did so that the next person can hopefully have less panic than I experienced.

What happened:

  • One of my two cache drives, that are in an array together, disconnected at some point
  • I reconnected the cache drive. This caused problems to happen when reading or writing to sections that were updated on the first disk
  • Unrelated, but my `docker.img` disk use was climbing and I ignored it, until it hit 100% and all my Docker containers stopped as did the Docker daemon


What I did to resolve the problem:

  • Stopped the docker service
  • ran a cache scrub by selecting the first disk in the array (selecting the "repair corruptions...")
  • verified the corruptions were fixed by running `$ btrfs dev stats /mnt/cache`
  • backed up my `docker.img` file just in case (this might not be needed)
  • deleted the original `docker.img` file
  • moved the `docker.img` file to `mnt/cache/docker.img` opposed to the original location of `/mnt/user/system/docker/docker.img`
  • lowered the `docker.img` filesize fom 60 GB to 40GB, as this was an experiment I was performing to try to fix the issue
  • turned on Docker which created a new file
  • Went to the Apps tab and saw the "previous apps" which allowed me to batch install all my old Docker containers with their original templates already selected

What I do not have figured out yet or resolved

  • Figure out what container was the original culprit in filling my `docker.img`. Lots of forum posts, and replies above, suggest I have a misconfigured container that is writing incorrectly to the `docker.img`. If anyone has any tips on how to debug this it'd be appreciated!

Thanks again everyone

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.