GrehgyHils Posted January 24, 2021 Share Posted January 24, 2021 Hey all, I noticed my Docker disk space was at 100% as all my containers were stopped. I've read quite a few threads that point to potentially setting up a container incorrectly where the data it downloads goes to the wrong folder. I have been unable to figure out what is responsible. I upped the Docker "Docker vDisk size:" from 40gb to 50gb and still the Docker service reports "Docker Service failed to start." Any advice is appreciated! Attached is my diagnostics, as I've seen many people ask for this data. tower-diagnostics-20210124-1449.zip Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 This shows that on of the cache devices dropped offline in the past: Jan 24 14:47:07 tower kernel: BTRFS info (device sdb1): bdev /dev/sdc1 errs: wr 1507246927, rd 137577961, flush 19733411, corrupt 0, gen 0 See here for what to do and how to better monitor a pool, then recreate the docker image. 1 Quote Link to comment
GrehgyHils Posted January 25, 2021 Author Share Posted January 25, 2021 (edited) Ah okay that makes sense, I remember one of my two cache disks disconnecting. I did not realize that would cause an issue. I ran the `$ btrfs dev stats /mnt/cache` script and got the output of [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 1507246927 [/dev/sdc1].read_io_errs 137577961 [/dev/sdc1].flush_io_errs 19733411 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 Which is what you showed me from the diagnostics output above. So I'm a bit confused from your link above, and trying to be extra careful to not result in any data loss as I have like 50+ containers configured. I've re-seated the cable to the cache that disconnected and believe that is resolved. I've also reset the btrfs dev stats. I'm now at the point where I want to: Quote Finally run a scrub, make sure there are no uncorrectable errors... As I want to be able to bring the Docker containers back online. Any advice? Edited January 25, 2021 by GrehgyHils Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 Did you run the scrub and confirmed all errors were corrected? Quote Link to comment
GrehgyHils Posted January 25, 2021 Author Share Posted January 25, 2021 Apologies, I just reread what I wrote and realized what I wrote wasn't clear. I'm trying to express that I don't actually follow what command one runs to perform the scrub.I ran btrfs dev stats -z /mnt/cache and the output now shows no errors: Quote btrfs dev stats -z /mnt/cache [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 If the btrfs dev command was not the correct way to perform a scrub, can you help me understand that? I've googled this, with respect to unraid, and have not been able to piece that together. Thank you for your patience Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 You can run the scrub by clicking on cache on the main page then scroll down to the scrub section. Quote Link to comment
GrehgyHils Posted January 25, 2021 Author Share Posted January 25, 2021 I apologize but I'm still not seeing this. If I navigate to the Cache drive (sdc1)'s page I see sections like: Cache 2 Settings SMART Settings Self-Test Attributes Capabilities Identity I see the SMART tests I could run but I do not see, nor did CTRL + F find, anything named scrub. Am I misunderstanding something? Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 Need to click on cache1, then: Quote Link to comment
GrehgyHils Posted January 25, 2021 Author Share Posted January 25, 2021 Ah! That was absolutely my problem here. Okay! I began a scrub with "repair corrupted blocks". Since I have two 500 GB SSDs, I imagine my slow CPU might take awhile. I'll let this command run then rerun the above command to ensure no more errors occur. From there I'll learn what "recreate the docker image" means in this context and give that a go. Thanks fro you help so far! Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 4 minutes ago, GrehgyHils said: I'll learn what "recreate the docker image" means in this context and give that a go. https://forums.unraid.net/topic/57181-docker-faq/?do=findComment&comment=564309 Quote Link to comment
trurl Posted January 25, 2021 Share Posted January 25, 2021 Recreate docker.img as only 20G. That should be more than enough. Making it larger won't fix the problem of filling it, it will only make it take longer to fill. When you use Previous Apps to reinstall your dockers, do them one at a time and see if you can figure out what is causing you to fill it. Any application that writes much data is a suspect. Each application must be configured to only write to a path that corresponds to a container path in the mappings. Linux is case-sensitive so make sure you get that correct. Quote Link to comment
GrehgyHils Posted January 25, 2021 Author Share Posted January 25, 2021 Okay so it looks like the scrub process finished successfully: UUID: some-uuid Scrub started: Mon Jan 25 08:41:42 2021 Status: finished Duration: 1:08:11 Total to scrub: 318.25GiB Rate: 79.66MiB/s Error summary: verify=18593 csum=1805142 Corrected: 1823735 Uncorrectable: 0 Unverified: 0 Everything is back to working as expected! So a big thank you to JorgeB and Trurl. I'm going to document what happened and what I did so that the next person can hopefully have less panic than I experienced. What happened: One of my two cache drives, that are in an array together, disconnected at some point I reconnected the cache drive. This caused problems to happen when reading or writing to sections that were updated on the first disk Unrelated, but my `docker.img` disk use was climbing and I ignored it, until it hit 100% and all my Docker containers stopped as did the Docker daemon What I did to resolve the problem: Stopped the docker service ran a cache scrub by selecting the first disk in the array (selecting the "repair corruptions...") verified the corruptions were fixed by running `$ btrfs dev stats /mnt/cache` backed up my `docker.img` file just in case (this might not be needed) deleted the original `docker.img` file moved the `docker.img` file to `mnt/cache/docker.img` opposed to the original location of `/mnt/user/system/docker/docker.img` lowered the `docker.img` filesize fom 60 GB to 40GB, as this was an experiment I was performing to try to fix the issue turned on Docker which created a new file Went to the Apps tab and saw the "previous apps" which allowed me to batch install all my old Docker containers with their original templates already selected What I do not have figured out yet or resolved Figure out what container was the original culprit in filling my `docker.img`. Lots of forum posts, and replies above, suggest I have a misconfigured container that is writing incorrectly to the `docker.img`. If anyone has any tips on how to debug this it'd be appreciated! Thanks again everyone Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.