October 29, 201411 yr So for me at least if the partitionthe docker image sits on fills up then docker breaks even though the loopback image sitself has loads of free space. Not ideal but its a good lesson in how to debug BTRFS. I know I can recreate the image but that requires a bunch of downloads. I was also curious how to debug this. Here are sample error messages: Oct 29 10:42:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 25868 off 633421824 csum 3607886504 expected csum 3916759364 Oct 29 10:42:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 25868 off 633421824 csum 3607886504 expected csum 3916759364 Oct 29 10:43:39 TOWER kernel: BTRFS info (device loop8): csum failed ino 23768 off 633708544 csum 1897023266 expected csum 856457585 Oct 29 10:43:39 TOWER kernel: BTRFS info (device loop8): csum failed ino 23768 off 633708544 csum 1897023266 expected csum 856457585 Oct 29 10:43:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 42396146 off 23777280 csum 697967187 expected csum 3067279347 Oct 29 10:43:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 42396146 off 23777280 csum 697967187 expected csum 3067279347 Oct 29 10:43:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 25868 off 633421824 csum 3607886504 expected csum 3916759364 Oct 29 10:43:41 TOWER kernel: BTRFS info (device loop8): csum failed ino 25868 off 633421824 csum 3607886504 expected csum 3916759364 If I pick one of these and I can locate the actual corruption find /var/lib/docker -inum 23768 -ls 23768 4 -rw-r--r-- 1 root root 1841 Aug 10 02:15 /var/lib/docker/btrfs/subvolumes/17798373a5bc71cc6c69e885d3ace356980b4f90585127859919fe83796f965a/usr/lib/python2.7/encodings/iso2022_jp_ext.pyc 23768 4 -rw-r--r-- 1 root root 1841 Aug 10 02:15 /var/lib/docker/btrfs/subvolumes/9b260cd5942ef9971872d6ef5f03a783e5490ab2e2d0f4631db97f15fb8355c6/usr/lib/python2.7/encodings/iso2022_jp_ext.pyc 23768 4 -rw-r--r-- 1 root root 1841 Aug 10 02:15 /var/lib/docker/btrfs/subvolumes/05b608d5e024a731864a097abd5b510056801d9129c621221436c60727f7fbb8/usr/lib/python2.7/encodings/iso2022_jp_ext.pyc ... You can run a repair the image once its unmounted btrfsck --repair --check-data-csum /mnt/cache/apps/docker_1.img And it returns a bunch of stuff but doesn't actually fix anything. So I am now at a loss. This ultra resilient new fangled BTRFS seems a bit fragile to me and repairing and debugging it is not trivial. Surely there has to be a way?
October 29, 201411 yr Shouldn't the command be btrfs check --repair --check-data-csum /mnt/cache/apps/docker_1.img https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check
October 29, 201411 yr Author Indeed it looks like btrfsck is deprecated. wonder why its still included. Will try that and post back.
October 31, 201411 yr Author So here we go again. First lets scrub to be safe: btrfs scrub start /var/lib/docker -B -R -d -r 2>&1 scrub device /dev/loop8 (id 1) done scrub started at Fri Oct 31 08:26:09 2014 and finished after 25 seconds data_extents_scrubbed: 298753 tree_extents_scrubbed: 160002 data_bytes_scrubbed: 4836069376 tree_bytes_scrubbed: 2621472768 read_errors: 0 csum_errors: 3 verify_errors: 0 no_csum: 560 csum_discards: 303591 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 10737418240 Now lets make sure we are still seeing error: Oct 31 04:41:43 TOWER kernel: BTRFS info (device loop8): csum failed ino 42396146 off 23777280 csum 697967187 expected csum 3067279347 Oct 31 04:41:43 TOWER kernel: BTRFS info (device loop8): csum failed ino 42396146 off 23777280 csum 697967187 expected csum 3067279347 Ok so lets do some ugly bash to list unique inode csum errors grep csum /var/log/syslog | cut -d " " -f13 | sort | uniq 23768 25868 42396146 Three, thats a good sign as it mates what scrub is saying "csum_errors: 3" Right lets do a dry run using the new repair command btrfs check --check-data-csum /mnt/cache/apps/docker_1.img Checking filesystem on /mnt/cache/apps/docker_1.img UUID: a64c0b52-f437-4526-bfa0-69841c0b28ae checking extents checking free space cache checking fs roots checking csums mirror 0 bytenr 5023473664 csum 3607886504 expected csum 3916759364 mirror 0 bytenr 6124892160 csum 1897023266 expected csum 856457585 mirror 0 bytenr 6124896256 csum 697967187 expected csum 3067279347 checking root refs found 3580535602 bytes used err is 0 total csum bytes: 4720824 total tree bytes: 1310670848 total fs tree bytes: 1243414528 total extent tree bytes: 56541184 btree space waste bytes: 237721390 file data blocks allocated: 26266181632 referenced 25847861248 Btrfs v3.16.1 So we are seeing three "mirror 0 bytenr" errors. It is a fair assumption these are our problem inodes. So lets try and fix them this time btrfs check --repair --check-data-csum /mnt/cache/apps/docker_1.img enabling repair mode Checking filesystem on /mnt/cache/apps/docker_1.img UUID: a64c0b52-f437-4526-bfa0-69841c0b28ae checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums mirror 0 bytenr 5023473664 csum 3607886504 expected csum 3916759364 mirror 0 bytenr 6124892160 csum 1897023266 expected csum 856457585 mirror 0 bytenr 6124896256 csum 697967187 expected csum 3067279347 checking root refs found 3580535602 bytes used err is 0 total csum bytes: 4720824 total tree bytes: 1310670848 total fs tree bytes: 1243414528 total extent tree bytes: 56541184 btree space waste bytes: 237721390 file data blocks allocated: 26266181632 referenced 25847861248 Btrfs v3.16.1 I wont bother posting more noise but suffice to say this all did absolutely nothing. I am changing my focus now. I am less interested in fixing these csum directly but rather locating the corrupt files, which docker they are part of and then just grabbing the container again. Not as easy as it sounds. Sure locating the file is easy: find /var/lib/docker -inum 42396146 -ls 42396146 23224 -rw-r----- 1 root adm 23780974 Oct 31 08:42 /var/lib/docker/btrfs/subvolumes/546dddbc4cf47a96847721980eed8a597640c693abb16f94f42d1a9550bbb67e/var/log/auth.log but which container is this?
Archived
This topic is now archived and is closed to further replies.