Cannot Start Docker Service

whwunraid · February 16, 2018

Updated to 6.4.1 today and all went well server was up and running,

Dockers and VM accessible, except 1, Dolphin.

It would not start and pegged the CPUs on the server.

I forced stopped and then proceed to remove.

I confirmed that I want to remove and left it alone for a bit.

When I cam back the server was hung and had to do a hard shutdown.

Restarted and could get to the GUI but no dockers are listed, when I open the Docker Tab it tells me the "Docker Service Could Not Be Started".

I looked thru the system log and got this, so need some help interpreting what happened.

I looked at the cache pool disks and they appear to be OK, SMART status show no issues.

Thanks in advance...

Feb 16 17:14:33 unRAID1 emhttpd: Starting services...
Feb 16 17:14:33 unRAID1 emhttpd: shcmd (98): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker.img' /var/lib/docker 20
Feb 16 17:14:33 unRAID1 kernel: BTRFS: device fsid e1484a10-2809-4021-96af-166109f50e44 devid 1 transid 260848 /dev/loop2
Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): disk space caching is enabled
Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): has skinny extents
Feb 16 17:14:33 unRAID1 kernel: BTRFS critical (device loop2): corrupt node, bad key order: block=895942656, root=1, slot=173
Feb 16 17:14:33 unRAID1 kernel: BTRFS critical (device loop2): corrupt node, bad key order: block=895942656, root=1, slot=173
Feb 16 17:14:33 unRAID1 root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error.
Feb 16 17:14:33 unRAID1 kernel: BTRFS error (device loop2): open_ctree failed
Feb 16 17:14:33 unRAID1 root: mount error
Feb 16 17:14:33 unRAID1 emhttpd: shcmd (98): exit status: 1
Feb 16 17:14:33 unRAID1 emhttpd: shcmd (111): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1
Feb 16 17:14:33 unRAID1 kernel: BTRFS: device fsid b5c92450-601f-44d5-a1c3-fc6a7df45a25 devid 1 transid 563 /dev/loop2
Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): disk space caching is enabled
Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): has skinny extents
Feb 16 17:14:33 unRAID1 root: Resize '/etc/libvirt' of 'max'
Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): new size for /dev/loop2 is 1073741824
Feb 16 17:14:33 unRAID1 emhttpd: shcmd (113): /etc/rc.d/rc.libvirt start
Feb 16 17:14:33 unRAID1 root: Starting virtlockd...
Feb 16 17:14:33 unRAID1 root: Starting virtlogd...
Feb 16 17:14:33 unRAID1 root: Starting libvirtd...
Feb 16 17:14:33 unRAID1 kernel: tun: Universal TUN/TAP device driver, 1.6
Feb 16 17:14:33 unRAID1 emhttpd: nothing to sync
Feb 16 17:14:33 unRAID1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Feb 16 17:14:33 unRAID1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Feb 16 17:14:33 unRAID1 kernel: Ebtables v2.0 registered
Feb 16 17:14:34 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered blocking state
Feb 16 17:14:34 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered disabled state
Feb 16 17:14:34 unRAID1 kernel: device virbr0-nic entered promiscuous mode
Feb 16 17:14:35 unRAID1 avahi-daemon[3601]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Feb 16 17:14:35 unRAID1 avahi-daemon[3601]: New relevant interface virbr0.IPv4 for mDNS.
Feb 16 17:14:35 unRAID1 avahi-daemon[3601]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Feb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered blocking state
Feb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered listening state
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: started, version 2.78 cachesize 150
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Feb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Feb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: DHCP, sockets bound exclusively to interface virbr0
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: reading /etc/resolv.conf
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: using nameserver 8.8.8.8#53
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: using nameserver 8.8.4.4#53
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: read /etc/hosts - 2 addresses
Feb 16 17:14:35 unRAID1 dnsmasq[4594]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Feb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Feb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered disabled state
Feb 16 17:15:03 unRAID1 sSMTP[6384]: Creating SSL connection to host
Feb 16 17:15:03 unRAID1 sSMTP[6384]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256
Feb 16 17:15:06 unRAID1 sSMTP[6384]: Sent mail for [email protected] (221 2.0.0 closing connection l63sm17053653ita.44 - gsmtp) uid=0 username=root outbytes=671
Feb 16 17:16:06 unRAID1 emhttpd: req (1): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog
Feb 16 17:16:06 unRAID1 emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

Squid · February 16, 2018

docker.img is corrupted for some reason. Could be unclean shutdown, bad block on the cache drive, etc.

Either way your only recourse is to delete it:

Settings - Docker - Advanced View

- Disable the Service

- Delete the image

- Re-enable the Service

Then reinstall your apps

Apps Tab, Previous Apps section

Check off all of your previous apps, then hit Install Multi. Couple of minutes later it'll be like nothing happened.

whwunraid · February 16, 2018

Here is what I got when I checked the cache disk:

root@unRAID1:~# fsck /dev/sdj
fsck from util-linux 2.30.2
e2fsck 1.43.6 (29-Aug-2017)
ext2fs_open2: Bad magic number in super-block
/sbin/e2fsck: Superblock invalid, trying backup blocks...
/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdj

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>

/dev/sdj contains `DOS/MBR boot sector; partition 1 : ID=0x83, start-CHS (0x0,0,0), end-CHS (0x0,0,0), startsector 64, 250069616 sectors, extended partition table (last)' data

Squid · February 16, 2018

Wrong command (you're looking for either xfs_repair or scrub (for btrfs)). You should be doing filesystem checks via the GUI with the array in Maintenance Mode.

Personally, I doubt there's underlying corruption on the cache (but it doesn't hurt to check), and rather its just the contents of the image that's messed. Regardless, recreating the image is easy and painfree

whwunraid · February 16, 2018

Thanks... got it installing.

whwunraid · February 17, 2018

Its never going to stop messing with me, now I am getting tons of disk errors from cache drive reporting it is Read Only.

ErrorWarningSystemArrayLogin

Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 131944448 csum 0xf5a0e085 expected csum 0x00000000 mirror 1
Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408
Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1
Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 49410048
Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 49410048 csum 0xeda1d89a expected csum 0x00000000 mirror 1
Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 49410048
Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 49410048 csum 0xeda1d89a expected csum 0x00000000 mirror 1
Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 3104768
Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 3104768 csum 0x321c030a expected csum 0x00000000 mirror 1
Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408
Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1
Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 3104768
Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 3104768 csum 0x321c030a expected csum 0x00000000 mirror 1
Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187
Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408
Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1
Feb 16 20:23:54 unRAID1 kernel: lo_write_bvec: 9 callbacks suppressed
Feb 16 20:23:54 unRAID1 kernel: loop: Write error at byte offset 3869896704, length 4096.
Feb 16 20:23:54 unRAID1 kernel: print_req_error: 9 callbacks suppressed
Feb 16 20:23:54 unRAID1 kernel: print_req_error: I/O error, dev loop2, sector 7558392
Feb 16 20:23:54 unRAID1 kernel: btrfs_dev_stat_print_on_error: 9 callbacks suppressed
Feb 16 20:23:54 unRAID1 kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0
Feb 16 20:27:00 unRAID1 root: Fix Common Problems Version 2018.02.16
Feb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: Unable to write to cache
Feb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: Unable to write to Docker Image
Feb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: unclean shutdown detected of your server ** Ignored
Feb 16 20:27:09 unRAID1 root: Fix Common Problems: Warning: Template URL for docker application binhex-delugevpn is missing.
Feb 16 20:27:10 unRAID1 sSMTP[24828]: Creating SSL connection to host
Feb 16 20:27:10 unRAID1 sSMTP[24828]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256
Feb 16 20:27:13 unRAID1 sSMTP[24828]: Sent mail for [email protected] (221 2.0.0 closing connection f63sm11927568ioj.74 - gsmtp) uid=0 username=root outbytes=863
Feb 16 20:37:23 unRAID1 ool www[11230]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''
Feb 16 20:38:33 unRAID1 login[14662]: ROOT LOGIN on '/dev/pts/0'

Squid · February 17, 2018

The expert here on this is @johnnie.black

Wouldn't be a bad idea to also post your diagnostics

whwunraid · February 17, 2018

Here are the diagnostics, I opened case and re-seated the SATA connectors on card and SSDs.

Shouldn't be bad wires, less than a year old.

unraid1-diagnostics-20180216-2158.zip

whwunraid · February 17, 2018

SMART finally came back with pre-fails.. so looks like the disk.

Lets see if diags say the same... thanks for the assist.

Squid · February 17, 2018

Where you see "pre fail" in a smart report doesn't mean that it is about to fail. That wording means that the attribute is indicative that once the values increase the drive might fail. Everything in a smart report is either prefail or old-age

And since you're running a btrfs cache pool, corruption on the pool is fairly common in the case of unclean shutdowns.

JorgeB · February 17, 2018

Run a scrub on the pool, if that doesn't fix it better to backup/format/restore.

whwunraid · February 17, 2018

Yeah I think so too, picked up new larger PCI-e SSD for cache... thanks for all your help.

whwunraid · February 18, 2018

So there is still some underlying issue with the GUI freezing up and then even a POWERDOWN command will not restart the box.

Dockers install fine then freeze the GUI once you try and start them, recreating the docker image has no effect.

I just want to install 6.4.1 to a new fresh USB thumb drive (current one is easily 10 years old).

So at this point I just want to preserve the disk & share config and port that into 6.4.1 USB drive.

Can the files in the \FLASH\CONFIG\ & \FLASH\CONFIG\SHARES relating to shares and disks just be copied over, anything else necessary?

I do not want to take docker, app data, domain, etc.. to the new install, something in there is totally messed up.

Don't want to lose data (likely not going to happen), but pretty frustrated with this right now.

Diags attached for good measure...

unraid1-diagnostics-20180218-1559.zip

JorgeB · February 18, 2018

Docker image is still corrupt from initial boot:

Feb 18 13:34:31 unRAID1 kernel: BTRFS critical (device loop0): corrupt leaf, slot offset bad: block=307478528, root=1, slot=32
Feb 18 13:34:31 unRAID1 kernel: BTRFS critical (device loop0): corrupt leaf, slot offset bad: block=307478528, root=1, slot=32

whwunraid · February 18, 2018

Yes replaced cache drive and rebuilt docker image and reinstalled a docker.. so something else has to be wrong.

Unthred · July 13, 2018

On 2/16/2018 at 11:34 PM, Squid said:

docker.img is corrupted for some reason. Could be unclean shutdown, bad block on the cache drive, etc.

Either way your only recourse is to delete it:

Settings - Docker - Advanced View

- Disable the Service

- Delete the image

- Re-enable the Service

Then reinstall your apps

Apps Tab, Previous Apps section

Check off all of your previous apps, then hit Install Multi. Couple of minutes later it'll be like nothing happened.

Was rerally rather worries about my server.... I knew my cache drive was giving some errors and jus not gotten around to replacing until a reboot broke docker! Was dreading all the hassle and reconfig it might entail!

Followed this to replace the cache drive http://lime-technology.com/wiki/Replace_A_Cache_Drive and your post..... honestly it couldn't have been easier..... few clicks and everything sprung into life! around 30 dockers it was too!

Thanks soooooo very muchly for the easy instructions Squid!

Super glad I bought unRaid now

Cannot Start Docker Service

Recommended Posts

whwunraid

Link to comment

Squid

Link to comment

whwunraid

Link to comment

Squid

Link to comment

whwunraid

Link to comment

whwunraid

Link to comment

Squid

Link to comment

whwunraid

Link to comment

whwunraid

Link to comment

Squid

Link to comment

JorgeB

Link to comment

whwunraid

Link to comment

whwunraid

Link to comment

JorgeB

Link to comment

whwunraid

Link to comment

Unthred

Link to comment

Archived