whwunraid Posted February 16, 2018 Posted February 16, 2018 Updated to 6.4.1 today and all went well server was up and running, Dockers and VM accessible, except 1, Dolphin. It would not start and pegged the CPUs on the server. I forced stopped and then proceed to remove. I confirmed that I want to remove and left it alone for a bit. When I cam back the server was hung and had to do a hard shutdown. Restarted and could get to the GUI but no dockers are listed, when I open the Docker Tab it tells me the "Docker Service Could Not Be Started". I looked thru the system log and got this, so need some help interpreting what happened. I looked at the cache pool disks and they appear to be OK, SMART status show no issues. Thanks in advance... Feb 16 17:14:33 unRAID1 emhttpd: Starting services...Feb 16 17:14:33 unRAID1 emhttpd: shcmd (98): /usr/local/sbin/mount_image '/mnt/user/system/docker/docker.img' /var/lib/docker 20Feb 16 17:14:33 unRAID1 kernel: BTRFS: device fsid e1484a10-2809-4021-96af-166109f50e44 devid 1 transid 260848 /dev/loop2Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): disk space caching is enabledFeb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): has skinny extentsFeb 16 17:14:33 unRAID1 kernel: BTRFS critical (device loop2): corrupt node, bad key order: block=895942656, root=1, slot=173Feb 16 17:14:33 unRAID1 kernel: BTRFS critical (device loop2): corrupt node, bad key order: block=895942656, root=1, slot=173Feb 16 17:14:33 unRAID1 root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error.Feb 16 17:14:33 unRAID1 kernel: BTRFS error (device loop2): open_ctree failedFeb 16 17:14:33 unRAID1 root: mount errorFeb 16 17:14:33 unRAID1 emhttpd: shcmd (98): exit status: 1Feb 16 17:14:33 unRAID1 emhttpd: shcmd (111): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1Feb 16 17:14:33 unRAID1 kernel: BTRFS: device fsid b5c92450-601f-44d5-a1c3-fc6a7df45a25 devid 1 transid 563 /dev/loop2Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): disk space caching is enabledFeb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): has skinny extentsFeb 16 17:14:33 unRAID1 root: Resize '/etc/libvirt' of 'max'Feb 16 17:14:33 unRAID1 kernel: BTRFS info (device loop2): new size for /dev/loop2 is 1073741824Feb 16 17:14:33 unRAID1 emhttpd: shcmd (113): /etc/rc.d/rc.libvirt startFeb 16 17:14:33 unRAID1 root: Starting virtlockd...Feb 16 17:14:33 unRAID1 root: Starting virtlogd...Feb 16 17:14:33 unRAID1 root: Starting libvirtd...Feb 16 17:14:33 unRAID1 kernel: tun: Universal TUN/TAP device driver, 1.6Feb 16 17:14:33 unRAID1 emhttpd: nothing to syncFeb 16 17:14:33 unRAID1 kernel: ip_tables: (C) 2000-2006 Netfilter Core TeamFeb 16 17:14:33 unRAID1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core TeamFeb 16 17:14:33 unRAID1 kernel: Ebtables v2.0 registeredFeb 16 17:14:34 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered blocking stateFeb 16 17:14:34 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered disabled stateFeb 16 17:14:34 unRAID1 kernel: device virbr0-nic entered promiscuous modeFeb 16 17:14:35 unRAID1 avahi-daemon[3601]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.Feb 16 17:14:35 unRAID1 avahi-daemon[3601]: New relevant interface virbr0.IPv4 for mDNS.Feb 16 17:14:35 unRAID1 avahi-daemon[3601]: Registering new address record for 192.168.122.1 on virbr0.IPv4.Feb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered blocking stateFeb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered listening stateFeb 16 17:14:35 unRAID1 dnsmasq[4594]: started, version 2.78 cachesize 150Feb 16 17:14:35 unRAID1 dnsmasq[4594]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotifyFeb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1hFeb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: DHCP, sockets bound exclusively to interface virbr0Feb 16 17:14:35 unRAID1 dnsmasq[4594]: reading /etc/resolv.confFeb 16 17:14:35 unRAID1 dnsmasq[4594]: using nameserver 8.8.8.8#53Feb 16 17:14:35 unRAID1 dnsmasq[4594]: using nameserver 8.8.4.4#53Feb 16 17:14:35 unRAID1 dnsmasq[4594]: read /etc/hosts - 2 addressesFeb 16 17:14:35 unRAID1 dnsmasq[4594]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addressesFeb 16 17:14:35 unRAID1 dnsmasq-dhcp[4594]: read /var/lib/libvirt/dnsmasq/default.hostsfileFeb 16 17:14:35 unRAID1 kernel: virbr0: port 1(virbr0-nic) entered disabled stateFeb 16 17:15:03 unRAID1 sSMTP[6384]: Creating SSL connection to hostFeb 16 17:15:03 unRAID1 sSMTP[6384]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256Feb 16 17:15:06 unRAID1 sSMTP[6384]: Sent mail for [email protected] (221 2.0.0 closing connection l63sm17053653ita.44 - gsmtp) uid=0 username=root outbytes=671Feb 16 17:16:06 unRAID1 emhttpd: req (1): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslogFeb 16 17:16:06 unRAID1 emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Squid Posted February 16, 2018 Posted February 16, 2018 docker.img is corrupted for some reason. Could be unclean shutdown, bad block on the cache drive, etc. Either way your only recourse is to delete it: Settings - Docker - Advanced View - Disable the Service - Delete the image - Re-enable the Service Then reinstall your apps Apps Tab, Previous Apps section Check off all of your previous apps, then hit Install Multi. Couple of minutes later it'll be like nothing happened.
whwunraid Posted February 16, 2018 Author Posted February 16, 2018 Here is what I got when I checked the cache disk: root@unRAID1:~# fsck /dev/sdj fsck from util-linux 2.30.2 e2fsck 1.43.6 (29-Aug-2017) ext2fs_open2: Bad magic number in super-block /sbin/e2fsck: Superblock invalid, trying backup blocks... /sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdj The superblock could not be read or does not describe a valid ext2/ext3/ext4 filesystem. If the device is valid and it really contains an ext2/ext3/ext4 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> or e2fsck -b 32768 <device> /dev/sdj contains `DOS/MBR boot sector; partition 1 : ID=0x83, start-CHS (0x0,0,0), end-CHS (0x0,0,0), startsector 64, 250069616 sectors, extended partition table (last)' data
Squid Posted February 16, 2018 Posted February 16, 2018 Wrong command (you're looking for either xfs_repair or scrub (for btrfs)). You should be doing filesystem checks via the GUI with the array in Maintenance Mode. Personally, I doubt there's underlying corruption on the cache (but it doesn't hurt to check), and rather its just the contents of the image that's messed. Regardless, recreating the image is easy and painfree
whwunraid Posted February 17, 2018 Author Posted February 17, 2018 Its never going to stop messing with me, now I am getting tons of disk errors from cache drive reporting it is Read Only. ErrorWarningSystemArrayLogin Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 131944448 csum 0xf5a0e085 expected csum 0x00000000 mirror 1Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 49410048Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 49410048 csum 0xeda1d89a expected csum 0x00000000 mirror 1Feb 16 20:23:02 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:02 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 49410048Feb 16 20:23:02 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 49410048 csum 0xeda1d89a expected csum 0x00000000 mirror 1Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 3104768Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 3104768 csum 0x321c030a expected csum 0x00000000 mirror 1Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 3104768Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 3104768 csum 0x321c030a expected csum 0x00000000 mirror 1Feb 16 20:23:43 unRAID1 kernel: BTRFS critical (device sdj1): corrupt leaf, bad key order: block=379185201152, root=1, slot=187Feb 16 20:23:43 unRAID1 kernel: BTRFS info (device sdj1): no csum found for inode 31277 start 41873408Feb 16 20:23:43 unRAID1 kernel: BTRFS warning (device sdj1): csum failed root 5 ino 31277 off 41873408 csum 0x844d252a expected csum 0x00000000 mirror 1Feb 16 20:23:54 unRAID1 kernel: lo_write_bvec: 9 callbacks suppressedFeb 16 20:23:54 unRAID1 kernel: loop: Write error at byte offset 3869896704, length 4096.Feb 16 20:23:54 unRAID1 kernel: print_req_error: 9 callbacks suppressedFeb 16 20:23:54 unRAID1 kernel: print_req_error: I/O error, dev loop2, sector 7558392Feb 16 20:23:54 unRAID1 kernel: btrfs_dev_stat_print_on_error: 9 callbacks suppressedFeb 16 20:23:54 unRAID1 kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0Feb 16 20:27:00 unRAID1 root: Fix Common Problems Version 2018.02.16Feb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: Unable to write to cacheFeb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: Unable to write to Docker ImageFeb 16 20:27:06 unRAID1 root: Fix Common Problems: Error: unclean shutdown detected of your server ** IgnoredFeb 16 20:27:09 unRAID1 root: Fix Common Problems: Warning: Template URL for docker application binhex-delugevpn is missing.Feb 16 20:27:10 unRAID1 sSMTP[24828]: Creating SSL connection to hostFeb 16 20:27:10 unRAID1 sSMTP[24828]: SSL connection using ECDHE-RSA-AES128-GCM-SHA256Feb 16 20:27:13 unRAID1 sSMTP[24828]: Sent mail for [email protected] (221 2.0.0 closing connection f63sm11927568ioj.74 - gsmtp) uid=0 username=root outbytes=863Feb 16 20:37:23 unRAID1 ool www[11230]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''Feb 16 20:38:33 unRAID1 login[14662]: ROOT LOGIN on '/dev/pts/0'
Squid Posted February 17, 2018 Posted February 17, 2018 The expert here on this is @johnnie.black Wouldn't be a bad idea to also post your diagnostics
whwunraid Posted February 17, 2018 Author Posted February 17, 2018 Here are the diagnostics, I opened case and re-seated the SATA connectors on card and SSDs. Shouldn't be bad wires, less than a year old. unraid1-diagnostics-20180216-2158.zip
whwunraid Posted February 17, 2018 Author Posted February 17, 2018 SMART finally came back with pre-fails.. so looks like the disk. Lets see if diags say the same... thanks for the assist.
Squid Posted February 17, 2018 Posted February 17, 2018 Where you see "pre fail" in a smart report doesn't mean that it is about to fail. That wording means that the attribute is indicative that once the values increase the drive might fail. Everything in a smart report is either prefail or old-age And since you're running a btrfs cache pool, corruption on the pool is fairly common in the case of unclean shutdowns.
JorgeB Posted February 17, 2018 Posted February 17, 2018 Run a scrub on the pool, if that doesn't fix it better to backup/format/restore.
whwunraid Posted February 17, 2018 Author Posted February 17, 2018 Yeah I think so too, picked up new larger PCI-e SSD for cache... thanks for all your help.
whwunraid Posted February 18, 2018 Author Posted February 18, 2018 So there is still some underlying issue with the GUI freezing up and then even a POWERDOWN command will not restart the box. Dockers install fine then freeze the GUI once you try and start them, recreating the docker image has no effect. I just want to install 6.4.1 to a new fresh USB thumb drive (current one is easily 10 years old). So at this point I just want to preserve the disk & share config and port that into 6.4.1 USB drive. Can the files in the \FLASH\CONFIG\ & \FLASH\CONFIG\SHARES relating to shares and disks just be copied over, anything else necessary? I do not want to take docker, app data, domain, etc.. to the new install, something in there is totally messed up. Don't want to lose data (likely not going to happen), but pretty frustrated with this right now. Diags attached for good measure... unraid1-diagnostics-20180218-1559.zip
JorgeB Posted February 18, 2018 Posted February 18, 2018 Docker image is still corrupt from initial boot: Feb 18 13:34:31 unRAID1 kernel: BTRFS critical (device loop0): corrupt leaf, slot offset bad: block=307478528, root=1, slot=32 Feb 18 13:34:31 unRAID1 kernel: BTRFS critical (device loop0): corrupt leaf, slot offset bad: block=307478528, root=1, slot=32
whwunraid Posted February 18, 2018 Author Posted February 18, 2018 Yes replaced cache drive and rebuilt docker image and reinstalled a docker.. so something else has to be wrong.
Unthred Posted July 13, 2018 Posted July 13, 2018 On 2/16/2018 at 11:34 PM, Squid said: docker.img is corrupted for some reason. Could be unclean shutdown, bad block on the cache drive, etc. Either way your only recourse is to delete it: Settings - Docker - Advanced View - Disable the Service - Delete the image - Re-enable the Service Then reinstall your apps Apps Tab, Previous Apps section Check off all of your previous apps, then hit Install Multi. Couple of minutes later it'll be like nothing happened. Was rerally rather worries about my server.... I knew my cache drive was giving some errors and jus not gotten around to replacing until a reboot broke docker! Was dreading all the hassle and reconfig it might entail! Followed this to replace the cache drive http://lime-technology.com/wiki/Replace_A_Cache_Drive and your post..... honestly it couldn't have been easier..... few clicks and everything sprung into life! around 30 dockers it was too! Thanks soooooo very muchly for the easy instructions Squid! Super glad I bought unRaid now
Recommended Posts
Archived
This topic is now archived and is closed to further replies.