Docker Image corrupt?

andyjayh · March 24, 2019

Hi all, my UniFi docker has just failed for some reason. I noticed this yesterday along with a few other containers not responding so I rebooted my server. Everything came back ok apart from the UniFi Controller container which although had a green light against it, it only showed as 'server starting' when I connected to the Webui.

When I look in the sys log I see the following error repeating all the time the controller container is running;

Mar 23 21:35:03 Media kernel: BTRFS warning (device loop2): csum failed root 1796 ino 10101 off 41115648 csum 0xb3cf73fa expected csum 0x293aeffb mirror 1

This morning I got a message from Fix Common Problems that said the Docker image was read only. I stopped Docker and increased the size to see if this was the issue but no avail. Stopped Docker again and ran a scrub;

scrub status for 266b6e65-1004-4619-8039-7a234e50f0f1
scrub started at Sun Mar 24 18:14:34 2019 and finished after 00:00:03
total bytes scrubbed: 3.65GiB with 2 errors
error details: csum=2
corrected errors: 0, uncorrectable errors: 2, unverified errors: 0

All other dockers run fine, only when I start this one I see the csum errors appear in the sys log. The UniFi Controller never starts and all I get is a message saying server starting when I try and connect to the WebUI.

I assume I need to remove the Docker Image file and restore from a CA backup? Obviously this will take all my containers back to the last backup, will this be a problem? I run Plex, Sonarr, Radarr, nzbget UniFi-video and Tautulli.

Squid · March 24, 2019

2 minutes ago, andyjayh said:

I assume I need to remove the Docker Image file and restore from a CA backup? Obviously this will take all my containers back to the last backup, will this be a problem? I run Plex, Sonarr, Radarr, nzbget UniFi-video and Tautulli.

Nope. All the appdata is stored outside of the docker.img. Remove the docker image then go to Apps, Previous Apps, check off what you want and you're back in business

andyjayh · March 24, 2019

43 minutes ago, Squid said:

Nope. All the appdata is stored outside of the docker.img. Remove the docker image then go to Apps, Previous Apps, check off what you want and you're back in business

Ah I see. So leave the appdata well alone in this case. Will give this a go, thanks.

andyjayh · March 24, 2019

Hmm, not great. Did the above and what I am seeing in the Sys Log now is;

Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2
Mar 24 21:54:59 Media kernel: BTRFS critical (device sdl1): corrupt node: root=7 block=93667393536 slot=86, bad key order, current (18446744073709551606 128 986823262208) next (18446744073707454454 128 986826604544)

When I go to the WebUI it is cycling between a message about repairing the database and then server startup.

Squid · March 24, 2019

1 hour ago, andyjayh said:

Hmm, not great. Did the above and what I am seeing in the Sys Log now is;

Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2
Mar 24 21:54:59 Media kernel: BTRFS critical (device sdl1): corrupt node: root=7 block=93667393536 slot=86, bad key order, current (18446744073709551606 128 986823262208) next (18446744073707454454 128 986826604544)

When I go to the WebUI it is cycling between a message about repairing the database and then server startup.

Which means that the docker corruption was caused by an underlying file system problem on the cache drive. Not in a position to assist right now, but if @johnnie.blackis around he can help no problems

Sent via telekinesis

Edited March 24, 2019 by Squid

andyjayh · March 24, 2019

Eek, sounds bad Thanks for your help though.

Squid · March 24, 2019

1 hour ago, andyjayh said:

Eek, sounds bad Thanks for your help though.

Probably not. He's just the resident expert. Wouldn't be a bad idea to post your diagnostics though...

JorgeB · March 25, 2019

10 hours ago, andyjayh said:

Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2

These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> Diagnostics

andyjayh · March 25, 2019

Probably not. He's just the resident expert. Wouldn't be a bad idea to post your diagnostics though...

Thanks, I hope not

Good idea and I’ll sort that when I get home.

Sent from my iPhone using Tapatalk Pro

andyjayh · March 25, 2019

These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> Diagnostics

Brilliant, I’ll do that as soon as I get back home

Sent from my iPhone using Tapatalk Pro

andyjayh · March 26, 2019

23 hours ago, johnnie.black said:

These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> Diagnostics

Apologies for not doing this yesterday but here is my diagnostics report.

I've also just noticed in my Dashboard that it is showing the 'Flash Log Docker' as 100% on the second of the three graphs! Not sure where this is located but none of my drives are full or near full including the flash drive.

media-diagnostics-20190326-0843.zip

JorgeB · March 26, 2019

Syslog rotated so can't see the beginning of the problem but it does look like a cache device dropped offline, reboot and after a few minutes of array usage grab and post new diags.

andyjayh · March 26, 2019

18 minutes ago, johnnie.black said:

Syslog rotated so can't see the beginning of the problem but it does look like a cache device dropped offline, reboot and after a few minutes of array usage grab and post new diags.

Ok, thanks for looking. I've restarted the server and uploaded a new diagnostics file.

Learning curve but I now understand that what I was seeing in the Dashboard was that my Docker Log was filling up and I suspect this is causing part of the issue, don't know if this was able to take one of the cache drives offline and therefore caused the corruption I am now experiencing? I need to understand why my log file is now filling up so quickly when my server has in the past been up for months at a time without issue. UniFi Controller and Video are the new Dockers so I suspect one of these is writing large amounts of log file entries?

media-diagnostics-20190326-0944.zip

JorgeB · March 26, 2019

At some time in the past there were errors writing to both cache devices, these a re hardware errors:

Mar 26 09:36:26 Media kernel: BTRFS info (device sdl1): bdev /dev/sdl1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Mar 26 09:36:26 Media kernel: BTRFS info (device sdl1): bdev /dev/sdk1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0

See here for more info on what to do:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

andyjayh · March 26, 2019

Ok thanks for this. I've created the following script as suggested in the linked post;

#!/bin/bash
btrfs dev stats -c /mnt/cache
if [[ $? -ne 0 ]]; then /usr/local/emhttp/webGui/scripts/notify -i warning -s "ERRORS on cache pool"; fi

Ran with the following output;

[/dev/sdl1].write_io_errs 7
[/dev/sdl1].read_io_errs 0
[/dev/sdl1].flush_io_errs 0
[/dev/sdl1].corruption_errs 0
[/dev/sdl1].generation_errs 0
[/dev/sdk1].write_io_errs 9
[/dev/sdk1].read_io_errs 0
[/dev/sdk1].flush_io_errs 0
[/dev/sdk1].corruption_errs 0
[/dev/sdk1].generation_errs 0

/tmp/user.scripts/tmpScripts/btrfs check/script: line 4: syntax error: unexpected end of file

Not quite sure about the syntax error?

As expected, some write errors. As suggested in your linked post this is most likely a cable fault so until I can purchase another pair of cable I should try and reseat them? Once I do this the only thing to do is scrub the cache drive? I can do this via selecting the primary cache drive and running the scrub tool? I've done this and it's reporting no errors.

JorgeB · March 26, 2019

57 minutes ago, andyjayh said:

Not quite sure about the syntax error?

Sometimes copying from the forum inserts extra characters, copy/paste to notepad first.

andyjayh · April 16, 2019

I'll start a new thread for this as it's clearly an issue with my Cache drives and so far I can't resolve and got completely stuck with how to fix the issues on the drives.

Docker Image corrupt?

Recommended Posts

andyjayh

Link to comment

Squid

Link to comment

andyjayh

Link to comment

andyjayh

Link to comment

Squid

Link to comment

andyjayh

Link to comment

Squid

Link to comment

JorgeB

Link to comment

andyjayh

Link to comment

andyjayh

Link to comment

andyjayh

Link to comment

JorgeB

Link to comment

andyjayh

Link to comment

JorgeB

Link to comment

andyjayh

Link to comment

JorgeB

Link to comment

andyjayh

Link to comment

Join the conversation