[solved] unRAID 6.12.6 Docker Service failed to start

openam · December 19, 2023

My docker just stopped working. I did try adding a new docker using the TRaSH guide. Not sure if something I did in there could have caused the problems. There are some BTFRS errors in the syslog

Dec 18 03:18:05 palazzo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Dec 18 03:18:05 palazzo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0

There was similar issue the other day. I restarted it, and it seemed to be working fine. I think I may try restarting, and disabling the 2 new docker containers from the TRaSH guide.

palazzo-diagnostics-20231218-1929.zip palazzo-diagnostics-20231215-1819.zip

Edited December 23, 2023 by openam

JorgeB · December 19, 2023

Pool is detecting data corruption, good idea to run memtest.

Also change docker network to ipvlan.

openam · December 20, 2023

Attached picture of memtest run. Did 4+ passes with 0 error. I have also rebooted and set network to ipvlan. It's running parity check again, because of unclean shutdown. I'm guessing that's because the errors that were present.

It looks like some of the errors in the syslog had to do with btrfs. The only btrfs drive I have is the cache drive. Is it possible that there is just something corrupt in the docker.img. Would blowing that away, and rebuilding it be of any use? If so is are there detailed instructions for that?

itimpi · December 20, 2023

The diagnostics show that there appears to be corruption on the cache drive itself, and internally within the docker.img file (the loop device which is presumably on the same drive). You need to fix the cache level before attempting to fix the docker.img level. Probably best to start with a scrub of the cache drive to see how that goes.

Instructions for recreating/repopulating the docker.img file are here in the online documentation accessible via the Manual link at the bottom of the Unraid GUI.

openam · December 20, 2023

What do you mean by a scrub of the cache drive? How do I go about that?

JorgeB · December 20, 2023

Click on the pool then scroll down to the scrub section.

openam · December 20, 2023

It finished the scrub, and didn't find anything

I did however see this in the syslog. I'm guessing I might just be able to delete that file?

Dec 19 21:59:25 palazzo webGUI: Successful login user root from 10.0.0.127
Dec 19 23:31:36 palazzo monitor: Stop running nchan processes
Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1
Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 56, gen 0
Dec 20 01:41:50 palazzo shfs: copy_file: /mnt/cache/tv/parents/Show Name/Season 01/Show Name.01x05.Episode Name.mkv /mnt/disk5/tv/parents/Show Name/Season 01/Show Name.01x05.Episode Name.mkv.partial (5) Input/output error
Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1
Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 57, gen 0
Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1
Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 58, gen 0
Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1
Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 59, gen 0
Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1
Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 60, gen 0
Dec 20 02:20:03 palazzo kernel: PMS LoudnessCmd[971]: segfault at 0 ip 000014ad5e111080 sp 000014ad595240c8 error 4 in libswresample.so.4[14ad5e109000+18000] likely on CPU 6 (core 2, socket 0)
Dec 20 02:20:03 palazzo kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Dec 20 02:21:36 palazzo kernel: PMS LoudnessCmd[4247]: segfault at 0 ip 000014ef176be080 sp 000014ef120790c8 error 4 in libswresample.so.4[14ef176b6000+18000] likely on CPU 7 (core 3, socket 0)
Dec 20 02:21:36 palazzo kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06

JorgeB · December 20, 2023

14 minutes ago, openam said:

and didn't find anything

It did find 20 checksum errors, run a correcting scrub and post the output of that.

openam · December 20, 2023

You're right, but it says they uncorrectable. Apparently I don't how to read that summary.

image.png.429618f00cd2fe58b76fd050f085e621.png

JorgeB · December 20, 2023

The corrupt files should be listed in the syslog, delete/restore them, then re-run a scrub.

openam · December 20, 2023

I was down to just 6, but delete it and ran again, and now it's showing 71. Do I just keep playing whack-a-mole?

image.png.6e1a352a92a8e30e813f01ba97b75024.png

openam · December 20, 2023

Oh man. I've been deleting them out of /mnt/user instead of /mnt/cache

trurl · December 20, 2023

/mnt/cache is included in /mnt/user.

openam · December 20, 2023

Every time I delete some and re-run it appears to get worse.

image.png.5f9169b9f9b07054204c672a0b52d81e.png

trurl · December 20, 2023

I know you did memtest but btrfs csum errors are often caused by bad RAM. Did you have memory problems in the past?

openam · December 20, 2023

I do not remember having memory problems in the past (which may mean I personally have them 😁).

Another thing I did recently (about 1 week ago) was upgrade from 6.9.x to 6.12.x. Which makes me wonder if it's something like this guy was seeing,

I have heard of people running memtest for many more hours than I did. Should I let in run all night?

JorgeB · December 21, 2023

Couple of hours is usually enough to detect a serious issue.

openam · December 21, 2023

I ended up letting it run all night, and still shows no errors. I did order some cheap replacement memory sticks that should be here later tonight though.

Would it be possible to just switch out my entire cache drive for another SSD? I have a smaller one sitting around that I could format and throw in. Then try to copy over important things from the old device if it'll let me using unassigned devices? Where is the configuration for the dockers all stored, by that I mean the setup of template configurations, not the appdata, or volume mappings.

JorgeB · December 21, 2023

Copy what you can from the current pool and then reformat or use a different device to see if it no longer happens.

openam · December 21, 2023

What's the best way to copy from the current pool, just targeting `/mnt/cache` and rysnc to a new drive?

JorgeB · December 21, 2023

For example, you can also use Dynamix File manager or midnight commander.

dirkinthedark · December 21, 2023

I think Im having the same issue

openam · December 21, 2023

So after I copy everything to a new cache do I just update the shares reference to use the new cache?

Also I got these warnings, but that kind of makes sense since I'm coping from one to the other.

trurl · December 21, 2023

5 hours ago, dirkinthedark said:

I think Im having the same issue

Start your own thread with a desciption of your problems and your diagnostics.

dirkinthedark · December 22, 2023

I have, Im simply letting this user know many are having the same issue currently.

[solved] unRAID 6.12.6 Docker Service failed to start

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation