Docker randomly failing and then failing to start


Recommended Posts

So this issue has been happening on and off for the past few months.

 

What seems to happen is that all of a sudden my docker containers would just become unresponsive and not work. Stopping or Restarting the containers would show an error message in the UI saying something along the lines of 'Service failed to start' with no other information.

 

Sometimes a reboot of the server would fix this, other times i'd need to disable docker, delete the `docker.img` and reinstall the containers.

 

Today this has happened again and seems to be becoming more frequent, so i'm hoping someone is able to point me in the right direction of what I can do to resolve this.

 

Server diagnostics are attached, currently the `Docker Service failed to start.` at this point i'm probably going to have to delete the docker image and start again.

 

If there is any other information I can provide please let me know.

server-diagnostics-20220614-1334.zip

Link to comment

Seems like there are errors on your cache drive and consequently with your docker image.

 

Jun 14 14:01:06 Server kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 189, rd 9526, flush 1, corrupt 0, gen 0
Jun 14 14:01:06 Server kernel: BTRFS warning (device sdg1): direct IO failed ino 16006543 rw 0,0 sector 0xb9db660 len 0 err no 10
Jun 14 14:01:06 Server kernel: blk_update_request: I/O error, dev loop2, sector 1614912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jun 14 14:01:06 Server kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 16, rd 4697, flush 0, corrupt 0, gen 0
Jun 14 14:01:06 Server kernel: sd 7:0:0:0: [sdg] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s
Jun 14 14:01:06 Server kernel: sd 7:0:0:0: [sdg] tag#12 CDB: opcode=0x28 28 00 0a 97 da e0 00 00 08 00
Jun 14 14:01:06 Server kernel: blk_update_request: I/O error, dev sdg, sector 177724128 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

 

Link to comment
9 minutes ago, JorgeB said:
Jun 14 08:00:08 Server kernel: ata7.00: disabled

 

Cache device dropped offline, check/replace cables and if it comes back post new diags after array start.

Just replaced the cable, started back up and things are running for now. But like i mentioned it keeps happening.

Update: Actually things are behaving quite strangely like nothing can write to the cache now

server-diagnostics-20220614-1452.zip

Edited by tidusjar
Link to comment
11 minutes ago, JorgeB said:
Jun 14 14:41:57 Server kernel: ata7.00: disabled

It dropped again, did you replace both cables? Power and SATA. If that doesn't help try a different SATA port or replace the device.

I only did the SATA cable. I've now switched SATA ports and different power cable. It now seems to be working, i'll check over the next few days 

  • Like 1
Link to comment

The diagnostics are showing that the cache dtive appears to be playing up and that the docker.img file is corrupt.

 

looking at the SMART information for the cache drive I see:

199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    1
202 Percent_Lifetime_Remain ----CK   000   000   001    NOW  0

but not sure how significant the Remaining Lifetime attribute really is in practice.    You could try running an extended SMART test on the drive to see if that can complete without error.

 

 

 

Link to comment

Docker image went read-only for an apparent lack of space, would should balancing the cache filesystem.

 

Cache device didn't drop again but it's still showing several ATA errors, still looks like a cable/connection problem, I would suggest trying it in a different controller, swap with another disk if you don't have other free ports.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.