Jump to content

Docker unexpectedly went down and won't restart


Go to solution Solved by trurl,

Recommended Posts

Plex wasn't working.  Upgraded unassigned devices and OS, rebooted and can't get docker service to start.  Something tells me my cache pool somehow went to SH!T.  The server wasn't even being used to my knowledge.

 

Thoughts?

 

Diagnostics attached.

 

 

Syslog displays:


Feb 22 19:03:01 Tower avahi-daemon[14887]: Service "Tower" (/services/sftp-ssh.service) successfully established.
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 307298304, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 600192 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 38862848, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 75904 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 248, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 249, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 307429376, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 600448 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 38993920, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 76160 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 250, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 251, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 308150272, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 601856 op 0x1:(WRITE) flags 0x1800 phys_seg 23 prio class 2
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 39714816, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 77568 op 0x1:(WRITE) flags 0x1800 phys_seg 23 prio class 2
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 252, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 253, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 308101120, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 601760 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 2
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 39665664, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 77472 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 2
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 254, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 255, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 307937280, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 601440 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 2
Feb 22 19:03:05 Tower kernel: loop: Write error at byte offset 39501824, length 4096.
Feb 22 19:03:05 Tower kernel: I/O error, dev loop2, sector 77152 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 2
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 256, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 257, rd 0, flush 0, corrupt 0, gen 0
Feb 22 19:03:05 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2494: errno=-5 IO failure (Error while writing out transaction)
Feb 22 19:03:05 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Feb 22 19:03:05 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Feb 22 19:03:05 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1992: errno=-5 IO failure

 

tower-diagnostics-20240222-1931.zip

Link to comment

Looks like your shares are configured OK. appdata has files on several array disks but can't get those moved to cache until cache has some space.

 

Why do you have 100G docker.img? Default 20G is often enough, maybe a little more if you have a lot of dockers. But usage shouldn't grow. The main reason for filling docker.img is an application writing to a path that isn't mapped.

 

You are going to have to recreate it now anyway since it is corrupt. We can get to that later after you make some space on cache.

 

You have a couple of shares set to move to the array. Temporarily set appdata to Secondary storage:none so mover will skip it, then run mover to see if you can get those other shares moved to the array.

Link to comment

Scratch that, it's trying to move Plex metadata to disk8 but telling me "Read-only file system."

 

Feb 22 20:09:40 Tower move: file: /mnt/disk8/appdata/Plex-Media-Server/Library/Application Support/Plex Media Server/Library/Application Support/Plex Media Server/Metadata/TV Shows/3/b65ba2804fa57d28ae7019fde0b9506857aabd7.bundle/Contents/_combined/posters/tv.plex.agents.series_72a5c9fda1932c1043859891f249a38b918dc4c7

 

So manually "chmod -R 777 to /mnt/disk8/appdata/Plex-Media-Server"  ??
 

Link to comment

So it's correctly set to none like you said.  I've stopped the mover.  Went to data share and manually started mover.

 

secondary storage for appdata is set to none

secondary storage for data is set to array but I'm seeing

file: /mnt/cache/data/media/music/Luke Combs/Luke Combs - What You See Is What You Get [2019]/15. All Over Again.mp3
create_parent: /mnt/cache/data/media/music/Luke Combs/Luke Combs - What You See Is What You Get [2019] error: Read-only file system

 

 

Edited by jongregory75
Link to comment

in an attempt to just free up a few GB, I can't even delete /mnt/cache/data/trashcan

Syslog continues to say anything on the cache pool is "Read-only file system".  

When I try running chmod anywhere on the cache pool it tells me its "read-only file system"

 

Would the fact that I have mismatched cache pool drives?  One is 500GB and one is 1TB.

 

Link to comment

Steps:

Stop array

Remove devices from cache pool

Unmount

Remove Partitions

Format (BTRFS)

Mount

Assign disks to cache pool

Start array

 

Still received the same failure.  I erased the cache pool itself and created a new one.

Assigned cache pool disks

Started array

Now its running ok

 

What would have corrupted the cache pool itself?  Last night I couldn't move files, delete, nothing was working.

 

Link to comment
  • Solution
16 hours ago, jongregory75 said:

cache pool is "Read-only file system"

Which explains why Mover can't work. It can't make any changes to cache, which is required when you move files off cache.

 

5 hours ago, jongregory75 said:

What would have corrupted the cache pool itself?

btrfs seems a little fragile when you fill it up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...