Multiple BTRFS and syslog errors, docker containers stop working


Go to solution Solved by Januszmirek,

Recommended Posts

I started recently to encounter lots of BTRFS and syslog errors.

I would not be normally bothered but recently every few days I wake up to find out my docker containers are basically not working. Arrary restart doesn't help. Only unraid restart helps. But the issue comes back a few days later.  What I usually see in log prior to restart is below.

Mar 18 04:55:22 Tower rsyslogd: action 'action-3-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Mar 18 04:55:22 Tower rsyslogd: file '/mnt/user/appdata/syslog-192.168.50.5.log'[2] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: No space left on device [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Mar 21 08:24:10 Tower rsyslogd: omfwd/udp: socket 5: sendto() error: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ]
Mar 21 08:24:10 Tower rsyslogd: omfwd: socket 5: error 101 sending via udp: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ]

Mar 15 15:54:31 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 11436949504 have 0

 

My array disks, cache disk and flash drive show 0 errors. I have included old log in the zip file (syslog-old.txt), maybe it will be useful for someone to help me out what's wrong with my server.  I tried google both issues but nothing helpful was found. Anyway thanks in advance to anyone who has any idea what's going on.

tower-diagnostics-20240321-1753.zip

Link to comment
3 hours ago, JorgeB said:

Only checked the other one, but the old one is missing the start of the problem, in any case, those errors come from the docker image, so start by recreating it:

 

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file

Also see below if you have any custom docker networks:

https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks

 

 

Thanks! I'll do that and report back on how it went. I don't suppose I have set up any custom docker network, all containers Network type is 'host' or 'bridge'.

Link to comment

I have rebuilt docker and restarted the machine.

Syslog errors came back right away

Mar 23 14:10:10 Tower rsyslogd: omfwd: socket 1: error 101 sending via udp: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ]
Mar 23 14:10:10 Tower rsyslogd: omfwd/udp: socket 1: sendto() error: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ]

I wonder if this has anything to do with the network type setting in docker? Currently this is setup as follows:

 

Docker custom network type: macvlan

Could changing this have impact on these errors?

 

For btfrs errors I will probably need to wait a day or two as these are usually happen during night hours. Hopefully the docker rebuilt helped solve this.

 

Link to comment
Posted (edited)

I don't need macvlan. I'm not sure why it was set up like this in a first place. Anyway, docker rebuilt didn't help. Woke up this morning to find out not only containers but also unraid web interface was not available. Hard reset later, and rebuilt of docker again, this time with ipvlan seems to work so far. At least no syslog errors in log. I will monitor those btfrs errors now. Thanks for the hint with macvlan;)

 

EDIT: Happiness didn't last long. Now, an hour after docker rebuilt, full system crash - only reboot helped. New btfrs errors from log:

Mar 24 20:19:25 Tower kernel: BTRFS info (device loop4): using crc32c (crc32c-intel) checksum algorithm

Is my cache drive dying? it still shows 0 errors.

Edited by Januszmirek
Link to comment

How about this one:

Mar 26 08:58:40 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224475136 have 0
Mar 26 08:58:40 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224524288 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224507904 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224475136 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224524288 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 2 want 20224524288 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224507904 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224475136 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224524288 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 2 want 20224524288 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224507904 have 0
Mar 26 08:58:45 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224475136 have 0
Mar 26 09:12:57 Tower unraid-api[8774]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:04 Tower unraid-api[11408]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:11 Tower unraid-api[14508]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:18 Tower unraid-api[17221]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:25 Tower unraid-api[20009]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:32 Tower unraid-api[22508]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:39 Tower unraid-api[25349]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:46 Tower unraid-api[27752]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:13:53 Tower unraid-api[30200]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:00 Tower unraid-api[32595]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:07 Tower unraid-api[2590]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:14 Tower unraid-api[5324]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:21 Tower unraid-api[7854]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:28 Tower unraid-api[9611]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:35 Tower unraid-api[11448]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'
Mar 26 09:14:42 Tower unraid-api[13675]: ⚠️ Caught exception: EIO: i/o error, scandir '/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c765a904e781a4ced957e91ad602ca741043834bba888d3de0d59fca040f5b0/work'

It's getting ridiculous now. Some containers start to behave really weird. Nginx won't generate new ssl certs. Plex web does not open. I tried to remove the container but 'Execution error Server error' pop up shows up and I am unable to remove the container. I will try to restart the unraid but this is becoming a chore and a far cry from rock solid experience I had with the machine for the last few years.

Link to comment
2 hours ago, Januszmirek said:
Mar 26 08:58:40 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224475136 have 0
Mar 26 08:58:40 Tower kernel: BTRFS error (device loop2: state EA): bad tree block start, mirror 1 want 20224524288 have 0

These indicate a corrupt docker image, recreate:

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file

Also see below if you have any custom docker networks:

https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks

Link to comment
  • 2 weeks later...
  • Solution

So I finally solved the issue. Turns out it wasn't a corrupt docker image or a problem with docker networks or anything else that I initially suspected. One other issue I was encountering for months now (but somehow did not connect it with this one) is that every night I got notifications about cache disk space filling out (to 100%). I had no idea what was this about as in the morning everything was fine and cache disk was maybe 60% filled. I then forgot I created about 600gb VM on my array - the space i needed for it was too big for my cache. What I however forgot to change after creating the vm was to not attempt moving the VM to cache. Basically as show below:

1414891903_Zrzutekranu2024-04-6o20_08_51.thumb.png.914afaf91dd82ba32fa22393444786c8.png

 

I really did not need this VM anymore, so deleted it and boom! all problems magically disappeared all together. No more btrfs or syslog errors.

Doubt this would help anyone, but just wanted to let you know the issue is resolved.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.