Server Crashing Every Coule of Days - BTRFS Error in syslogs


Go to solution Solved by jdiacobbo,

Recommended Posts

Starting about a week ago my server started crashing every few days. I was able to capture the logs on the most recent crash and noticed there were many lines referencing a BTRFS issue. Any ideas whats causing this and how to resolve it?

 

syslog errors:

Dec 17 05:26:00 Tower kernel: BTRFS critical (device loop2): corrupt leaf: block=48136192 slot=140 extent bytenr=1060425728 len=8192 invalid data ref offset, have 4294936705 expect aligned to 4096
Dec 17 05:26:00 Tower kernel: BTRFS info (device loop2): leaf 48136192 gen 2207042 total ptrs 148 free space 4594 owner 2
Dec 17 05:26:00 Tower kernel: #011item 0 key (1059401728 168 16384) itemoff 16230 itemsize 53
Dec 17 05:26:00 Tower kernel: #011#011extent refs 1 gen 2084816 flags 1
Dec 17 05:26:00 Tower kernel: #011#011ref#0: extent data backref root 10841 objectid 12509 offset 37146624 count 1
Dec 17 05:26:00 Tower kernel: #011item 1 key (1059418112 168 4096) itemoff 16177 itemsize 53
Dec 17 05:26:00 Tower kernel: #011#011extent refs 1 gen 1407781 flags 1
Dec 17 05:26:00 Tower kernel: #011#011ref#0: extent data backref root 7995 objectid 17690 offset 0 count 1
Dec 17 05:26:00 Tower kernel: #011item 2 key (1059422208 168 4096) itemoff 16124 itemsize 53
...
Dec 17 05:26:00 Tower kernel: BTRFS error (device loop2): block=48136192 write time tree block corruption detected
Dec 17 05:26:00 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2460: errno=-5 IO failure (Error while writing out transaction)
Dec 17 05:26:00 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Dec 17 05:26:00 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Dec 17 05:26:00 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1958: errno=-5 IO failure

 

PXL_20231217_133017090.MP.jpg

unraid_syslog_crash.txt tower-diagnostics-20231215-1623.zip

Link to comment

I just started getting this error as well after upgrading to 6.12.6 this morning. Plex was running all day and this evening stopped responding. When I bring up Docker, it has no stats for any container. Here is syslog:

 

Dec 17 18:37:00 laffy kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:00 laffy kernel: I/O error, dev loop2, sector 2556800 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 2
Dec 17 18:37:00 laffy kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:00 laffy kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2494: errno=-5 IO failure (Error while writing out transaction)
Dec 17 18:37:00 laffy kernel: BTRFS info (device loop2: state E): forced readonly
Dec 17 18:37:00 laffy kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Dec 17 18:37:00 laffy kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1992: errno=-5 IO failure
Dec 17 18:37:04 laffy kernel: docker0: port 2(vethf809026) entered disabled state
Dec 17 18:37:04 laffy kernel: veth3f981c1: renamed from eth0
Dec 17 18:37:09 laffy kernel: lo_write_bvec: 25 callbacks suppressed
Dec 17 18:37:09 laffy kernel: loop: Write error at byte offset 14684160, length 4096.
Dec 17 18:37:09 laffy kernel: loop: Write error at byte offset 2864652288, length 4096.
Dec 17 18:37:09 laffy kernel: blk_print_req_error: 25 callbacks suppressed
Dec 17 18:37:09 laffy kernel: I/O error, dev loop2, sector 28680 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:09 laffy kernel: loop: Write error at byte offset 2728972288, length 4096.
Dec 17 18:37:09 laffy kernel: loop: Write error at byte offset 2728976384, length 4096.
Dec 17 18:37:09 laffy kernel: btrfs_dev_stat_inc_and_print: 25 callbacks suppressed
Dec 17 18:37:09 laffy kernel: I/O error, dev loop2, sector 5330024 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:09 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 36, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:09 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 37, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:09 laffy kernel: I/O error, dev loop2, sector 5330032 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:09 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 38, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:09 laffy kernel: I/O error, dev loop2, sector 5595024 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:09 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 39, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:30 laffy kernel: loop: Write error at byte offset 2541936640, length 4096.
Dec 17 18:37:30 laffy kernel: loop: Write error at byte offset 2542256128, length 4096.
Dec 17 18:37:30 laffy kernel: loop: Write error at byte offset 2542260224, length 4096.
Dec 17 18:37:30 laffy kernel: I/O error, dev loop2, sector 4964720 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:30 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 40, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:30 laffy kernel: I/O error, dev loop2, sector 4965344 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:30 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 41, rd 0, flush 0, corrupt 0, gen 0
Dec 17 18:37:30 laffy kernel: I/O error, dev loop2, sector 4965352 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
Dec 17 18:37:30 laffy kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 42, rd 0, flush 0, corrupt 0, gen 0

 

I've never seen BTRFS errors before.

Link to comment

After recreating the docker image I woke up this morning to most (but not all) of my docker images stopped. I tried to restart them and got

Execution error

Error code 403

 

Looking at the syslogs I see

BTRFS error (device nvme0n1p1): block=7008911360 write time tree block corruption detected

I've attached the full syslogs from the last 4 days.

 

*EDIT*

Trying to restart Docker gives me this error

Docker Service failed to start.

syslog-192.168.0.150.log

Edited by jdiacobbo
Link to comment

After a full server reboot docker was able to come back up so I'm guessing the docker images is OK. Interestingly, before the reboot I wanted to do a scrub on the docker image but the whole docker volume section in the docker settings was missing. After reboot i am able to see it again and running a scrub produced no errors.

 

Also before the reboot I did try to run a scrub on the cache drive and it would abort every time with out running. After the reboot I am able to run the scrub on the cache drive (no errors detected).

 

From what I was reading on other posts this might be either a memory issue or a macvlan issue. Any ideas on what it could actually be?

 

Attaching updated logs and diagnostics.

syslog-192.168.0.150_reboot.log tower-diagnostics-20231218-0945.zip

Link to comment
10 minutes ago, itimpi said:

If you want to continue using macvlan then you should make sure bridging is disabled on eth0.

Thanks you for taking the time to help out. Networking is admittedly my weakest skill when it come to home-labbing. Could you help me figure out how to make sure bridging is disabled on eth0? Is this a BIOS setting?

 

*EDIT*

 

Would it be this setting here?

image.png.6590fc9fdc5c485ff2331ab9a9470622.png

 

Also is there anything else I would need to know about making this change? What else this might affect?

Edited by jdiacobbo
Link to comment

I happen to have a spare set of RAM recently pulled from a working PC. I'll try swapping that in an see if I still have issues. Thanks for the support!

 

*EDIT*

 

Just swapped the RAM. I'll let it run for a few days to see if the issues still appear and provide an update. Thanks again to everyone for the help. Fingers crossed!

Edited by jdiacobbo
Update
  • Like 1
Link to comment

The issue happened again last night. I've attached the syslogs and what was on the screen when I found it down. I don't see anything in the logs this time around. Doesnt appear to be the RAM causing it. Any ideas on next steps?

 

I'm going to try changing from macvlan to ipvlan to see if that might be the cause. Its the only thing I can see in the screenshot that might be the issue.

PXL_20231219_124329556.MP.jpg

syslog-192.168.0.150-20231219.log

Edited by jdiacobbo
Link to comment
3 hours ago, itimpi said:

That looks like a Macvlan related crash.    If you are still using Macvlan for docker networking have you disabled bridging for eth0 under Settings->Network Settings and rebooted to make sure it is activated?

I attempted this yesterday but lost network access to my Home Assistant VM and couldn't figure out how to properly reconfigure the VM to get network access back.

  • Is it recommended to use macvlan with bridging disabled over ipvlan?
  • What would the pros/cons be for each setup?
  • Is there anything that I should ahve done when switching from macvlan to ipvlan in docker setting? I made the switch and didn't change anything else and everything appears to be working but I don't know if I'm sitting on an issue waiting to appear because I didn't do something when I made the switch. I tried to check out the resources in the help text in that field in the settings but they were general docker docs and I'm not sure how to relate the information to my setup.
  • If I were to go with macvlan with bridging disabled is there a resource you could point me to on how to get proper networking in a VM with that configuration?

Sorry for all the questions but my networking skills are my weakest when it comes to server stuff. I'm hoping to come out of this issue better educated and more informed and appreciate all the assistance.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.