BTRFS critical corrupt leaf


Go to solution Solved by JorgeB,

Recommended Posts

So I've been doing a ton of:

 

docker build 
docker run
docker stop
docker rm

 

Today as I try to get my docker container running.  I've noticed that after docker build there are several dockers shown by UnRaid that are not anything that I did.

 

So I've been deleting them as I go.  Kind of annoying, but after you get enough of the you get a warning that docker.img is 71% full.

 

So I was removing several of the abandoned containers when things stopped working.

 

On my console I see the BTRFS critical corrupt leaf for sdh1 and loop2

 

there is now nothing in /mnt/*

 

The web page is no longer working, but I do have a command prompt.

 

Suggestions on how to proceed?

 

thanks

 

Link to comment

As the web page hung, and I could only see base files.  Nothing under /mnt /boot.  I rebooted.

 

I then ran btrfs on the drive that reported issues:

root@tower:~# btrfs check /dev/sdh1  
Opening filesystem to check...
Checking filesystem on /dev/sdh1
UUID: 10cf35ee-3e74-4215-a481-d7012316918c
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 677561499648 has wrong amount of free space, free space cache has 475136 block group has 491520
failed to load free space cache for block group 677561499648
block group 853655158784 has wrong amount of free space, free space cache has 2248704 block group has 2625536
failed to load free space cache for block group 853655158784
...
block group 2235560886272 has wrong amount of free space, free space cache has 696348672 block group has 753799168
failed to load free space cache for block group 2235560886272
block group 2236634628096 has wrong amount of free space, free space cache has 794693632 block group has 826208256
failed to load free space cache for block group 2236634628096
block group 2238782111744 has wrong amount of free space, free space cache has 798904320 block group has 841105408
failed to load free space cache for block group 2238782111744
block group 2239855853568 has wrong amount of free space, free space cache has 782303232 block group has 834203648
failed to load free space cache for block group 2239855853568
block group 2240929595392 has wrong amount of free space, free space cache has 793395200 block group has 864063488
failed to load free space cache for block group 2240929595392
block group 2242003337216 has wrong amount of free space, free space cache has 832372736 block group has 900947968
failed to load free space cache for block group 2242003337216
[4/7] checking fs roots
root 5 inode 77786 errors 200, dir isize wrong
root 5 inode 3812802 errors 1, no inode item
        unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812803 errors 1, no inode item
        unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812804 errors 1, no inode item
        unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812806 errors 1, no inode item
        unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812807 errors 1, no inode item
        unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812808 errors 1, no inode item
        unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812809 errors 1, no inode item
        unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812810 errors 1, no inode item
        unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812811 errors 1, no inode item
        unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
ERROR: errors found in fs roots
found 1629647908864 bytes used, error(s) found
total csum bytes: 1560507564
total tree bytes: 3358048256
total fs tree bytes: 875036672
total extent tree bytes: 483098624
btree space waste bytes: 676815308
file data blocks allocated: 8606666534912
 referenced 1605392433152

 

This is a drive that is unassigned.  It contains IO heavy containers, docker.img, and local backups.

 

Should I try to repair this?

 

tower-diagnostics-20221208-0647.zip

Edited by lovingHDTV
Link to comment

It is a bit better, the free space errors are gone:

 

root@tower:~# btrfs check /dev/sdh1
Opening filesystem to check...
Checking filesystem on /dev/sdh1
UUID: 10cf35ee-3e74-4215-a481-d7012316918c
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 77786 errors 200, dir isize wrong
root 5 inode 3812802 errors 1, no inode item
        unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812803 errors 1, no inode item
        unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812804 errors 1, no inode item
        unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812806 errors 1, no inode item
        unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812807 errors 1, no inode item
        unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812808 errors 1, no inode item
        unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812809 errors 1, no inode item
        unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812810 errors 1, no inode item
        unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
root 5 inode 3812811 errors 1, no inode item
        unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
ERROR: errors found in fs roots
found 1629164007424 bytes used, error(s) found
total csum bytes: 1560507564
total tree bytes: 3357343744
total fs tree bytes: 875036672
total extent tree bytes: 483115008
btree space waste bytes: 676869574
file data blocks allocated: 8606183337984
 referenced 1604909236224

 

Link to comment

I was able to get things back up and running.

 

There were a few files I couldn't copy off the corrupted drive, fortunately they were debug logs.

 

I then replace the drive with two 4TB drives and created a cache pool, managed to get all my dockers setup using the new cache pool and everything is up and running.

 

The good news is that I was meaning to fix all this, as my setup is based on UnRaid prior to cache pools.  So now everything should work better.

 

thanks

  • Like 1
Link to comment

I am still getting BTRFS issues.  Last night I noticed the new cache pool had issues.  I had also seen some CRC issues earlier.  In the past when I saw CRC errors it was caused by power issues. I also noticed that several dockers had stopped, as the cache_hdd pool has all the dockers and it was set read only by the BTRFS errors.  The array stopped, but I couldn't shutdown because one of the dockers, even though docker had shutdown, was still running.  So I had to hard shutdown.

 

I moved the two newly added cache_hdd drives to sata ports on the motherboard and off the SAS port.  I also put them on their own power connection.  

 

This AM, I see that there are BTRFS errors on loop2, which is now read only, and dockers are screwed up.  The errors are not on the cache_hdd devices, just loop2.  I also see that two dockers are marked as unhealthy.  I did notice that one was marked unhealthy yesterday.  I'm beginning to think that docker.img is screwed up?  

 

Here are the diags.tower-diagnostics-20221213-0616.zip

Link to comment

I recreated the docker.img and it reports no errors.  However the container that was marked as unhealthy is still marked unhealthy.  Here is the new diagnostics.  I did move the docker.img from cache_hdd to cache_nvme.  Wow the performance gain is crazy.

 

Is the change to ipvlan a simple change in the docker settings or do I have to do something additional?  I manually set all my container IPs.

 

thanks

david

tower-diagnostics-20221213-0721.zip

Edited by lovingHDTV
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.