BTRFS cache unmountable ERROR: errors found in extent allocation tree or chunk allocation (probably cause it filled?)


sorano

Recommended Posts

Greetings.

 

Last night my cache array stopped working as normal.

What I suspect happened is that some docker container filled the free space.

 

I'm unsure on how to continue to resolve this.

 

So far I have started the array in maintenance and ran the btrfs -readonly which shows this at the beginning

[1/7] checking root items
[2/7] checking extents
bad key ordering 42 43
bad key ordering 42 43
bad key ordering 42 43
bad key ordering 42 43
bad block 16614422200320
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
there is no free space entry for 16470947741696-16470947749888
there is no free space entry for 16470947741696-16473475514368
cache appears valid but isn't 16468106805248
[4/7] checking fs roots
bad key ordering 42 43

then it continues with alot of 

bad key ordering 42 43

 

and after that lots of files like

unresolved ref dir 6983009 index 30 namelen 15 name ClamWinPortable filetype 0 errors 3, no dir item, no dir index root 5 inode 75545 errors 2000, link count wrong

 

What gives me some hope is the:

cache appears valid but isn't 16468106805248
 

So, hopefully some btrfs magic can be made to get it back online.

 

Edit: I never saw the cache array being near full in the webui, so  I'm thinking something like inodes/allocation full?

 

Any help is appreciated. I've attached diagnostics.

 

tower-diagnostics-20210124-1049.zip

Edited by sorano
Explain "full" better
Link to comment

Appreciate the help!

 

I managed to mount the cache array with 

mount -o ro,notreelog,nologreplay /dev/sdl1 /pleasework

 

and it's currently copying the data to the main array.

 

Is there any risk that the files thats currently being copied have become corrupted?

Is there anything I should pay extra attention to prevent this from happening again?

And can I trust unraid webui regarding that this disk is the cause of the problem cause it seems that whenever there is issues with the cache array the error will always point at the "first" disk in the cache array.

image.png.026e61e771c20cc7d02747b6c6eb667b.png

 

After it's finished copying the data, is there any point in trying to run a btrfs --repair?

 

I was planning to re-design the array after 6.9 stable but sometimes things doesn't go according to plan :P.

Link to comment
3 hours ago, John_M said:

 

I'd try all the available options in the linked article before giving up. Have you tried option 2 (btrfs restore)?

 

I had been running memtest86 v8.4 for a couple of hours in order to rule out bad RAM but it was not showing any errors.

So I decided it could be worth trying out btrfs restore, unfortunately the outcome was pretty similar to mounting readonly and copying. Some files restored fine but the big img files that are of interest would just keep looping.

I'm going back to memtest and leave that running under night to get more a reliable result. 

Link to comment
20 hours ago, sorano said:

Is there any risk that the files thats currently being copied have become corrupted?

Files that copy without errors can be assume OK (if data checksums are not disable).

 

19 hours ago, sorano said:

I checked up on how the copying was going and mc had stopped with:

Would need to see the syslog to confirm but most likely that file is corrupt, you should be able to recover with btrfs restore, but the file will still be corrupt.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.