[SOLVED] BTRFS errors, cannot run dockers with raid0 cache

denishay · May 25, 2020

Hi all,

I could really use some help to get back the possibility to run my dockers.

Here is the situation:

My unraid 6.8.3 had a fully happily functionning array of various disks and a 250 GB SSD as cache drive (xfs). My VMs and dockers image are by default on the cache drive too for performance reasons.

I happen to have gotten another 250 GB SSD drive and thought I would use that second SSD and use that new 6.8.3 support for BTRFS pools to set the two as raid0.

So here I go, stopping my array, extending the cache to two slots and adding the extra SSD. I format the first drive as BTRFS, then use the balance option to convert to raid0.

Everything goes well and after some time, I get what looked like a functionning cache. I had my dockers running, one VM too. so here I go to copy back to the cache all the files which I had previously backed up in a separate disk on the array.

And then, I proceeded to restart the whole unraid server, to make sure everything would work well if the server rebooted when I'm not present.

And now I'm a bit stuck... the docker service will fail to start and I look at the system log, I get this weird bit I didn't have before:

May 25 13:40:02 unraid kernel: BTRFS error (device loop3): failed to read chunk root
May 25 13:40:02 unraid root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error.
May 25 13:40:02 unraid kernel: BTRFS error (device loop3): open_ctree failed
May 25 13:40:02 unraid root: mount error
May 25 13:40:02 unraid emhttpd: shcmd (167): exit status: 1
May 25 13:40:37 unraid root: error: /webGui/include/Notify.php: wrong csrf_token
May 25 13:40:57 unraid emhttpd: error: cmd: wrong csrf_token

I have tried googling for that, and it seems others have had a similar problem in the past, but I couldn't find any working solution. I have attached here the anonymized diagnostics file in case that can help.

The second line seems to indicate a wroing "fs" (filesystem?) type, and I'm wondering if it's not the change from xfs to btrfs. I have rerun the balance operation, tried the chunk verification with error correction, multiple restarts, but nothing seems to help.

If anyone could point me to the right direction here, it would be much appreciated.

unraid-diagnostics-20200525-1241.zip

denishay · May 25, 2020

Some more info. I saw there was a BTRFS check feature whent he array is in maintenance mode, so I tried that, but it can't find any errors:

[1/7] checking root items

[2/7] checking extents

[3/7] checking free space cache

[4/7] checking fs roots

[5/7] checking only csums items (without verifying data)

[6/7] checking root refs

[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
Checking filesystem on /dev/sdl1
UUID: ***removed***
found 206969012224 bytes used, no error found
total csum bytes: 149267328
total tree bytes: 381583360
total fs tree bytes: 214286336
total extent tree bytes: 12845056
btree space waste bytes: 49397585
file data blocks allocated: 531302301696
referenced 206132273152

Edited May 25, 2020 by denishay

trurl · May 25, 2020

Your docker image is corrupt, and the fact you had 50G allocated to docker image likely means you have been filling it. 20G should be more than enough. Making it larger won't help anything, it will just make it take longer to fill. You need to figure out what is writing into the docker image instead of to mapped storage.

Your system share has files on the array instead of all on cache. Your domains share likewise. Do you have any VMs?

Go to Settings - Docker, disable, delete docker image.

Go to Settings - VM Manager, disable.

Set system and domains share to cache-prefer, run mover, wait for it to complete.

Then post new diagnostics

denishay · May 25, 2020

Hi turl,

The shares on the array was a temporary thing as I wanted to try and sort out the current issues before setting them back to use the cache.

I had tried to delete the docker image and install dockers again, but it dien't work at first.

So I went a bit brutal and stopped the array completely, removed the two cache drives and let them in unassigned devices where I removed their respective partitions. Then I added them back as cache, let unraid format them again and clicked on the Balance > Convert to raid 0 once more.

This time everything went well and I could delete the docker image once more and this time it allowed installation of my previous docker. I am sooo glad we have this saving of templates in appdata!

So all in all, everything is working as it did before. I think my issue was probably some initial corruption during the first conversion to raid0 which in turn corrupted the docker image file.

Anyway, thanks for the help! I'll now set back the shares which should be on the cache.

trurl · May 26, 2020

11 hours ago, denishay said:

everything is working as it did before

But why did you have 50G docker image? If it is working as before it is likely just a matter of time before it corrupts again.

[SOLVED] BTRFS errors, cannot run dockers with raid0 cache

Recommended Posts

denishay

Link to comment

denishay

Link to comment

trurl

Link to comment

denishay

Link to comment

trurl

Link to comment

Join the conversation