lovingHDTV Posted December 7, 2022 Share Posted December 7, 2022 So I've been doing a ton of: docker build docker run docker stop docker rm Today as I try to get my docker container running. I've noticed that after docker build there are several dockers shown by UnRaid that are not anything that I did. So I've been deleting them as I go. Kind of annoying, but after you get enough of the you get a warning that docker.img is 71% full. So I was removing several of the abandoned containers when things stopped working. On my console I see the BTRFS critical corrupt leaf for sdh1 and loop2 there is now nothing in /mnt/* The web page is no longer working, but I do have a command prompt. Suggestions on how to proceed? thanks Quote Link to comment
JorgeB Posted December 8, 2022 Share Posted December 8, 2022 Please post the diagnostics. Quote Link to comment
lovingHDTV Posted December 8, 2022 Author Share Posted December 8, 2022 (edited) As the web page hung, and I could only see base files. Nothing under /mnt /boot. I rebooted. I then ran btrfs on the drive that reported issues: root@tower:~# btrfs check /dev/sdh1 Opening filesystem to check... Checking filesystem on /dev/sdh1 UUID: 10cf35ee-3e74-4215-a481-d7012316918c [1/7] checking root items [2/7] checking extents [3/7] checking free space cache block group 677561499648 has wrong amount of free space, free space cache has 475136 block group has 491520 failed to load free space cache for block group 677561499648 block group 853655158784 has wrong amount of free space, free space cache has 2248704 block group has 2625536 failed to load free space cache for block group 853655158784 ... block group 2235560886272 has wrong amount of free space, free space cache has 696348672 block group has 753799168 failed to load free space cache for block group 2235560886272 block group 2236634628096 has wrong amount of free space, free space cache has 794693632 block group has 826208256 failed to load free space cache for block group 2236634628096 block group 2238782111744 has wrong amount of free space, free space cache has 798904320 block group has 841105408 failed to load free space cache for block group 2238782111744 block group 2239855853568 has wrong amount of free space, free space cache has 782303232 block group has 834203648 failed to load free space cache for block group 2239855853568 block group 2240929595392 has wrong amount of free space, free space cache has 793395200 block group has 864063488 failed to load free space cache for block group 2240929595392 block group 2242003337216 has wrong amount of free space, free space cache has 832372736 block group has 900947968 failed to load free space cache for block group 2242003337216 [4/7] checking fs roots root 5 inode 77786 errors 200, dir isize wrong root 5 inode 3812802 errors 1, no inode item unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812803 errors 1, no inode item unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812804 errors 1, no inode item unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812806 errors 1, no inode item unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812807 errors 1, no inode item unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812808 errors 1, no inode item unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812809 errors 1, no inode item unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812810 errors 1, no inode item unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812811 errors 1, no inode item unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref ERROR: errors found in fs roots found 1629647908864 bytes used, error(s) found total csum bytes: 1560507564 total tree bytes: 3358048256 total fs tree bytes: 875036672 total extent tree bytes: 483098624 btree space waste bytes: 676815308 file data blocks allocated: 8606666534912 referenced 1605392433152 This is a drive that is unassigned. It contains IO heavy containers, docker.img, and local backups. Should I try to repair this? tower-diagnostics-20221208-0647.zip Edited December 8, 2022 by lovingHDTV Quote Link to comment
Solution JorgeB Posted December 8, 2022 Solution Share Posted December 8, 2022 At least some of issues appear to be caused by free space cache v1, on the console type: btrfs check --clear-space-cache v1 /dev/sdh1 then post new output of btrfs check Quote Link to comment
lovingHDTV Posted December 8, 2022 Author Share Posted December 8, 2022 It is a bit better, the free space errors are gone: root@tower:~# btrfs check /dev/sdh1 Opening filesystem to check... Checking filesystem on /dev/sdh1 UUID: 10cf35ee-3e74-4215-a481-d7012316918c [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 5 inode 77786 errors 200, dir isize wrong root 5 inode 3812802 errors 1, no inode item unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812803 errors 1, no inode item unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812804 errors 1, no inode item unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812806 errors 1, no inode item unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812807 errors 1, no inode item unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812808 errors 1, no inode item unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812809 errors 1, no inode item unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812810 errors 1, no inode item unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref root 5 inode 3812811 errors 1, no inode item unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref ERROR: errors found in fs roots found 1629164007424 bytes used, error(s) found total csum bytes: 1560507564 total tree bytes: 3357343744 total fs tree bytes: 875036672 total extent tree bytes: 483115008 btree space waste bytes: 676869574 file data blocks allocated: 8606183337984 referenced 1604909236224 Quote Link to comment
JorgeB Posted December 8, 2022 Share Posted December 8, 2022 Since the other errors are still there would recommend backup and re-format, also recommend using it an Unraid pool, since Unraid uses space cache v2 which is the better option currently, UD will also use that but only for a future update. If you have issues mounting the device see here for some recovery options. 1 Quote Link to comment
lovingHDTV Posted December 8, 2022 Author Share Posted December 8, 2022 It does mount. I had to turn off dockers to unmount it. So copy everything off, figure out how to add it to a cache pool, copy everything back. It is a spinning disk so I put my download containers on it instead of my nvme drive. 1 Quote Link to comment
lovingHDTV Posted December 9, 2022 Author Share Posted December 9, 2022 I was able to get things back up and running. There were a few files I couldn't copy off the corrupted drive, fortunately they were debug logs. I then replace the drive with two 4TB drives and created a cache pool, managed to get all my dockers setup using the new cache pool and everything is up and running. The good news is that I was meaning to fix all this, as my setup is based on UnRaid prior to cache pools. So now everything should work better. thanks 1 Quote Link to comment
lovingHDTV Posted December 13, 2022 Author Share Posted December 13, 2022 I am still getting BTRFS issues. Last night I noticed the new cache pool had issues. I had also seen some CRC issues earlier. In the past when I saw CRC errors it was caused by power issues. I also noticed that several dockers had stopped, as the cache_hdd pool has all the dockers and it was set read only by the BTRFS errors. The array stopped, but I couldn't shutdown because one of the dockers, even though docker had shutdown, was still running. So I had to hard shutdown. I moved the two newly added cache_hdd drives to sata ports on the motherboard and off the SAS port. I also put them on their own power connection. This AM, I see that there are BTRFS errors on loop2, which is now read only, and dockers are screwed up. The errors are not on the cache_hdd devices, just loop2. I also see that two dockers are marked as unhealthy. I did notice that one was marked unhealthy yesterday. I'm beginning to think that docker.img is screwed up? Here are the diags.tower-diagnostics-20221213-0616.zip Quote Link to comment
lovingHDTV Posted December 13, 2022 Author Share Posted December 13, 2022 Ran scrub on the docker.img and it has 4 unrecoverable errors. I guess it is time to create a new docker img. Quote Link to comment
JorgeB Posted December 13, 2022 Share Posted December 13, 2022 write time tree block corruption detected This usually indicates bad RAM or other kernel memory corruption, since it's the docker image you can just recreate, but if issues persist there's likely an underlying hardware problem. Unrelated you should change the docker network from macvlan to ipvlan. Quote Link to comment
lovingHDTV Posted December 13, 2022 Author Share Posted December 13, 2022 (edited) I recreated the docker.img and it reports no errors. However the container that was marked as unhealthy is still marked unhealthy. Here is the new diagnostics. I did move the docker.img from cache_hdd to cache_nvme. Wow the performance gain is crazy. Is the change to ipvlan a simple change in the docker settings or do I have to do something additional? I manually set all my container IPs. thanks david tower-diagnostics-20221213-0721.zip Edited December 13, 2022 by lovingHDTV Quote Link to comment
JorgeB Posted December 13, 2022 Share Posted December 13, 2022 18 minutes ago, lovingHDTV said: Is the change to ipvlan a simple change in the docker settings or do I have to do something additional? Should be a simple change, there were macvlan related crashes in the last diags, and it's a known issue. Quote Link to comment
lovingHDTV Posted December 13, 2022 Author Share Posted December 13, 2022 OK, moved to ipvlan. Hopefully this resolves the random system hangs, and BTRFS issues. thanks for your help 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.