sorano Posted January 24, 2021 Share Posted January 24, 2021 (edited) Greetings. Last night my cache array stopped working as normal. What I suspect happened is that some docker container filled the free space. I'm unsure on how to continue to resolve this. So far I have started the array in maintenance and ran the btrfs -readonly which shows this at the beginning [1/7] checking root items [2/7] checking extents bad key ordering 42 43 bad key ordering 42 43 bad key ordering 42 43 bad key ordering 42 43 bad block 16614422200320 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache there is no free space entry for 16470947741696-16470947749888 there is no free space entry for 16470947741696-16473475514368 cache appears valid but isn't 16468106805248 [4/7] checking fs roots bad key ordering 42 43 then it continues with alot of bad key ordering 42 43 and after that lots of files like unresolved ref dir 6983009 index 30 namelen 15 name ClamWinPortable filetype 0 errors 3, no dir item, no dir index root 5 inode 75545 errors 2000, link count wrong What gives me some hope is the: cache appears valid but isn't 16468106805248 So, hopefully some btrfs magic can be made to get it back online. Edit: I never saw the cache array being near full in the webui, so I'm thinking something like inodes/allocation full? Any help is appreciated. I've attached diagnostics. tower-diagnostics-20210124-1049.zip Edited January 24, 2021 by sorano Explain "full" better Quote Link to comment
JorgeB Posted January 24, 2021 Share Posted January 24, 2021 See here for some recovery options, then re-format pool. Quote Link to comment
sorano Posted January 24, 2021 Author Share Posted January 24, 2021 Appreciate the help! I managed to mount the cache array with mount -o ro,notreelog,nologreplay /dev/sdl1 /pleasework and it's currently copying the data to the main array. Is there any risk that the files thats currently being copied have become corrupted? Is there anything I should pay extra attention to prevent this from happening again? And can I trust unraid webui regarding that this disk is the cause of the problem cause it seems that whenever there is issues with the cache array the error will always point at the "first" disk in the cache array. After it's finished copying the data, is there any point in trying to run a btrfs --repair? I was planning to re-design the array after 6.9 stable but sometimes things doesn't go according to plan :P. Quote Link to comment
sorano Posted January 24, 2021 Author Share Posted January 24, 2021 I checked up on how the copying was going and mc had stopped with: Should I try to mount with other options like degraded,usebackuproot ? Or just accept that those files have been lost. Quote Link to comment
John_M Posted January 24, 2021 Share Posted January 24, 2021 3 hours ago, sorano said: Should I try to mount with other options like degraded,usebackuproot ? Or just accept that those files have been lost. I'd try all the available options in the linked article before giving up. Have you tried option 2 (btrfs restore)? Quote Link to comment
sorano Posted January 24, 2021 Author Share Posted January 24, 2021 3 hours ago, John_M said: I'd try all the available options in the linked article before giving up. Have you tried option 2 (btrfs restore)? I had been running memtest86 v8.4 for a couple of hours in order to rule out bad RAM but it was not showing any errors. So I decided it could be worth trying out btrfs restore, unfortunately the outcome was pretty similar to mounting readonly and copying. Some files restored fine but the big img files that are of interest would just keep looping. I'm going back to memtest and leave that running under night to get more a reliable result. Quote Link to comment
JorgeB Posted January 25, 2021 Share Posted January 25, 2021 20 hours ago, sorano said: Is there any risk that the files thats currently being copied have become corrupted? Files that copy without errors can be assume OK (if data checksums are not disable). 19 hours ago, sorano said: I checked up on how the copying was going and mc had stopped with: Would need to see the syslog to confirm but most likely that file is corrupt, you should be able to recover with btrfs restore, but the file will still be corrupt. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.