ldrax Posted March 29, 2020 Share Posted March 29, 2020 I have 4 SSD drives in the cache pool. I trusted my itchy hands to do some cabling work, and turned out the power cables to 2 drives were not stable, resulting in the drives fell out and in of the pool during a short amount of period (less than 5 mins), so I quickly stopped the array and powered down the system. Now it's all back up, started array in maintenance mode and, did filesystem check --readonly on the cache: [1/7] checking root items [2/7] checking extents [3/7] checking free space cache btrfs: csum mismatch on free space cache failed to load free space cache for block group 4040680275968 btrfs: space cache generation (126118) does not match inode (126182) failed to load free space cache for block group 4044975243264 btrfs: csum mismatch on free space cache failed to load free space cache for block group 4561445060608 btrfs: space cache generation (126113) does not match inode (126155) <------------------ truncated, there are about 200 lines of this error ---------> [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/sdh1 UUID: 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 found 1430313537536 bytes used, no error found total csum bytes: 993257632 total tree bytes: 2197733376 total fs tree bytes: 572276736 total extent tree bytes: 238682112 btree space waste bytes: 415841311 file data blocks allocated: 67854097842176 referenced 1405220114432 I saw on another post, @johnnie.black mentioned that the 'csum mismatch' is just warning, nothing to worry about. Can you advise on what to do from here? While I'm glad to notice the line 'found 1430313537536 bytes used, no error found', I hope nothing serious happened. Do i restart the array in normal mode, and then do [repairing] scrub? (I have disabled docker and VM for the time being). Thanks! Quote Link to comment
JorgeB Posted March 29, 2020 Share Posted March 29, 2020 That will usually clear by itself, but if you want the clear-space-cache option is considered safe to use with btrfs check, check man page for more info: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check Quote Link to comment
ldrax Posted March 29, 2020 Author Share Posted March 29, 2020 Thank you @johnnie.black as always! Do i run it with v1 or v2, the description are there, but to be honest I don't really grasp the concept of free space cache here. --clear-space-cache v1|v2 completely wipe all free space cache of given type For free space cache v1, the clear_cache kernel mount option only rebuilds the free space cache for block groups that are modified while the filesystem is mounted with that option. Thus, using this option with v1 makes it possible to actually clear the entire free space cache. For free space cache v2, the clear_cache kernel mount option destroys the entire free space cache. This option, with v2 provides an alternative method of clearing the free space cache that doesn’t require mounting the filesystem. Quote Link to comment
JorgeB Posted March 29, 2020 Share Posted March 29, 2020 Default is v1, and like mentioned it's considered safe to clear but make sure backups are up do that before doing it. Quote Link to comment
ldrax Posted March 29, 2020 Author Share Posted March 29, 2020 Thank you @johnnie.black I'll do that shortly and update again here. You've been very helpful each time. Quote Link to comment
ldrax Posted March 30, 2020 Author Share Posted March 30, 2020 So before I did the --clear-space-cache command, I started the array in normal mode to be able to backup some selective files from the cache pool. While doing this, there were a lot of errors message (including message to correct them) on syslog, and rebuilding space cache message as well: Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4907223482368, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4158791876608, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4512052936704, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS error (device sdh1): csum mismatch on free space cache Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4999565279232, rebuilding it now Mar 30 15:19:19 gpt760t kernel: io_ctl_check_generation: 21 callbacks suppressed Mar 30 15:19:19 gpt760t kernel: BTRFS error (device sdh1): space cache generation (126117) does not match inode (126155) Mar 30 15:19:19 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4959836831744, rebuilding it now Mar 30 15:19:19 gpt760t kernel: BTRFS error (device sdh1): space cache generation (126115) does not match inode (126182) Mar 30 15:19:19 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4998491537408, rebuilding it now --- truncated, hundreds of these same messages ---- Once the backup is completed, I restarted the array in maintenance mode, and did a check --readonly, just to check. All previous errors are now gone: [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/sdh1 UUID: 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 found 1030725935104 bytes used, no error found total csum bytes: 603511636 total tree bytes: 1761673216 total fs tree bytes: 533708800 total extent tree bytes: 218890240 btree space waste bytes: 481730316 file data blocks allocated: 67454937907200 referenced 1007469522944 I guess I don't have to run the btrfs-check --clear-space-cache then? Thanks @johnnie.black! Quote Link to comment
ldrax Posted March 30, 2020 Author Share Posted March 30, 2020 BTRFS Scrub command (non repairing, yet), however, shows a lot of errors found: scrub status for 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 scrub started at Mon Mar 30 15:52:07 2020, running for 00:01:21 total bytes scrubbed: 121.53GiB with 14191 errors error details: csum=14191 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 (in progress) Quote Link to comment
JorgeB Posted March 30, 2020 Share Posted March 30, 2020 23 minutes ago, ldrax said: I guess I don't have to run the btrfs-check --clear-space-cache then? Yes, like mentioned that issue tends to get fixed on its own. 18 minutes ago, ldrax said: however, shows a lot of errors found: That's a different issue, and unlike the previous one, that one is important, those checksum errors are in the data/metadata, run a correcting scrub and check all errors are corrected. Quote Link to comment
ldrax Posted March 30, 2020 Author Share Posted March 30, 2020 Done, looks like all errors are corrected. Thanks! scrub status for 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 scrub started at Mon Mar 30 16:21:50 2020 and finished after 00:26:03 total bytes scrubbed: 1.87TiB with 300965 errors error details: csum=300965 corrected errors: 300965, uncorrectable errors: 0, unverified errors: 0 Quote Link to comment
JorgeB Posted March 30, 2020 Share Posted March 30, 2020 Make sure to check this for better pool monitoring, one of the most common reasons for those errors in a pool is one the the devices dropping offline then coming back online. Quote Link to comment
ldrax Posted March 30, 2020 Author Share Posted March 30, 2020 Thanks, I'll check it out! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.