spall Posted February 23, 2023 Share Posted February 23, 2023 (edited) Hi all, Had a power outage last night. While shutting the server down I noticed my log was full of btrfs errors. Power is back, brought the server online and after some searching and investigation I assumed my docker.img was corrupt. So I blew that out and when I added my first container back in the log shenanigans started back up. So here I am asking for guidance. Sample of log output: Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1639137280 wanted 436518 found 430380 Feb 23 14:24:23 spock kernel: repair_io_failure: 186 callbacks suppressed Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639137280 (dev /dev/sdb1 sector 3201440) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639141376 (dev /dev/sdb1 sector 3201448) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639145472 (dev /dev/sdb1 sector 3201456) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639149568 (dev /dev/sdb1 sector 3201464) Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1618395136 wanted 436296 found 430364 Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618395136 (dev /dev/sdb1 sector 3160928) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618399232 (dev /dev/sdb1 sector 3160936) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618403328 (dev /dev/sdb1 sector 3160944) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618407424 (dev /dev/sdb1 sector 3160952) Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1620000768 wanted 436514 found 430365 Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1620000768 (dev /dev/sdb1 sector 3164064) Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1620004864 (dev /dev/sdb1 sector 3164072) Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1172520960 wanted 432916 found 430993 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1130659840 wanted 432426 found 430772 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1176584192 wanted 431017 found 430995 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1158103040 wanted 436526 found 430856 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1529888768 wanted 436527 found 430299 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1876803584 wanted 434185 found 430562 Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1669185536 wanted 436520 found 430402 Feb 23 14:24:28 spock kernel: verify_parent_transid: 52 callbacks suppressed Diagnostics attached. Any help appreciated. Thanks! Edited February 24, 2023 by spall Quote Link to comment
Solution JorgeB Posted February 23, 2023 Solution Share Posted February 23, 2023 One of you cache devices dropped offline in the past, run a correcting scrub and check that there are no uncorrectable errors, also see here for how to reset the errors and better pool monitoring. Quote Link to comment
spall Posted February 23, 2023 Author Share Posted February 23, 2023 (edited) @JorgeBAwesome, thanks for the quick response! So, I followed the link, did the things, and even made the script to head this off in the future. However, stuff still seems not happy. btrfs dev stats before: [/dev/sdb1].write_io_errs 2591402 [/dev/sdb1].read_io_errs 62573 [/dev/sdb1].flush_io_errs 111764 [/dev/sdb1].corruption_errs 18 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 btrfs dev stats after clear: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 btrfs dev stats after scrub: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 35802 [/dev/sdb1].generation_errs 700 [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 and the log file is still angry when I try to add containers back in. Thanks! Edited February 24, 2023 by spall Quote Link to comment
spall Posted February 24, 2023 Author Share Posted February 24, 2023 So, uh.. it's been a long couple days of power outages. I completely missed the part about a _correcting_ scrub. All is well. I will mark as solved. Thanks again! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.