[SOLVED] Log filling with BTRFS errors, unsure how to proceed.

spall · February 23, 2023

Hi all,

Had a power outage last night. While shutting the server down I noticed my log was full of btrfs errors. Power is back, brought the server online and after some searching and investigation I assumed my docker.img was corrupt. So I blew that out and when I added my first container back in the log shenanigans started back up. So here I am asking for guidance.

Sample of log output:

Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1639137280 wanted 436518 found 430380
Feb 23 14:24:23 spock kernel: repair_io_failure: 186 callbacks suppressed
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639137280 (dev /dev/sdb1 sector 3201440)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639141376 (dev /dev/sdb1 sector 3201448)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639145472 (dev /dev/sdb1 sector 3201456)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1639149568 (dev /dev/sdb1 sector 3201464)
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1618395136 wanted 436296 found 430364
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618395136 (dev /dev/sdb1 sector 3160928)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618399232 (dev /dev/sdb1 sector 3160936)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618403328 (dev /dev/sdb1 sector 3160944)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1618407424 (dev /dev/sdb1 sector 3160952)
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1620000768 wanted 436514 found 430365
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1620000768 (dev /dev/sdb1 sector 3164064)
Feb 23 14:24:23 spock kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 1620004864 (dev /dev/sdb1 sector 3164072)
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1172520960 wanted 432916 found 430993
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1130659840 wanted 432426 found 430772
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1176584192 wanted 431017 found 430995
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1158103040 wanted 436526 found 430856
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1529888768 wanted 436527 found 430299
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1876803584 wanted 434185 found 430562
Feb 23 14:24:23 spock kernel: BTRFS error (device sdc1): parent transid verify failed on 1669185536 wanted 436520 found 430402
Feb 23 14:24:28 spock kernel: verify_parent_transid: 52 callbacks suppressed

Diagnostics attached. Any help appreciated.

Thanks!

Edited February 24, 2023 by spall

JorgeB · February 23, 2023

One of you cache devices dropped offline in the past, run a correcting scrub and check that there are no uncorrectable errors, also see here for how to reset the errors and better pool monitoring.

spall · February 23, 2023

@JorgeBAwesome, thanks for the quick response!

So, I followed the link, did the things, and even made the script to head this off in the future. However, stuff still seems not happy.

btrfs dev stats before:

[/dev/sdb1].write_io_errs    2591402
[/dev/sdb1].read_io_errs     62573
[/dev/sdb1].flush_io_errs    111764
[/dev/sdb1].corruption_errs  18
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

btrfs dev stats after clear:

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

btrfs dev stats after scrub:

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  35802
[/dev/sdb1].generation_errs  700
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

and the log file is still angry when I try to add containers back in.

Thanks!

Edited February 24, 2023 by spall

spall · February 24, 2023

So, uh.. it's been a long couple days of power outages. I completely missed the part about a _correcting_ scrub. All is well. I will mark as solved.

Thanks again!

[SOLVED] Log filling with BTRFS errors, unsure how to proceed.

Recommended Posts

spall

Link to comment

JorgeB

Link to comment

spall

Link to comment

spall

Link to comment

Join the conversation