BTRFS error (device loop2): bdev /dev/loop2 errs: wr 39, rd 0, flush 0, corrupt 20124275, gen 0

Simom · October 26, 2022

Hey,

I have a BTRFS related problem and I am unsure what to do, so any help is appreciated!

I realized some of my dockers weren't running properly, so I took a look into the log and found these lines created about every five seconds

BTRFS error (device loop2): bdev /dev/loop2 errs: wr 39, rd 0, flush 0, corrupt 20124274, gen 0
Oct 26 03:10:23 Turing kernel: BTRFS warning (device loop2): csum hole found for disk bytenr range [412303360, 412307456)
Oct 26 03:10:23 Turing kernel: BTRFS warning (device loop2): csum failed root 1370 ino 1038 off 0 csum 0x42b31ff3 expected csum 0x00000000 mirror 1

Because my docker image is stored on a BTRFS RAID1 (2 1TB NVMEs) cache pool called "cache" I am guessing that this is somewhat of the root of the problem.

I also ran "btrfs dev stats /mnt/cache" resulting in the following:

root@Turing:~# btrfs dev stats /mnt/cache
[/dev/nvme1n1p1].write_io_errs    364810
[/dev/nvme1n1p1].read_io_errs     272
[/dev/nvme1n1p1].flush_io_errs    32498
[/dev/nvme1n1p1].corruption_errs  115
[/dev/nvme1n1p1].generation_errs  0
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  0
[/dev/nvme0n1p1].generation_errs  0

I read in the FAQ that everything should be zero. As these are NVME drives I put cables out of the equation and just tried to start a scrub, which is directly aborted and the following lines can be seen in the log:

Oct 26 03:45:18 Turing ool www[20751]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r'
Oct 26 03:45:18 Turing kernel: BTRFS info (device nvme1n1p1): scrub: started on devid 2
Oct 26 03:45:18 Turing kernel: BTRFS info (device nvme1n1p1): scrub: not finished on devid 2 with status: -30
Oct 26 03:45:18 Turing kernel: BTRFS info (device nvme1n1p1): scrub: started on devid 3
Oct 26 03:45:18 Turing kernel: BTRFS info (device nvme1n1p1): scrub: not finished on devid 3 with status: -30

Unfortunately, I am at a dead end with my ideas; if you have any: please let me know!

(I also have attached the diagnostics)

turing-diagnostics-20221026-0310.zip

JorgeB · October 26, 2022

The write errors suggests one of the cache devices dropped offline before, use the script in the FAQ to monitor the pool for the future, but there are other issues:

write time tree block corruption detected

This usually indicates a RAM problem or other kernel memory corruption, and you are running the RAM above the officially supported speeds and that is known to corrupt data, see here and adjust accordingly.

Simom · October 26, 2022

Thank you for the reply!
I rebooted and dropped down to the default DDR4 speeds and the scrub finished with some errors that were corrected.

I was thinking of switching to an Intel-based system, is there a similar list for supported RAM configurations or is it mainly a Ryzen issue?

JorgeB · October 26, 2022

11 minutes ago, Simom said:

I was thinking of switching to an Intel-based system, is there a similar list for supported RAM configurations or is it mainly a Ryzen issue?

Intel usually specified the max speed with all sockets populated, so for example CPUs that support DDR4 3200 support it with 4 DIMMs, for Alder Lake with DDR5 this is no longer true:

Cannot find this table for Raptor Lake but I would assume there will also be limitations since max speed is now 5600 MT/s with DDR5

Simom · October 26, 2022

Thanks for the help, I really appreciate it!

BTRFS error (device loop2): bdev /dev/loop2 errs: wr 39, rd 0, flush 0, corrupt 20124275, gen 0

Recommended Posts

Simom

Link to comment

JorgeB

Link to comment

Simom

Link to comment

JorgeB

Link to comment

Simom

Link to comment

Join the conversation