Cache Pool got corrupted....again. What next?

May 3, 20224 yr

I posted a while back about my BTRFS cache pool getting borked. At the time, the solution was to reformat the drives and set them up again. I did that, all has been well for a couple months. Then, corruption again.

So, I did my usual thing. Format the drives, set them up as a pool again, use the CA Backup plugin to restore my AppData, and I was off to the races.

I added a UserScript that checks the BTRFS pool hourly (as suggested by JorgeB) and within a few hours, I started to get corruption errors on both of the NVMEs.

root@Sanctuary:~# btrfs dev stats /mnt/cache
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  4
[/dev/nvme0n1p1].generation_errs  0
[/dev/nvme1n1p1].write_io_errs    0
[/dev/nvme1n1p1].read_io_errs     0
[/dev/nvme1n1p1].flush_io_errs    0
[/dev/nvme1n1p1].corruption_errs  2
[/dev/nvme1n1p1].generation_errs  0

root@Sanctuary:~# btrfs fi usage -T /mnt/cache
Overall:
    Device size:                   3.64TiB
    Device allocated:            310.06GiB
    Device unallocated:            3.33TiB
    Device missing:                  0.00B
    Used:                         95.50GiB
    Free (estimated):              1.77TiB      (min: 1.77TiB)
    Free (statfs, df):             1.77TiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              173.05MiB      (used: 64.00KiB)
    Multiple profiles:                  no

                  Data      Metadata  System              
Id Path           RAID1     RAID1     RAID1    Unallocated
-- -------------- --------- --------- -------- -----------
 1 /dev/nvme0n1p1 153.00GiB   2.00GiB 32.00MiB     1.67TiB
 2 /dev/nvme1n1p1 153.00GiB   2.00GiB 32.00MiB     1.67TiB
-- -------------- --------- --------- -------- -----------
   Total          153.00GiB   2.00GiB 32.00MiB     3.33TiB
   Used            47.56GiB 195.88MiB 48.00KiB

I read somewhere that maybe a "scrub" is in order. I assume that's "btrfs scrub /mnt/cache"? I didn't want to start blindly typing commands, though. Should I run a scrub? And, if so, do I scrub the cache pool or an individual drive?

I realized there's an option to scrub in the cache settings. I did that, no errors found. But the size doesn't look right at all. My cache is two 2TB drives. And there is only about 300MB used. It should have been scrubbing nearly the full 2TB, right?

UUID:             f0eb0645-ca4a-418e-bc12-95393fa57c50
Scrub started:    Tue May  3 13:45:16 2022
Status:           finished
Duration:         0:00:18
Total to scrub:   95.76GiB
Rate:             5.32GiB/s
Error summary:    no errors found

Any other suggestions? Or any other output that might be helpful?

Thanks!

Edited May 3, 20224 yr by Hollandex

Quote

May 3, 20224 yr

Author

I wanted to mention that, last time this happened, I ran MemTest for about 12 hours with no errors. And I ran extended SMART tests on both NVMe drives. No errors.

Is there any way to get raid functionality out of cache drives without BTRFS? I'd be curious to see if this issue is specific to BTRFS. If not, I may go to a single XFS cache drive with nightly backups.

Quote

May 4, 20224 yr

RAM would still be the main suspect, according to last diags you were overclocking the RAM, I would start by stopping that, if it still happens try with one DIMM at a time (without overclocking), if it's not RAM it's likely other hardware issue, but my money is still on that.

You can change to XFS, but if there's data corruption it will continue, you just won't be warned.

Quote

May 4, 20224 yr

Author

I'll pull back the XMP profile on the RAM and see if that does the trick.

Thanks for your help!

Quote

Cache Pool got corrupted....again. What next?

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)