BTRFS Cache Pool "corruption_errs" - advice?

plasmaball · February 23, 2020

Hello, I am on Unraid 6.6.7 with 4 SSDs in my cache pool.

One of my devices has corruption_errs, and I would like advice on what I should do. Do I need to replace the drive?

root@tower:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  67
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
...

Some additional information about my cache pool:

root@tower:~# btrfs fi show /mnt/cache
Label: none  uuid: f58a539b-5b41-4497-a98f-16eb6fbf48fc
        Total devices 4 FS bytes used 734.32GiB
        devid    1 size 476.94GiB used 197.02GiB path /dev/sdb1
        devid    2 size 223.57GiB used 0.00B path /dev/sdc1
        devid    3 size 953.87GiB used 673.00GiB path /dev/sde1
        devid    4 size 953.87GiB used 673.03GiB path /dev/sdd1

root@tower:~# btrfs fi df /mnt/cache
Data, RAID1: total=676.00GiB, used=671.49GiB
Data, single: total=188.01GiB, used=61.84GiB
System, single: total=32.00MiB, used=144.00KiB
Metadata, single: total=3.01GiB, used=1012.19MiB
GlobalReserve, single: total=512.00MiB, used=0.00B

root@tower:~# btrfs fi usage /mnt/cache
Overall:
    Device size:                   2.55TiB
    Device allocated:              1.51TiB
    Device unallocated:            1.04TiB
    Device missing:                  0.00B
    Used:                          1.37TiB
    Free (estimated):            673.47GiB      (min: 608.45GiB)
    Data ratio:                       1.78
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:188.01GiB, Used:61.84GiB
   /dev/sdb1     188.01GiB

Data,RAID1: Size:676.00GiB, Used:671.49GiB
   /dev/sdb1       7.00GiB
   /dev/sdd1     673.00GiB
   /dev/sde1     672.00GiB

Metadata,single: Size:3.01GiB, Used:1012.19MiB
   /dev/sdb1       2.01GiB
   /dev/sde1       1.00GiB

System,single: Size:32.00MiB, Used:144.00KiB
   /dev/sdd1      32.00MiB

Unallocated:
   /dev/sdb1     279.92GiB
   /dev/sdc1     223.57GiB
   /dev/sdd1     280.84GiB
   /dev/sde1     280.87GiB

The reason I found this was from going through this post, which triggered me to run "btrfs dev stats" command.

Here is a screenshot from the WebGUI.

Any suggestions would be greatly appreciated!

Edited February 23, 2020 by plasmaball
Updated for clarity

JorgeB · February 23, 2020

Pool was likely created on v6.7.x and because of a bug it's not redundant, metadata isn't raid1, though there are other issues since data is using two profiles, single and raid1.

Start by running a scrub, if all errors are corrected balance the pool to raid1 (both data and metadata), if the corruption is on single data profile or metadata it can't be fixed, you'll need to delete the affected files (if it's data) or the entire pool (if it's the metadata).

plasmaball · February 23, 2020

Thanks for the quick reply. My Metadata and System was "single", rebalanced to RAID1 now.

root@tower:~# btrfs scrub status /dev/sdb1
scrub status for f58a539b-5b41-4497-a98f-16eb6fbf48fc
        no stats available
        total bytes scrubbed: 0.00B with 0 errors

root@tower:~# btrfs fi usage -T /mnt/cache
Overall:
...<snipped>...
             Data      Data      Metadata   System
Id Path      single    RAID1     single     single    Unallocated

root@tower:~# btrfs balance start -mconvert=raid1 /mnt/cache
Done, had to relocate 5 out of 870 chunks

root@tower:~# btrfs fi usage -T /mnt/cache
Overall:
...<snipped>...
             Data      Data      Metadata   System
Id Path      single    RAID1     RAID1      RAID1     Unallocated

My corruption_errs count is unchanged unfortunately.

root@tower:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  67
[/dev/sdb1].generation_errs  0

I'll plan for re-creating the cache and see if that helps.

Thanks again, I didn't even realize my Metadata and System weren't in RAID1. That could have been a catastrophe!

Edited February 23, 2020 by plasmaball

JorgeB · February 24, 2020

12 hours ago, plasmaball said:

My corruption_errs count is unchanged unfortunately.

You need to reset the errors (it's in the link above):

btrfs dev stats -z /mnt/cache

Also you still have dual data profiles, convert with:

btrfs balance start -dconvert=raid1 /mnt/cache

Then and as long as the scrub didn't find any uncorrectable errors you're fine now.

BTRFS Cache Pool "corruption_errs" - advice?

Recommended Posts

plasmaball

Link to comment

JorgeB

Link to comment

plasmaball

Link to comment

JorgeB

Link to comment

Join the conversation