BTRFS Cache Pool "corruption_errs" - advice?


Recommended Posts

Hello, I am on Unraid 6.6.7 with 4 SSDs in my cache pool.

 

One of my devices has corruption_errs, and I would like advice on what I should do. Do I need to replace the drive?

root@tower:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  67
[/dev/sdb1].generation_errs  0
[/dev/sdc1].write_io_errs    0
...

 

Some additional information about my cache pool:

root@tower:~# btrfs fi show /mnt/cache
Label: none  uuid: f58a539b-5b41-4497-a98f-16eb6fbf48fc
        Total devices 4 FS bytes used 734.32GiB
        devid    1 size 476.94GiB used 197.02GiB path /dev/sdb1
        devid    2 size 223.57GiB used 0.00B path /dev/sdc1
        devid    3 size 953.87GiB used 673.00GiB path /dev/sde1
        devid    4 size 953.87GiB used 673.03GiB path /dev/sdd1
root@tower:~# btrfs fi df /mnt/cache
Data, RAID1: total=676.00GiB, used=671.49GiB
Data, single: total=188.01GiB, used=61.84GiB
System, single: total=32.00MiB, used=144.00KiB
Metadata, single: total=3.01GiB, used=1012.19MiB
GlobalReserve, single: total=512.00MiB, used=0.00B
root@tower:~# btrfs fi usage /mnt/cache
Overall:
    Device size:                   2.55TiB
    Device allocated:              1.51TiB
    Device unallocated:            1.04TiB
    Device missing:                  0.00B
    Used:                          1.37TiB
    Free (estimated):            673.47GiB      (min: 608.45GiB)
    Data ratio:                       1.78
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:188.01GiB, Used:61.84GiB
   /dev/sdb1     188.01GiB

Data,RAID1: Size:676.00GiB, Used:671.49GiB
   /dev/sdb1       7.00GiB
   /dev/sdd1     673.00GiB
   /dev/sde1     672.00GiB

Metadata,single: Size:3.01GiB, Used:1012.19MiB
   /dev/sdb1       2.01GiB
   /dev/sde1       1.00GiB

System,single: Size:32.00MiB, Used:144.00KiB
   /dev/sdd1      32.00MiB

Unallocated:
   /dev/sdb1     279.92GiB
   /dev/sdc1     223.57GiB
   /dev/sdd1     280.84GiB
   /dev/sde1     280.87GiB

 

The reason I found this was from going through this post, which triggered me to run "btrfs dev stats" command.

 

Here is a screenshot from the WebGUI.

2020-02-23 07_21_04-vostro430_Main.png

 

Any suggestions would be greatly appreciated!

Edited by plasmaball
Updated for clarity
Link to comment

Pool was likely created on v6.7.x and because of a bug it's not redundant, metadata isn't raid1, though there are other issues since data is using two profiles, single and raid1.

 

Start by running a scrub, if all errors are corrected balance the pool to raid1 (both data and metadata), if the corruption is on single data profile or metadata it can't be fixed, you'll need to delete the affected files (if it's data) or the entire pool (if it's the metadata).

Link to comment

Thanks for the quick reply. My Metadata and System was "single", rebalanced to RAID1 now.

root@tower:~# btrfs scrub status /dev/sdb1
scrub status for f58a539b-5b41-4497-a98f-16eb6fbf48fc
        no stats available
        total bytes scrubbed: 0.00B with 0 errors
root@tower:~# btrfs fi usage -T /mnt/cache
Overall:
...<snipped>...
             Data      Data      Metadata   System
Id Path      single    RAID1     single     single    Unallocated
root@tower:~# btrfs balance start -mconvert=raid1 /mnt/cache
Done, had to relocate 5 out of 870 chunks
root@tower:~# btrfs fi usage -T /mnt/cache
Overall:
...<snipped>...
             Data      Data      Metadata   System
Id Path      single    RAID1     RAID1      RAID1     Unallocated

My corruption_errs count is unchanged unfortunately.

root@tower:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  67
[/dev/sdb1].generation_errs  0

I'll plan for re-creating the cache and see if that helps.

 

Thanks again, I didn't even realize my Metadata and System weren't in RAID1. That could have been a catastrophe!

Edited by plasmaball
Link to comment
12 hours ago, plasmaball said:

My corruption_errs count is unchanged unfortunately.

You need to reset the errors (it's in the link above):

btrfs dev stats -z /mnt/cache

 

Also you still have dual data profiles, convert with:

 

btrfs balance start -dconvert=raid1 /mnt/cache

 

Then and as long as the scrub didn't find any uncorrectable errors you're fine now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.