• Issue with cache/btrfs pool, errors on check, not sure what problem is


    benyaki
    • Closed

    I was trying to copy some data to my cache drive and noticed that I was unable to write. It is a cache pool of two 500GB WD Black NVMEs in a mirrored pool

    Looked in the logs and saw that cache was trying to write but was getting errors

     

    Dec  9 20:41:15 Tower kernel: BTRFS error (device nvme0n1p1: state EA): parent transid verify failed on logical 1399619584 mirror 1 wanted 2502028 found 2502026

     

    Decided to do a clean reboot, but saw in the logs that the server was waiting on cache to unmount and would not. (target is busy, error 32).

     

    I have restarted in safe mode, put into maintenance, and run a btrfs check, with the following output. 

    Looking for some guidance on what steps to take.

     

    
    [1/7] checking root items
    [2/7] checking extents
    tree extent[1096040448, 16384] root 18446612686404689040 has no tree block found
    tree extent[1096040448, 16384] root 22180 has no backref item in extent tree
    incorrect global backref count on 1096040448 found 3 wanted 2
    backpointer mismatch on [1096040448 16384]
    ref mismatch on [2257813504 32768] extent item 9836206978716663809, found 1
    data extent[6303264768, 24576] bytenr mimsmatch, extent item bytenr 6303264768 file item bytenr 0
    data extent[6303264768, 24576] referencer count mismatch (root 17843261723641926475 owner 281472971913761 offset 0) wanted 1 have 0
    data extent[6303264768, 24576] referencer count mismatch (root 21323 owner 2055 offset 0) wanted 0 have 1
    backpointer mismatch on [6303264768 24576]
    ERROR: errors found in extent allocation tree or chunk allocation
    [3/7] checking free space tree
    free space info recorded 10473 extents, counted 10488
    wanted offset 714981376, found 714960896
    cache appears valid but isn't 22020096
    free space info recorded 10546 extents, counted 10552
    wanted bytes 688128, found 323584 for off 3533160448
    cache appears valid but isn't 3276800000
    [4/7] checking fs roots
    [5/7] checking only csums items (without verifying data)
    [6/7] checking root refs
    [7/7] checking quota groups skipped (not enabled on this FS)
    Opening filesystem to check...
    Checking filesystem on /dev/nvme0n1p1
    UUID: dc3c46b8-ae6c-403a-a905-84c2f0209c73
    found 172622249984 bytes used, error(s) found
    total csum bytes: 138954748
    total tree bytes: 1306017792
    total fs tree bytes: 1067319296
    total extent tree bytes: 70434816
    btree space waste bytes: 286745898
    file data blocks allocated: 1380010901504
     referenced 162174189568

     

    tower-diagnostics-20231209-2117.zip

     

    EDIT: just trying to copy off of cache drive, watching log and this showed up

     

    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 0 csum 0x8941f998 expected csum 0x99384b54 mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 4096 csum 0x8941f998 expected csum 0x154bd2df mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 8192 csum 0x8941f998 expected csum 0xb6cae9b5 mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 12288 csum 0x8941f998 expected csum 0xaef79424 mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 16384 csum 0x8941f998 expected csum 0xb8e34627 mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 13, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 20480 csum 0x8941f998 expected csum 0x05fedd1d mirror 2
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 0 csum 0x8941f998 expected csum 0x99384b54 mirror 1
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 4096 csum 0x8941f998 expected csum 0x154bd2df mirror 1
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 8192 csum 0x8941f998 expected csum 0xb6cae9b5 mirror 1
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
    Dec  9 21:45:38 Tower kernel: BTRFS warning (device nvme0n1p1: state EA): csum failed root 5 ino 10748423 off 12288 csum 0x8941f998 expected csum 0xaef79424 mirror 1
    Dec  9 21:45:38 Tower kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0

     




    User Feedback

    Recommended Comments

    My apologies, I had posed in here as I found a similar type of problem in this forum.

    I ended up backing up with data from the drive (with input/output errors on multiple files, all appeared to be log files).

    After copying back the data, I am having trouble getting a few dockers to work properly, mainly my whole plex library not working and nginxproxymanager not running properly.

    Attached is new diagnostics. As far as a I can tell, things are working OK with the FS on the cache now, no errors on check.

    tower-diagnostics-20231210-1002.zip

    Link to comment

    Diags look normal, other than a macvlan call trace, change docker network to ipvlan.

     

    17 hours ago, benyaki said:

    I am having trouble getting a few dockers to work properly, mainly my whole plex library not working and nginxproxymanager not running properly.

    Suggest posting in the appropriate container support threads to get help with those, if you click on the container you should see the support link.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.