Jump to content
  • [6.12.8] BTRFS corruption after unraid upgrade


    lman30
    • Closed

    My server was rebooted unexpectedly, as I found I had to log in and start the array this morning.  I suspect this may have something to do with the Appdata Backup plugin as well, which I have scheduled to do updates every Monday morning, and this was the first time that scheduled job would have triggered since I updated my OS to Unraid 6.12.8.  I also see from the Appdata Backup logs that it tried (and failed) to backup/update the docker images.

     

    I'm not able to delete containers with a "Server Error" which states "BTRFS: error (device loop2: state EA) in btrfs_run_delayed_refs:2149: errno=-5 IO failure" and "BTRFS error (device loop2: state EA): parent transid verify failed on logical 143147008 mirror 1 wanted 140737494598749 found 6243421".

     

    Googling says one of my BTRFS drive partitions may be shot.  I'm trying to downgrade now to avoid having to re-do any of my drives.

     

    Edit: Downgrading didn't help anything...

     

    Edit 2: It appears to be my Static appdata SSD which is giving errors.  Here's the results of the BTRFS drive check status:

     

    [1/7] checking root items

    [2/7] checking extents ref mismatch on [498957393920 81920] extent item 32769, found 1 ERROR: errors found in extent allocation tree or chunk allocation

    [3/7] checking free space tree

    [4/7] checking fs roots

    [5/7] checking only csums items (without verifying data)

    [6/7] checking root refs

    [7/7] checking quota groups skipped (not enabled on this FS)

    Opening filesystem to check...

    Checking filesystem on /dev/mapper/sdc1

    UUID: 016155b8-43a7-4a75-961c-b02d770f1a0d

    found 54666518528 bytes used, error(s) found

    total csum bytes: 28461568

    total tree bytes: 230047744

    total fs tree bytes: 154550272

    total extent tree bytes: 35176448

    btree space waste bytes: 68406632

    file data blocks allocated: 234118766592

    referenced 51469131776

     

    Edit 3: Fixed with wiping affected drive, and switched to using xfs-encrypted

    bk-diagnostics-20240304_1127.zip




    User Feedback

    Recommended Comments

    Mar  4 15:17:11 BK kernel: BTRFS info (device dm-4): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2728, gen 0
    Mar  4 15:17:11 BK kernel: BTRFS info (device dm-5): bdev /dev/mapper/sdc1 errs: wr 0, rd 0, flush 0, corrupt 1723141, gen 0

     

    Good that it's resolved but btrfs was detecting data corruption on both pools, would recommend running memtest

    Link to comment
    6 hours ago, JorgeB said:
    Mar  4 15:17:11 BK kernel: BTRFS info (device dm-4): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2728, gen 0
    Mar  4 15:17:11 BK kernel: BTRFS info (device dm-5): bdev /dev/mapper/sdc1 errs: wr 0, rd 0, flush 0, corrupt 1723141, gen 0

     

    Good that it's resolved but btrfs was detecting data corruption on both pools, would recommend running memtest

    Oh good catch, I'll do that.

     

    I tried fixing the corruption with a scrub but I'm not able to.  Is there an easy way to fix that corruption?  It's on my boot drive unfortunately...

    Link to comment

    Scrub won't be able to fix a single device filesystem, look for a list of corrupt files in the syslog and delete/replace them from a backup.

    Link to comment
    11 minutes ago, JorgeB said:

    Scrub won't be able to fix a single device filesystem, look for a list of corrupt files in the syslog and delete/replace them from a backup.

    OK, will do.

     

    Memtest immediately showing errors, so thank you for your advice!

    • Like 1
    Link to comment

    Once you fix the memory issues scrub may return less errors, or possibly even no errors, since they can just the the result of the bad RAM, and the data itself still be OK, or at least matching the checksums, but that will also depend on if the memory was already bad when the data was written, it that case it can be corrupt and the scrub not detect anything, since the checksum may match the corrupt written data, that's the problem with bad RAM, basically very difficult to say the extent of the damage.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...