• [6.7.0-rc1] kernel: BTRFS error


    Outcasst
    • 6.7.0-rc2 Solved Minor

    kernel: BTRFS error (device md1): bdev /dev/md1

    btrfs_dev_stat_print_on_error: 2 callbacks suppressed

     

    Seems to be a random error and can't seem to trigger it on purpose.

    storage-diagnostics-20190122-2304.zip




    User Feedback

    Recommended Comments

    I'm having a tonne of BTRFS errors as well that started upon installing RC1. Might as well throw my diagnostic in here too.

     

    Lots of these errors for disks with no SMART errors, disks are on separate controllers and different cables.

     

    Jan 23 09:36:05 VADERV3 kernel: BTRFS error (device md2): bdev /dev/md2 errs: wr 0, rd 259, flush 0, corrupt 0, gen 0
    Jan 23 09:36:08 VADERV3 kernel: BTRFS error (device md2): bdev /dev/md2 errs: wr 0, rd 260, flush 0, corrupt 0, gen 0
    Jan 23 09:36:08 VADERV3 kernel: BTRFS error (device md2): bdev /dev/md2 errs: wr 0, rd 261, flush 0, corrupt 0, gen 0
    Jan 23 09:36:09 VADERV3 kernel: BTRFS error (device md2): bdev /dev/md2 errs: wr 0, rd 262, flush 0, corrupt 0, gen 0

     

    vaderv3-diagnostics-20190123-0935.zip

    Link to comment

    I am having this issue was well. Attached Diag. Mine is only on a single disk though....

    tower-diagnostics-20190122-1629.zip

     

    Edit: I spoke too soon, its affecting all my btrfs disks (only have a few left...most have been converted to xfs)

    Edited by clowrym
    added in additional disks.
    Link to comment

    i am having this issue as well.

     

    it happens when files are read or copied, and affects writing to the array.

    The files appear to be fine.

     

    In the beginning it is detected and not flagged as errors

     

    Jan 22 06:57:17 Galactica emhttpd: shcmd (605): mount -t btrfs -o noatime,nodiratime /dev/md1 /mnt/disk1
    Jan 22 06:57:17 Galactica kernel: BTRFS info (device md1): disk space caching is enabled
    Jan 22 06:57:17 Galactica kernel: BTRFS info (device md1): has skinny extents
    Jan 22 06:57:17 Galactica kernel: BTRFS info (device md1): bdev /dev/md1 errs: wr 0, rd 41884, flush 0, corrupt 0, gen 0
    Jan 22 06:57:19 Galactica emhttpd: shcmd (606): btrfs filesystem resize max /mnt/disk1

     

    In addition, not reported as read erros.

     

    galactica-diagnostics-20190122-1757.zip

     

    Edit some more after some writing to the array.

    galactica-diagnostics-20190122-1931.zip

    Edited by starbetrayer
    Link to comment

    Exactly the same here :) 

     

    Jan 23 04:17:25 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3761, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3762, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3763, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3764, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3765, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3766, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3767, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3768, flush 0, corrupt 0, gen 0
    Jan 23 04:17:30 Raptor kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 3769, flush 0, corrupt 0, gen 0

     

    raptor-diagnostics-20190123-0839.zip

     

    Edited by nexusmaniac
    Link to comment

    Maybe some helpful informations. My array drives are XFS formated and I only have the cache drive and some unassigned devices using BTRFS. Reading/writing to this drives for me doesn't show these errors. Looks like only the array is affected.

    Link to comment

    I have the exact same errors going mental in my logs.

    Jan 23 02:06:06 Loki kernel: btrfs_dev_stat_print_on_error: 37 callbacks suppressed
    Jan 23 02:06:06 Loki kernel: BTRFS error (device md4): bdev /dev/md4 errs: wr 0, rd 284, flush 0, corrupt 0, gen 0
    Jan 23 02:06:06 Loki kernel: BTRFS error (device md4): bdev /dev/md4 errs: wr 0, rd 285, flush 0, corrupt 0, gen 0
    Jan 23 02:06:07 Loki kernel: BTRFS error (device md4): bdev /dev/md4 errs: wr 0, rd 286, flush 0, corrupt 0, gen 0

     

    Diagnostics attached too

    loki-diagnostics-20190123-0922.zip

    Edited by SavellM
    Link to comment

    I can reproduce these on one of my servers, it looks like something is making the read errors stats increase without an actual error, doubt it's a btrfs problem since it only happens with array disks, and at least for me, only when copying from one array disk to another, if I copy to a Windows desktop there are no errors.

    Link to comment

    I had this issue spammed in my logs as well. What's more concerning, running scrub on one of the reported disks, showed checksum errors on recently accessed files (Plex generating thumbnails). Yikes. Reverted back to 6.6.6.

    Link to comment

    Update: this is due to a one-line patch that went into Linux 4.19 kernel.

     

    tldr: the error messages are harmless, other than filling your system log

     

    The message are generated as a result of failing read-ahead operations.  Used to be that btrfs didn't issue explicit read-ahead, but with above patch now it does.  Also it's legitimate for a driver to fail a read-ahead by simply reporting "I/O Error".  Other file systems respond to this with , "meh", but btrfs does not distinguish this and goes ahead and reports the error.

     

    In looking at kernel source all the way up to current 5.0-rc3, looks to me this annoyance is still present.  So, still pondering how to solve this for our kernel....

     

    4 hours ago, d2dyno said:

    running scrub on one of the reported disks, showed checksum errors on recently accessed files

    I think this is unrelated.

    • Like 1
    Link to comment

    Since the upgrade from 6.7.2 to 6.8.3 I have exactly the same problems as described here.
    Affected are my two cache hard drives (SSDs). Since the two SSDs come from different manufacturers and were purchased at different times, I think it is almost impossible that they fail at the same time, i.e. have real bugs.
    Any help is welcome.

    microserver-diagnostics-20200421-1302.zip

    Edited by JoergHH
    Link to comment
    1 hour ago, JoergHH said:

    Any help is welcome.

    Completely unrelated to this report, one of you cache devices is dropping offline:

    Apr 20 21:45:20 microserver kernel: ata6: EH complete
    Apr 20 21:45:22 microserver kernel: ata6: limiting SATA link speed to 1.5 Gbps
    Apr 20 21:45:22 microserver kernel: ata6.00: exception Emask 0x10 SAct 0x8000000 SErr 0x280100 action 0x6 frozen
    Apr 20 21:45:22 microserver kernel: ata6.00: irq_stat 0x08000000, interface fatal error
    Apr 20 21:45:22 microserver kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
    Apr 20 21:45:22 microserver kernel: ata6.00: failed command: READ FPDMA QUEUED
    Apr 20 21:45:22 microserver kernel: ata6.00: cmd 60/00:d8:98:3e:32/01:00:01:00:00/40 tag 27 ncq dma 131072 in
    Apr 20 21:45:22 microserver kernel:         res 40/00:dc:98:3e:32/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
    Apr 20 21:45:22 microserver kernel: ata6.00: status: { DRDY }
    Apr 20 21:45:22 microserver kernel: ata6: hard resetting link
    Apr 20 21:45:24 microserver kernel: ata6: SATA link down (SStatus 1 SControl 310)
    Apr 20 21:45:24 microserver kernel: ata6: hard resetting link
    Apr 20 21:45:26 microserver kernel: ata6: SATA link down (SStatus 1 SControl 310)
    Apr 20 21:45:26 microserver kernel: ata6: hard resetting link
    Apr 20 21:45:29 microserver kernel: ata6: SATA link down (SStatus 1 SControl 310)
    Apr 20 21:45:29 microserver kernel: ata6.00: disabled

     

    ATA6 is MTFDDAK256MAM-1K12_1318093908F1, it wasn't even assigned at server start, but since it's part of the pool it was still being used, looks more like a cable/connection issue, see here for more info and how to better monitor the pool for errors.

     

    If more help is needed please start a thread on the general support forum.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.