Jump to content

Sporadic BTRFS errors


mlody11

Recommended Posts

Having sporadic issues with the server.  I had numerous issues initially with BTRFS errors.  I replaced the memory and the errors persisted.  I replaced the power supply, the errors persisted.  I wiped the cache pools, all of which were BTRFS file systems... turned one into XFS (this one stopped becoming an issue), but I need the others to be BTRFS because I need a pool of drives.  The errors persists, below is sample.  Any thoughts on what to troubleshoot next?

 

I also removed a drive that I thought might be causing this issue but now a different drive is complaining.  In fact, seems like 2 of them.

 

Machine is an i7 6700K

 

<2>Sep 19 16:10:54 Tower kernel: BTRFS: error (device sdi1) in write_all_supers:4369: errno=-5 IO failure (errors while submitting device barriers.)

<3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR }

<2>Sep 19 16:10:54 Tower kernel: BTRFS: error (device sdi1: state EA) in cleanup_transaction:1982: errno=-5 IO failure

<3>Sep 19 16:10:54 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0

<3>Sep 19 16:10:54 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0

<3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:54 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:54 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:54 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<4>Sep 19 16:10:54 Tower kernel: ata14.02: NCQ disabled due to excessive errors

<3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:53 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:53 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 16:10:53 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<13>Sep 19 16:04:25 Tower root: Error response from daemon: error while removing network: network br0 id feb81cce98f28c20f684eaa33e288035dcede3e461c88eb536ae334b3f1e40f0 has active endpoints

<2>Sep 19 12:21:40 Tower kernel: BTRFS: error (device sdi1: state EA) in cleanup_transaction:1982: errno=-5 IO failure

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<2>Sep 19 12:21:40 Tower kernel: BTRFS: error (device sdi1) in write_all_supers:4369: errno=-5 IO failure (errors while submitting device barriers.)

<3>Sep 19 12:21:40 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0

<3>Sep 19 12:21:40 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<4>Sep 19 12:21:40 Tower kernel: ata14.02: NCQ disabled due to excessive errors

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

<3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR }

<3>Sep 19 12:21:40 Tower kernel:         res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)

Link to comment
Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s
Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 Sense Key : 0x5 [current]
Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 ASC=0x21 ASCQ=0x4
Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
Sep 19 16:10:54 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Sep 19 16:10:54 Tower kernel: ata14: EH complete
Sep 19 16:10:54 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0

 

This is a hardware issue, btrfs errors come after because it can't write to the filesystem, start by replacing/swapping cables.

 

P.S. also need to check filesystem on the device below:

 

Sep 19 16:10:04 Tower kernel: XFS (nvme0n1p1): Unmount and run xfs_repair

 

Link to comment

Understood on the port multiplier, this thing has been running like this for about 2 years, with the port multiplier so unless something changed to now cause an issue with that, seems like that wouldn't be it.

 

nvme0n1p1 is a nvme drive attached to the m2 connector on the motherboard.  If that is causing an issue... motherboard would be the issue?  Because there are no cable and no port multipliers on that port.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...