September 19, 20223 yr Having sporadic issues with the server. I had numerous issues initially with BTRFS errors. I replaced the memory and the errors persisted. I replaced the power supply, the errors persisted. I wiped the cache pools, all of which were BTRFS file systems... turned one into XFS (this one stopped becoming an issue), but I need the others to be BTRFS because I need a pool of drives. The errors persists, below is sample. Any thoughts on what to troubleshoot next? I also removed a drive that I thought might be causing this issue but now a different drive is complaining. In fact, seems like 2 of them. Machine is an i7 6700K <2>Sep 19 16:10:54 Tower kernel: BTRFS: error (device sdi1) in write_all_supers:4369: errno=-5 IO failure (errors while submitting device barriers.) <3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 16:10:54 Tower kernel: ata14.02: status: { DRDY DF ERR } <2>Sep 19 16:10:54 Tower kernel: BTRFS: error (device sdi1: state EA) in cleanup_transaction:1982: errno=-5 IO failure <3>Sep 19 16:10:54 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 <3>Sep 19 16:10:54 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 <3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:54 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:54 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 16:10:54 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:54 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <4>Sep 19 16:10:54 Tower kernel: ata14.02: NCQ disabled due to excessive errors <3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 16:10:53 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:53 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:53 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 16:10:53 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 16:10:53 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <13>Sep 19 16:04:25 Tower root: Error response from daemon: error while removing network: network br0 id feb81cce98f28c20f684eaa33e288035dcede3e461c88eb536ae334b3f1e40f0 has active endpoints <2>Sep 19 12:21:40 Tower kernel: BTRFS: error (device sdi1: state EA) in cleanup_transaction:1982: errno=-5 IO failure <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <2>Sep 19 12:21:40 Tower kernel: BTRFS: error (device sdi1) in write_all_supers:4369: errno=-5 IO failure (errors while submitting device barriers.) <3>Sep 19 12:21:40 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 <3>Sep 19 12:21:40 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 12:21:40 Tower kernel: ata14.02: error: { ABRT } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <4>Sep 19 12:21:40 Tower kernel: ata14.02: NCQ disabled due to excessive errors <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error) <3>Sep 19 12:21:40 Tower kernel: ata14.02: status: { DRDY DF ERR } <3>Sep 19 12:21:40 Tower kernel: res 71/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)
September 20, 20223 yr You should attach your diagnostics to your next post. You said that you changed your memory, did you run memtest with the new one ?
September 20, 20223 yr Author I have not done a memtest on my new memory, I guess its possible to have 2 bad batches of memory but I didn't think it would be likely. I can do that... Attached is the diagnostics. tower-diagnostics-20220920-0916.zip
September 20, 20223 yr Community Expert Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 Sense Key : 0x5 [current] Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 ASC=0x21 ASCQ=0x4 Sep 19 16:10:54 Tower kernel: sd 14:2:0:0: [sdl] tag#25 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 Sep 19 16:10:54 Tower kernel: I/O error, dev sdl, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 Sep 19 16:10:54 Tower kernel: ata14: EH complete Sep 19 16:10:54 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdl1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 This is a hardware issue, btrfs errors come after because it can't write to the filesystem, start by replacing/swapping cables. P.S. also need to check filesystem on the device below: Sep 19 16:10:04 Tower kernel: XFS (nvme0n1p1): Unmount and run xfs_repair
September 20, 20223 yr Community Expert Forgot to mention, that disk is connected to a SATA port multiplier, those are not recommend, and that could be the problem.
September 20, 20223 yr Author Understood on the port multiplier, this thing has been running like this for about 2 years, with the port multiplier so unless something changed to now cause an issue with that, seems like that wouldn't be it. nvme0n1p1 is a nvme drive attached to the m2 connector on the motherboard. If that is causing an issue... motherboard would be the issue? Because there are no cable and no port multipliers on that port.
September 20, 20223 yr Community Expert 5 minutes ago, mlody11 said: would be the issue? For now it's just a filesystem corruption problem, check filesystem
September 20, 20223 yr Author But something must have caused the corruption problem since its a very fresh install, after replacing the memory and power supply. Would a filesystem check reveal the source of the corruption?
September 20, 20223 yr Community Expert Bad RAM is usually the main suspect for unexpected filesystem corruption.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.