I/O ERROR | BTRFS ERROR


Go to solution Solved by JorgeB,

Recommended Posts

Hi,

 

I'm having issues with what I suspect is the cash drive. My server needs to be rebooted on a daily basis, and I'm having issues with several dockers.

 

Any idea where to start?

 

Quote

Mar 27 09:19:18 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 533586477056 wanted 17184512195 found 4643011
Mar 27 09:19:18 Tower kernel: BTRFS: error (device nvme0n1p1: state A) in btrfs_finish_ordered_io:3329: errno=-5 IO failure
Mar 27 09:19:18 Tower kernel: BTRFS info (device nvme0n1p1: state EA): forced readonly
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7077072 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 2, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7351048 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8056712 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7814560 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 5, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 6, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7624720 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 7, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 8, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 6841008 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 10, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:19 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction)
Mar 27 09:19:19 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Mar 27 09:19:19 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Mar 27 09:19:19 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure
Mar 27 09:19:29 Tower kernel: docker0: port 1(veth062d5aa) entered disabled state
Mar 27 09:19:29 Tower kernel: veth5dbbaf8: renamed from eth0
Mar 27 09:19:39 Tower kernel: docker0: port 2(veth3debe34) entered disabled state
Mar 27 09:19:39 Tower kernel: vethb1ea0cc: renamed from eth0
Mar 27 09:19:48 Tower kernel: blk_print_req_error: 2347 callbacks suppressed
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8068200 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Mar 27 09:19:48 Tower kernel: btrfs_dev_stat_print_on_error: 2339 callbacks suppressed
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2350, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2351, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2352, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9000840 op 0x1:(WRITE) flags 0x100000 phys_seg 7 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2353, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9002568 op 0x1:(WRITE) flags 0x100000 phys_seg 6 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2354, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9005240 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2355, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994920 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2356, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9005272 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2357, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9001224 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2358, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 7351200 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2359, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:49 Tower kernel: docker0: port 4(veth0e52554) entered disabled state
Mar 27 09:19:49 Tower kernel: veth3819d5d: renamed from eth0
Mar 27 09:19:54 Tower kernel: blk_print_req_error: 50 callbacks suppressed
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:54 Tower kernel: btrfs_dev_stat_print_on_error: 50 callbacks suppressed
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2410, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2411, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8995784 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2412, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 9000840 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2413, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 9016176 op 0x1:(WRITE) flags 0x100000 phys_seg 18 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2414, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7077072 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2415, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7815688 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2416, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7814840 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2417, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2418, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2419, rd 0, flush 0, corrupt 25, gen 0
Mar 27 09:20:25 Tower  ntpd[1217]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

 

 

Edited by xCrossOne
Link to comment
  • Solution

Btrfs is finding a lot of data corruption. start by running memtest, and you can clearly see one example of a bit flip:

 

Mar 27 09:19:18 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 533586477056 wanted 17184512195 found 4643011


 

17184512195 = 010000000000010001101101100011000011
4643011     =             010001101101100011000011

 

So RAM issues for sure.

Link to comment

So my cache drive stopped working the other day a few days ago (unmountable: wrong or no file system). After not being able to fix the drive, I, in  a desperate attempt, replaced the drive, and started fresh with new appdata (reinstalled my dockers from scratch.

 

I have also ordered new RAM to replace the old, but I haven't received it yet, referring to the suspicion that it's probably something wrong with the memory. I was hoping it was a leakage which would be solved by reinstalling my dockers, but no such luck.. Memtest is difficult to run without a screen

 

I had log running while executing different actions on the server to see if I could find a trigger, and the errors seems to arrive when I watch a show on Plex. Specifically when reaching a certain point in a specific episode. Can I be related to a very low and seemingly static Reallocated sector count on one of my Drives?

 

Any advice on next step?

tower-diagnostics-20230402-1219.zip

Edited by xCrossOne
Link to comment
  • 2 weeks later...

Thanks again - when I ran the command I got the following output on my brand new cache drive:

 

Quote

btrfs dev stats -c /mnt/cache
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  94
[/dev/nvme0n1p1].generation_errs  0

 Unsure why I have 94 corrupt errors on it... And I don't know what to do about it either.. And, how do I run a similar test on my Data Drives which are all xfs?

Edited by xCrossOne
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.