xCrossOne Posted March 27, 2023 Share Posted March 27, 2023 (edited) Hi, I'm having issues with what I suspect is the cash drive. My server needs to be rebooted on a daily basis, and I'm having issues with several dockers. Any idea where to start? Quote Mar 27 09:19:18 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 533586477056 wanted 17184512195 found 4643011 Mar 27 09:19:18 Tower kernel: BTRFS: error (device nvme0n1p1: state A) in btrfs_finish_ordered_io:3329: errno=-5 IO failure Mar 27 09:19:18 Tower kernel: BTRFS info (device nvme0n1p1: state EA): forced readonly Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7077072 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 2, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7351048 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8056712 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7814560 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 5, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 6, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 7624720 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 7, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 8, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 6841008 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:18 Tower kernel: I/O error, dev loop2, sector 8919176 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Mar 27 09:19:18 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 10, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:19 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction) Mar 27 09:19:19 Tower kernel: BTRFS info (device loop2: state E): forced readonly Mar 27 09:19:19 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction. Mar 27 09:19:19 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure Mar 27 09:19:29 Tower kernel: docker0: port 1(veth062d5aa) entered disabled state Mar 27 09:19:29 Tower kernel: veth5dbbaf8: renamed from eth0 Mar 27 09:19:39 Tower kernel: docker0: port 2(veth3debe34) entered disabled state Mar 27 09:19:39 Tower kernel: vethb1ea0cc: renamed from eth0 Mar 27 09:19:48 Tower kernel: blk_print_req_error: 2347 callbacks suppressed Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8068200 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Mar 27 09:19:48 Tower kernel: btrfs_dev_stat_print_on_error: 2339 callbacks suppressed Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2350, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2351, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2352, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9000840 op 0x1:(WRITE) flags 0x100000 phys_seg 7 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2353, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9002568 op 0x1:(WRITE) flags 0x100000 phys_seg 6 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2354, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9005240 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2355, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 8994920 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2356, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9005272 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2357, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 9001224 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2358, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:48 Tower kernel: I/O error, dev loop2, sector 7351200 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Mar 27 09:19:48 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2359, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:49 Tower kernel: docker0: port 4(veth0e52554) entered disabled state Mar 27 09:19:49 Tower kernel: veth3819d5d: renamed from eth0 Mar 27 09:19:54 Tower kernel: blk_print_req_error: 50 callbacks suppressed Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:54 Tower kernel: btrfs_dev_stat_print_on_error: 50 callbacks suppressed Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2410, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2411, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8995784 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2412, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 9000840 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2413, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 9016176 op 0x1:(WRITE) flags 0x100000 phys_seg 18 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2414, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7077072 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2415, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7815688 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2416, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 7814840 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2417, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994880 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2418, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:19:54 Tower kernel: I/O error, dev loop2, sector 8994904 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0 Mar 27 09:19:54 Tower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 2419, rd 0, flush 0, corrupt 25, gen 0 Mar 27 09:20:25 Tower ntpd[1217]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Edited March 27, 2023 by xCrossOne Quote Link to comment
JorgeB Posted March 27, 2023 Share Posted March 27, 2023 Please post the diagnostics. Quote Link to comment
xCrossOne Posted March 27, 2023 Author Share Posted March 27, 2023 Diagnostics attached - any assistance is greatly appreciated tower-diagnostics-20230327-1857.zip Quote Link to comment
Solution JorgeB Posted March 27, 2023 Solution Share Posted March 27, 2023 Btrfs is finding a lot of data corruption. start by running memtest, and you can clearly see one example of a bit flip: Mar 27 09:19:18 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 533586477056 wanted 17184512195 found 4643011 17184512195 = 010000000000010001101101100011000011 4643011 = 010001101101100011000011 So RAM issues for sure. Quote Link to comment
xCrossOne Posted March 27, 2023 Author Share Posted March 27, 2023 Apologies for my ignorance, but how do I run the memtest? Quote Link to comment
JorgeB Posted March 27, 2023 Share Posted March 27, 2023 It's an option on the Unraid boot menu, note that only works with CSM/legacy boot, if your board only supports UEFI boot Google Passmark memtest, it's free. Quote Link to comment
macmanluke Posted March 28, 2023 Share Posted March 28, 2023 I think something is funky in unraid/btrfs - far too many people (including me) seem to be having issues with this for it to be "memory" issues or some other common underlying issue. Quote Link to comment
JorgeB Posted March 28, 2023 Share Posted March 28, 2023 5 hours ago, macmanluke said: far too many people (including me) seem to be having issues with this for it to be "memory" issues Most people don't, including me, and did you see the post above, there's a clear bit flip, and that's memory problem for sure. Quote Link to comment
xCrossOne Posted April 2, 2023 Author Share Posted April 2, 2023 (edited) So my cache drive stopped working the other day a few days ago (unmountable: wrong or no file system). After not being able to fix the drive, I, in a desperate attempt, replaced the drive, and started fresh with new appdata (reinstalled my dockers from scratch. I have also ordered new RAM to replace the old, but I haven't received it yet, referring to the suspicion that it's probably something wrong with the memory. I was hoping it was a leakage which would be solved by reinstalling my dockers, but no such luck.. Memtest is difficult to run without a screen I had log running while executing different actions on the server to see if I could find a trigger, and the errors seems to arrive when I watch a show on Plex. Specifically when reaching a certain point in a specific episode. Can I be related to a very low and seemingly static Reallocated sector count on one of my Drives? Any advice on next step? tower-diagnostics-20230402-1219.zip Edited April 2, 2023 by xCrossOne Quote Link to comment
JorgeB Posted April 2, 2023 Share Posted April 2, 2023 IMHO there's no point in trying to run the server with suspected bad RAM, everything goes through RAM, so it can cause all sorts of issues. Quote Link to comment
xCrossOne Posted April 11, 2023 Author Share Posted April 11, 2023 After very slow deliver, I finally received new RAM this morning, and have replace the old. As stated above, I replaced the cache drive a couple of weeks ago as well. All looked good throughout the day, however, looking at the logs the same error seem to be back.. Any clue what to do next - Help? tower-diagnostics-20230411-2101.zip Quote Link to comment
JorgeB Posted April 12, 2023 Share Posted April 12, 2023 You need to run a scrub to see if there are any uncorrectable errors, if yes replace/delete all affected files, then reset stats and see if any new errors appear, more info here. Quote Link to comment
xCrossOne Posted April 12, 2023 Author Share Posted April 12, 2023 (edited) Thanks again - when I ran the command I got the following output on my brand new cache drive: Quote btrfs dev stats -c /mnt/cache [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 94 [/dev/nvme0n1p1].generation_errs 0 Unsure why I have 94 corrupt errors on it... And I don't know what to do about it either.. And, how do I run a similar test on my Data Drives which are all xfs? Edited April 12, 2023 by xCrossOne Quote Link to comment
JorgeB Posted April 13, 2023 Share Posted April 13, 2023 If you reset the errors keep monitoring for any new ones. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.