Struck Posted August 29, 2021 Share Posted August 29, 2021 (edited) This morming the docker service crashed with one docker running it had filed the log, so i tried to restart. now it seems that my cache drive wont mount. the btrfs seems to be unmountable. the cache drive log says this. Quote Aug 29 14:24:02 ChiaTower kernel: ata8.00: configured for UDMA/133 Aug 29 14:24:02 ChiaTower kernel: ata8.00: Enabling discard_zeroes_data Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] 937703088 512-byte logical blocks: (480 GB/447 GiB) Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] 4096-byte physical blocks Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Write Protect is off Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Mode Sense: 00 3a 00 00 Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 29 14:24:02 ChiaTower kernel: sdm: sdm1 Aug 29 14:24:02 ChiaTower kernel: ata8.00: Enabling discard_zeroes_data Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Attached SCSI disk Aug 29 14:24:02 ChiaTower kernel: BTRFS: device fsid 70a02f4b-af7b-4685-863f-3a5c13160d86 devid 1 transid 5841 /dev/sdm1 scanned by udevd (2214) Aug 29 14:25:31 ChiaTower emhttpd: INTEL_SSDSC2BB480G4R_PHWL501600ME480QGN (sdm) 512 937703088 Aug 29 14:25:31 ChiaTower emhttpd: import 30 cache device: (sdm) INTEL_SSDSC2BB480G4R_PHWL501600ME480QGN Aug 29 14:25:31 ChiaTower emhttpd: read SMART /dev/sdm Aug 29 14:40:22 ChiaTower emhttpd: shcmd (969): mount -t btrfs -o noatime,space_cache=v2 /dev/sdm1 /mnt/cache Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): using free space tree Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): has skinny extents Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): enabling ssd optimizations Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): start tree-log replay Aug 29 14:40:25 ChiaTower kernel: BTRFS info (device sdm1): leaf 129368064 gen 5842 total ptrs 207 free space 137 owner 2 Aug 29 14:40:25 ChiaTower kernel: BTRFS error (device sdm1): unable to find ref byte nr 51010932736 parent 0 root 5 owner 29588 offset 12409917440 Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in __btrfs_free_extent:3092: errno=-2 No such entry Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in btrfs_run_delayed_refs:2144: errno=-2 No such entry Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in btrfs_replay_log:2279: errno=-2 No such entry (Failed to recover log tree) Aug 29 14:40:25 ChiaTower kernel: BTRFS error (device sdm1): open_ctree failed How do I fix this problem? the cache drive was added less than one week ago, Edited August 29, 2021 by Struck Quote Link to comment
Struck Posted August 29, 2021 Author Share Posted August 29, 2021 Diagnostics attached the restart also triggered a parity sync. I don’t know why this is, since the array seems to be unaffected of this problem. chiatower-diagnostics-20210829-1504.zip Quote Link to comment
trurl Posted August 29, 2021 Share Posted August 29, 2021 4 hours ago, Struck said: the restart also triggered a parity sync. I don’t know why this is Aug 29 14:25:28 ChiaTower emhttpd: unclean shutdown detected You will always get a parity check after an unclean shutdown 4 hours ago, Struck said: btrfs seems to be unmountable Quote Link to comment
Struck Posted August 31, 2021 Author Share Posted August 31, 2021 (edited) I used the instructions to retore the data, formatted the drive and copied the data back afterwards. It worked for less than three days. Now the issue is the same. The log is filled with stuff like this: Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759626240 csum 0x21417709 expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233 Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759699968 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759699968 csum 0x108cc45f expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233 Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759708160 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759708160 csum 0x7d0b155f expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233 Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759736832 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759736832 csum 0xabb5631a expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233 Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15760031744 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15760031744 csum 0xb842b40e expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233 Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759298560 Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759298560 csum 0xff2de314 expected csum 0x00000000 mirror 1 Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 Aug 31 04:30:13 ChiaTower kernel: verify_parent_transid: 10 callbacks suppressed Aug 31 04:30:13 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:30:13 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:30:44 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:30:44 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:31:15 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:31:15 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:31:46 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:31:46 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:32:18 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:32:18 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:32:49 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:32:49 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:33:20 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:33:20 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 Aug 31 04:33:51 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764 And my guess is that if i try to reboot the machine the cache drive parition cannot be mounted. Even though I can access the cache drive fine before the reboot. Is the drive bad? I have multiple of these drives, so i can try and replace it. Would i be having less issues if i run multiple of them in the cache pool? chiatower-diagnostics-20210831-1815.zip Edited August 31, 2021 by Struck diagnostics added Quote Link to comment
trurl Posted August 31, 2021 Share Posted August 31, 2021 You could run an extended SMART test on that SSD. My guess is some other problem is causing corruption. Have you done memtest? Quote Link to comment
JorgeB Posted August 31, 2021 Share Posted August 31, 2021 16 minutes ago, trurl said: Have you done memtest? This. Quote Link to comment
Struck Posted August 31, 2021 Author Share Posted August 31, 2021 (edited) 18 minutes ago, trurl said: You could run an extended SMART test on that SSD. My guess is some other problem is causing corruption. Have you done memtest? I will run an extended SMART test after reboot. I have now inserted a new SSD, that is supposed to replace the one i currently use. I will try memtest later, but i haven't had any problems before i installed the SSD. The array is unaffected of this problem it seems Edited August 31, 2021 by Struck Quote Link to comment
JorgeB Posted August 31, 2021 Share Posted August 31, 2021 3 minutes ago, Struck said: The array is unaffected of this problem it seems Array is XFS, btrfs is much more sensitive to bad RAM, though if that is the problem you'll also get data corruption on the array, just undetected. Quote Link to comment
Struck Posted August 31, 2021 Author Share Posted August 31, 2021 4 minutes ago, JorgeB said: Array is XFS, btrfs is much more sensitive to bad RAM, though if that is the problem you'll also get data corruption on the array, just undetected. Okay,. Thanks i will try it after the extended smart test is done. As a side note, the cache disk mounted as normally after a reboot. Quote Link to comment
Struck Posted September 2, 2021 Author Share Posted September 2, 2021 Memtest didn't find anything. Extended SMART test did not find any issues either. I have not tried replacing the drive yet. I will do that after the weekend i guess Quote Link to comment
trurl Posted September 2, 2021 Share Posted September 2, 2021 2 hours ago, Struck said: Memtest didn't find anything. How long did you let it run? Quote Link to comment
Struck Posted September 2, 2021 Author Share Posted September 2, 2021 6 hours ago, trurl said: How long did you let it run? Not long enough. 2 passes, like 4 hours. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.