Cache Btrfs corrupt


Recommended Posts

This morming the docker service crashed with one docker running

it had filed the log, so i tried to restart.

now it seems that my cache drive wont mount.

the btrfs seems to be unmountable.

 

the cache drive log says this.

 

Quote

Aug 29 14:24:02 ChiaTower kernel: ata8.00: configured for UDMA/133
Aug 29 14:24:02 ChiaTower kernel: ata8.00: Enabling discard_zeroes_data
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] 937703088 512-byte logical blocks: (480 GB/447 GiB)
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] 4096-byte physical blocks
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Write Protect is off
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Mode Sense: 00 3a 00 00
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 29 14:24:02 ChiaTower kernel: sdm: sdm1
Aug 29 14:24:02 ChiaTower kernel: ata8.00: Enabling discard_zeroes_data
Aug 29 14:24:02 ChiaTower kernel: sd 10:0:0:0: [sdm] Attached SCSI disk
Aug 29 14:24:02 ChiaTower kernel: BTRFS: device fsid 70a02f4b-af7b-4685-863f-3a5c13160d86 devid 1 transid 5841 /dev/sdm1 scanned by udevd (2214)
Aug 29 14:25:31 ChiaTower emhttpd: INTEL_SSDSC2BB480G4R_PHWL501600ME480QGN (sdm) 512 937703088
Aug 29 14:25:31 ChiaTower emhttpd: import 30 cache device: (sdm) INTEL_SSDSC2BB480G4R_PHWL501600ME480QGN
Aug 29 14:25:31 ChiaTower emhttpd: read SMART /dev/sdm
Aug 29 14:40:22 ChiaTower emhttpd: shcmd (969): mount -t btrfs -o noatime,space_cache=v2 /dev/sdm1 /mnt/cache
Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): using free space tree
Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): has skinny extents
Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): enabling ssd optimizations
Aug 29 14:40:22 ChiaTower kernel: BTRFS info (device sdm1): start tree-log replay
Aug 29 14:40:25 ChiaTower kernel: BTRFS info (device sdm1): leaf 129368064 gen 5842 total ptrs 207 free space 137 owner 2
Aug 29 14:40:25 ChiaTower kernel: BTRFS error (device sdm1): unable to find ref byte nr 51010932736 parent 0 root 5 owner 29588 offset 12409917440
Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in __btrfs_free_extent:3092: errno=-2 No such entry
Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in btrfs_run_delayed_refs:2144: errno=-2 No such entry
Aug 29 14:40:25 ChiaTower kernel: BTRFS: error (device sdm1) in btrfs_replay_log:2279: errno=-2 No such entry (Failed to recover log tree)
Aug 29 14:40:25 ChiaTower kernel: BTRFS error (device sdm1): open_ctree failed

How do I fix this problem?

 

the cache drive was added less than one week ago,

Edited by Struck
Link to comment

I used the instructions to retore the data,

formatted the drive and copied the data back afterwards.

It worked for less than three days.

Now the issue is the same.

The log is filled with stuff like this:

 

 

Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759626240 csum 0x21417709 expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233
Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759699968
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759699968 csum 0x108cc45f expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233
Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759708160
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759708160 csum 0x7d0b155f expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233
Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759736832
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759736832 csum 0xabb5631a expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233
Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15760031744
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15760031744 csum 0xb842b40e expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): parent transid verify failed on 8343076864 wanted 3304 found 3233
Aug 31 04:29:37 ChiaTower kernel: BTRFS info (device sdm1): no csum found for inode 12305 start 15759298560
Aug 31 04:29:37 ChiaTower kernel: BTRFS warning (device sdm1): csum failed root 5 ino 12305 off 15759298560 csum 0xff2de314 expected csum 0x00000000 mirror 1
Aug 31 04:29:37 ChiaTower kernel: BTRFS error (device sdm1): bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Aug 31 04:30:13 ChiaTower kernel: verify_parent_transid: 10 callbacks suppressed
Aug 31 04:30:13 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:30:13 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:30:44 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:30:44 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:31:15 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:31:15 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:31:46 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:31:46 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:32:18 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:32:18 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:32:49 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:32:49 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:33:20 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:33:20 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764
Aug 31 04:33:51 ChiaTower kernel: BTRFS error (device loop2): parent transid verify failed on 4708515840 wanted 203356 found 202764

And my guess is that if i try to reboot the machine the cache drive parition cannot be mounted.

Even though I can access the cache drive fine before the reboot.

 

Is the drive bad? 
I have multiple of these drives, so i can try and replace it.

Would i be having less issues if i run multiple of them in the cache pool?

chiatower-diagnostics-20210831-1815.zip

Edited by Struck
diagnostics added
Link to comment
18 minutes ago, trurl said:

You could run an extended SMART test on that SSD. My guess is some other problem is causing corruption. Have you done memtest?

I will run an extended SMART test after reboot.

I have now inserted a new SSD, that is supposed to replace the one i currently use.

I will try memtest later, but i haven't had any problems before i installed the SSD. The array is unaffected of this problem it seems

Edited by Struck
Link to comment
4 minutes ago, JorgeB said:

Array is XFS, btrfs is much more sensitive to bad RAM, though if that is the problem you'll also get data corruption on the array, just undetected.

Okay,. Thanks i will try it after the extended smart test is done.

 

As a side note, the cache disk mounted as normally after a reboot.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.