Multiple BTRFS cache drive errors.

colev14 · December 2, 2022

Hello,

Came home to some a bunch of errors on 2/3 of my cache drives. I have tried running a scrub, but it seems to complete in 2 seconds and nothing happens. I've attached my logs. My docker image does not seem to be full and neither is the cache drive, so I'm not really sure what's going on. Any help would be appreciated.

Edit: Fix common problems says: unable to write to cache and unable to write to docker image

Edit2: On the docker tab there are a lot of these errors:

Warning: file_put_contents(/var/lib/docker/unraid/images/grocy-icon.png): failed to open stream: Read-only file system in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 93

Dec  2 02:00:03 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 150, gen 0
Dec  2 02:00:03 Unraid kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 17790576 off 1602236416 csum 0x7b6aa7d1 expected csum 0x9007ed7c mirror 1

Dec  1 23:34:33 Unraid kernel: BTRFS error (device nvme2n1p1): block=1715306496 write time tree block corruption detected
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction)
Dec  1 23:34:33 Unraid kernel: BTRFS info (device nvme2n1p1: state E): forced readonly
Dec  1 23:34:33 Unraid kernel: BTRFS warning (device nvme2n1p1: state E): Skipping commit of aborted transaction.
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1: state EA) in cleanup_transaction:1982: errno=-5 IO failure
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1: state EA) in btrfs_sync_log:3332: errno=-5 IO failure

unraid-syslog-20221202-2058.zip

Edited December 2, 2022 by colev14
additional details

JorgeB · December 3, 2022

Syslog is just spammed with xfs fs corruption detected on disk 6, please post the complete diagnostics but this:

10 hours ago, colev14 said:
write time tree block corruption detected

suggests bad RAM or other kernel memory corruption, assuming no ECC RAM start by running memtest.

colev14 · December 9, 2022

On 12/3/2022 at 2:52 AM, JorgeB said:

Syslog is just spammed with xfs fs corruption detected on disk 6, please post the complete diagnostics but this:

suggests bad RAM or other kernel memory corruption, assuming no ECC RAM start by running memtest.

Earlier, I just turned off xmp and ran all my ram at 2666 and cleared the cache drive and everything was fine for a week. Now I'm getting similar errors so I will run a memtest and see what happens. I've attached diagnostics for the errors I'm getting now.

unraid-diagnostics-20221209-1428.zip

JorgeB · December 10, 2022

13 hours ago, colev14 said:

so I will run a memtest

You should, and after that check filesystem, on disk6.

colev14 · December 19, 2022

On 12/10/2022 at 3:55 AM, JorgeB said:

You should, and after that check filesystem, on disk6.

I ran that and got this output:

Phase 1 - find and verify superblock...
        - block cache size set to 1446096 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3856388 tail block 3856388
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 712810574
bad CRC for inode 712810574, would rewrite
would have cleared inode 712810574
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 9
        - agno = 7
        - agno = 0
        - agno = 3
        - agno = 8
        - agno = 5
        - agno = 6
        - agno = 10
        - agno = 4
        - agno = 2
bad CRC for inode 712810574, would rewrite
would have cleared inode 712810574
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
Metadata corruption detected at 0x46e010, inode 0x2a7ca04e dinode
couldn't map inode 712810574, err = 117
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 2207801806, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x46e010, inode 0x2a7ca04e dinode
couldn't map inode 712810574, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Mon Dec 19 09:20:07 2022

Phase		Start		End		Duration
Phase 1:	12/19 09:19:38	12/19 09:19:38
Phase 2:	12/19 09:19:38	12/19 09:19:39	1 second
Phase 3:	12/19 09:19:39	12/19 09:19:58	19 seconds
Phase 4:	12/19 09:19:58	12/19 09:19:58
Phase 5:	Skipped
Phase 6:	12/19 09:19:58	12/19 09:20:07	9 seconds
Phase 7:	12/19 09:20:07	12/19 09:20:07

Total run time: 29 seconds

unraid-diagnostics-20221219-0921.zip

trurl · December 19, 2022

2 hours ago, colev14 said:

No modify flag set, skipping filesystem flush and exiting.

You will have to check filesystem without -n before it will actually do the repair. And add -L if it asks for it.

Filesystem repair very often does not give perfect results. Do you have backups of anything important and irreplaceable?

trurl · December 19, 2022

Be sure to capture the output of repair so you can post it.

colev14 · December 19, 2022

25 minutes ago, trurl said:

Be sure to capture the output of repair so you can post it.

Phase 1 - find and verify superblock...
        - block cache size set to 1446096 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3856388 tail block 3856388
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 712810574
bad CRC for inode 712810574, will rewrite
cleared inode 712810574
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 3
        - agno = 9
        - agno = 4
        - agno = 10
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Mon Dec 19 09:47:03 2022

Phase		Start		End		Duration
Phase 1:	12/19 09:46:31	12/19 09:46:31
Phase 2:	12/19 09:46:31	12/19 09:46:32	1 second
Phase 3:	12/19 09:46:32	12/19 09:46:51	19 seconds
Phase 4:	12/19 09:46:51	12/19 09:46:51
Phase 5:	12/19 09:46:51	12/19 09:46:53	2 seconds
Phase 6:	12/19 09:46:53	12/19 09:47:02	9 seconds
Phase 7:	12/19 09:47:02	12/19 09:47:02

Total run time: 31 seconds
done

I ran it with -V and got this output. Is this fixed now? I don't really have a great understanding of the issue.

Edited December 19, 2022 by colev14

JorgeB · December 19, 2022

Should be, start the array in normal mode.

colev14 · December 19, 2022

Thank you! You all are absolute legends.

trurl · December 19, 2022

2 hours ago, colev14 said:
moving disconnected inodes to lost+found

Be sure to check your lost+found share

colev14 · December 19, 2022

19 minutes ago, trurl said:

Be sure to check your lost+found share

There wasn't a lost and found share created. I restarted and there isn't one even after restart.

trurl · December 19, 2022

1 minute ago, colev14 said:

There wasn't a lost and found share created

👍

Multiple BTRFS cache drive errors.

Recommended Posts

colev14

Link to comment

JorgeB

Link to comment

colev14

Link to comment

JorgeB

Link to comment

colev14

Link to comment

trurl

Link to comment

trurl

Link to comment

colev14

Link to comment

JorgeB

Link to comment

colev14

Link to comment

trurl

Link to comment

colev14

Link to comment

trurl

Link to comment

Join the conversation