Jump to content

Multiple BTRFS cache drive errors.


colev14
Go to solution Solved by trurl,

Recommended Posts

Hello,

 

Came home to some a bunch of errors on 2/3 of my cache drives. I have tried running a scrub, but it seems to complete in 2 seconds and nothing happens. I've attached my logs. My docker image does not seem to be full and neither is the cache drive, so I'm not really sure what's going on. Any help would be appreciated. 

 

Edit: Fix common problems says: unable to write to cache and unable to write to docker image

 

Edit2: On the docker tab there are a lot of these errors: 

Warning: file_put_contents(/var/lib/docker/unraid/images/grocy-icon.png): failed to open stream: Read-only file system in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 93

 

 

Dec  2 02:00:03 Unraid kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 150, gen 0
Dec  2 02:00:03 Unraid kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 17790576 off 1602236416 csum 0x7b6aa7d1 expected csum 0x9007ed7c mirror 1
Dec  1 23:34:33 Unraid kernel: BTRFS error (device nvme2n1p1): block=1715306496 write time tree block corruption detected
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction)
Dec  1 23:34:33 Unraid kernel: BTRFS info (device nvme2n1p1: state E): forced readonly
Dec  1 23:34:33 Unraid kernel: BTRFS warning (device nvme2n1p1: state E): Skipping commit of aborted transaction.
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1: state EA) in cleanup_transaction:1982: errno=-5 IO failure
Dec  1 23:34:33 Unraid kernel: BTRFS: error (device nvme2n1p1: state EA) in btrfs_sync_log:3332: errno=-5 IO failure

 

unraid-syslog-20221202-2058.zip

Edited by colev14
additional details
Link to comment
On 12/3/2022 at 2:52 AM, JorgeB said:

Syslog is just spammed with xfs fs corruption detected on disk 6, please post the complete diagnostics but this:

suggests bad RAM or other kernel memory corruption, assuming no ECC RAM start by running memtest.

Earlier, I just turned off xmp and ran all my ram at 2666 and cleared the cache drive and everything was fine for a week. Now I'm getting similar errors so I will run a memtest and see what happens. I've attached diagnostics for the errors I'm getting now. 

unraid-diagnostics-20221209-1428.zip

Link to comment
  • 2 weeks later...
On 12/10/2022 at 3:55 AM, JorgeB said:

You should, and after that check filesystem, on disk6.

I ran that and got this output: 

Phase 1 - find and verify superblock...
        - block cache size set to 1446096 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3856388 tail block 3856388
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 712810574
bad CRC for inode 712810574, would rewrite
would have cleared inode 712810574
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 9
        - agno = 7
        - agno = 0
        - agno = 3
        - agno = 8
        - agno = 5
        - agno = 6
        - agno = 10
        - agno = 4
        - agno = 2
bad CRC for inode 712810574, would rewrite
would have cleared inode 712810574
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
Metadata corruption detected at 0x46e010, inode 0x2a7ca04e dinode
couldn't map inode 712810574, err = 117
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 2207801806, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x46e010, inode 0x2a7ca04e dinode
couldn't map inode 712810574, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Mon Dec 19 09:20:07 2022

Phase		Start		End		Duration
Phase 1:	12/19 09:19:38	12/19 09:19:38
Phase 2:	12/19 09:19:38	12/19 09:19:39	1 second
Phase 3:	12/19 09:19:39	12/19 09:19:58	19 seconds
Phase 4:	12/19 09:19:58	12/19 09:19:58
Phase 5:	Skipped
Phase 6:	12/19 09:19:58	12/19 09:20:07	9 seconds
Phase 7:	12/19 09:20:07	12/19 09:20:07

Total run time: 29 seconds

 

unraid-diagnostics-20221219-0921.zip

Link to comment
25 minutes ago, trurl said:

Be sure to capture the output of repair so you can post it.

Phase 1 - find and verify superblock...
        - block cache size set to 1446096 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3856388 tail block 3856388
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad CRC for inode 712810574
bad CRC for inode 712810574, will rewrite
cleared inode 712810574
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 3
        - agno = 9
        - agno = 4
        - agno = 10
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Mon Dec 19 09:47:03 2022

Phase		Start		End		Duration
Phase 1:	12/19 09:46:31	12/19 09:46:31
Phase 2:	12/19 09:46:31	12/19 09:46:32	1 second
Phase 3:	12/19 09:46:32	12/19 09:46:51	19 seconds
Phase 4:	12/19 09:46:51	12/19 09:46:51
Phase 5:	12/19 09:46:51	12/19 09:46:53	2 seconds
Phase 6:	12/19 09:46:53	12/19 09:47:02	9 seconds
Phase 7:	12/19 09:47:02	12/19 09:47:02

Total run time: 31 seconds
done

I ran it with -V and got this output. Is this fixed now? I don't really have a great understanding of the issue. 

Edited by colev14
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...