Jump to content

BTRFS Errors after cache upgrade


Go to solution Solved by jackfalveyiv,

Recommended Posts

I've been experiencing a litany of issues with my server, seemingly around cache drive corruption.  I formatted the drive several times before buying a new one and installing it.  After two days, I'm starting to get errors seen in the below screenshot.  What is happening?  I don't see an issue with my applications yet, just trying to get ahead of whatever this may be.  

Screen Shot 2023-03-08 at 2.17.27 PM.png

trescommas-diagnostics-20230308-1415.zip

Link to comment
Mar  8 10:12:21 TresCommas kernel: ata6.00: exception Emask 0x10 SAct 0xc0000 SErr 0x4890000 action 0xe frozen
Mar  8 10:12:21 TresCommas kernel: ata6.00: irq_stat 0x08400040, interface fatal error, connection status changed
Mar  8 10:12:21 TresCommas kernel: ata6: SError: { PHYRdyChg 10B8B LinkSeq DevExch }
Mar  8 10:12:21 TresCommas kernel: ata6.00: failed command: READ FPDMA QUEUED
Mar  8 10:12:21 TresCommas kernel: ata6.00: cmd 60/98:90:b0:4a:0b/00:00:68:03:00/40 tag 18 ncq dma 77824 in
Mar  8 10:12:21 TresCommas kernel:         res 40/00:00:e0:c5:29/00:00:d5:00:00/40 Emask 0x10 (ATA bus error)
Mar  8 10:12:21 TresCommas kernel: ata6.00: status: { DRDY }
Mar  8 10:12:21 TresCommas kernel: ata6.00: failed command: READ FPDMA QUEUED
Mar  8 10:12:21 TresCommas kernel: ata6.00: cmd 60/80:98:78:3c:7f/00:00:68:03:00/40 tag 19 ncq dma 65536 in
Mar  8 10:12:21 TresCommas kernel:         res 40/00:00:e0:c5:29/00:00:d5:00:00/40 Emask 0x10 (ATA bus error)
Mar  8 10:12:21 TresCommas kernel: ata6.00: status: { DRDY }
Mar  8 10:12:21 TresCommas kernel: ata6: hard resetting link
Mar  8 10:12:27 TresCommas kernel: ata6: link is slow to respond, please be patient (ready=0)
Mar  8 10:12:31 TresCommas kernel: ata6: COMRESET failed (errno=-16)
Mar  8 10:12:31 TresCommas kernel: ata6: hard resetting link
Mar  8 10:12:35 TresCommas kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  8 10:12:36 TresCommas kernel: ata6.00: supports DRM functions and may not be fully accessible

 

Issues with disk3, check/replace cables, also a good idea to recreate the docker image on cache instead of disk3.

Link to comment

Booted to maint mode, tried a Check Filesystem Status -nv and got the following:

 


Phase 1 - find and verify superblock...
        - block cache size set to 1404320 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1197993 tail block 1197987
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 5
        - agno = 3
        - agno = 9
        - agno = 15
        - agno = 4
        - agno = 13
        - agno = 7
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 0
        - agno = 14
        - agno = 16
        - agno = 6
        - agno = 8
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (4:1198031) is ahead of log (4:1197993).
Would format log to cycle 7.
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Wed Mar  8 15:27:15 2023

Phase		Start		End		Duration
Phase 1:	03/08 15:27:06	03/08 15:27:06
Phase 2:	03/08 15:27:06	03/08 15:27:07	1 second
Phase 3:	03/08 15:27:07	03/08 15:27:11	4 seconds
Phase 4:	03/08 15:27:11	03/08 15:27:11
Phase 5:	Skipped
Phase 6:	03/08 15:27:11	03/08 15:27:15	4 seconds
Phase 7:	03/08 15:27:15	03/08 15:27:15

Total run time: 9 seconds

 

Link to comment

Thank you.  Here's the output from running with the -L option:

 


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 5
        - agno = 8
        - agno = 13
        - agno = 6
        - agno = 7
        - agno = 1
        - agno = 10
        - agno = 11
        - agno = 14
        - agno = 12
        - agno = 16
        - agno = 15
        - agno = 3
        - agno = 4
        - agno = 9
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (4:1198044) is ahead of log (1:2).
Format log to cycle 7.
done

 

Link to comment
  • Solution

My system is back up and running.  To summarize, when migrating data off the cache for an upgrade, then back again, it looks like my System share was still on disk3 when I started up the docker service.  This looks like it caused the btrfs errors that eventually crashed the disk and made it unmountable.  Thanks JorgeB and itimpi for your suggestions and getting me the correct solution.

Link to comment
11 hours ago, jackfalveyiv said:

This looks like it caused the btrfs errors that eventually crashed the disk and made it unmountable.

Likely what caused the problems with both the docker image and the disk filesystem were the ATA errors I've mentioned above, so make sure you check/replace cables.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...