March 8, 20233 yr I've been experiencing a litany of issues with my server, seemingly around cache drive corruption. I formatted the drive several times before buying a new one and installing it. After two days, I'm starting to get errors seen in the below screenshot. What is happening? I don't see an issue with my applications yet, just trying to get ahead of whatever this may be. trescommas-diagnostics-20230308-1415.zip
March 8, 20233 yr Community Expert Mar 8 10:12:21 TresCommas kernel: ata6.00: exception Emask 0x10 SAct 0xc0000 SErr 0x4890000 action 0xe frozen Mar 8 10:12:21 TresCommas kernel: ata6.00: irq_stat 0x08400040, interface fatal error, connection status changed Mar 8 10:12:21 TresCommas kernel: ata6: SError: { PHYRdyChg 10B8B LinkSeq DevExch } Mar 8 10:12:21 TresCommas kernel: ata6.00: failed command: READ FPDMA QUEUED Mar 8 10:12:21 TresCommas kernel: ata6.00: cmd 60/98:90:b0:4a:0b/00:00:68:03:00/40 tag 18 ncq dma 77824 in Mar 8 10:12:21 TresCommas kernel: res 40/00:00:e0:c5:29/00:00:d5:00:00/40 Emask 0x10 (ATA bus error) Mar 8 10:12:21 TresCommas kernel: ata6.00: status: { DRDY } Mar 8 10:12:21 TresCommas kernel: ata6.00: failed command: READ FPDMA QUEUED Mar 8 10:12:21 TresCommas kernel: ata6.00: cmd 60/80:98:78:3c:7f/00:00:68:03:00/40 tag 19 ncq dma 65536 in Mar 8 10:12:21 TresCommas kernel: res 40/00:00:e0:c5:29/00:00:d5:00:00/40 Emask 0x10 (ATA bus error) Mar 8 10:12:21 TresCommas kernel: ata6.00: status: { DRDY } Mar 8 10:12:21 TresCommas kernel: ata6: hard resetting link Mar 8 10:12:27 TresCommas kernel: ata6: link is slow to respond, please be patient (ready=0) Mar 8 10:12:31 TresCommas kernel: ata6: COMRESET failed (errno=-16) Mar 8 10:12:31 TresCommas kernel: ata6: hard resetting link Mar 8 10:12:35 TresCommas kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 8 10:12:36 TresCommas kernel: ata6.00: supports DRM functions and may not be fully accessible Issues with disk3, check/replace cables, also a good idea to recreate the docker image on cache instead of disk3.
March 8, 20233 yr Author I turned the array off to rebuild the docker image, then I didn't have an option to delete the vdisk and create a new docker img. When I turned the array back on, my disk3 is now reporting as unmountable. What's my next logical move here? Edited March 8, 20233 yr by jackfalveyiv
March 8, 20233 yr Author Booted to maint mode, tried a Check Filesystem Status -nv and got the following: Phase 1 - find and verify superblock... - block cache size set to 1404320 entries Phase 2 - using internal log - zero log... zero_log: head block 1197993 tail block 1197987 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 2 - agno = 5 - agno = 3 - agno = 9 - agno = 15 - agno = 4 - agno = 13 - agno = 7 - agno = 10 - agno = 11 - agno = 12 - agno = 0 - agno = 14 - agno = 16 - agno = 6 - agno = 8 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (4:1198031) is ahead of log (4:1197993). Would format log to cycle 7. No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Wed Mar 8 15:27:15 2023 Phase Start End Duration Phase 1: 03/08 15:27:06 03/08 15:27:06 Phase 2: 03/08 15:27:06 03/08 15:27:07 1 second Phase 3: 03/08 15:27:07 03/08 15:27:11 4 seconds Phase 4: 03/08 15:27:11 03/08 15:27:11 Phase 5: Skipped Phase 6: 03/08 15:27:11 03/08 15:27:15 4 seconds Phase 7: 03/08 15:27:15 03/08 15:27:15 Total run time: 9 seconds
March 8, 20233 yr Community Expert That is quite standard, you need to rerun it adding the -L option (and without -n for any changes to be made).
March 8, 20233 yr Author Thank you. Here's the output from running with the -L option: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 5 - agno = 8 - agno = 13 - agno = 6 - agno = 7 - agno = 1 - agno = 10 - agno = 11 - agno = 14 - agno = 12 - agno = 16 - agno = 15 - agno = 3 - agno = 4 - agno = 9 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (4:1198044) is ahead of log (1:2). Format log to cycle 7. done
March 8, 20233 yr Community Expert Have you tried restarting the array in normal mode? I would expect the drive to now mount OK.
March 8, 20233 yr Author I'll give that a try. The disk is still displaying as unmountable in the Main screen, but I'll report back after the next startup attempt.
March 8, 20233 yr Community Expert 1 minute ago, jackfalveyiv said: The disk is still displaying as unmountable in the Main screen, It will if you have not restarted the array in normal mode.
March 8, 20233 yr Author It did startup and mount. I'm about to rebuild the docker image and I'll report back.
March 8, 20233 yr Author Solution My system is back up and running. To summarize, when migrating data off the cache for an upgrade, then back again, it looks like my System share was still on disk3 when I started up the docker service. This looks like it caused the btrfs errors that eventually crashed the disk and made it unmountable. Thanks JorgeB and itimpi for your suggestions and getting me the correct solution.
March 9, 20233 yr Community Expert 11 hours ago, jackfalveyiv said: This looks like it caused the btrfs errors that eventually crashed the disk and made it unmountable. Likely what caused the problems with both the docker image and the disk filesystem were the ATA errors I've mentioned above, so make sure you check/replace cables.
March 9, 20233 yr Author Noted. Replacing the cables in the coming day or two, and I received a read error this morning, fresh diagnostic posted below. Is this the beginning of a full hd failure? trescommas-diagnostics-20230309-0743.zip
March 9, 20233 yr Author Ok, new cables arrive tomorrow and everything will get swapped then. Will update at that point. Thanks.
March 9, 20233 yr Author Question: if I have the system turned on but the array unmounted, am I safe to unplug/plug-in a drive? I'm realizing I need to label my drives somehow so that I know which is which the next time I need to do some troubleshooting. Thanks in advance.
March 9, 20233 yr Community Expert Usually yes but if the hardware doesn't fully sport hot plug it can cause issues.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.