September 5, 20241 yr I keep having shares disappear within about an hour of starting the array. Same thing happens after reboot. After a period of time my disk10 shows the following instead of shares. Its the only disk that does this which was replaced about a month ago, but this has only started happening in the last few weeks: My troubleshooting: I've disabled all of my docker containers except 2 that I've literally been using for years. I've removed priviledge access where it was enabled from dockers. I've run a check disk via Maintenance Mode. I've run multiple parity checks. The first fixed ~3300 errors, and the subsequent ones have not found any. Reboot solves it temporarily Stopping and starting the array solves it temporarily. I'm getting a replacement drive just in case, but I thought it would be silly to not explore all problem points while I wait for it to arrive. kong-diagnostics-20240905-1215.zip
September 5, 20241 yr Author This is the result, almost immediately: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Smart Erro Log Shows: No Errors Logged Smart Short Test returned: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 213 - # 2 Short offline Completed without error 00% 158 - Edited September 5, 20241 yr by Nomar1245
September 5, 20241 yr Author Running now with -L instead Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata agi_freecount 285, counted 286 in ag 3 agi_freecount 285, counted 287 in ag 3 finobt sb_ifree 2988, counted 2989 sb_fdblocks 2269434669, counted 2296675626 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 5 - agno = 2 - agno = 11 - agno = 4 - agno = 6 - agno = 8 - agno = 7 - agno = 10 - agno = 9 - agno = 3 - agno = 1 - agno = 12 - agno = 13 - agno = 14 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:2155061) is ahead of log (1:2). Format log to cycle 4. done It's been ~30 minutes and everything looks to be good. I'll follow up later this evening to confirm all is well. Thanks. Edited September 5, 20241 yr by Nomar1245
September 5, 20241 yr Community Expert The disk should be mounting now? BTW: The Short SMART test is not a good indicator of a disks health (although if it fails the disk definitely needs replacing).
September 6, 20241 yr Author Everything has been fine for about 3 hours now which is longest it has working in about 2 weeks. I think this has been solved. Thank you.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.