Nomar1245 Posted September 5 Share Posted September 5 I keep having shares disappear within about an hour of starting the array. Same thing happens after reboot. After a period of time my disk10 shows the following instead of shares. Its the only disk that does this which was replaced about a month ago, but this has only started happening in the last few weeks: My troubleshooting: I've disabled all of my docker containers except 2 that I've literally been using for years. I've removed priviledge access where it was enabled from dockers. I've run a check disk via Maintenance Mode. I've run multiple parity checks. The first fixed ~3300 errors, and the subsequent ones have not found any. Reboot solves it temporarily Stopping and starting the array solves it temporarily. I'm getting a replacement drive just in case, but I thought it would be silly to not explore all problem points while I wait for it to arrive. kong-diagnostics-20240905-1215.zip Quote Link to comment
Solution JorgeB Posted September 5 Solution Share Posted September 5 Check filesystem on disk10, run it without -n Quote Link to comment
Nomar1245 Posted September 5 Author Share Posted September 5 (edited) This is the result, almost immediately: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Smart Erro Log Shows: No Errors Logged Smart Short Test returned: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 213 - # 2 Short offline Completed without error 00% 158 - Edited September 5 by Nomar1245 Quote Link to comment
Nomar1245 Posted September 5 Author Share Posted September 5 (edited) Running now with -L instead Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata agi_freecount 285, counted 286 in ag 3 agi_freecount 285, counted 287 in ag 3 finobt sb_ifree 2988, counted 2989 sb_fdblocks 2269434669, counted 2296675626 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 5 - agno = 2 - agno = 11 - agno = 4 - agno = 6 - agno = 8 - agno = 7 - agno = 10 - agno = 9 - agno = 3 - agno = 1 - agno = 12 - agno = 13 - agno = 14 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:2155061) is ahead of log (1:2). Format log to cycle 4. done It's been ~30 minutes and everything looks to be good. I'll follow up later this evening to confirm all is well. Thanks. Edited September 5 by Nomar1245 1 Quote Link to comment
itimpi Posted September 5 Share Posted September 5 The disk should be mounting now? BTW: The Short SMART test is not a good indicator of a disks health (although if it fails the disk definitely needs replacing). Quote Link to comment
Nomar1245 Posted September 6 Author Share Posted September 6 Everything has been fine for about 3 hours now which is longest it has working in about 2 weeks. I think this has been solved. Thank you. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.