KptnKMan Posted October 1, 2020 Share Posted October 1, 2020 (edited) Hi all, Something strange is happening with my array and I'm not sure if I should be very worried or what to do. Edit: I'm running 6.8.3 latest stable, no changes for some time now. So today I ran a monthly parity check on my 34TB array. A little over 9 18 hours in, I checked the status and saw that there were 450 errors listed. Checked the current log and looks like one of my disks (Disk 4 / ata7) was playing up and having an issue. I stopped the parity check, stopped the array and checked the log of the disk. It looked like the disk was having some kind of initialisation error, but I foolishly didn't take a screenshot or note. I brought the array back online, and saw that the reported usage was the same. Disk 4 still having issues. When accessing the array over LAN, I noticed many files missing, and that my VMs wouldn't start. Apparently there appeared to be many files and directories missing, despite the reported array size being correct. The VMs would not start because files like the GPU bios and virtio-win-0.1.173-2.iso image were missing. At this point I decided to completely shut down the system and leave it for a little bit, then start up clean. Now the array is mounted, Disk 4 is showing "Unmountable: No file system", with the option to format the disk available further down. The files missing were still missing, but after a short time seemed to have reappeared. I haven't verified everything. The array usage now seems to reports what looks like incorrect total usage: Any advice on what I should or can do? Thanks for any help. blaster-diagnostics-20201001-2003.zip Edited October 1, 2020 by KptnKMan Quote Link to comment
JorgeB Posted October 1, 2020 Share Posted October 1, 2020 Pleas post the diagnostics: Tools -> Diagnostics 1 Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Yeah sorry, I forgot to attach. Got a new one and added to original post. Quote Link to comment
JorgeB Posted October 1, 2020 Share Posted October 1, 2020 Check filesystem on disk4. 1 Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Ran the test in Maintenance mode. Ran test with -nv options Results: Phase 1 - find and verify superblock... - block cache size set to 3043288 entries Phase 2 - using internal log - zero log... zero_log: head block 1126679 tail block 1126656 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 278251709, counted 279719268 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 data fork in ino 1567417 claims free block 195322 data fork in ino 1567417 claims free block 195323 data fork in ino 1567419 claims free block 250450 data fork in ino 1567419 claims free block 250451 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 data fork in ino 12884902030 claims free block 1610613854 data fork in ino 12884902030 claims free block 1610613855 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 7 - agno = 6 - agno = 0 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (3:1137604) is ahead of log (3:1126679). Would format log to cycle 6. No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Thu Oct 1 20:31:14 2020 Phase Start End Duration Phase 1: 10/01 20:30:17 10/01 20:30:18 1 second Phase 2: 10/01 20:30:18 10/01 20:30:18 Phase 3: 10/01 20:30:18 10/01 20:30:50 32 seconds Phase 4: 10/01 20:30:50 10/01 20:30:50 Phase 5: Skipped Phase 6: 10/01 20:30:50 10/01 20:31:14 24 seconds Phase 7: 10/01 20:31:14 10/01 20:31:14 Total run time: 57 seconds Quote Link to comment
JorgeB Posted October 1, 2020 Share Posted October 1, 2020 You need to run without -n or nothing will be done, and if it asks for -L use it. 1 Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Ok thanks, running without anything produced this response. I'll try again with -L as advised and listed in response: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Check complete using -L Results: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_fdblocks 278251709, counted 279719268 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 data fork in ino 1567417 claims free block 195322 data fork in ino 1567417 claims free block 195323 data fork in ino 1567419 claims free block 250450 data fork in ino 1567419 claims free block 250451 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 data fork in ino 12884902030 claims free block 1610613854 data fork in ino 12884902030 claims free block 1610613855 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 1 - agno = 5 - agno = 4 - agno = 7 - agno = 3 - agno = 6 - agno = 0 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (3:1137604) is ahead of log (1:2). Format log to cycle 6. done Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Well I ran the check with -nv as recommended by documentation. Result before I start the array normally: Phase 1 - find and verify superblock... - block cache size set to 3043288 entries Phase 2 - using internal log - zero log... zero_log: head block 0 tail block 0 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 1 - agno = 3 - agno = 4 - agno = 7 - agno = 5 - agno = 6 - agno = 0 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Thu Oct 1 20:45:37 2020 Phase Start End Duration Phase 1: 10/01 20:44:38 10/01 20:44:40 2 seconds Phase 2: 10/01 20:44:40 10/01 20:44:40 Phase 3: 10/01 20:44:40 10/01 20:45:13 33 seconds Phase 4: 10/01 20:45:13 10/01 20:45:13 Phase 5: Skipped Phase 6: 10/01 20:45:13 10/01 20:45:37 24 seconds Phase 7: 10/01 20:45:37 10/01 20:45:37 Total run time: 59 seconds Quote Link to comment
KptnKMan Posted October 1, 2020 Author Share Posted October 1, 2020 Thanks @JorgeB looks like the array started back correctly: I tried to browse for lost+found and couldnt see the dir, but I'll check with cli. For now looks like its working. Thanks so much for your help. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.