lankanmon Posted March 19, 2019 Share Posted March 19, 2019 Note to Mods: I posted this here first: https://ipstest.lime-technology.com/forums/topic/70755-two-disks-are-unmountable-after-a-single-drive-failure-unmountable-disks-present/ (is this a test forum?) Hi all, I've got a bit of a problem... One of my Disks (Disk 5) failed last week and after reseating the state, trying a new one and using a different power, I decided to use a backup drive that I had already precleared (and rebuild from parity). The rebuild seemed to be going well, but I may have knocked the SATA cable of Disk 3 and it became unavailable (and also emulated). I have two parity drives. I noticed that the parity rebuild sped up by much when this happened. I still let it finish. With disk 5 rebuilt and Disk 3's SATA cable reseated correctly, I also rebuilt disk 3. Now, my party is valid (all green). I now have an issue where when I mount the array, I have a message that says unmountable disks present. I mounted it in maintenance mode and ran the "Check Filesystem Status" for each disk -- here are the results: Disk 3 xfs_repair status: Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Disk 5 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. I am not entirely sure what this means or what is going on. I am really concerned about my data right now... Is there any way to fix this without the loss of data? Any help will be much appreciated! Quote Link to comment
JorgeB Posted March 19, 2019 Share Posted March 19, 2019 Do you have the diagnostics from the rebuild? This part makes me suspect it didn't complete successfully: 7 hours ago, lankanmon said: I noticed that the parity rebuild sped up by much when this happened. In any case you'll need to run xfs_repair without -n, but superblock corruption is not a very good sign. Quote Link to comment
lankanmon Posted March 19, 2019 Author Share Posted March 19, 2019 I have attached the diagnostics. I did find that strange too, Is there any way to determine if data is corrupted? And how well does xfs repair work? lknserver-diagnostics-20190319-0451.zip Quote Link to comment
JorgeB Posted March 19, 2019 Share Posted March 19, 2019 32 minutes ago, lankanmon said: I have attached the diagnostics. Those are after rebooting, so not much help. 32 minutes ago, lankanmon said: Is there any way to determine if data is corrupted? Not easily without the diags from the rebuild and/or checksums from all files. 33 minutes ago, lankanmon said: And how well does xfs repair work? It usually works well, but it can't do miracles, if the rebuilds are incomplete there will be data loss. Quote Link to comment
lankanmon Posted March 19, 2019 Author Share Posted March 19, 2019 Okay, I will run the xfs_repair without -n for each drive and report back. Thanks! Quote Link to comment
lankanmon Posted March 19, 2019 Author Share Posted March 19, 2019 Update: when I am trying to run it on Disk 3, it os giving this Error: Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. So, do I run it with the -L option? Is there any other way to mount to filesystem? Quote Link to comment
JorgeB Posted March 19, 2019 Share Posted March 19, 2019 41 minutes ago, lankanmon said: So, do I run it with the -L option? Yep. Quote Link to comment
lankanmon Posted March 19, 2019 Author Share Posted March 19, 2019 Okay, I have finished running the xfs_repair on both drives: Disk 3: Phase 1 - find and verify superblock... sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_icount 0, counted 210816 sb_ifree 0, counted 6594 sb_fdblocks 976277683, counted 182759326 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 2 - agno = 1 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (34:652511) is ahead of log (1:2). Format log to cycle 37. done Disk 5 (took much longer): Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_icount 0, counted 55660224 sb_ifree 0, counted 142 sb_fdblocks 976277683, counted 343482943 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Note - stripe unit (0) and width (0) were copied from a backup superblock. Please reset with mount -o sunit=,swidth= if necessary Maximum metadata LSN (59:518163) is ahead of log (1:2). Format log to cycle 62. done The drives still show "Unmountable: No file system" Do I need to restart or unmount & remount now to get the drives to mount? Also, there is a note at the bottom of the Disk 5 log, do I need to run any of those commands? Quote Link to comment
JorgeB Posted March 19, 2019 Share Posted March 19, 2019 4 minutes ago, lankanmon said: The drives still show "Unmountable: No file system" That's weird, post new diags after starting the array. Quote Link to comment
lankanmon Posted March 19, 2019 Author Share Posted March 19, 2019 I did not restart before. I have restarted now and mounted the array (not on maintenance mode) and they now appear to show up as normal (xfs). I would also know where do I go from here? Is there any way to determine what files may have been corrupted? I noticed a mention of "lost+found" in the logs above... Is that something that I can actually access? Also, would my parity be valid right now? Should I run a parity check (and if so, should I write corrections to parity)? Please let me know... I really appreciate your help! Thank you so much! Quote Link to comment
JorgeB Posted March 19, 2019 Share Posted March 19, 2019 19 minutes ago, lankanmon said: Is there any way to determine what files may have been corrupted? Like mentioned only if you have checksums for all files, or backups to compare to. 19 minutes ago, lankanmon said: I noticed a mention of "lost+found" in the logs above... Is that something that I can actually access? Check if that folder exists on both disks, and for any data there. 20 minutes ago, lankanmon said: Should I run a parity check (and if so, should I write corrections to parity)? Since there are no diags from the rebuild it's a good idea. Quote Link to comment
lankanmon Posted March 20, 2019 Author Share Posted March 20, 2019 I did a parity check and it did complete successfully with 0 errors. So I hope everything is well. I do have some backups (although not of the entire server), and will see if I can verify data integrity. Thanks for all of your help! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.