grphx Posted July 1, 2017 Share Posted July 1, 2017 I'm unable to mount my 4 disks(3+1). It seems to take a while and then the web gui stops responding. I can still SSH into it but some functions(reboot) won't respond either. I've attached my diag zip file but if you can tell me where in the diag you are looking if you find out what's causing the disks to fail to mount, that would be appreciated. I'm all about learning. tower-diagnostics-20170701-1528.zip EDIT: Disk1 has filesystem corruption, I ran a repair on /dev/md1 and now I'm back up and running! Link to comment
JorgeB Posted July 1, 2017 Share Posted July 1, 2017 Probably a xfs disk with filesystem corruption, grab the diagnostics on the console/ssh after starting the array by typing diagnostics or start in maintenance mode and check filesystem on all xfs disks: https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 18 minutes ago, johnnie.black said: Probably a xfs disk with filesystem corruption, grab the diagnostics on the console/ssh after starting the array by typing diagnostics or start in maintenance mode and check filesystem on all xfs disks: https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS Is there a way to run a check on all disks with one command or do I need to run it against each one separately? Link to comment
JorgeB Posted July 1, 2017 Share Posted July 1, 2017 One by one, xfs data disks only. Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 Quote Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... sb_fdblocks 105557487, counted 106538526 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 3 - agno = 0 - agno = 2 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (28:1117792) is ahead of log (22:3417256). Would format log to cycle 31. No modify flag set, skipping filesystem flush and exiting. I ran a check on a random disk. Unsure if this is the problematic drive or not. What would it say if it had corruption? Link to comment
JorgeB Posted July 1, 2017 Share Posted July 1, 2017 Or post the after start diags and the problem disk should be visible. Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 I'm guessing disk1 is the problem but please take a look yourself. tower-diagnostics-20170701-1624.zip Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 Quote Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0x15d508ec9/0x200 flfirst 118 in agf 3 too large (max = 118) agf 118 freelist blocks bad, skipping freelist scan agi unlinked bucket 6 is 14870406 in ag 3 (inode=6457321350) agi unlinked bucket 8 is 14806472 in ag 3 (inode=6457257416) agi unlinked bucket 19 is 98076627 in ag 3 (inode=6540527571) agi unlinked bucket 60 is 98046204 in ag 3 (inode=6540497148) agi unlinked bucket 63 is 14884799 in ag 3 (inode=6457335743) sb_ifree 54216, counted 50595 sb_fdblocks 77913191, counted 78974531 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 6457257416, would move to lost+found disconnected inode 6457321350, would move to lost+found disconnected inode 6457335743, would move to lost+found disconnected inode 6540497148, would move to lost+found disconnected inode 6540527571, would move to lost+found Phase 7 - verify link counts... would have reset inode 6457257416 nlinks from 0 to 1 would have reset inode 6457321350 nlinks from 0 to 1 would have reset inode 6457335743 nlinks from 0 to 1 would have reset inode 6540497148 nlinks from 0 to 1 would have reset inode 6540527571 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting. Oh yeah this looks scarier than the other disk. Should I run xfs_repair /dev/md1 or should I use any flags to the command? Link to comment
JorgeB Posted July 1, 2017 Share Posted July 1, 2017 you can use -v, and if it asks for it, -L. Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 Quote xfs_repair /dev/md1 -v Phase 1 - find and verify superblock... - block cache size set to 1469720 entries Phase 2 - using internal log - zero log... zero_log: head block 2984339 tail block 2918920 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. I'm assuming this is "asking for -L" since I can't mount the disks. Link to comment
JorgeB Posted July 1, 2017 Share Posted July 1, 2017 Yes, it's normal in these cases and usually there's no data loss. Link to comment
grphx Posted July 1, 2017 Author Share Posted July 1, 2017 Success! Repair was complete and my system is back up and running! Thanks a lot! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.