matt_webb Posted September 21, 2016 Share Posted September 21, 2016 Hi all, Sorry, there's a bit of a story here, I don't know how much of it is relevant, but just in case... I got an issue with my server. A few weeks ago, I received an email of an "unclean shutdown was detected". I'm on a UPS - but it's an EATON, so my NUT settings must be wrong. Otherwise the server was running happily until upgrading to 6.2 (though I think this may be coincidence). I had a few server freezes, especially when streaming from Plex. This required me to hard shut and boot the server. This happened a few times. Even the console was frozen. I then uninstalled all plug-ins and dockers. This resulted in the stability I had been used to, though I did get a few Party warnings after a Parity check. So a few days ago, I started putting a few dockers back in. The Plex server seemed to be happy streaming for two days until today. When I got home, Plex wasn't running, so I had a look and the disk with a number of files and the docker is being reported as "Unmountable". I'm not totally sure what to do next, so please guide me in the right direction. Attached are the logs. Thanks in advance. Cheers, Matt. familyserver-diagnostics-20160921-1939.zip Quote Link to comment
itimpi Posted September 21, 2016 Share Posted September 21, 2016 Looking at the syslog I think you have file system level corruption on disk1. To check for this and correct it you should stop the array, restart it in Maintenance mode and then click on disk1 to get to the dialog for running the file system check. Quote Link to comment
matt_webb Posted September 21, 2016 Author Share Posted September 21, 2016 Hi, Thanks for your analysis and guidance. - Stopped Array - Checked "Maintenance mode" - Clicked Start (Now says "Started - Maintenance Mode") The section "Check Filesystem Status" looks like the attached pic. Should I go ahead and click "Check"? Thanks again. Cheers, Matt. Quote Link to comment
matt_webb Posted September 21, 2016 Author Share Posted September 21, 2016 OK, noticed that's a read-only task and your reply says it needs a FS check, so went ahead. Here's the output. Thanks. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0x1/0x200 flfirst 118 in agf 0 too large (max = 118) agf 118 freelist blocks bad, skipping freelist scan agi unlinked bucket 9 is 362701321 in ag 0 (inode=362701321) sb_ifree 12196, counted 11894 sb_fdblocks 41756354, counted 41734366 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 362701321, would move to lost+found Phase 7 - verify link counts... would have reset inode 362701321 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
itimpi Posted September 21, 2016 Share Posted September 21, 2016 That confirms there is some corruption. If you remove the -n option and try again it should fix the issue. Quote Link to comment
matt_webb Posted September 21, 2016 Author Share Posted September 21, 2016 Thanks - this was the output: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
itimpi Posted September 21, 2016 Share Posted September 21, 2016 Since you are unable to mount the drive that means you will need to run with the -L option. That means there is a faint chance of the last few files being lost although in my experience most of the times that does not happen. When the repair completes you want to check if a lost+found folder is created that contains any files that could not be properly identified. Quote Link to comment
matt_webb Posted September 21, 2016 Author Share Posted September 21, 2016 Thanks again. This is the output: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0x1/0x200 flfirst 118 in agf 0 too large (max = 118) agi unlinked bucket 9 is 362701321 in ag 0 (inode=362701321) sb_ifree 12196, counted 11894 sb_fdblocks 41756354, counted 41734372 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 362701321, moving to lost+found Phase 7 - verify and correct link counts... Maximum metadata LSN (15:862229) is ahead of log (1:2). Format log to cycle 18. done And this was the output when I reran with the -L option: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. So I'm guessing I can take the array out of maintenance mode and rebuild the array? Thanks again for your awesome help. Cheers, Matt. Quote Link to comment
itimpi Posted September 21, 2016 Share Posted September 21, 2016 All you need to do is stop the array, and then restart it in normal mode. The disk should now mount just fine. Because you did the repair in Maintenance mode parity will have been maintained. Quote Link to comment
matt_webb Posted September 21, 2016 Author Share Posted September 21, 2016 Looks like its all back up. Went to the lost and found share and there's one 0 byte file only. I might go into my backups and see what file(s) were modified at that date. Ill take the array offline to also figure out if the ups is working and configured properly. Thanks again itimpi ! Sent from my SM-N920I using Tapatalk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.