Unmountable disk present

matt_webb · September 21, 2016

Hi all,

Sorry, there's a bit of a story here, I don't know how much of it is relevant, but just in case...

I got an issue with my server. A few weeks ago, I received an email of an "unclean shutdown was detected". I'm on a UPS - but it's an EATON, so my NUT settings must be wrong.

Otherwise the server was running happily until upgrading to 6.2 (though I think this may be coincidence). I had a few server freezes, especially when streaming from Plex. This required me to hard shut and boot the server. This happened a few times. Even the console was frozen.

I then uninstalled all plug-ins and dockers. This resulted in the stability I had been used to, though I did get a few Party warnings after a Parity check.

So a few days ago, I started putting a few dockers back in. The Plex server seemed to be happy streaming for two days until today.

When I got home, Plex wasn't running, so I had a look and the disk with a number of files and the docker is being reported as "Unmountable".

I'm not totally sure what to do next, so please guide me in the right direction. Attached are the logs.

Thanks in advance.

Cheers,

Matt.

familyserver-diagnostics-20160921-1939.zip

itimpi · September 21, 2016

Looking at the syslog I think you have file system level corruption on disk1.

To check for this and correct it you should stop the array, restart it in Maintenance mode and then click on disk1 to get to the dialog for running the file system check.

matt_webb · September 21, 2016

Hi,

Thanks for your analysis and guidance.

- Stopped Array

- Checked "Maintenance mode"

- Clicked Start

(Now says "Started - Maintenance Mode")

The section "Check Filesystem Status" looks like the attached pic. Should I go ahead and click "Check"?

Thanks again.

Cheers,

Matt.

matt_webb · September 21, 2016

OK, noticed that's a read-only task and your reply says it needs a FS check, so went ahead.

Here's the output. Thanks.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
Metadata corruption detected at xfs_agf block 0x1/0x200
flfirst 118 in agf 0 too large (max = 118)
agf 118 freelist blocks bad, skipping freelist scan
agi unlinked bucket 9 is 362701321 in ag 0 (inode=362701321)
sb_ifree 12196, counted 11894
sb_fdblocks 41756354, counted 41734366
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 362701321, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 362701321 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.

itimpi · September 21, 2016

That confirms there is some corruption. If you remove the -n option and try again it should fix the issue.

matt_webb · September 21, 2016

Thanks - this was the output:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

itimpi · September 21, 2016

Since you are unable to mount the drive that means you will need to run with the -L option. That means there is a faint chance of the last few files being lost although in my experience most of the times that does not happen.

When the repair completes you want to check if a lost+found folder is created that contains any files that could not be properly identified.

matt_webb · September 21, 2016

Thanks again. This is the output:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
Metadata corruption detected at xfs_agf block 0x1/0x200
flfirst 118 in agf 0 too large (max = 118)
agi unlinked bucket 9 is 362701321 in ag 0 (inode=362701321)
sb_ifree 12196, counted 11894
sb_fdblocks 41756354, counted 41734372
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 362701321, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (15:862229) is ahead of log (1:2).
Format log to cycle 18.
done

And this was the output when I reran with the -L option:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

So I'm guessing I can take the array out of maintenance mode and rebuild the array?

Thanks again for your awesome help.

Cheers,

Matt.

itimpi · September 21, 2016

All you need to do is stop the array, and then restart it in normal mode. The disk should now mount just fine. Because you did the repair in Maintenance mode parity will have been maintained.

matt_webb · September 21, 2016

Looks like its all back up. Went to the lost and found share and there's one 0 byte file only. I might go into my backups and see what file(s) were modified at that date.

Ill take the array offline to also figure out if the ups is working and configured properly.

Thanks again itimpi !

Sent from my SM-N920I using Tapatalk

Unmountable disk present

Recommended Posts

matt_webb

Link to comment

itimpi

Link to comment

matt_webb

Link to comment

matt_webb

Link to comment

itimpi

Link to comment

matt_webb

Link to comment

itimpi

Link to comment

matt_webb

Link to comment

itimpi

Link to comment

matt_webb

Link to comment

Join the conversation