unmountable XFS data disk

ShaneH · January 22, 2019

Ok this does not look fun. Parity sync is complete. 0 errors. 8tb disk now has an unmountable file system.

Here we go:

--------------------------

Event: Unraid Disk 2 message
Subject: Notice [TOWER] - Disk 2 returned to normal operation
Description: ST8000AS0002-1NA17Z_Z840E3TF (sdd)

--------------------------

Event: Unraid Parity sync / Data rebuild
Subject: Notice [TOWER] - Parity sync / Data rebuild finished (0 errors)
Description: Duration: 14 hours, 40 minutes, 10 seconds. Average speed: 189.4 MB/s

--------------------------

trurl · January 22, 2019

Looks good. You will have to repair the filesystem on disk2 as expected.

Click on Disk 2 to get to its page. You should see a section Check Filesystem Status. The button will be disabled telling you to be in Maintenance Mode.

Stop the array, start it in Maintenance mode and go back to that page and click the button. Post your results.

ShaneH · January 22, 2019

I started the array in Maintenance mode
I clicked the check file system button for disk2.

Results are:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_fdblocks 1530287853, counted 1532948218
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
would have corrected directory 99 size from 95 to 89
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
would have corrected directory 99 size from 95 to 89
        - agno = 1
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
Metadata corruption detected at 0x44f20d, inode 0x63 data fork
couldn't map inode 99, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 2147483744, would move to lost+found
disconnected dir inode 8657675550, would move to lost+found
disconnected dir inode 10737418336, would move to lost+found
disconnected dir inode 10737856698, would move to lost+found
Phase 7 - verify link counts...
Metadata corruption detected at 0x44f20d, inode 0x63 data fork
couldn't map inode 99, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

JonathanM · January 23, 2019

I seem to remember some issues with the xfs repair procedure in prior versions of unraid. Since you are on an older version (6.6.3), maybe @johnnie.black or @trurl has a better memory of which versions were affected. I would hold tight where you are at right now until somebody confirms it's safe to continue the repair on that specific unraid version, or perhaps it would be better to upgrade unraid first.

trurl · January 23, 2019

This bug report - [6.6.6] XFSPROGS 4.16.X VERSION OF XFS_REPAIR HAS BUG IN PHASE6.C

is for 6.6.6 but might have been on earlier versions as well. That report says solved in 6.7.0-rc1

The linked threads in that report seem similar to this xfs_repair result we just got here.

So it looks like we have come full circle on this. To recap:

OP started this thread after he had removed the disk and repaired it in another system, thus invalidating parity. Then, through a misunderstanding, partially my fault, he rebuilt the disk, returning it to its original state. Now he has attempted the repair in Unraid and encountered what may be a bug in his version of Unraid that prevents the repair from completing.

I just reviewed this thread and the rest of his post history and found no indication that he repaired the disk on another system purposely so as to avoid this bug. But it looks like that might have been a valid approach after all. Of course then a parity sync would have been needed and that was mentioned early in this thread.

I guess at this point the way forward is to upgrade Unraid. Either that or go back and do it all again the "wrong way" on another system like he did before and resync parity.

I hesitate to make any firm recommendations at this point without other opinions. I'm just going to tag @johnnie.black again and see if he has other ideas.

JorgeB · January 23, 2019

8 hours ago, ShaneH said:

No modify flag set, skipping filesystem flush and exiting.

First thing would be to run xfs_repair without -n.

trurl · January 23, 2019

OK. I was afraid maybe this part in phase 6

13 hours ago, ShaneH said:

Phase 6 - check inode connectivity...
- traversing filesystem ...
Metadata corruption detected at 0x44f20d, inode 0x63 data fork
couldn't map inode 99, err = 117

was related to the bug

@ShaneH proceed

5 hours ago, johnnie.black said:

First thing would be to run xfs_repair without -n.

JorgeB · January 23, 2019

16 minutes ago, trurl said:

was related to the bug

Doesn't look like it is, if xfs_repair without -n fails then the the OP can try again after upgrading xfsprogs or Unraid.

ShaneH · January 23, 2019

OK, I went to Disk 2 I cleared the "-n" from the "options" box after check.

I then ran check.

Output:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

itimpi · January 23, 2019

3 minutes ago, ShaneH said:

OK, I went to Disk 2 I cleared the "-n" from the "options" box after check.

I then ran check.

Output:

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

That is not at all unusual! You can run with the -L option. In the vast majority of cases there is no data loss at all, and even if there is it is only likely to affect the last file written

ShaneH · January 24, 2019

Hello, @trurl

Are we going to proceed with using the -L option? I don't want to discount itimpi but you have been a strong voice in this topic.

Thanks

trurl · January 24, 2019

1 hour ago, ShaneH said:

Are we going to proceed with using the -L option?

Yes

ShaneH · January 24, 2019

What is next?

Output:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_fdblocks 1530287853, counted 1532948218
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
corrected directory 99 size, was 95, now 89
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 0
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:1995504) is ahead of log (1:2).
Format log to cycle 4.
done

trurl · January 24, 2019

Looks like it completed. Start the array in normal mode and see if the drive is mountable now.

ShaneH · January 25, 2019

The array is up. Disk 2 is mounted and things are looking way better.

Do we have any more disk integrity checks to preform?

What are the next steps?

trurl · January 25, 2019

Check the lost+found folder.

ShaneH · January 25, 2019

I went to the "main" page, then clicked on the folder icon (on the right) for disk2. I also check the unRaid terminal.

I do not see a lost+found folder.

trurl · January 25, 2019

Nothing else to be done then. Have you got all your files back?

itimpi · January 25, 2019

Just now, ShaneH said:

I went to the "main" page, then clicked on the folder icon (on the right) for disk2. I also check the unRaid terminal.

I do not see a lost+found folder.

That is probably a good sign! The lost+found folder is only created if the repair process found some files where it could not correctly identify the name.

ShaneH · January 25, 2019

Hello. All the apps are gone from "Dashboard". Dockers and VMs are gone. I can rebuild.

I am going to do a file compare to look for missing files. It is looking good with a quick glance.

I need to do a better check. (I was not really using the array until we were finished.)

trurl · January 25, 2019

6 minutes ago, ShaneH said:

All the apps are gone from "Dashboard". Dockers and VMs are gone.

Looked at your diagnostics again. Unfortunately I can't tell from the diagnostics with that "older" version of Unraid exactly which disk(s) your system share was on. It is cache-prefer, but if you set these up before adding cache then probably they never got moved to cache and maybe they didn't survive the repair of disk2.

You can reinstall your dockers exactly as they were before using the Previous Apps feature on the Apps page.

ShaneH · January 28, 2019

Hello, The video files seem to have returned since the rebuild. I copied all pictures from a backup on top off the unRaid pictures share. Very few pictures were actually copied to the array. I am going to look at the documents folder but that will be a slow process.
Things are looking very good with the array. I am going to upgrade my back up process.

Thank you very much for all of the help and your time.

trurl · January 28, 2019

👍

unmountable XFS data disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation