Unmountable drive fixed with xfs_repair. Now over 600GB of data is missing

paululibro · November 27, 2021

So one of my drives was unmounted and was showing "Unmountable: not mounted". I followed this guide and was able to mount the drive. But now some of its data is missing and my server shows more free space than before (~600GB).

Is it possible to rebuild that data from parity or was it overwritten with xfs_repair?

trurl · November 28, 2021

attach diagnostics to your NEXT post in this thread.

paululibro · November 28, 2021

Attachments

orion2-diagnostics-20211128-1028.zip

paululibro · November 28, 2021

So should I try to rebuild from parity? Is it even possible?

trurl · November 28, 2021

Haven't had a chance to look at diagnostics yet but thought I should respond before you do what you are thinking about doing.

18 minutes ago, paululibro said:

So should I try to rebuild from parity? Is it even possible?

No

trurl · November 28, 2021

Disk3 seems to be disabled and unassigned, but can't tell whether or not the emulated disk is unmountable since the array isn't started. Also, can't tell what happened since you rebooted before getting diagnostics so syslog reset.

Post a screenshot of Main - Array Devices.

trurl · November 28, 2021

2 minutes ago, trurl said:

Disk3 seems to be disabled and unassigned

Did you unassign disk3 yourself?

trurl · November 28, 2021

OK, after looking more closely at your syslog, it seems you had already started to rebuild disk3.

It looks like emulated disk3 was mounting though.

Start the array with disk3 unassigned and post new diagnostics.

paululibro · November 29, 2021

Here is what happened:

1. My Disk 3 was showing up as "Unmountable: not mounted"

2. I ran xfs_repair on it and it was mounted again but was missing over 600GB of data

3. I unplugged it and replaced it with brand new disk

4. After boot, rebuilding from parity started automatically

5. After it finished, new Disk 3 is mounted but with zero data on it

Now:

- I plugged old Disk 3, with new Disk 3 still connected and mounted old Disk 3 with "Unassigned Devices" plugin

- When old Disk 3 is mounted it shows that it has zero data on it BUT when old Disk 3 is mounted, new Disk 3 shows up "Unmountable: not mounted". When I unplugged old Disk 3 and rebooted, new Disk 3 is mounted again but with zero data

I'm confused, beacuse old Disk 3 should still have remaining data.

Should I start array with old or new Disk 3 unassigned?

Edited November 29, 2021 by paululibro

trurl · November 29, 2021

1 hour ago, paululibro said:

new Disk 3 is mounted but with zero data on it

Are you sure you didn't format?

1 hour ago, paululibro said:

Should I start array with old or new Disk 3 unassigned?

No disk assigned as disk3.

paululibro · November 29, 2021

6 hours ago, trurl said:

Are you sure you didn't format?

It was a brand new empty disk. When you said it was not possible to rebuild data lost with xfs_repair I was at least hoping it would have the rest of data from old Disk 3.

Here are diagnostics with old Disk 3 (the one that should have data but shows empty) plugged in but unassigned and new Disk 3 not connected at all.

orion2-diagnostics-20211129-1104.zip

paululibro · November 29, 2021

Also when I click on "File System Check" next to unassigned old Disk 3:

image.png.6fd121acb2d157f728e83ef799b9e0e2.png

I'm getting these logs:

Spoiler

FS: xfs

Executing file system check: /sbin/xfs_repair -n /dev/sdg1 2>&1

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x37fffffd0/0x1000
btree block 7/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 7/1
Metadata corruption detected at 0x43cc88, xfs_cntbt block 0x37fffffd8/0x1000
btree block 7/2 is suspect, error -117
bad magic # 0x49414233 in btcnt block 7/2
agf_freeblks 74458438, counted 0 in ag 7
agf_longest 74458438, counted 0 in ag 7
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x37fffffe0/0x1000
btree block 7/3 is suspect, error -74
bad magic # 0x58444233 in inobt block 7/3
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x80000000/0x1000
btree block 1/1 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0xfffffff8/0x1000
btree block 2/1 is suspect, error -74
bad magic # 0x64383a61 in btbno block 2/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x27fffffe0/0x1000Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x1ffffffe8/0x1000
btree block 4/1 is suspect, error -117

btree block 5/1 is suspect, error -74
bad magic # 0x205f5f5f in btbno block 5/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x2ffffffd8/0x1000
btree block 6/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 6/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x17ffffff0/0x1000
btree block 3/1 is suspect, error -74
bad magic # 0x51380c72 in btbno block 3/1
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x80000008/0x1000
btree block 1/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x100000000/0x1000
btree block 2/2 is suspect, error -74
bad magic # 0xe12ea68c in btcnt block 2/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x1fffffff0/0x1000
btree block 4/2 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x27fffffe8/0x1000
btree block 5/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x2ffffffe0/0x1000
btree block 6/2 is suspect, error -74
bad magic # 0x54686973 in btcnt block 6/2
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x17ffffff8/0x1000
btree block 3/2 is suspect, error -74
bad magic # 0x17b63fbe in btcnt block 3/2
agf_freeblks 268435445, counted 750 in ag 1
agf_freeblks 268435445, counted 0 in ag 2
agf_longest 268435445, counted 0 in ag 2
agf_longest 268435445, counted 5 in ag 1
agf_freeblks 268435445, counted 567 in ag 5
agf_longest 268435445, counted 4 in ag 5
agf_freeblks 267913717, counted 1320 in ag 4
agf_longest 267913717, counted 9 in ag 4
agf_freeblks 268435445, counted 0 in ag 6
agf_longest 268435445, counted 0 in ag 6
agf_freeblks 268435445, counted 0 in ag 3
agf_longest 268435445, counted 0 in ag 3
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x80000010/0x1000
btree block 1/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x100000008/0x1000
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x1fffffff8/0x1000btree block 2/3 is suspect, error -74
bad magic # 0x5829dbd5 in inobt block 2/3

btree block 4/3 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x27ffffff0/0x1000
btree block 5/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x180000000/0x1000
btree block 3/3 is suspect, error -74
bad magic # 0xd3a6ff15 in inobt block 3/3
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x2ffffffe8/0x1000
btree block 6/3 is suspect, error -74
bad magic # 0x546f7272 in inobt block 6/3
agi_count 0, counted 1728 in ag 1
agi_freecount 0, counted 191 in ag 1
agi_count 0, counted 2016 in ag 4
agi_freecount 0, counted 134 in ag 4
agi_count 0, counted 2336 in ag 5
agi_freecount 0, counted 102 in ag 5
sb_icount 64, counted 6144
sb_ifree 61, counted 488
sb_fdblocks 1952984849, counted 268438106
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
free space (1,2952230-2952262) only seen by one free space btree
free space (1,2952264-2952332) only seen by one free space btree
free space (1,2952334-2952418) only seen by one free space btree
[[[ IT GOES FOR THE NEXT ~1400 lines but with different values ]]]

- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 0
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
Maximum metadata LSN (1610016023:-369629521) is ahead of log (1:58).
Would format log to cycle 1610016026.
No modify flag set, skipping filesystem flush and exiting.

Edited November 29, 2021 by paululibro

paululibro · November 29, 2021

I read multiple posts related to "Unmountable: not mounted" error and I can’t find anything similar to my issue. Like:

- what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors?

- and why content of original Disk 3 wasn’t rebuilt to new Disk 3 even if parity was valid and finished with 0 errors?

Squid · November 29, 2021

4 minutes ago, paululibro said:

what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors

Yes

6 minutes ago, paululibro said:

- and why content of original Disk 3 wasn’t rebuilt to new Disk 3 even if parity was valid and finished with 0 errors?

Based upon the last set of diagnostics you posted, it really looks like at some point during all of this when starting the array and the disk came up as unmountable you hit the check box for format and acknowledged the pop up that stated it's never part of a rebuild operation

/dev/md3        7.3T   52G  7.3T   1% /mnt/disk3

So in a nutshell the system did what was asked and formatted the drive (or the emulated version) and subsequently rebuilt a blank filesystem.

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.

trurl · November 29, 2021

12 hours ago, trurl said:

Are you sure you didn't format?

6 hours ago, paululibro said:

It was a brand new empty disk.

You didn't answer my question.

paululibro · November 29, 2021

Quote

Are you sure you didn’t format?

I’m not 100% sure but I don’t think so. I connected new disk, assigned it to the Disk 3 and started array. I got a message that replacement disk was found, disk 3 is not ready and that Parity-Sync/Data-Rebuild is in progress.

Edited November 29, 2021 by paululibro

paululibro · November 29, 2021

1 hour ago, Squid said:

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.

So how do I proceed? Currently new Disk 3 is mounted and empty and original Disk 3 is unassigned and mounted to /dev/sdg. Checking file system gives the same logs as posted before. I just use xfs_repair but point it to /dev/sdg1?

trurl · November 29, 2021

3 minutes ago, paululibro said:

Checking file system gives the same logs as posted before. I just use xfs_repair but point it to /dev/sdg1?

The check results you posted before were already using /dev/sdg1.

6 hours ago, paululibro said:
No modify flag set

You just need to remove the -n (no modify)

paululibro · November 30, 2021

Quote

Quote

what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors

Yes

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.

I ran xfs_repair:

Spoiler

root@Orion2:~# xfs_repair /dev/sdg1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x80000000/0x1000
btree block 1/1 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x80000008/0x1000
btree block 1/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0xfffffff8/0x1000
btree block 2/1 is suspect, error -74
bad magic # 0x64383a61 in btbno block 2/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x17ffffff0/0x1000
btree block 3/1 is suspect, error -74
bad magic # 0x51380c72 in btbno block 3/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x37fffffd0/0x1000
btree block 7/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 7/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x27fffffe0/0x1000
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x2ffffffd8/0x1000
btree block 6/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 6/1
btree block 5/1 is suspect, error -74
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x1ffffffe8/0x1000
bad magic # 0x205f5f5f in btbno block 5/1
btree block 4/1 is suspect, error -117
agf_freeblks 268435445, counted 750 in ag 1
agf_longest 268435445, counted 5 in ag 1
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x100000000/0x1000
btree block 2/2 is suspect, error -74
bad magic # 0xe12ea68c in btcnt block 2/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x1fffffff0/0x1000
btree block 4/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x17ffffff8/0x1000
btree block 3/2 is suspect, error -74
bad magic # 0x17b63fbe in btcnt block 3/2
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x2ffffffe0/0x1000
btree block 6/2 is suspect, error -74
bad magic # 0x54686973 in btcnt block 6/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x27fffffe8/0x1000
btree block 5/2 is suspect, error -117
Metadata corruption detected at 0x43cc88, xfs_cntbt block 0x37fffffd8/0x1000
btree block 7/2 is suspect, error -117
bad magic # 0x49414233 in btcnt block 7/2
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x80000010/0x1000
btree block 1/3 is suspect, error -117
agf_freeblks 268435445, counted 0 in ag 2
agf_longest 268435445, counted 0 in ag 2
agf_freeblks 268435445, counted 0 in ag 3
agf_longest 268435445, counted 0 in ag 3
agf_freeblks 267913717, counted 1320 in ag 4
agf_longest 267913717, counted 9 in ag 4
agf_freeblks 268435445, counted 567 in ag 5
agf_longest 268435445, counted 4 in ag 5
agf_freeblks 268435445, counted 0 in ag 6
agf_longest 268435445, counted 0 in ag 6
agf_freeblks 74458438, counted 0 in ag 7
agf_longest 74458438, counted 0 in ag 7
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x100000008/0x1000
btree block 2/3 is suspect, error -74
bad magic # 0x5829dbd5 in inobt block 2/3
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x27ffffff0/0x1000
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x1fffffff8/0x1000
btree block 4/3 is suspect, error -117
btree block 5/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x180000000/0x1000
btree block 3/3 is suspect, error -74
bad magic # 0xd3a6ff15 in inobt block 3/3
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x37fffffe0/0x1000
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x2ffffffe8/0x1000
btree block 7/3 is suspect, error -74
btree block 6/3 is suspect, error -74
bad magic # 0x58444233 in inobt block 7/3
bad magic # 0x546f7272 in inobt block 6/3
agi_count 0, counted 1728 in ag 1
agi_freecount 0, counted 191 in ag 1
agi_count 0, counted 2016 in ag 4
agi_freecount 0, counted 134 in ag 4
agi_count 0, counted 2336 in ag 5
agi_freecount 0, counted 102 in ag 5
sb_icount 64, counted 6144
sb_ifree 61, counted 488
sb_fdblocks 1952984849, counted 268438106
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1610016023:-369629521) is ahead of log (1:64).
Format log to cycle 1610016026.
done

But the original drive still shows up empty:

Also when old disk is mounted, new disk is unmountable:

And now, there is also a "Format" button next to the old drive. If it's the one we talked about earlier then I'm 100% sure I didn't format:

Started array in the maintenance mode and checked both drives:

Original drive:

Spoiler

FS: xfs

Executing file system check: /sbin/xfs_repair -n /dev/sdg1 2>&1

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 6
- agno = 7
- agno = 4
- agno = 5
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

New drive:

Spoiler

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

So what do I do now?

orion2-diagnostics-20211130-1115.zip

Edited November 30, 2021 by paululibro

trurl · November 30, 2021

Nov 29 17:54:22 Orion2 kernel: XFS (sdg1): Filesystem has duplicate UUID e2cc6cf0-db52-44ea-8bef-9974b89a834f - can't mount
Nov 30 10:29:35 Orion2 kernel: XFS (md3): Filesystem has duplicate UUID e2cc6cf0-db52-44ea-8bef-9974b89a834f - can't mount

The unassigned disk and disk3 have the same uuid. You will have to change the uuid of the unassigned disk. Click on the Settings icon for the unassigned disk.

paululibro · November 30, 2021

I'm trying to generate new UUID but I'm getting timeout error in syslog:

Nov 30 22:24:29 Orion2 unassigned.devices: Error: shell_exec(/usr/sbin/xfs_admin -U generate '/dev/sdg1') took longer than 20s!
Nov 30 22:24:29 Orion2 unassigned.devices: Changed partition UUID on '/dev/sdg1' with result: command timed out

After clicking "Change UUID" again I'm getting this:

Nov 30 22:24:47 Orion2 unassigned.devices: Changed partition UUID on '/dev/sdg1' with result: ERROR: cannot find log head/tail, run xfs_repair

So I run xfs_repair with required -L flag and try to generate again but it's back to timeout error and head/tail error. I'm doing it in the maintenance mode.

orion2-diagnostics-20211130-2231.zip

paululibro · November 30, 2021

I tried to generate it manually:

root@Orion2:~# xfs_admin -U generate /dev/sdg1
Clearing log and setting UUID
writing all SBs
new UUID = ab1f65dd-e188-4c67-afc6-9f89fc139e93

And now both disks are mounted but original disk still shows up as empty:

root@Orion2:/mnt/disks/WDC_WD80EZAZ-11TDBA0_2SG425ZW# df
Filesystem       1K-blocks        Used  Available Use% Mounted on
/dev/sdg1       7811939620    54499088 7757440532   1% /mnt/disks/WDC_WD80EZAZ-11TDBA0_2SG425ZW
/dev/md1        7811939620  7807708212    4231408 100% /mnt/disk1
/dev/md2        7811939620  7802174388    9765232 100% /mnt/disk2
/dev/md3        7811939620    54499088 7757440532   1% /mnt/disk3
/dev/md4        7811939620  7799822864   12116756 100% /mnt/disk4
/dev/md5        7811939620  7791383328   20556292 100% /mnt/disk5
/dev/sdc1        500107576   126153784  372420104  26% /mnt/cache

trurl · December 1, 2021

The first diagnostics you posted were after reboot so can't see what you might have done before that, but it really seems like you must have formatted the original disk before replacing it.

paululibro · December 2, 2021

Well, that's unfortunate. Thanks for all your help anyway.

Unmountable drive fixed with xfs_repair. Now over 600GB of data is missing

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation