Jump to content

Unmountable drive fixed with xfs_repair. Now over 600GB of data is missing


Recommended Posts

Here is what happened:

1. My Disk 3 was showing up as "Unmountable: not mounted"

2. I ran xfs_repair on it and it was mounted again but was missing over 600GB of data

3. I unplugged it and replaced it with brand new disk

4. After boot, rebuilding from parity started automatically

5. After it finished, new Disk 3 is mounted but with zero data on it

 

Now:

- I plugged old Disk 3, with new Disk 3 still connected and mounted old Disk 3 with "Unassigned Devices" plugin

- When old Disk 3 is mounted it shows that it has zero data on it BUT when old Disk 3 is mounted, new Disk 3 shows up "Unmountable: not mounted". When I unplugged old Disk 3 and rebooted, new Disk 3 is mounted again but with zero data

 

I'm confused, beacuse old Disk 3 should still have remaining data.

 

Should I start array with old or new Disk 3 unassigned?

Edited by paululibro
Link to comment
6 hours ago, trurl said:

Are you sure you didn't format?

It was a brand new empty disk. When you said it was not possible to rebuild data lost with xfs_repair I was at least hoping it would have the rest of data from old Disk 3.

 

Here are diagnostics with old Disk 3 (the one that should have data but shows empty) plugged in but unassigned and new Disk 3 not connected at all.

orion2-diagnostics-20211129-1104.zip

Link to comment

Also when I click on "File System Check" next to unassigned old Disk 3:

 

image.png.6fd121acb2d157f728e83ef799b9e0e2.png

 

I'm getting these logs:

 

Spoiler
FS: xfs

Executing file system check: /sbin/xfs_repair -n /dev/sdg1 2>&1

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x37fffffd0/0x1000
btree block 7/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 7/1
Metadata corruption detected at 0x43cc88, xfs_cntbt block 0x37fffffd8/0x1000
btree block 7/2 is suspect, error -117
bad magic # 0x49414233 in btcnt block 7/2
agf_freeblks 74458438, counted 0 in ag 7
agf_longest 74458438, counted 0 in ag 7
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x37fffffe0/0x1000
btree block 7/3 is suspect, error -74
bad magic # 0x58444233 in inobt block 7/3
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x80000000/0x1000
btree block 1/1 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0xfffffff8/0x1000
btree block 2/1 is suspect, error -74
bad magic # 0x64383a61 in btbno block 2/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x27fffffe0/0x1000Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x1ffffffe8/0x1000
btree block 4/1 is suspect, error -117

btree block 5/1 is suspect, error -74
bad magic # 0x205f5f5f in btbno block 5/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x2ffffffd8/0x1000
btree block 6/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 6/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x17ffffff0/0x1000
btree block 3/1 is suspect, error -74
bad magic # 0x51380c72 in btbno block 3/1
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x80000008/0x1000
btree block 1/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x100000000/0x1000
btree block 2/2 is suspect, error -74
bad magic # 0xe12ea68c in btcnt block 2/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x1fffffff0/0x1000
btree block 4/2 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x27fffffe8/0x1000
btree block 5/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x2ffffffe0/0x1000
btree block 6/2 is suspect, error -74
bad magic # 0x54686973 in btcnt block 6/2
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x17ffffff8/0x1000
btree block 3/2 is suspect, error -74
bad magic # 0x17b63fbe in btcnt block 3/2
agf_freeblks 268435445, counted 750 in ag 1
agf_freeblks 268435445, counted 0 in ag 2
agf_longest 268435445, counted 0 in ag 2
agf_longest 268435445, counted 5 in ag 1
agf_freeblks 268435445, counted 567 in ag 5
agf_longest 268435445, counted 4 in ag 5
agf_freeblks 267913717, counted 1320 in ag 4
agf_longest 267913717, counted 9 in ag 4
agf_freeblks 268435445, counted 0 in ag 6
agf_longest 268435445, counted 0 in ag 6
agf_freeblks 268435445, counted 0 in ag 3
agf_longest 268435445, counted 0 in ag 3
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x80000010/0x1000
btree block 1/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x100000008/0x1000
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x1fffffff8/0x1000btree block 2/3 is suspect, error -74
bad magic # 0x5829dbd5 in inobt block 2/3

btree block 4/3 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x27ffffff0/0x1000
btree block 5/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x180000000/0x1000
btree block 3/3 is suspect, error -74
bad magic # 0xd3a6ff15 in inobt block 3/3
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x2ffffffe8/0x1000
btree block 6/3 is suspect, error -74
bad magic # 0x546f7272 in inobt block 6/3
agi_count 0, counted 1728 in ag 1
agi_freecount 0, counted 191 in ag 1
agi_count 0, counted 2016 in ag 4
agi_freecount 0, counted 134 in ag 4
agi_count 0, counted 2336 in ag 5
agi_freecount 0, counted 102 in ag 5
sb_icount 64, counted 6144
sb_ifree 61, counted 488
sb_fdblocks 1952984849, counted 268438106
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
free space (1,2952230-2952262) only seen by one free space btree
free space (1,2952264-2952332) only seen by one free space btree
free space (1,2952334-2952418) only seen by one free space btree
[[[ IT GOES FOR THE NEXT ~1400 lines but with different values ]]]

- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 0
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
Maximum metadata LSN (1610016023:-369629521) is ahead of log (1:58).
Would format log to cycle 1610016026.
No modify flag set, skipping filesystem flush and exiting.

 

 

Edited by paululibro
Link to comment

I read multiple posts related to "Unmountable: not mounted" error and I can’t find anything similar to my issue. Like: 

- what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors?

- and why content of original Disk 3 wasn’t rebuilt to new Disk 3 even if parity was valid and finished with 0 errors?

Link to comment
4 minutes ago, paululibro said:

what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors

Yes

6 minutes ago, paululibro said:

- and why content of original Disk 3 wasn’t rebuilt to new Disk 3 even if parity was valid and finished with 0 errors?

Based upon the last set of diagnostics you posted, it really looks like at some point during all of this when starting the array and the disk came up as unmountable you hit the check box for format and acknowledged the pop up that stated it's never part of a rebuild operation

/dev/md3        7.3T   52G  7.3T   1% /mnt/disk3

So in a nutshell the system did what was asked and formatted the drive (or the emulated version) and subsequently rebuilt a blank filesystem.

 

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.

Link to comment
Quote

Are you sure you didn’t format?


I’m not 100% sure but I don’t think so. I connected new disk, assigned it to the Disk 3 and started array. I got a message that replacement disk was found, disk 3 is not ready and that Parity-Sync/Data-Rebuild is in progress. 

Edited by paululibro
Link to comment
1 hour ago, Squid said:

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.


So how do I proceed? Currently new Disk 3 is mounted and empty and original Disk 3 is unassigned and mounted to /dev/sdg. Checking file system gives the same logs as posted before. I just use xfs_repair but point it to /dev/sdg1?

Link to comment
Quote
Quote

what happened to the data on the original Disk 3? Is it still there but can’t be accessed due to fs errors

Yes

 

So, you're option right now is to fix the errors on the old disk 3 and then copy back into the array.

 

 

I ran xfs_repair:

Spoiler
root@Orion2:~# xfs_repair /dev/sdg1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x80000000/0x1000
btree block 1/1 is suspect, error -117
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x80000008/0x1000
btree block 1/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0xfffffff8/0x1000
btree block 2/1 is suspect, error -74
bad magic # 0x64383a61 in btbno block 2/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x17ffffff0/0x1000
btree block 3/1 is suspect, error -74
bad magic # 0x51380c72 in btbno block 3/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x37fffffd0/0x1000
btree block 7/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 7/1
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x27fffffe0/0x1000
Metadata CRC error detected at 0x43cfad, xfs_bnobt block 0x2ffffffd8/0x1000
btree block 6/1 is suspect, error -74
bad magic # 0xa202020 in btbno block 6/1
btree block 5/1 is suspect, error -74
Metadata corruption detected at 0x4536d0, xfs_bnobt block 0x1ffffffe8/0x1000
bad magic # 0x205f5f5f in btbno block 5/1
btree block 4/1 is suspect, error -117
agf_freeblks 268435445, counted 750 in ag 1
agf_longest 268435445, counted 5 in ag 1
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x100000000/0x1000
btree block 2/2 is suspect, error -74
bad magic # 0xe12ea68c in btcnt block 2/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x1fffffff0/0x1000
btree block 4/2 is suspect, error -117
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x17ffffff8/0x1000
btree block 3/2 is suspect, error -74
bad magic # 0x17b63fbe in btcnt block 3/2
Metadata CRC error detected at 0x43cfad, xfs_cntbt block 0x2ffffffe0/0x1000
btree block 6/2 is suspect, error -74
bad magic # 0x54686973 in btcnt block 6/2
Metadata corruption detected at 0x4536d0, xfs_cntbt block 0x27fffffe8/0x1000
btree block 5/2 is suspect, error -117
Metadata corruption detected at 0x43cc88, xfs_cntbt block 0x37fffffd8/0x1000
btree block 7/2 is suspect, error -117
bad magic # 0x49414233 in btcnt block 7/2
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x80000010/0x1000
btree block 1/3 is suspect, error -117
agf_freeblks 268435445, counted 0 in ag 2
agf_longest 268435445, counted 0 in ag 2
agf_freeblks 268435445, counted 0 in ag 3
agf_longest 268435445, counted 0 in ag 3
agf_freeblks 267913717, counted 1320 in ag 4
agf_longest 267913717, counted 9 in ag 4
agf_freeblks 268435445, counted 567 in ag 5
agf_longest 268435445, counted 4 in ag 5
agf_freeblks 268435445, counted 0 in ag 6
agf_longest 268435445, counted 0 in ag 6
agf_freeblks 74458438, counted 0 in ag 7
agf_longest 74458438, counted 0 in ag 7
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x100000008/0x1000
btree block 2/3 is suspect, error -74
bad magic # 0x5829dbd5 in inobt block 2/3
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x27ffffff0/0x1000
Metadata corruption detected at 0x4536d0, xfs_inobt block 0x1fffffff8/0x1000
btree block 4/3 is suspect, error -117
btree block 5/3 is suspect, error -117
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x180000000/0x1000
btree block 3/3 is suspect, error -74
bad magic # 0xd3a6ff15 in inobt block 3/3
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x37fffffe0/0x1000
Metadata CRC error detected at 0x46b78d, xfs_inobt block 0x2ffffffe8/0x1000
btree block 7/3 is suspect, error -74
btree block 6/3 is suspect, error -74
bad magic # 0x58444233 in inobt block 7/3
bad magic # 0x546f7272 in inobt block 6/3
agi_count 0, counted 1728 in ag 1
agi_freecount 0, counted 191 in ag 1
agi_count 0, counted 2016 in ag 4
agi_freecount 0, counted 134 in ag 4
agi_count 0, counted 2336 in ag 5
agi_freecount 0, counted 102 in ag 5
sb_icount 64, counted 6144
sb_ifree 61, counted 488
sb_fdblocks 1952984849, counted 268438106
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1610016023:-369629521) is ahead of log (1:64).
Format log to cycle 1610016026.
done

 

But the original drive still shows up empty:

image.thumb.png.362f85d673b8431340bfac31739d57da.png

 

Also when old disk is mounted, new disk is unmountable:

image.thumb.png.202f66db9856daa093da52b2da88c412.png

 

And now, there is also a "Format" button next to the old drive. If it's the one we talked about earlier then I'm 100% sure I didn't format:

image.thumb.png.9a92b6b98de3dfd43fccea80adae2e8c.png

 

Started array in the maintenance mode and checked both drives:

Original drive:

Spoiler
FS: xfs

Executing file system check: /sbin/xfs_repair -n /dev/sdg1 2>&1

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 6
- agno = 7
- agno = 4
- agno = 5
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

New drive:

Spoiler
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

So what do I do now?

orion2-diagnostics-20211130-1115.zip

Edited by paululibro
Link to comment
Nov 29 17:54:22 Orion2 kernel: XFS (sdg1): Filesystem has duplicate UUID e2cc6cf0-db52-44ea-8bef-9974b89a834f - can't mount
Nov 30 10:29:35 Orion2 kernel: XFS (md3): Filesystem has duplicate UUID e2cc6cf0-db52-44ea-8bef-9974b89a834f - can't mount

The unassigned disk and disk3 have the same uuid. You will have to change the uuid of the unassigned disk. Click on the Settings icon for the unassigned disk.

Link to comment

I'm trying to generate new UUID but I'm getting timeout error in syslog:

Nov 30 22:24:29 Orion2 unassigned.devices: Error: shell_exec(/usr/sbin/xfs_admin -U generate '/dev/sdg1') took longer than 20s!
Nov 30 22:24:29 Orion2 unassigned.devices: Changed partition UUID on '/dev/sdg1' with result: command timed out

 

After clicking "Change UUID" again I'm getting this:

Nov 30 22:24:47 Orion2 unassigned.devices: Changed partition UUID on '/dev/sdg1' with result: ERROR: cannot find log head/tail, run xfs_repair 

 

So I run xfs_repair with required -L flag and try to generate again but it's back to timeout error and head/tail error. I'm doing it in the maintenance mode.

orion2-diagnostics-20211130-2231.zip

Link to comment

I tried to generate it manually:

root@Orion2:~# xfs_admin -U generate /dev/sdg1
Clearing log and setting UUID
writing all SBs
new UUID = ab1f65dd-e188-4c67-afc6-9f89fc139e93

 

And now both disks are mounted but original disk still shows up as empty:

image.thumb.png.93bd278101f8823c6f04dfb79a069079.png

 

root@Orion2:/mnt/disks/WDC_WD80EZAZ-11TDBA0_2SG425ZW# df
Filesystem       1K-blocks        Used  Available Use% Mounted on
/dev/sdg1       7811939620    54499088 7757440532   1% /mnt/disks/WDC_WD80EZAZ-11TDBA0_2SG425ZW
/dev/md1        7811939620  7807708212    4231408 100% /mnt/disk1
/dev/md2        7811939620  7802174388    9765232 100% /mnt/disk2
/dev/md3        7811939620    54499088 7757440532   1% /mnt/disk3
/dev/md4        7811939620  7799822864   12116756 100% /mnt/disk4
/dev/md5        7811939620  7791383328   20556292 100% /mnt/disk5
/dev/sdc1        500107576   126153784  372420104  26% /mnt/cache

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...