Jump to content

Cannot mount XFS array - Device to be encrypted


Tuur

Recommended Posts

Hi guys

 

While in the middle of attempting to backup my nas with Duplicati over USB (with Unassigned Devices) for the past week, one of the disks (xfs) in my array seems to have been corrupted (great timing 😟). After almost 2 years of no issues it seems I got my first parity error last night. When I rebooted the server this morning Disk 1 started showing up as "Device to be encrypted".

(Tangent: strangely enough the same thing was happening to my backups after 8h/~1TB of writes. Suddenly the disk became unmountable independent of formatting.)

(Tangent 2: could the parity correction have caused this?)

 

I have tried to repair the disk with the included "Check Filesystem Status" tool: first I tried with, then without the -n parameter, then adding -L as the logs prompted me to do so. When this didn't work I got scared and cloned the device using a hdd dock.

(Tangent 3: I probably should've done this before adding the -L option? Not sure as I cannot find a clear answer what this actually does.)

 

Unfortunately I do not have another 3TB disk on hand (only the 4TB I was trying to backup to) which means I cannot attempt to repair the file system on the cloned disk as the array won't let me start with a bigger data than parity disk.

When I try to remove the disk and attempt to access my data via "emulated mode" , Disk 2 also shows up with "Device to be encrypted" and I am unable to fill in the keyphrase. However this clears when I reinsert the corrupt Disk 1. However, all files from Disk 1 are still missing when I browse the mounted array.

 

I've included 3 diagnostics zips:

  1. before the issue occured (downloaded this zip to debug the backup issue, that issue can be ignored for now) nas-diagnostics-20210814-0052.zip
  2. directly after the issue occured nas-diagnostics-20210815-1239.zip
  3. when I unplug the corrupt disk nas-diagnostics-20210815-1944.zip

 

How can I go about fixing this?

  • Do I spend another €80 to buy a 3TB disk that I have no further use for to attempt a repair that way?
  • Can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why?)
  • Perhaps I can fix and mount the cloned corrupted drive somehow and access everything that way?

 

Link to comment

Should have asked before doing anything.

 

49 minutes ago, Tuur said:

(Tangent 2: could the parity correction have caused this?)

Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks.

 

None of your attachments are working for some reason (I think I have seen reports that "drag and drop" isn't working at the moment).

 

Attach them to your NEXT post in this thread and wait on further advice.

 

 

  • Thanks 1
Link to comment

To make sure you get as good a picture as possible, I also ran the "Check Filesystem Status" tool again (with the no-modify flag) since I didn't see it in the diagnostics package:
 

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
would reset superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
would reset superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
would reset superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 0
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

Link to comment

I did indeed do it from the webui. I already executed it without the no modify flag before I made this post. Doing so lessened the amount of errors, but now whenever I try I get the following output:

 

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

 

Link to comment

Small update: I also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue.

This might come of as ungrateful, but I'm a bit scared for my data.

 

Are there other avenues I could take in case I don't get any replies (not sure how active the forum is for these kind of issues)? I can definitely wait a few more days, but knowing there are more options would ease my mind.

For example: can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)?

Link to comment
2 hours ago, Tuur said:

also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue.

If it ended with the same error message I am wondering if it might be a hardware issue.

 

Post new diagnostics.

 

2 hours ago, Tuur said:

scared for my data

data on your other disk should be fine. Do you have backups of everything important and irreplaceable?

 

2 hours ago, Tuur said:

can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)?

As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks.

 

Typically parity will be in sync with all the array disks, which means it agrees with the contents of all the disks, including corrupt filesystems, and so rebuilding from parity would almost certainly produce the same result you have now.

 

I will ping some others to see if they have any ideas.

 

@JorgeB @itimpi @JonathanM

  • Thanks 1
Link to comment
10 minutes ago, trurl said:

If it ended with the same error message I am wondering if it might be a hardware issue.

 

To clarify: I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue. Though I guess the initial corruption might have indeed been caused by a hardware issue.

 

10 minutes ago, trurl said:

As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks.

 

 

Right, I should've realized that, thanks for clarifying. The parity check is likely what caused the issue as it reported (and auto corrected) an error a few hours before it started:

image.png.e4cb0397b90a394039b513d198524c7d.png

Meaning the parity will only help to undo me running "xfs_repair -L" and not the initial issue 🤔

Let's hope someone can chime in then, as I ...drumroll... don't have backups for the data of this drive (was actually in the process of creating them).

 

Link to comment
20 minutes ago, Tuur said:

I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue.

If that new disk was connected to the same computer then I don't see how you can rule out hardware.

 

21 minutes ago, Tuur said:

parity check is likely what caused the issue

no

On 8/15/2021 at 3:07 PM, trurl said:

Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks.

 

22 minutes ago, Tuur said:

parity will only help to undo me running "xfs_repair -L"

If you did the repair on an array disk using the webUI, then it would have repaired the md device, which updates parity so it stays in sync with the repair. Repair on an Unassigned disk will not affect parity of course.

Link to comment

Yes, I'm keeping my system turned off for now as I don't want to make things worse by mounting the array.

The output was identical:

 

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

 

Link to comment

There's nothing in the log that points to a hardware issue, xfs_repair should always finish (with more or less success), you can try again after updating to v6.10 since it includes newer xfsprogs, if that still fails you'd need to ask for help in the xfs mailing list, or restore from backups if available.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...