Cannot mount XFS array - Device to be encrypted

Tuur · August 15, 2021

Hi guys

While in the middle of attempting to backup my nas with Duplicati over USB (with Unassigned Devices) for the past week, one of the disks (xfs) in my array seems to have been corrupted (great timing 😟). After almost 2 years of no issues it seems I got my first parity error last night. When I rebooted the server this morning Disk 1 started showing up as "Device to be encrypted".

(Tangent: strangely enough the same thing was happening to my backups after 8h/~1TB of writes. Suddenly the disk became unmountable independent of formatting.)

(Tangent 2: could the parity correction have caused this?)

I have tried to repair the disk with the included "Check Filesystem Status" tool: first I tried with, then without the -n parameter, then adding -L as the logs prompted me to do so. When this didn't work I got scared and cloned the device using a hdd dock.

(Tangent 3: I probably should've done this before adding the -L option? Not sure as I cannot find a clear answer what this actually does.)

Unfortunately I do not have another 3TB disk on hand (only the 4TB I was trying to backup to) which means I cannot attempt to repair the file system on the cloned disk as the array won't let me start with a bigger data than parity disk.

When I try to remove the disk and attempt to access my data via "emulated mode" , Disk 2 also shows up with "Device to be encrypted" and I am unable to fill in the keyphrase. However this clears when I reinsert the corrupt Disk 1. However, all files from Disk 1 are still missing when I browse the mounted array.

I've included 3 diagnostics zips:

before the issue occured (downloaded this zip to debug the backup issue, that issue can be ignored for now) nas-diagnostics-20210814-0052.zip
directly after the issue occured nas-diagnostics-20210815-1239.zip
when I unplug the corrupt disk nas-diagnostics-20210815-1944.zip

How can I go about fixing this?

Do I spend another €80 to buy a 3TB disk that I have no further use for to attempt a repair that way?
Can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why?)
Perhaps I can fix and mount the cloned corrupted drive somehow and access everything that way?

trurl · August 15, 2021

Should have asked before doing anything.

49 minutes ago, Tuur said:

(Tangent 2: could the parity correction have caused this?)

Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks.

None of your attachments are working for some reason (I think I have seen reports that "drag and drop" isn't working at the moment).

Attach them to your NEXT post in this thread and wait on further advice.

Tuur · August 15, 2021

Alright, thanks for taking a look! 🙂

I've uploaded them via the dialog window this time.

nas-diagnostics-20210814-0052.zip nas-diagnostics-20210815-1239.zip nas-diagnostics-20210815-1944.zip

trurl · August 15, 2021

Are you sure disk1 is encrypted?

Now that you have removed it probably Unraid is going to want to rebuild it if you put it back in, so don't do that yet.

Can you start the array and post new diagnostics?

Tuur · August 15, 2021

Yes, I'm 100% certain both disks were encrypted. Since I started questioning it myself, I found some screenshots from early 2020 that show they were.
Also something I forgot to mention in my first post: I swapped sata cables (same port on mobo) but that didn't change anything.

nas-diagnostics-20210815-2220.zip

Tuur · August 15, 2021

To make sure you get as good a picture as possible, I also ran the "Check Filesystem Status" tool again (with the no-modify flag) since I didn't see it in the diagnostics package:

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
would reset superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
would reset superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
would reset superblock realtime summary inode pointer to 130
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 2
- agno = 0
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

trurl · August 16, 2021

Post a screenshot of Main - Array Devices.

Tuur · August 16, 2021

Hi trurl

Here you go, before starting the array:

And after:

trurl · August 16, 2021

Did you do that filesystem check from the webUI? I don't know how well that works on encrypted disks either, but it looks like it might do something if you really did the repair.

I don't have any experience with encrypted disks so maybe someone else will chime in.

Tuur · August 16, 2021

I did indeed do it from the webui. I already executed it without the no modify flag before I made this post. Doing so lessened the amount of errors, but now whenever I try I get the following output:

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.

Tuur · August 17, 2021

Small update: I also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue.

This might come of as ungrateful, but I'm a bit scared for my data.

Are there other avenues I could take in case I don't get any replies (not sure how active the forum is for these kind of issues)? I can definitely wait a few more days, but knowing there are more options would ease my mind.

For example: can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)?

trurl · August 17, 2021

2 hours ago, Tuur said:

also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue.

If it ended with the same error message I am wondering if it might be a hardware issue.

Post new diagnostics.

2 hours ago, Tuur said:

scared for my data

data on your other disk should be fine. Do you have backups of everything important and irreplaceable?

2 hours ago, Tuur said:

can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)?

As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks.

Typically parity will be in sync with all the array disks, which means it agrees with the contents of all the disks, including corrupt filesystems, and so rebuilding from parity would almost certainly produce the same result you have now.

I will ping some others to see if they have any ideas.

@JorgeB @itimpi @JonathanM

Tuur · August 17, 2021

10 minutes ago, trurl said:

If it ended with the same error message I am wondering if it might be a hardware issue.

To clarify: I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue. Though I guess the initial corruption might have indeed been caused by a hardware issue.

10 minutes ago, trurl said:

As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks.

Right, I should've realized that, thanks for clarifying. The parity check is likely what caused the issue as it reported (and auto corrected) an error a few hours before it started:

image.png.e4cb0397b90a394039b513d198524c7d.png

Meaning the parity will only help to undo me running "xfs_repair -L" and not the initial issue 🤔

Let's hope someone can chime in then, as I ...drumroll... don't have backups for the data of this drive (was actually in the process of creating them).

trurl · August 17, 2021

20 minutes ago, Tuur said:

I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue.

If that new disk was connected to the same computer then I don't see how you can rule out hardware.

21 minutes ago, Tuur said:

parity check is likely what caused the issue

no

On 8/15/2021 at 3:07 PM, trurl said:

Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks.

22 minutes ago, Tuur said:

parity will only help to undo me running "xfs_repair -L"

If you did the repair on an array disk using the webUI, then it would have repaired the md device, which updates parity so it stays in sync with the repair. Repair on an Unassigned disk will not affect parity of course.

JorgeB · August 18, 2021

On 8/16/2021 at 4:58 PM, Tuur said:
xfs_repair: Lost a write to the data device!

This suggests a hardware issue, please post diags after running xfs_repair.

Tuur · August 18, 2021

Hi JorgeB, thanks for taking a look

The output of xfs_repair (without a flag) was identical to the one in this post:

Here's the diagnostic file, keep in mind that I also plugged in the clone of the broken disk (WDC_WD40EZAZ...).

nas-diagnostics-20210818-0845.zip

JorgeB · August 18, 2021

1 hour ago, JorgeB said:

diags after running xfs_repair.

Those appear to be just after a reboot, also post the new xfs_repair output.

Tuur · August 18, 2021

Yes, I'm keeping my system turned off for now as I don't want to make things worse by mounting the array.

The output was identical:

Phase 1 - find and verify superblock...
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
sb_icount 0, counted 32
sb_ifree 0, counted 29
sb_fdblocks 732208911, counted 732208907
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 2
- agno = 1
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.

JorgeB · August 18, 2021

1 hour ago, JorgeB said:

diags after running xfs_repair.

Tuur · August 18, 2021

nas-diagnostics-20210818-1228.zip

JorgeB · August 18, 2021

There's nothing in the log that points to a hardware issue, xfs_repair should always finish (with more or less success), you can try again after updating to v6.10 since it includes newer xfsprogs, if that still fails you'd need to ask for help in the xfs mailing list, or restore from backups if available.

Tuur · August 18, 2021

Alright, thanks for checking. I'll update this post if I find a solution.

Cannot mount XFS array - Device to be encrypted

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation