Tuur Posted August 15, 2021 Share Posted August 15, 2021 Hi guys While in the middle of attempting to backup my nas with Duplicati over USB (with Unassigned Devices) for the past week, one of the disks (xfs) in my array seems to have been corrupted (great timing 😟). After almost 2 years of no issues it seems I got my first parity error last night. When I rebooted the server this morning Disk 1 started showing up as "Device to be encrypted". (Tangent: strangely enough the same thing was happening to my backups after 8h/~1TB of writes. Suddenly the disk became unmountable independent of formatting.) (Tangent 2: could the parity correction have caused this?) I have tried to repair the disk with the included "Check Filesystem Status" tool: first I tried with, then without the -n parameter, then adding -L as the logs prompted me to do so. When this didn't work I got scared and cloned the device using a hdd dock. (Tangent 3: I probably should've done this before adding the -L option? Not sure as I cannot find a clear answer what this actually does.) Unfortunately I do not have another 3TB disk on hand (only the 4TB I was trying to backup to) which means I cannot attempt to repair the file system on the cloned disk as the array won't let me start with a bigger data than parity disk. When I try to remove the disk and attempt to access my data via "emulated mode" , Disk 2 also shows up with "Device to be encrypted" and I am unable to fill in the keyphrase. However this clears when I reinsert the corrupt Disk 1. However, all files from Disk 1 are still missing when I browse the mounted array. I've included 3 diagnostics zips: before the issue occured (downloaded this zip to debug the backup issue, that issue can be ignored for now) nas-diagnostics-20210814-0052.zip directly after the issue occured nas-diagnostics-20210815-1239.zip when I unplug the corrupt disk nas-diagnostics-20210815-1944.zip How can I go about fixing this? Do I spend another €80 to buy a 3TB disk that I have no further use for to attempt a repair that way? Can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why?) Perhaps I can fix and mount the cloned corrupted drive somehow and access everything that way? Quote Link to comment
trurl Posted August 15, 2021 Share Posted August 15, 2021 Should have asked before doing anything. 49 minutes ago, Tuur said: (Tangent 2: could the parity correction have caused this?) Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks. None of your attachments are working for some reason (I think I have seen reports that "drag and drop" isn't working at the moment). Attach them to your NEXT post in this thread and wait on further advice. 1 Quote Link to comment
Tuur Posted August 15, 2021 Author Share Posted August 15, 2021 Alright, thanks for taking a look! 🙂 I've uploaded them via the dialog window this time. nas-diagnostics-20210814-0052.zip nas-diagnostics-20210815-1239.zip nas-diagnostics-20210815-1944.zip Quote Link to comment
trurl Posted August 15, 2021 Share Posted August 15, 2021 Are you sure disk1 is encrypted? Now that you have removed it probably Unraid is going to want to rebuild it if you put it back in, so don't do that yet. Can you start the array and post new diagnostics? Quote Link to comment
Tuur Posted August 15, 2021 Author Share Posted August 15, 2021 Yes, I'm 100% certain both disks were encrypted. Since I started questioning it myself, I found some screenshots from early 2020 that show they were. Also something I forgot to mention in my first post: I swapped sata cables (same port on mobo) but that didn't change anything. nas-diagnostics-20210815-2220.zip Quote Link to comment
Tuur Posted August 15, 2021 Author Share Posted August 15, 2021 To make sure you get as good a picture as possible, I also ran the "Check Filesystem Status" tool again (with the no-modify flag) since I didn't see it in the diagnostics package: Phase 1 - find and verify superblock... sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 would reset superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 would reset superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 would reset superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... sb_icount 0, counted 32 sb_ifree 0, counted 29 sb_fdblocks 732208911, counted 732208907 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 2 - agno = 0 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
trurl Posted August 16, 2021 Share Posted August 16, 2021 Post a screenshot of Main - Array Devices. Quote Link to comment
Tuur Posted August 16, 2021 Author Share Posted August 16, 2021 Hi trurl Here you go, before starting the array: And after: Quote Link to comment
trurl Posted August 16, 2021 Share Posted August 16, 2021 Did you do that filesystem check from the webUI? I don't know how well that works on encrypted disks either, but it looks like it might do something if you really did the repair. I don't have any experience with encrypted disks so maybe someone else will chime in. Quote Link to comment
Tuur Posted August 16, 2021 Author Share Posted August 16, 2021 I did indeed do it from the webui. I already executed it without the no modify flag before I made this post. Doing so lessened the amount of errors, but now whenever I try I get the following output: Phase 1 - find and verify superblock... sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... sb_icount 0, counted 32 sb_ifree 0, counted 29 sb_fdblocks 732208911, counted 732208907 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... SB summary counter sanity check failed Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200 xfs_repair: Releasing dirty buffer to free list! xfs_repair: Refusing to write a corrupt buffer to the data device! xfs_repair: Lost a write to the data device! fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair. Quote Link to comment
Tuur Posted August 17, 2021 Author Share Posted August 17, 2021 Small update: I also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue. This might come of as ungrateful, but I'm a bit scared for my data. Are there other avenues I could take in case I don't get any replies (not sure how active the forum is for these kind of issues)? I can definitely wait a few more days, but knowing there are more options would ease my mind. For example: can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)? Quote Link to comment
trurl Posted August 17, 2021 Share Posted August 17, 2021 2 hours ago, Tuur said: also tried the same instruction on the clone I took, so xfs_repair isn't failing because of a hardware issue. If it ended with the same error message I am wondering if it might be a hardware issue. Post new diagnostics. 2 hours ago, Tuur said: scared for my data data on your other disk should be fine. Do you have backups of everything important and irreplaceable? 2 hours ago, Tuur said: can I somehow fix the corrupt disk with the parity disk (read in a few posts this isn't possible, but it's not clear to me why)? As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks. Typically parity will be in sync with all the array disks, which means it agrees with the contents of all the disks, including corrupt filesystems, and so rebuilding from parity would almost certainly produce the same result you have now. I will ping some others to see if they have any ideas. @JorgeB @itimpi @JonathanM 1 Quote Link to comment
Tuur Posted August 17, 2021 Author Share Posted August 17, 2021 10 minutes ago, trurl said: If it ended with the same error message I am wondering if it might be a hardware issue. To clarify: I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue. Though I guess the initial corruption might have indeed been caused by a hardware issue. 10 minutes ago, trurl said: As mentioned, parity contains none of your data. Parity, wherever it is used in computers, is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity disk allows the data for a missing disk to be calculated from all the other disks. Right, I should've realized that, thanks for clarifying. The parity check is likely what caused the issue as it reported (and auto corrected) an error a few hours before it started: Meaning the parity will only help to undo me running "xfs_repair -L" and not the initial issue 🤔 Let's hope someone can chime in then, as I ...drumroll... don't have backups for the data of this drive (was actually in the process of creating them). Quote Link to comment
trurl Posted August 17, 2021 Share Posted August 17, 2021 20 minutes ago, Tuur said: I did the second xfs_repair check on a second cloned drive (newly bought), mounted with UD and it produced the same result, so I believe it's not a hardware issue. If that new disk was connected to the same computer then I don't see how you can rule out hardware. 21 minutes ago, Tuur said: parity check is likely what caused the issue no On 8/15/2021 at 3:07 PM, trurl said: Parity has none of your data. Non-correcting parity check doesn't change any disk. Correcting parity check, or even parity rebuild, only writes parity, and will not affect any of your data disks. 22 minutes ago, Tuur said: parity will only help to undo me running "xfs_repair -L" If you did the repair on an array disk using the webUI, then it would have repaired the md device, which updates parity so it stays in sync with the repair. Repair on an Unassigned disk will not affect parity of course. Quote Link to comment
JorgeB Posted August 18, 2021 Share Posted August 18, 2021 On 8/16/2021 at 4:58 PM, Tuur said: xfs_repair: Lost a write to the data device! This suggests a hardware issue, please post diags after running xfs_repair. Quote Link to comment
Tuur Posted August 18, 2021 Author Share Posted August 18, 2021 Hi JorgeB, thanks for taking a look The output of xfs_repair (without a flag) was identical to the one in this post: Here's the diagnostic file, keep in mind that I also plugged in the clone of the broken disk (WDC_WD40EZAZ...). nas-diagnostics-20210818-0845.zip Quote Link to comment
JorgeB Posted August 18, 2021 Share Posted August 18, 2021 1 hour ago, JorgeB said: diags after running xfs_repair. Those appear to be just after a reboot, also post the new xfs_repair output. Quote Link to comment
Tuur Posted August 18, 2021 Author Share Posted August 18, 2021 Yes, I'm keeping my system turned off for now as I don't want to make things worse by mounting the array. The output was identical: Phase 1 - find and verify superblock... sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128 resetting superblock root inode pointer to 128 sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129 resetting superblock realtime bitmap inode pointer to 129 sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130 resetting superblock realtime summary inode pointer to 130 Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... sb_icount 0, counted 32 sb_ifree 0, counted 29 sb_fdblocks 732208911, counted 732208907 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... SB summary counter sanity check failed Metadata corruption detected at 0x47518b, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x200 xfs_repair: Releasing dirty buffer to free list! xfs_repair: Refusing to write a corrupt buffer to the data device! xfs_repair: Lost a write to the data device! fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair. Quote Link to comment
JorgeB Posted August 18, 2021 Share Posted August 18, 2021 1 hour ago, JorgeB said: diags after running xfs_repair. Quote Link to comment
Tuur Posted August 18, 2021 Author Share Posted August 18, 2021 nas-diagnostics-20210818-1228.zip Quote Link to comment
JorgeB Posted August 18, 2021 Share Posted August 18, 2021 There's nothing in the log that points to a hardware issue, xfs_repair should always finish (with more or less success), you can try again after updating to v6.10 since it includes newer xfsprogs, if that still fails you'd need to ask for help in the xfs mailing list, or restore from backups if available. Quote Link to comment
Tuur Posted August 18, 2021 Author Share Posted August 18, 2021 Alright, thanks for checking. I'll update this post if I find a solution. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.