Matthew Kent Posted July 2, 2020 Share Posted July 2, 2020 Hi, I've had a cascade of a bunch of things go sideways on my Unraid server. I'm freaking out a little because while trying to fix one problem, one of my encrypted XFS drives suddenly won't mount anymore. Unraid says it's unmountable. I tried to run a check on it, but it's no longer being seen as an XFS drive. While removing it and re-adding it to the array, I forgot to checkmark the box "Parity is already valid", and after I started everything up, the parity began auto checking/writing. Did I just loose my parity of my array?! I thought I should stop everything and check here before I take any further steps. I have a ton of important stuff on this drive that I don't currently have anywhere else. Quote Link to comment
Matthew Kent Posted July 3, 2020 Author Share Posted July 3, 2020 So from what I can tell I think the drive just needs an XFS repair done on it, but because it's encrypted I don't think I can do it via command line. When I try to access the xfs repair via the gui, the option is not present. The system is treating the drive as if it's a new disk ready to be formated. If anyone that knows this stuff sees this, please help, I'm desperately in need... I can't loose this data. Quote Link to comment
JorgeB Posted July 3, 2020 Share Posted July 3, 2020 3 hours ago, Matthew Kent said: but because it's encrypted I don't think I can do it via command line. You can but you need to specify the correct device, e.g.: xfs_repair -v /dev/mapper/mdX You can also use the GUI: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
Matthew Kent Posted July 3, 2020 Author Share Posted July 3, 2020 Thank you for the link, I was able to find the command cryptsetup to use on the command line to decrypt the drive, and then had to do a -L on the drive since the log data seemed to have been lost. As far as I can tell so far, there are no lost files, or at least a lost+found directory was not created. I think I'm going to retire this drive though. Replace it with something newer. Quote Link to comment
JorgeB Posted July 3, 2020 Share Posted July 3, 2020 File system corruption is usually not a device problem, though it can be, posting the diags might give some clues. Quote Link to comment
Matthew Kent Posted July 7, 2020 Author Share Posted July 7, 2020 Thanks, will take a look. Where would I find the diags that wold have this information? Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 (edited) *Sigh* it happened again, pretty sure on the same drive. Luckily this time I'm pretty sure my parity drive is in tact. Looking at this repair, does it look like my drive is going? Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... agf_freeblks 151784069, counted 151783569 in ag 2 agi_freecount 259, counted 222 in ag 2 agi_freecount 259, counted 222 in ag 2 finobt sb_ifree 1582, counted 1545 sb_fdblocks 453078759, counted 461719860 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 imap claims a free inode 4792090963 is in use, correcting imap and clearing inode cleared inode 4792090963 imap claims a free inode 4792090964 is in use, correcting imap and clearing inode cleared inode 4792090964 imap claims a free inode 4792090965 is in use, correcting imap and clearing inode cleared inode 4792090965 imap claims a free inode 4792090966 is in use, correcting imap and clearing inode cleared inode 4792090966 imap claims a free inode 4792090967 is in use, correcting imap and clearing inode cleared inode 4792090967 imap claims a free inode 4792090968 is in use, correcting imap and clearing inode cleared inode 4792090968 imap claims a free inode 4792090969 is in use, correcting imap and clearing inode cleared inode 4792090969 imap claims a free inode 4792090970 is in use, correcting imap and clearing inode cleared inode 4792090970 imap claims a free inode 4792090971 is in use, correcting imap and clearing inode cleared inode 4792090971 imap claims a free inode 4792090972 is in use, correcting imap and clearing inode cleared inode 4792090972 imap claims a free inode 4792090973 is in use, correcting imap and clearing inode cleared inode 4792090973 imap claims a free inode 4792090974 is in use, correcting imap and clearing inode cleared inode 4792090974 imap claims a free inode 4792090975 is in use, correcting imap and clearing inode cleared inode 4792090975 - agno = 3 Metadata CRC error detected at 0x4598a9, xfs_dir3_block block 0x1d3505428/0x1000 bad directory block magic # 0x36323800 in block 0 for directory inode 7840224254 corrupt block 0 in directory inode 7840224254 will junk block no . entry for directory 7840224254 no .. entry for directory 7840224254 problem with directory contents in inode 7840224254 cleared inode 7840224254 correcting imap - agno = 4 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 4 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 4792090976, moving to lost+found disconnected inode 4792090977, moving to lost+found disconnected inode 4792090978, moving to lost+found disconnected inode 4792090979, moving to lost+found disconnected inode 4792090980, moving to lost+found disconnected inode 4792090981, moving to lost+found disconnected inode 4792090982, moving to lost+found disconnected inode 4792090983, moving to lost+found disconnected inode 4792090984, moving to lost+found disconnected inode 4792090985, moving to lost+found disconnected inode 4792090986, moving to lost+found disconnected inode 4792090987, moving to lost+found disconnected inode 4792090988, moving to lost+found disconnected inode 4792090989, moving to lost+found disconnected inode 4792090990, moving to lost+found disconnected inode 4792090991, moving to lost+found disconnected inode 4792090992, moving to lost+found disconnected inode 4792090993, moving to lost+found disconnected inode 4792090994, moving to lost+found disconnected inode 4792090995, moving to lost+found disconnected inode 4792090996, moving to lost+found disconnected inode 4792090997, moving to lost+found disconnected inode 4792090998, moving to lost+found disconnected inode 4792090999, moving to lost+found disconnected inode 7840224255, moving to lost+found Phase 7 - verify and correct link counts... Maximum metadata LSN (4:4009) is ahead of log (3:32768). Format log to cycle 7. It's been on phase 7 for about 45 minutes now. Exporting my diagnostics data now Edited July 24, 2020 by Matthew Kent Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 nas-diagnostics-20200723-1844.zip Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 There are constant ATA errors on disk3, there's not even a full SMART report, check/replace cables and post new diags. Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 The XFS repair finished, but I went ahead and shutdown and checked over the cabling. I swapped the cable to the trouble drive. On reboot, the drive was missing, and then all of a sudden appeared. Anyways, included is my new diagnostic Thanks nas-diagnostics-20200723-2325.zip Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 Still many errors, if you replaced both cables (power + SATA) disk is likely dying, if you didn't do it now. Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 So the drive ended up failing completely on me tonight. Not long after, while checking on cabling, the boot USB w Unraid also failed on me *palm to face*... Soooo, I’ve gone ahead and ordered a replacement drive that should arrive by next week. Now that I’m starting with a new Unraid install, the option for marking the parity drive as valid shows up. I should check this when starting up yes? Aside from assigning the new drive to the slot of the old drive. Do I need to do anything in particular to get the recovery of the old drive going? Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 1 hour ago, Matthew Kent said: with a new Unraid install 1 hour ago, Matthew Kent said: Do I need to do anything in particular to get the recovery of the old drive going? New install won't permit rebuilding a failed drive (without going through the invalid slot procedure), you should use the old install and just do a standard disk replacement. Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 How do I use the old installation with a bad USB? Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 With a flash backup, if you don't have one you need to use the invalid config command, I need more info for the instructions, what Unraid version, single or dual parity and the disk# you want to rebuild. Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 I was runing 6.8.3 w/ the LinuxServer Nvidia addon in single parity. I'm trying to rebuild drive #3 Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 This will only work if parity was valid, aslo make sure you follow the instructions carefully, any doubt ask. -Assign all disks (including new disk3) and check all assignments, especially make sure parity is correctly assigned. -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 3 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk3 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Quote Link to comment
Matthew Kent Posted July 24, 2020 Author Share Posted July 24, 2020 Ok... I'll have to wait for the drive to arrive, will write back with my results when it does. Thank you so much for your assistance! Quote Link to comment
Matthew Kent Posted July 25, 2020 Author Share Posted July 25, 2020 (edited) k... I couldn't wait 2 weeks for a replacement drive. I went to costco and got a 2.5" esata drive to temporarily work in place of the dead drive. I followed your instructions and the status at the bottom of the screen says it's doing a data rebuild. I'm assuming at this point I could refresh my browser or leave the page yes? Thank you again for your help. There was almost 4TB of data on the drive, so this might take awhile. Will report back hopefully soon Edited July 25, 2020 by Matthew Kent Quote Link to comment
JorgeB Posted July 25, 2020 Share Posted July 25, 2020 4 hours ago, Matthew Kent said: I'm assuming at this point I could refresh my browser or leave the page yes? Yes, after array start you can change page or close it. Quote Link to comment
Matthew Kent Posted July 25, 2020 Author Share Posted July 25, 2020 The rebuild finished last night. The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair. It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something? Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check? Quote Link to comment
Matthew Kent Posted July 25, 2020 Author Share Posted July 25, 2020 (edited) Also, I don't know if it matters, but the original drive was encrypted, and the replacement drive has popped up showing no encryption. Did I need to set this before the rebuild? Edited July 25, 2020 by Matthew Kent Quote Link to comment
itimpi Posted July 25, 2020 Share Posted July 25, 2020 3 hours ago, Matthew Kent said: The rebuild finished last night. The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair. It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something? Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check? The rebuilt drive should be identical to what the previous one was. Have you tried starting the array in non-maintenance mode to see if the disk now mounts? Quote Link to comment
Matthew Kent Posted July 25, 2020 Author Share Posted July 25, 2020 I have, but it just shows it's an unmountable filesystem Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.