Did I just loose my parity/data?!


Recommended Posts

Hi,

 

I've had a cascade of a bunch of things go sideways on my Unraid server.

 

I'm freaking out a little because while trying to fix one problem, one of my encrypted XFS drives suddenly won't mount anymore. Unraid says it's unmountable. I tried to run a check on it, but it's no longer being seen as an XFS drive. While removing it and re-adding it to the array, I forgot to checkmark the box "Parity is already valid", and after I started everything up, the parity began auto checking/writing. Did I just loose my parity of my array?!

 

I thought I should stop everything and check here before I take any further steps. I have a ton of important stuff on this drive that I don't currently have anywhere else.

Link to comment

So from what I can tell I think the drive just needs an XFS repair done on it, but because it's encrypted I don't think I can do it via command line. When I try to access the xfs repair via the gui, the option is not present. The system is treating the drive as if it's a new disk ready to be formated. 

 

If anyone that knows this stuff sees this, please help, I'm desperately in need... I can't loose this data.

Link to comment

Thank you for the link, I was able to find the command cryptsetup to use on the command line to decrypt the drive, and then had to do a -L on the drive since the log data seemed to have been lost. As far as I can tell so far, there are no lost files, or at least a lost+found directory was not created. 

 

I think I'm going to retire this drive though. Replace it with something newer.

Link to comment
  • 3 weeks later...

*Sigh* it happened again, pretty sure on the same drive. Luckily this time I'm pretty sure my parity drive is in tact.

Looking at this repair, does it look like my drive is going?

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
agf_freeblks 151784069, counted 151783569 in ag 2
agi_freecount 259, counted 222 in ag 2
agi_freecount 259, counted 222 in ag 2 finobt
sb_ifree 1582, counted 1545
sb_fdblocks 453078759, counted 461719860
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
imap claims a free inode 4792090963 is in use, correcting imap and clearing inode
cleared inode 4792090963
imap claims a free inode 4792090964 is in use, correcting imap and clearing inode
cleared inode 4792090964
imap claims a free inode 4792090965 is in use, correcting imap and clearing inode
cleared inode 4792090965
imap claims a free inode 4792090966 is in use, correcting imap and clearing inode
cleared inode 4792090966
imap claims a free inode 4792090967 is in use, correcting imap and clearing inode
cleared inode 4792090967
imap claims a free inode 4792090968 is in use, correcting imap and clearing inode
cleared inode 4792090968
imap claims a free inode 4792090969 is in use, correcting imap and clearing inode
cleared inode 4792090969
imap claims a free inode 4792090970 is in use, correcting imap and clearing inode
cleared inode 4792090970
imap claims a free inode 4792090971 is in use, correcting imap and clearing inode
cleared inode 4792090971
imap claims a free inode 4792090972 is in use, correcting imap and clearing inode
cleared inode 4792090972
imap claims a free inode 4792090973 is in use, correcting imap and clearing inode
cleared inode 4792090973
imap claims a free inode 4792090974 is in use, correcting imap and clearing inode
cleared inode 4792090974
imap claims a free inode 4792090975 is in use, correcting imap and clearing inode
cleared inode 4792090975
        - agno = 3
Metadata CRC error detected at 0x4598a9, xfs_dir3_block block 0x1d3505428/0x1000
bad directory block magic # 0x36323800 in block 0 for directory inode 7840224254
corrupt block 0 in directory inode 7840224254
	will junk block
no . entry for directory 7840224254
no .. entry for directory 7840224254
problem with directory contents in inode 7840224254
cleared inode 7840224254
correcting imap
        - agno = 4
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 4
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 4792090976, moving to lost+found
disconnected inode 4792090977, moving to lost+found
disconnected inode 4792090978, moving to lost+found
disconnected inode 4792090979, moving to lost+found
disconnected inode 4792090980, moving to lost+found
disconnected inode 4792090981, moving to lost+found
disconnected inode 4792090982, moving to lost+found
disconnected inode 4792090983, moving to lost+found
disconnected inode 4792090984, moving to lost+found
disconnected inode 4792090985, moving to lost+found
disconnected inode 4792090986, moving to lost+found
disconnected inode 4792090987, moving to lost+found
disconnected inode 4792090988, moving to lost+found
disconnected inode 4792090989, moving to lost+found
disconnected inode 4792090990, moving to lost+found
disconnected inode 4792090991, moving to lost+found
disconnected inode 4792090992, moving to lost+found
disconnected inode 4792090993, moving to lost+found
disconnected inode 4792090994, moving to lost+found
disconnected inode 4792090995, moving to lost+found
disconnected inode 4792090996, moving to lost+found
disconnected inode 4792090997, moving to lost+found
disconnected inode 4792090998, moving to lost+found
disconnected inode 4792090999, moving to lost+found
disconnected inode 7840224255, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (4:4009) is ahead of log (3:32768).
Format log to cycle 7.

It's been on phase 7 for about 45 minutes now. Exporting my diagnostics data now

Edited by Matthew Kent
Link to comment

So the drive ended up failing completely on me tonight. Not long after, while checking on cabling, the boot USB w Unraid also failed on me *palm to face*...
 

Soooo, I’ve gone ahead and ordered a replacement drive that should arrive by next week. 
 

Now that I’m starting with a new Unraid install, the option for marking the parity drive as valid shows up. I should check this when starting up yes? Aside from assigning the new drive to the slot of the old drive. Do I need to do anything in particular to get the recovery of the old drive going?

Link to comment
1 hour ago, Matthew Kent said:

with a new Unraid install

 

1 hour ago, Matthew Kent said:

Do I need to do anything in particular to get the recovery of the old drive going?

New install won't permit rebuilding a failed drive (without going through the invalid slot procedure), you should use the old install and just do a standard disk replacement.

Link to comment

This will only work if parity was valid, aslo make sure you follow the instructions carefully, any doubt ask.

 

-Assign all disks (including new disk3) and check all assignments, especially make sure parity is correctly assigned.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 3 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk3 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

 

 

Link to comment

k... I couldn't wait 2 weeks for a replacement drive. I went to costco and got a 2.5" esata drive to temporarily work in place of the dead drive. I followed your instructions and the status at the bottom of the screen says it's doing a data rebuild. I'm assuming at this point I could refresh my browser or leave the page yes?

Thank you again for your help. There was almost 4TB of data on the drive, so this might take awhile. Will report back hopefully soon

Edited by Matthew Kent
Link to comment

The rebuild finished last night. 

 

The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair. 

It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something?

 

Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check?

Link to comment
3 hours ago, Matthew Kent said:

The rebuild finished last night. 

 

The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair. 

It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something?

 

Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check?

The rebuilt drive should be identical to what the previous one was.    Have you tried starting the array in non-maintenance mode to see if the disk now mounts?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.