Did I just loose my parity/data?!

Matthew Kent · July 2, 2020

Hi,

I've had a cascade of a bunch of things go sideways on my Unraid server.

I'm freaking out a little because while trying to fix one problem, one of my encrypted XFS drives suddenly won't mount anymore. Unraid says it's unmountable. I tried to run a check on it, but it's no longer being seen as an XFS drive. While removing it and re-adding it to the array, I forgot to checkmark the box "Parity is already valid", and after I started everything up, the parity began auto checking/writing. Did I just loose my parity of my array?!

I thought I should stop everything and check here before I take any further steps. I have a ton of important stuff on this drive that I don't currently have anywhere else.

Matthew Kent · July 3, 2020

So from what I can tell I think the drive just needs an XFS repair done on it, but because it's encrypted I don't think I can do it via command line. When I try to access the xfs repair via the gui, the option is not present. The system is treating the drive as if it's a new disk ready to be formated.

If anyone that knows this stuff sees this, please help, I'm desperately in need... I can't loose this data.

JorgeB · July 3, 2020

3 hours ago, Matthew Kent said:

but because it's encrypted I don't think I can do it via command line.

You can but you need to specify the correct device, e.g.:

xfs_repair -v /dev/mapper/mdX

You can also use the GUI:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Matthew Kent · July 3, 2020

Thank you for the link, I was able to find the command cryptsetup to use on the command line to decrypt the drive, and then had to do a -L on the drive since the log data seemed to have been lost. As far as I can tell so far, there are no lost files, or at least a lost+found directory was not created.

I think I'm going to retire this drive though. Replace it with something newer.

JorgeB · July 3, 2020

File system corruption is usually not a device problem, though it can be, posting the diags might give some clues.

Matthew Kent · July 7, 2020

Thanks, will take a look. Where would I find the diags that wold have this information?

JorgeB · July 8, 2020

Tools -> Diagnostics

Matthew Kent · July 24, 2020

*Sigh* it happened again, pretty sure on the same drive. Luckily this time I'm pretty sure my parity drive is in tact.

Looking at this repair, does it look like my drive is going?

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
agf_freeblks 151784069, counted 151783569 in ag 2
agi_freecount 259, counted 222 in ag 2
agi_freecount 259, counted 222 in ag 2 finobt
sb_ifree 1582, counted 1545
sb_fdblocks 453078759, counted 461719860
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
imap claims a free inode 4792090963 is in use, correcting imap and clearing inode
cleared inode 4792090963
imap claims a free inode 4792090964 is in use, correcting imap and clearing inode
cleared inode 4792090964
imap claims a free inode 4792090965 is in use, correcting imap and clearing inode
cleared inode 4792090965
imap claims a free inode 4792090966 is in use, correcting imap and clearing inode
cleared inode 4792090966
imap claims a free inode 4792090967 is in use, correcting imap and clearing inode
cleared inode 4792090967
imap claims a free inode 4792090968 is in use, correcting imap and clearing inode
cleared inode 4792090968
imap claims a free inode 4792090969 is in use, correcting imap and clearing inode
cleared inode 4792090969
imap claims a free inode 4792090970 is in use, correcting imap and clearing inode
cleared inode 4792090970
imap claims a free inode 4792090971 is in use, correcting imap and clearing inode
cleared inode 4792090971
imap claims a free inode 4792090972 is in use, correcting imap and clearing inode
cleared inode 4792090972
imap claims a free inode 4792090973 is in use, correcting imap and clearing inode
cleared inode 4792090973
imap claims a free inode 4792090974 is in use, correcting imap and clearing inode
cleared inode 4792090974
imap claims a free inode 4792090975 is in use, correcting imap and clearing inode
cleared inode 4792090975
        - agno = 3
Metadata CRC error detected at 0x4598a9, xfs_dir3_block block 0x1d3505428/0x1000
bad directory block magic # 0x36323800 in block 0 for directory inode 7840224254
corrupt block 0 in directory inode 7840224254
	will junk block
no . entry for directory 7840224254
no .. entry for directory 7840224254
problem with directory contents in inode 7840224254
cleared inode 7840224254
correcting imap
        - agno = 4
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 4
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 4792090976, moving to lost+found
disconnected inode 4792090977, moving to lost+found
disconnected inode 4792090978, moving to lost+found
disconnected inode 4792090979, moving to lost+found
disconnected inode 4792090980, moving to lost+found
disconnected inode 4792090981, moving to lost+found
disconnected inode 4792090982, moving to lost+found
disconnected inode 4792090983, moving to lost+found
disconnected inode 4792090984, moving to lost+found
disconnected inode 4792090985, moving to lost+found
disconnected inode 4792090986, moving to lost+found
disconnected inode 4792090987, moving to lost+found
disconnected inode 4792090988, moving to lost+found
disconnected inode 4792090989, moving to lost+found
disconnected inode 4792090990, moving to lost+found
disconnected inode 4792090991, moving to lost+found
disconnected inode 4792090992, moving to lost+found
disconnected inode 4792090993, moving to lost+found
disconnected inode 4792090994, moving to lost+found
disconnected inode 4792090995, moving to lost+found
disconnected inode 4792090996, moving to lost+found
disconnected inode 4792090997, moving to lost+found
disconnected inode 4792090998, moving to lost+found
disconnected inode 4792090999, moving to lost+found
disconnected inode 7840224255, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (4:4009) is ahead of log (3:32768).
Format log to cycle 7.

It's been on phase 7 for about 45 minutes now. Exporting my diagnostics data now

Edited July 24, 2020 by Matthew Kent

Matthew Kent · July 24, 2020

nas-diagnostics-20200723-1844.zip

JorgeB · July 24, 2020

There are constant ATA errors on disk3, there's not even a full SMART report, check/replace cables and post new diags.

Matthew Kent · July 24, 2020

The XFS repair finished, but I went ahead and shutdown and checked over the cabling. I swapped the cable to the trouble drive. On reboot, the drive was missing, and then all of a sudden appeared. Anyways, included is my new diagnostic

Thanks

nas-diagnostics-20200723-2325.zip

JorgeB · July 24, 2020

Still many errors, if you replaced both cables (power + SATA) disk is likely dying, if you didn't do it now.

Matthew Kent · July 24, 2020

So the drive ended up failing completely on me tonight. Not long after, while checking on cabling, the boot USB w Unraid also failed on me *palm to face*...

Soooo, I’ve gone ahead and ordered a replacement drive that should arrive by next week.

Now that I’m starting with a new Unraid install, the option for marking the parity drive as valid shows up. I should check this when starting up yes? Aside from assigning the new drive to the slot of the old drive. Do I need to do anything in particular to get the recovery of the old drive going?

JorgeB · July 24, 2020

1 hour ago, Matthew Kent said:

with a new Unraid install

1 hour ago, Matthew Kent said:

Do I need to do anything in particular to get the recovery of the old drive going?

New install won't permit rebuilding a failed drive (without going through the invalid slot procedure), you should use the old install and just do a standard disk replacement.

Matthew Kent · July 24, 2020

How do I use the old installation with a bad USB?

JorgeB · July 24, 2020

With a flash backup, if you don't have one you need to use the invalid config command, I need more info for the instructions, what Unraid version, single or dual parity and the disk# you want to rebuild.

Matthew Kent · July 24, 2020

I was runing 6.8.3 w/ the LinuxServer Nvidia addon in single parity. I'm trying to rebuild drive #3

JorgeB · July 24, 2020

This will only work if parity was valid, aslo make sure you follow the instructions carefully, any doubt ask.

-Assign all disks (including new disk3) and check all assignments, especially make sure parity is correctly assigned.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 3 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk3 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

Matthew Kent · July 24, 2020

Ok... I'll have to wait for the drive to arrive, will write back with my results when it does. Thank you so much for your assistance!

Matthew Kent · July 25, 2020

k... I couldn't wait 2 weeks for a replacement drive. I went to costco and got a 2.5" esata drive to temporarily work in place of the dead drive. I followed your instructions and the status at the bottom of the screen says it's doing a data rebuild. I'm assuming at this point I could refresh my browser or leave the page yes?

Thank you again for your help. There was almost 4TB of data on the drive, so this might take awhile. Will report back hopefully soon

Edited July 25, 2020 by Matthew Kent

JorgeB · July 25, 2020

4 hours ago, Matthew Kent said:

I'm assuming at this point I could refresh my browser or leave the page yes?

Yes, after array start you can change page or close it.

Matthew Kent · July 25, 2020

The rebuild finished last night.

The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair.

It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something?

Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check?

Matthew Kent · July 25, 2020

Also, I don't know if it matters, but the original drive was encrypted, and the replacement drive has popped up showing no encryption. Did I need to set this before the rebuild?

Edited July 25, 2020 by Matthew Kent

itimpi · July 25, 2020

3 hours ago, Matthew Kent said:

The rebuild finished last night.

The volume didn't mount so I went ahead and stopped the array and started it in maintenance mode to see if I could do an xfs_repair.

It indicated I needed to rebuild the log data, so I ran xfs_repair -L. It went through the repair fairly quickly. On stopping and starting the array (in maintenance), it still says it's not mountable. Am I missing something?

Also, I think my last parity check was last week Wednesday. How does the rebuild work if there's been changed data to the array since the time of the last check?

The rebuilt drive should be identical to what the previous one was. Have you tried starting the array in non-maintenance mode to see if the disk now mounts?

Matthew Kent · July 25, 2020

I have, but it just shows it's an unmountable filesystem

Did I just loose my parity/data?!

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation