[SOLVED] Disk disappeared, then reappeared empty. How I recovered my data (XFS)


noel

Recommended Posts

Sharing a recent experience that might help others.

The tl;dr

- Don't panic. Unless your drive is clickety-clunking and you don't have offline/offsite backups of the important stuff.

- Don't use 8yr old drives.

- Spend the couple of minutes it takes to configure notifications

- Knowing how to use dd or ddrescue, mount and xfs_repair can save the day.

 

Had an old (8yr) 750gb drive disappear from its slot in the array. I power cycled, still not there but the disk appeared in unassigned devices. I mounted it from there to check its contents. Empty. SMART report showed a bunch of bad sectors. :(

I unmounted it, stopped the array and tried to re-add the drive. The drive was allocated to the slot but unRAID couldn't mount it and wanted to format it, stating doing so would erase the contents. I noticed that some files from recent work had disappeared from the array, so I knew unraid wasn't emulating the contents of the failed disk. I can only suspect the disk in its moments somehow lost its contents and unRAID dutifully updated the parity drive to reflect the 'deleted' files.

 

I have offline/offsite backups of the critical stuff so I knew I'd only lost data I'd have to re-obtain from other sources, however I didn't have a firm understanding 'what' had been lost. I set about getting the data back.

 

I replaced the 750gb drive with a new 3TB dive. Once it was running I created a dd image file of the failed disk to my new 3TB disk, as it had enough free space for a 750gb file.

dd if=/dev/sdm of=/mnt/disk2/drive.img

I left this copy overnight. Lucky for me dd worked fine, ddrescue might give better results to do the same job if you have it on your system.

I tried mounting the image file:

mkdir /mnt/loop
mount -o ro,loop,offset=32256 harddrive.img /mnt/loop

But mount failed, saying the structure needed cleaning first.

So then I made a loop device:

losetup --offset 32256 /dev/loop2 harddrive.img

(thanks to https://major.io/2010/12/14/mounting-a-raw-partition-file-made-with-dd-or-dd_rescue-in-linux/)

Tried to repair this

xfs_repair /dev/loop2

This failed, saying there was data in the log file that should be written to disk first by mounting the drive, if unable to mount the drive to try the -L option to purge the log.

xfs_repair -L /dev/loop2

Log purged, XFS repair process completed, some errors but it did complete.

Finally I could mount this image

mount /dev/loop2 /mnt/loop

The good news - the bulk of my data was there :) Almost my entire directory structure was back, with a bunch of random things in 'lost+found' I can sort through on a rainy day.

 

The above procedure could (and probably should) be done on a separate system with a linux boot USB,  I unfortunately didn't have one available with enough space so had to utilize my unRAID box.

 

I've now correctly configured notifications to alert me of disk warnings, and will be proactively replacing old (>5yr) disks. Lesson learned.

My experience may be a testimony to the caveat of using inexpensive (desktop) disks, particularly old ones. But hey that's why there's an I in unRAID ;)

 

 

Link to comment

Great report!  I'm afraid much of it is going to be beyond the capabilities of most unRAID users, but not all of us, so thank you.

 

One nice thing to see is how xfs_repair is finally becoming useful!  Prior to the recent versions, it had not inspired much confidence, at all.  With its upgrades, included in unRAID 6.2 and up, there are now a number of very successful reports.  Good to see, shores up the one weakness of XFS we had seen.  From your report, it sounds like it may now be as good as reiserfsck.

Link to comment

Questions from one who wants to understand how things work. 

 

First, why was it necessary to create the image file of the 'bad' disk?

 

Second, why not do the xfs_repair directly on the hard disk with the problem?  I can see real issues/problems with having enough free space to be able to create an image file for large hard drives. 

Link to comment

Questions from one who wants to understand how things work. 

 

First, why was it necessary to create the image file of the 'bad' disk?

 

Second, why not do the xfs_repair directly on the hard disk with the problem?  I can see real issues/problems with having enough free space to be able to create an image file for large hard drives.

Making an image file is not necessary, but it's good if you can do it. Rule of thumb for professional recovery, first do no more harm. Meaning, DON'T ALTER THE SOURCE. If the repair on the original drive doesn't work, you have seriously messed up your chances of using other, possibly more intensive and expensive tools to get your customer's data back. If you make an image, then you can copy the image and try all sorts of different recovery tools and techniques without digging yourself into a hole.

 

If you don't particularly care about how much you recover, then by all means, recover in place and hope your first shot is successful. If it's not your data, or it's valuable data with no backup, then follow best practices to give yourself and possibly other professionals the best chance at recovery.

Link to comment

THANK YOU!  I thought that might be the reason but I wasn't sure.  And I do agree that it is better to work on a copy rather than the original when one is not sure of success.

 

So basically, using the xfs_repair would have done the same operations to the disk and restored the disk so it could have been put back into the array so that the files and file structure that were fully restored could then have been copied off.  The 'lost-and-found' stuff is exactly that and the user can decide if it is worth the effort to see if anything useful is recoverable. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.