Controller died; caused parity corruption+drive redball. How to trust data disk?


Recommended Posts

I got a notification last night that two of my array drives were missing.  I looked and noticed they're both on the same 2-port PCIe controller, so I figure maybe the controller wiggled loose or something.

 

I shut down, re-seated the controller, boot back up.  All the drives show up again, and of course unraid launches into a correcting parity check (argh! If only there were something like my feature request to have a safe-read-only-across-reboots option... unRAIDv5's "maintenance mode" does not appear to persist across reboots in a manner which would have satisfied this requirement)

 

I stop the correcting check as soon as I can, though it's already "fixed" 30+ "incorrect" parity locations.  I start a new non-correcting check, but both drives go missing about an hour into the check.

 

Diagnosis at this point: likely dead controller, and parity is probably corrupted.

 

 

I shut down, pull the bad controller, and replace it with another from the closet.  When I reboot, one of the drives on the controller is red-balled (sdb/disk5, a data drive), the other (the parity drive, FWIW) is green.

 

Status at this point: red-balled drive with likely-functioning drive and mostly-good data (I can mount it manually); "green" parity which is probably wrong due to the automatic correcting parity check; new drive controller that still isn't 100% trusted.

 

I have backups of everything "critical", but would rather not lose the rest of the data on the drive.

 

What I (think) I want to do

1) Run a couple non-correcting parity checks to make sure the new controller is solid

1a) this will generate a list of mismatching blocks

2) Force disk5 to be trusted.

3) Rebuild parity

4) If possible, use the blocks from (1a) and map block->filename on disk5.  I know this is possible on ext*, last time I researched I couldn't find a similar tool for reiser.  Manually check these files for integrity and/or restore from backups. 

 

Thoughts? Advice? How (in particular) can I accomplish step 2?

Link to comment

I found instructions on 'trust the array' for v5, which boil down to:

* Use the "new config" utility

* Set up the drives

* Run: "mdcmd set invalidslot 99"

* Click "Start" on the array management page without loading/reloading the page (aside: there was also a 'parity is correct' checkbox.  I checked that too)

 

I was on unRAID v5-RC8a, so had to upgrade to v5-RC11 before this would work  :o

 

So my array is back up/"trusted", and I'm running the stability parity checks mentioned in step#1.

 

I'd still be very interested in mapping block numbers to files if anyone has ideas how this can be accomplished.

Link to comment

I think I figured out a somewhat roundabout way to accomplish this using the FIBMAP ioctl; example code given here: http://lists.debian.org/debian-mips/2002/04/msg00059.html

 

This will generate a list of FS blocks for a given filename.  Do this for every file on the drive, save the results to a file.  Search that file for the block you're looking for.

 

Hopefully the 'blocks' returned by that ioctl correspond directly to the number returned when there's a parity mismatch.

 

Anyone know if parity mismatches are reported units of 4kB or 512B 'blocks'?  IIRC, I think it's 512B blocks

 

 

Generating my blocklist now, I'll post more detailed/streamlined instructions if this ends up working for me.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.