New Configuration from Existing Parity and 4 of 5 data HDDs?


Recommended Posts

What a mess. I have learned a lot. But still have some questions.

 

I have been running UNRAID for about 4 years now and I run it strictly as a data storage solution, so no Plex or other app-type things installed. I have a general understanding of how the server functions and have swapped out failed drives and upgraded drives a number of times, but I do nothing outside of the GUI. I run an 8tb parity drive and 5 data drives of around 2-4tb each.

 

Recently my parity drive began acting up. It was starting to show errors. Since all of my drives showed healthy and no errors, the first thing I did was try to rebuild parity. In retrospect, maybe not the best solution? But it would not complete a new parity rebuild. It kept getting about 30% of the way through and hanging. So I made the assumption that the parity drive was bad... and maybe that wasn't the case. Regardless, I ordered a new parity drive, popped it in and began to build parity with all 5 of my data drives functioning perfectly. And I packaged up my old parity drive to be shipped back to the vendor as it is still under warranty. I was excited to wake up and check on the parity build after day 2 of the build process only to find that one of my disks had millions of read errors and parity had not completed. I tried pulling the bad drive and swapping in a new one just to see what would happen and found that I had none of the data from the failed data drive.

 

So I focused on trying to pull data directly from the failed/failing drive. I was able to pull quite a bit from it, although not some very important files. I estimate about 40-50% data loss. Believing I had no accurate parity drive AND a failed drive I thought I had no choice but to do a new configuration with the good data drives, build new parity, and then add in a new drive and place what files I was able to recover. So I checked the box and did a new configuration with my 4 good data drives. Parity is almost done building (which will give me a huge sense of relief).

 

But it occurred to me that I never pulled the old parity drive (ready to go to the manufacturer for return) from the shipping box and try it in the old array before nuking that configuration. That leads to my question... is there any way I can use my 4 good data drives and the old parity drive, which may or may not be complete and may or may not work, to try to recover the contents of the failed data drive?

 

Please let me know if I'm leaving out any need info, and thank you to anyone who takes the time to read and weigh in on this.

Link to comment
30 minutes ago, [email protected] said:

is there any way I can use my 4 good data drives and the old parity drive, which may or may not be complete and may or may not work, to try to recover the contents of the failed data drive?

Depends on whether there were any writes to those drives. I'm not just talking about you intentionally adding data, ANY writes, including deletions or metadata from windows reading a folder, anything that changed on ANY of the drives in question will effect the emulated drive.

 

Although, it really can't hurt anything to try it and see at this stage. I just wouldn't expect a miraculous recovery. If at any point after the first failure you wrote to those drives, it's probably a lost cause.

Link to comment

Thank you, JonathanM. There may actually be a chance that nothing was written to any of those drives. So I may see if I can give it a go if there is a way to do it. What I don't understand is how do I use the old parity drive and working 4 data drives now that I have opted to do a New Configuration for my UnRaid array? Can you shed some light on what steps I may need to take?

Link to comment

Here is a link to a thread from February of this year, it's a more complex situation, but it gives an overview into how, what, and where.

If you are technically savvy, you may be able to infer from that thread the exact commands needed for you, but if not, I'd wait for more input into your specific situation.

Link to comment

I can wait I think :)

 

Although now I am dealing with an issue with my new config.

 

Transferring files go from right on par with what I'm used to, to snails pace, to totally stopped and network connection lost. And the GUI is sometimes perfectly responsive, and will sometimes spin for minutes and minutes and minutes. Maybe I should start a new thread, but can someone shed some light on what I may be facing there?

 

Thank you again

Link to comment

Did you remove ALL the drives that you expect to be able to recover data from?

On 7/30/2018 at 3:30 PM, [email protected] said:

is there any way I can use my 4 good data drives and the old parity drive, which may or may not be complete and may or may not work, to try to recover the contents of the failed data drive?

If you are messing with any of these drives, you can kiss your chances of recovery goodbye.

Link to comment

Oh gosh. Um yeah, I have those four data drives still in the array (new config). All four are still working and I'm not writing anything to them (that I know of). My understanding was that if you are just building parity it doesn't write to those drives? They are important if I am to rebuild the failed drive (unistalled) with what may or may not be a viable parity drive (uninstalled). Sounds like I've further damaged my chances of using those drives with the old for recovery?

 

Additionally, it sounds like I have a flawed understanding of how parity works. I made the assumption that when the array attempts to emulate or restore a drive from intact data drives and parity, it goes through the entirety of the drives comparing their 1s and 0s at every "sector"... so if you have parity issues, it may mess up a few hundred files on your emulated drive, but not scrap the whole thing. Is my understanding wrong? 

 

Thank you

 

-noob

Link to comment
27 minutes ago, [email protected] said:

I made the assumption that when the array attempts to emulate or restore a drive from intact data drives and parity, it goes through the entirety of the drives comparing their 1s and 0s at every "sector"... so if you have parity issues, it may mess up a few hundred files on your emulated drive, but not scrap the whole thing. Is my understanding wrong? 

Your understanding of parity is correct, but your understanding of what a file is may be flawed.

 

Think of it this way. Your hard drive contains what looks like random strings of 1's and 0's. It's only by using the table of contents that you can be sure of what section of data belongs to what file. That proportionally small piece of data where the table of contents resides is in a common "column" of addresses, so if the table of contents is modified on any of the drives, you will be corrupting the TOC for your parity emulated drive. No valid TOC, no files, at least not without forensic recovery tools.

 

When you delete a file or move a file or anything like that, it updates the TOC for the drive. So, yeah, it's pretty easy to scrap the whole thing. Sometimes you can get the filesystem repair tools to rebuild the TOC, sometimes it's just too far gone.

 

Building parity doesn't write to the data drives, but there are various background processes that can access and write to the drives.

 

I was hoping when you said you were content to wait, that you were doing just that, waiting, not continuing to use the drives you wanted to recover with.

  • Upvote 1
Link to comment
29 minutes ago, [email protected] said:

My understanding was that if you are just building parity it doesn't write to those drives?

 

When you build new parity, you aren't writing to these four disks.

 

But just mounting the four drives introduces small writes to them, that isn't mirrored on the old parity drive.


But the really big question is what is happening now - if any Docker or VM makes writes or any machines performs writes to any of the shares then you are toast.

 

Since the repair steps are so critical, and there are a huge number of ways people can accidentally go wrong, everyone should have a post-it note on the unRAID machine:

 

In case of emergency:

  1. Plan first
  2. Review plan
  3. Request second opinion from forum
  4. Double-check any suggestions was understood
  5. Review again
  6. Start repair actions
  7. Post progress feedback
  • Upvote 1
Link to comment

Freakin love this...

 

8 minutes ago, pwm said:

 

In case of emergency:

  1. Plan first
  2. Review plan
  3. Request second opinion from forum
  4. Double-check any suggestions was understood
  5. Review again
  6. Start repair actions
  7. Post progress feedback

 

I've learned some lessons this go 'round for sure. It is looking like I may just be resigning myself to loss of data on that drive and that's all there is to it.

 

Depending on the complexity of the process to attempt the rebuild, I MAY still try it, so I guess I hold a spark of hope yet.

 

I may have to start a new thread as I am seeing stall-outs in the new configuration.

 

Oh boy

 

Thanks again

Link to comment
On 7/30/2018 at 9:30 PM, [email protected] said:

So I focused on trying to pull data directly from the failed/failing drive. I was able to pull quite a bit from it, although not some very important files. I estimate about 40-50% data loss.


Just curious. What tools did you use for this step? Just normal copy commands?


And what was the reason for failing to retrieve more data? That some files gave read errors? Or that you couldn't even locate the directories? Something else?

 

One thing I wish unRAID would have is a manual "compute and give me block x for disk y" command.

Then a user could read out as much blocks as possible from a bad disk.

But make use of remaining data disks + parity to manually request the recomputation of the blocks that the bad disk fails to read.

 

That would give the users an option of making as much use as possible of a damaged disk in a situation where they have accidentally corrupted some of the parity data. The chances are reasonably good that the blocks that are unreadable on the damaged disk does not align with corrupted parity.

 

One competing product to unRAID - SnapRAID - can even perform repairs by retrieving partial data from a backup. But that's because SnapRAID works with checksummed file blocks and not with raw disk sectors. So the disadvantage with SnapRAID is that it doesn't maintain real-time parity - any changes has to be committed. There are always tradeoffs between different designs.

Link to comment

For my recovery efforts, I pulled the failed data drive into an external SATA case connected to my Windows machine, then I used Raise Data Recovery to pull everything that was still available off the original file system. A lot of directories weren't there at all, and then some files within the directories that were present were not readable form source. I have yet to do a deep dive recovery to see what is beneath the file system, but plan to do that as well.

 

Thanks for the heads up on the alternative product also... interesting.

Link to comment

Yes, Please @johnnie.black. Sorry, I've been out of town.

 

A bit of an update if it is at all helpful...

 

In case I didn't mention it, I am running v5.0.6

 

And as I mentioned before, I kept 4 of the 5 data disks in the array and made a new configuration after installing a new parity drive. I was able to create parity without issue. So I moved on to adding in a new drive to the array to begin copying over those things I could recover from the failed data drive. I then noticed slowing/stalling issues when trying to write to that drive and when accessing the GUI. I decided I'd better run a correcting parity check to make sure all is going well... and it didn't complete. In fact, it did the exact same thing as the last parity drive. I'm now thinking my original parity drive didn't actually fail and that I am dealing with some other issue??

 

I mention that info not necessarily to address it here, but so that you are aware of it in case it changes anything in my attempt to reinstall the old parity drive with the intact data disks.

 

Here is the new thread 

 

 

Link to comment

I'm usually pretty good about figuring things out and resolving issues on my own. But I feel like I have botched this one pretty good, so I'm doing my best to be patient and get tips on exactly what to do from the experts. So I won't run another parity check or try anything else to get a valid syslog unless someone who knows a lot better than me tells me too :)

 

Thanks again

Link to comment

Ha. Not really, truthfully. I've kind of just relied on the unraid box for my redundancy. But the more I research, the more I see that this was bound to happen at some point in the raid array's life. If not this, something like it. Anyway, I am backing up everything, working out a plan for an additional layer of protection, plus online storage for those items that are irreplaceable. So in short, no, not yet... but working on it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.