Jump to content

Potential Double Loss, but I think it might be savable... Any advice?


Recommended Posts

I've got a unique problem here, and hopefully some big brains here can help out.  I'm having problems with Disk 2 and Disk 7.  I know this is typically a "data's lost, and move on" (fortunately, all of the critical data is backed up to the cloud I believe...  So this should be just an effort in trying to avoid a massive download).

 

Two days ago, Disk 7 wound up disappearing.  It looked like a drive disconnect, I reseated the drive, it came online, and rebuilt perfectly, no errors.  Unfortunately today, when writing some data in the array, Disk 2 started spitting some errors.  "Current_Pending_Sector" errors.  My take is that this drive is probably toast, and while trying to figure things out, Disk 7 disappeared again.  Got it reconnected, but now I'm hesitating...

 

I'm pretty certain that the contents of Disk 7 are stable and correct.  I had temporarily removed it as a target from the shares, so I don't think there's anything that's changed with regard to that drive...  Screenshot below is of the current state.

 

My fear is that if I start rebuilding the array, and try and rebuild from Parity on top of Disk 7, I'm going to be really hosed. 

 

I have 2 disks coming... Assuming that we can figure something out to get Disk 7 forced back in, then here is the plan for those 2 drives. I plan to use 1 to replace Disk 2 - I think that's the first priority...  The other replaces Disk 7 (and I may connect it somewhere else).  Third step is to take the drive that *was* Disk 7 and really run it through it's paces to make sure it's good.  If it's good, it goes back in as Parity 2...

 

So, questions:

1) Is there a way to confirm that there was no writing to Disk 7 (or the emulated Disk 7) after we started hitting the errors on Disk 2? Does that even matter?

2) Is there a way to confirm that Disk 7 was up when I removed it from the "share target" (I'm pretty sure this was the case...)

3) Is there a way to force Disk 7 in without rebuilding it?

4) Does the plan seem to make sense to you for the drive movement?

 

Thanks so much for the help!

 

image.png.d33676082c8decec6f21e23c39736a1b.pngwhitenas-diagnostics-20230414-1950.zip

Link to comment

Maybe it would be better to put a new "cleared" drive in place of Disk 7 and "try" the rebuild first?  I don't know.  I'm going to be out of town for the week, so I'm not going to be able to physically touch things for a bit...

 

And, here's the setting that I changed before Disk 7 went weird - I assume that this would prevent writes to the disk.  (Don't have any dockers that would write to it either - they were all disabled).

 

image.png.5f306f16bddaf22f51e6041e28bdb5e0.png

Link to comment
7 hours ago, jrhamilt said:

1) Is there a way to confirm that there was no writing to Disk 7 (or the emulated Disk 7) after we started hitting the errors on Disk 2? Does that even matter?

If there were any writes to the emulated disk7 after it got disabled parity will no longer be in sync if it's forced enabled, it will never be 100% in sync any way due to filesystem mount/unmount, but if that 's the only changes it's usually recoverable, problem is if you don't know if there were writes, was the disk disable d for long? If not sure it might be still worth a try.

Link to comment

The disk wasn't disabled for too long - and I was watching everything.  A couple "quiescent" hours.  But, it's a lot of data and there are other users - though typically they wouldn't be writing anything there.  And all of the dockers were disabled.  And the mover was disabled.  I suppose I did turn the array back on and it took me a little while (again, couple hours) to realize that Disk 7 was offline...

 

Thinking it through, if there were any changes to the emulated disk 7, and we wind up rebuilding Disk 2, then I've effectively corrupted Disk 2 if it rebuilds in those areas (because disk 7 isn't what we said it was).

 

I could remove disk 7 and check file modification times?  But that doesn't help by itself... - I need to check the modification times on the "emulated" disk 7 and compare to the real disk 7.  That would require me to start the array and "read" the contents of the emulated disk 7 (enough to get the file properties).  Can I start the array "read only" in order to check that out?  If the file mod times on Disk 7 "emulated" and Disk 7 "actual" match, is that sufficient?

 

 

Link to comment

I've removed the "potentially good" disk 7, and have a new disk in it's place.  I'm starting the array in maintenance mode and trying to rebuild to Disk 7...

 

Disk 2 is starting to throw read errors...  Not sure what to do, going to let it rebuild?  (I still have the old disk 7, which I think / hope is good...)

 

Need help!

 

image.thumb.png.caedf5f3d6cebefd96dfdcbb26377a58.png

Link to comment

I'm thinking since this is getting so many errors (up to 18,000), that I will let this finish.

 

Then, is there a way for me to put the old "probably good" drive back in the array, force Unraid to consider that drive and parity to be correct, and rebuild on top of Disk 2?

 

Then I'll have a "Disk 2 rebuild" and a "Disk 7 rebuild" and the old Disk 2 (if it still runs) and the old Disk 7, and my online backup - and from there it just is what it is... And when I find failed files, I've got options on trying to find them...

Link to comment
3 hours ago, JorgeB said:

You could have tried to force enable disk7, still can though parity will be more out sync, can post instructions if you want to try it.

I would like to try it, and would appreciate your instructions.  At the end of my previous build, it noted that disk 7 is invalid.  I believe that with it being marked as invalid, that there is no way that I can successfully turn the array on, and I don't think that a "check disk" on disk 2 is going to do anything good.

 

I think the steps are something like, put the other disk back in (the old, probably good disk 7), force it to think the previous configuration (current bad disk 2, old probably good disk 7) is good, (new config? parity correct check mark? start in maintenance? something like that)...  then shutdown, pull Disk 2 out, start array.  Assign new (blank) 4 TB drive as Disk 2. Start in Maintenance.  Sync.  I think that series of steps will be as good as I can get it...  But again, would like instructions from the professional.  I don't need to use the array while we do any of this.  It's probably better that it's not used.  (I've downloaded the super critical data from my backups, so having this down for a week isn't horrible.)

 

image.thumb.png.47ee9b5b7c9ebc84eb36e2ce2105ff73.png

 

Here is the final result from the main page.

image.thumb.png.d1b83042e185a8f94070da9b2bede30d.png

Link to comment

This will only work if parity is still valid:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including old disk7 and a new disk2 (or use the old one for now to see if it still can be emulated), replacement disk should be same size or larger than the old one
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk2
-Start array (in normal mode now), and post new diags

Link to comment

Hmmm - if I unassign and then start the array, we're not done, right?  That's just going to emulate Disk 2 for now.  Correct?  Then I have to rebuild on Disk 2?  Is the point to check the diags before we commit to rebuilding on 2?

 

Here's the screen after the start in maintenance...

image.thumb.png.b90ddd851402a3323ba5ba14db8aa79d.png

Link to comment

So, thinking that not much could be lost by trying a sync, I assigned a new drive to the disk2 slot, and started a rebuild.  All of that looked pretty normal, but I did wind up getting some read errors from the "good disk 7". I didn't write from Disk 2 to Disk 7, this one should be unmolested, but I guess I'm realizing at this point, that both drives really are most likely bad...  I don't have a good feel for why the reads failed...

 

Can't get a diagnostic right now, but this is where it's at...Screenshot_20230416_183109_Chrome.thumb.jpg.94c07f7d34ee22d3664b05d1b3524ff3.jpgScreenshot_20230416_183138_Chrome.thumb.jpg.cc15124e8a713e44b06c5f94063292ad.jpg

 

Link to comment

Can't do that from maintenance mode, right?  Need to actually mount the drives with a normal startup, right?

 

I did finish the rebuild of Disk 2, (with the 1151 read errors from Disk7 per above).  Does that change anything? Do I just run it on the file system itself since it's not emulated?

 

Link to comment

Seems like, even with the read errors from "original disk 7", and the potential for bad sectors in 2 from the rebuild, this is the best path forward, right?

 

Now, what's to be done with Disk 7?

 

When I bring the disk online I plan to have the backup service try and restore all the files it backed up in place overwriting things as it goes...  Then will do some other checks on the data that wasn't backed up.

 

I also plan to add a second parity. Thoughts on when I should do that?

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...