Jump to content

Disk 3 error - Disk 3 rebuilt from parity - Disk 3 missing almost everything :(


Living Legend

Recommended Posts

First time hard drive fail, so this was the first time I've had to replace a drive.  Apparently I did something very wrong  as my disk as been "rebuilt" except  it's only at 114 GB usage rather than 2.45 TB usage.  I've likely already killed off any life lines by formatting the initial disk 3 that had an error

 

This was my order of operation.

 

1. I wake up with notifications that disk 3 isn't working.  I got the red x:

 

image.thumb.png.f39d23a063b539561ffd0be3d31362d1.png

 

I posted on the forums and the recommended action was to replace the disk, but it was mentioned that I could attempt to rebuild with the same disk even though it was unlikely to work.  I decided to give it a go.  I stopped the array and removed disk 3 from the array on the GUI.  Then I started the array up, added disk 3 (my original and broken disk) and selected the option to rebuild.  After rebuild, I still had red x.  Bummer.  I ordered two new hard drives.  One to replace, and one to expand.

 

I received the hard drives, popped them in and pre cleared both.  Everything looks good.  I now stop the array and swap broken disk 3 with my new disk ending in RDHU, another WD 3TB disk.  It takes a good 8 hours, but I finally get a message that "Parity sync/Data Rebuild finished". 

 

Except one problem.  Disk 3 only has 114GB of data rather than the 2.45 it used to have.

 

image.thumb.png.e2a5b6d8e8d4e8e5375326fe599babc5.png

 

While all this was going on, I continued to use the array as I've never read of any harm doing so.  I continued to read/write to the array.  All writing is done to shares, none of which have a specific allocation besides some only to cache drive.

 

Where did I go wrong in my order of operation, and am I SOL on recovering any of my disk 3 data?

 

Link to comment

Also now realizing that because I use shares which allocated the files, I really have no clue what I'm actually now missing or how to find out.

 

Bigger issue than I initially anticipated.  :(

 

One thing I'm noticing that I don't understand.  Why does it say that the last parity check was completed the same time the rebuild ended and found no errors?  Does this mean that somehow the parity disk was modified before the new disk was rebuilt?

Link to comment

Writing to the Array during a rebuild... i think that was a bad idea - NEVER do this :(

If you rebuild a Disk, "all" Disks are involved - also the parity disk. If you write Data during a rebuild, the parity changes - but it is needed for rebuild.

That may explain why you have now only 114GB instead of 2,45TB :(

Link to comment
9 minutes ago, Zonediver said:

Writing to the Array during a rebuild... i think that was a bad idea - NEVER do this :(

 

o.O

 

I can't remember for certain, but I thought that the array was up and live during the rebuild.  Had it spun down the disks and stopped the array, I certainly wouldn't have tried to over rule this.  But I believe the array was active, so I didn't see an issue, or remember reading anything that said not to.  Had it said no writing and left it running, I would have had to shut down every single docker, VM, plugin and user scripts manually to make sure that no modifications were made to the array.  Seems like a lot for the user to have to be cognizant of doing manually if it were truly a critical thing during the rebuild process.

 

I'm not going to say anything negative about unRAID as it's far more likely the error was on my end, but it'd sure be a shame if in my first disk failure something out of my control went wrong on the rebuild process.

Link to comment
13 minutes ago, Living Legend said:

 

o.O

 

I can't remember for certain, but I thought that the array was up and live during the rebuild.  Had it spun down the disks and stopped the array, I certainly wouldn't have tried to over rule this.  But I believe the array was active, so I didn't see an issue, or remembering reading anything that said not to.  Had it, I would have had to shut down every single docker, VM, plugin and user scripts manually to make sure that no modifications were made to the array.  Seems like a lot for the user to have to be cognizant of doing if it's truly that critical.

 

I'm not going to say anything negative about unRAID as it's far more likely the error was on my end, but it'd sure be a shame if in my first disk failure something out of my control went wrong on the rebuild process.

 

Dockers and VMs are normaly running on the Cache-SSD/HDD which is "not" part of the Array - except you dont use a cache-drive... also bad, sorry :(

Link to comment
Just now, Zonediver said:

 

Dockers and VMs are running on the Cache-SSD which is "not" part of the Array - except you dont use a cache-drive... also bad, sorry :(

 

Hmm, not sure I follow.  I am using a Cache drive.  It doesn't have to be, but it is an SSD.  VMs don't have to be on a cache drive.  Mine are actually located on an unassigned device outside of the array, but they do have access to the array through SMB.  Dockers are on my cache drive, but also interact quite a bit with the array per usual...

 

Link to comment
5 minutes ago, Living Legend said:

 

Hmm, not sure I follow.  I am using a Cache drive.  It doesn't have to be, but it is an SSD.  VMs don't have to be on a cache drive.  Mine are actually located on an unassigned device outside of the array, but they do have access to the array through SMB.  Dockers are on my cache drive, but also interact quite a bit with the array per usual...

 

 

Unassigned devices are ok but you have to be sure that nothing do a write to the array during a rebuild because it changes the parity which is needed for rebuild.

I dont use VMs but when i do a rebuild i deactivate all my dockers - just a precaution - but i have only three of them.

Link to comment
1 hour ago, Zonediver said:

Unassigned devices are ok but you have to be sure that nothing do a write to the array during a rebuild because it changes the parity which is needed for rebuild.

It is perfectly OK to write to the array during a rebuild.    It does update parity, but in a way that is consistent with the rebuild.  

 

The reason that this is normally recommended against is that it can badly affect performance, and also if something goes wrong with the rebuild then (depending on whether you were writing to the disk being rebuilt) it might make it harder to work out what data might have gone missing.

Link to comment

Is there something specific I need to do to get the diagnostics from the rebuild?  Does that differ from current diagnostics?

 

unraid-diagnostics-20171024-0253.zip

 

I'm currently running a parity check with writing corrections disabled.  I'm naively hoping that maybe the 3rd disk was somehow built incorrectly and that my parity disk is still capable of assisting in the rebuild of disk 3 the way it's supposed to be :S

 

Forunately I'm mostly media, so it's of minimal consequence.  I do however have a few generic data shares, and really have no idea how to even tell what's missing.

Link to comment
1 minute ago, johnnie.black said:

You can try running a file recovery util on both the formatted (if it was just a quick format) and rebuilt disks, like this one:

 

http://www.ufsexplorer.com/download_stdr.php

 

Two questions.

 

1. Why would I need to run this on the rebuilt disks?    

2. I assume I need to remove the disk from my server and place into a PC running this software?

Link to comment

Rebuild looks normal, only thing that's not normal is preclear resuming the disk currently being rebuild, but IIRC that was just a bug and nothing was actually being written to the disk, though I still don't like seeing that on the log.
 

Oct 23 13:11:07 unraid kernel: md: import disk3: (sde) WDC_WD30EZRZ-00GXCB0_WD-WCC7K1XJRDHU size: 2930266532 erased
Oct 23 13:11:07 unraid kernel: md: import_slot: 3 replaced
...
Oct 23 13:13:54 unraid kernel: md: recovery thread: recon D3 ...
Oct 23 13:13:54 unraid kernel: md: using 1536k window, over a total of 2930266532 blocks.
...
Oct 23 13:13:56 unraid preclear.disk: Resuming preclear of disk 'sdg'
Oct 23 13:13:56 unraid preclear.disk: Resuming preclear of disk 'sde'

 

sde is the new disk3

 

When did you notice the missing data? Right after starting the rebuild, during or after it finished?

Link to comment

I just had exactly the same experience as Living Legend. My drive 12 (4TB) was disabled during a parity check which has more than 2000 errors. I suspect a too much bent cable. The SMART report on the disk is as clean as it can be. So I reinstalled the disk. Precleared it once. No problems. And then reinserted it as a 'new' disk back into the array. Rebuild took about 12 hours and was finished without any errors. Before, it had at least 3TB of data on it. Not it's only 100GB which were probably coming from a backup job that ran after it was rebuilt. The array was running during the rebuild.

 

I tried to look at the syslog but the syslog page was unresponsive. The parts that I could get to were from after the rebuild and didn't show any problems. I rebooted but the disk 12 is still empty. So I lost 3TB!

 

What is going on here? This severely shatters my trust in unRAID!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...