Jump to content

Help with parity sync/data rebuild


Recommended Posts

I recently did an upgrade from version 4.7 to the lastest version.  All seems to have gone fine with the upgrade on the flash drive, but I need some help with what I should do with a parity sync/data rebuild issue.  I've attached a picture of where it's at right now.  Obviously there's an issue with that first drive.  Just not sure what my next course of action should be.  As you can see it's "estimated finish time" is 434 days lol.  Am I stuck, or can I cancel the process now and replace the drive? 

 

Server-X2.jpg

Link to comment

The parity sync is clearly not going to complete successfully, so I'd just cancel it.

 

Did you do a parity check BEFORE you shut down v4.7 to confirm all was well before starting the upgrade?  THAT was the time to replace the drive, as you could have done a rebuild onto a new drive and wouldn't have lost any data.  At this point, all of the data on the "unmountable" drive is likely lost -- although you CAN make some attempt to recover data from that drive independently later.

 

But for now, I'd just start over with a New Config, assigning your good drives and the parity drive (be CERTAIN you assign the correct drive as parity) to the system, and then letting it do a parity sync.    You might also check that both the SATA and power cables are securely fastened to the problem drive -- just in case this is simply a loose cable issue [disconnect them and reconnect them -- being certain they're firmly connected at both ends].    If you re-seat the cables, you can do a New Config with all of the drives -- but if the drive still shows "unmountable" I would then do another New Config without it, and worry about data recovery at a later time.

 

Link to comment

SMART for parity drive looks OK. As you can see from your screenshot, you are having read errors on disk1. Not surprising since that disk needs to be replaced ASAP.

Serial Number:    WD-WCAVY7455519
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       5
196 Reallocated_Event_Count 0x0032   197   197   000    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   196   196   000    Old_age   Always       -       1460
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       98

Disk3 also has problems but that will have to wait.

Serial Number:    S2H7J1CZA03013
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       18

Why were you doing a parity sync? Would have been a much better idea to rebuild disk1 to a new drive. Hope your parity is still good.

Link to comment

As you can also see from your screenshot, disk1 is unmountable. That typically means filesystem issues.

 

There are going to be some detailed steps you will need to follow to attempt to recover the files from disk1, and no guarantee of success.

 

Do you have backups? Restoring from backup might be the simplest and most reliable solution.

Link to comment

You absolutely MUST have backups of any irreplaceable files. unRAID parity is not and never has been a substitute for backups. You should start considering a backup plan immediately. Lesson learned perhaps too late.

 

Since that disk is currently unmountable, you won't be able to read any of its files even if you stop the parity sync. We will have to try to repair the filesystem.

 

Since we don't know whether parity is good or not it might be best to try more than one way to recover from this, one with parity and one without. But as I said, detailed steps and no guarantee of success.

 

Hopefully someone else will join this thread since two heads are sometimes better than one.

 

Do you have any new disks you can use to rebuild disk1?

 

Link to comment

Basically, the plan will be to get unRAID to trust your current parity so you can rebuild disk1 to a new drive. Then try to repair the filesystem on the new disk1. And if necessary we could also try to repair the filesystem on the old drive separate from the parity array to see if anything additional could be recovered that way.

 

Let us know when you have the new disk and are ready to proceed.

Link to comment

when I get home do I need to put the new drive in a new slot or swap the bad drive out with the new one

Since the idea is to rebuild you must assign it to the same slot the bad drive is currently assigned to. Whether you actually physically replace it doesn't matter.

 

*EDIT* In fact it might be easier if you have the old drive and the new drive both in the machine if you have space and ports for that in case we want to get at the old drive later.

 

But that is getting quite a bit ahead of the game. Before doing anything else we must make unRAID trust the parity disk.

 

It would be a good idea to go to Settings - Disk Settings and set Enable auto start: No until we get done with everything.

 

Ideally we would test the new drive by preclearing it, but since it doesn't need to be clear you could test it on another system with another method. At the very least it would be a good idea to check the SMART of the new drive before attempting the rebuild.

Link to comment

Basically, the plan will be to get unRAID to trust your current parity so you can rebuild disk1 to a new drive.

 

NO => this isn't a good option at this point, since a parity sync has already been running (for over a day) on the v6 setup.  There's NO chance that parity is valid -- using that option would simply result in a known-bad parity disk which would not be in a reasonable condition to rebuild the failed disk.    The new parity sync has already corrupted at least the first 1% of the disk, so any rebuild would be very unlikely to have any reasonable chance of recovery.

 

IF you want to try this anyway, buy TWO new disks => one to try this with; and one to actually put in the array.  You'll need to do this FIRST ... before you do ANY writes to the array at all.    Basically you would do a New Config with all of the current disks (including the bad one), with the "parity is already valid" box checked;  Start the array;  Stop the array; unassign the bad disk;  Start the array (so it shows as "missing");  Stop the array; assign a NEW disk to that slot; then Start the array and let it do a rebuild.  When that finished, you Stop the array; do a New Config -- this time assigning the OTHER new disk to that slot and NOT checking the "parity is already valid" box; then Start the array and let it do a new parity sync; then (after that finishes) format the new drive; and you'll then have a good, parity-protected array.  Meanwhile, you'll have both the original and the "rebuilt-but-known-bad" copy that you can attempt to do some recovery from "outside" of the array.    There IS a chance that the "rebuilt-but-known-bad" copy may have some recoverable data in the area past where the aborted parity sync had already written ... but realistically this simply isn't a good option at this point.

 

...  try to repair the filesystem on the old drive separate from the parity array

 

This is indeed what you need to do r.e. attempting recovery -- Reiserfsck is VERY good at recovering data from corrupted disks; so there IS a chance.  But at this point you may as well get a good v6 array running and ready to copy any recovered data to.

 

Did this failure happen after you'd shut down v4.7, or did you not run a parity check in 4.7 before shutting it down to upgrade it?    As I noted earlier (too late now), THAT would have been the time to do a rebuild ... BEFORE you upgraded to the new version.  But at this point it's simply not an option.

 

One other thought:  Did you check the cables as I suggested earlier?  IF the issue is simply a loose cable; then a New Config with all of the drives original drives assigned will let you do a good parity sync and everything will be fine with no need to replace the drive.  I'd definitely unplug; then replug both the SATA and power cables to the problem drive and try it before doing anything else.  If it mounts, you're good to go  :)

 

Link to comment

Basically, the plan will be to get unRAID to trust your current parity so you can rebuild disk1 to a new drive.

 

NO => this isn't a good option at this point, since a parity sync has already been running (for over a day) on the v6 setup.  There's NO chance that parity is valid -- using that option would simply result in a known-bad parity disk which would not be in a reasonable condition to rebuild the failed disk.    The new parity sync has already corrupted at least the first 1% of the disk, so any rebuild would be very unlikely to have any reasonable chance of recovery.

 

IF you want to try this anyway, buy TWO new disks => one to try this with; and one to actually put in the array.  You'll need to do this FIRST ... before you do ANY writes to the array at all.    Basically you would do a New Config with all of the current disks (including the bad one), with the "parity is already valid" box checked;  Start the array;  Stop the array; unassign the bad disk;  Start the array (so it shows as "missing");  Stop the array; assign a NEW disk to that slot; then Start the array and let it do a rebuild.  When that finished, you Stop the array; do a New Config -- this time assigning the OTHER new disk to that slot and NOT checking the "parity is already valid" box; then Start the array and let it do a new parity sync; then (after that finishes) format the new drive; and you'll then have a good, parity-protected array.  Meanwhile, you'll have both the original and the "rebuilt-but-known-bad" copy that you can attempt to do some recovery from "outside" of the array.    There IS a chance that the "rebuilt-but-known-bad" copy may have some recoverable data in the area past where the aborted parity sync had already written ... but realistically this simply isn't a good option at this point.

 

...  try to repair the filesystem on the old drive separate from the parity array

 

This is indeed what you need to do r.e. attempting recovery -- Reiserfsck is VERY good at recovering data from corrupted disks; so there IS a chance.  But at this point you may as well get a good v6 array running and ready to copy any recovered data to.

 

Did this failure happen after you'd shut down v4.7, or did you not run a parity check in 4.7 before shutting it down to upgrade it?    As I noted earlier (too late now), THAT would have been the time to do a rebuild ... BEFORE you upgraded to the new version.  But at this point it's simply not an option.

 

One other thought:  Did you check the cables as I suggested earlier?  IF the issue is simply a loose cable; then a New Config with all of the drives original drives assigned will let you do a good parity sync and everything will be fine with no need to replace the drive.  I'd definitely unplug; then replug both the SATA and power cables to the problem drive and try it before doing anything else.  If it mounts, you're good to go  :)

 

Alright, I've got a new drive.  While I'm in there, I will try checking the cables of the old drive and see if that helps.

 

In the mean time, if that doesn't help, what are the steps I need to do to try and recovery seperate from the array?

Link to comment

Basically, the plan will be to get unRAID to trust your current parity so you can rebuild disk1 to a new drive.

 

NO => this isn't a good option at this point, since a parity sync has already been running (for over a day) on the v6 setup.  There's NO chance that parity is valid -- using that option would simply result in a known-bad parity disk which would not be in a reasonable condition to rebuild the failed disk.    The new parity sync has already corrupted at least the first 1% of the disk, so any rebuild would be very unlikely to have any reasonable chance of recovery.

 

IF you want to try this anyway, buy TWO new disks => one to try this with; and one to actually put in the array.  You'll need to do this FIRST ... before you do ANY writes to the array at all.    Basically you would do a New Config with all of the current disks (including the bad one), with the "parity is already valid" box checked;  Start the array;  Stop the array; unassign the bad disk;  Start the array (so it shows as "missing");  Stop the array; assign a NEW disk to that slot; then Start the array and let it do a rebuild.  When that finished, you Stop the array; do a New Config -- this time assigning the OTHER new disk to that slot and NOT checking the "parity is already valid" box; then Start the array and let it do a new parity sync; then (after that finishes) format the new drive; and you'll then have a good, parity-protected array.  Meanwhile, you'll have both the original and the "rebuilt-but-known-bad" copy that you can attempt to do some recovery from "outside" of the array.    There IS a chance that the "rebuilt-but-known-bad" copy may have some recoverable data in the area past where the aborted parity sync had already written ... but realistically this simply isn't a good option at this point.

 

...  try to repair the filesystem on the old drive separate from the parity array

 

This is indeed what you need to do r.e. attempting recovery -- Reiserfsck is VERY good at recovering data from corrupted disks; so there IS a chance.  But at this point you may as well get a good v6 array running and ready to copy any recovered data to.

 

Did this failure happen after you'd shut down v4.7, or did you not run a parity check in 4.7 before shutting it down to upgrade it?    As I noted earlier (too late now), THAT would have been the time to do a rebuild ... BEFORE you upgraded to the new version.  But at this point it's simply not an option.

 

One other thought:  Did you check the cables as I suggested earlier?  IF the issue is simply a loose cable; then a New Config with all of the drives original drives assigned will let you do a good parity sync and everything will be fine with no need to replace the drive.  I'd definitely unplug; then replug both the SATA and power cables to the problem drive and try it before doing anything else.  If it mounts, you're good to go  :)

 

Alright, I've got a new drive.  While I'm in there, I will try checking the cables of the old drive and see if that helps.

 

In the mean time, if that doesn't help, what are the steps I need to do to try and recovery separate from the array?

Checking connections on all drives is always a good idea but the drive is telling us it is bad and not a connection problem.

 

garycase has given you another option, and it does have the benefit of getting the rest of your array protected before we try to recover the disk1 data. There are several steps along any path that will take a significant amount of time, so getting the rest of the array protected is worth considering. And there are a significant number of pending sectors on disk3 which could cause problems with the rebuild.

 

As he outlined it you would need an additional disk to put in the disk1 slot, which would become an empty disk, while we work on the reconstructed disk (and possibly the original disk) outside the array. Then you could copy any recovered files to the new empty disk.

 

Another possibility would be to do the rebuild to a new disk, then remove disk1 and leave that slot empty and rebuild parity. That would get the rest of the array protected, but unfortunately that won't really leave us any room on the rest of your array to copy the data from the repaired disk(s).

 

Any other opinions out there?

Link to comment

None of the options look very good to me, but the rebuilt disk seems the best, yes there's more than little damage to the parity disk, but disk1  is reiser so with some luck most of it can be recovered.

 

The old disk1 looks very damaged, reiserfsck can't fix a disk with bad sectors, you'd need to run badblocks first to create a list of them to be used by reiserfsck, it's worth trying but I don't expect much success.

 

The option to get the array protected first is always a good idea, and you still don't know if the rebuild will be 100% successful because of disk3 (or any other surprise).

Link to comment

And another plus for the "2 new disks" plan of garycase would be you would already have another disk not part of the array (the new rebuilt disk1 that we recovered files from) that you could use to replace disk3 when you're done.

 

Then you could try to clear the pending sectors on the old disk3 by preclearing it and if that worked you would still have an extra disk that you could use for backups (remember those?) or whatever (parity2?).

 

Probably sounds pretty confusing by now, doesn't it?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...