While rebuilding parity on a new parity drive, a data drive failed. What now?


Recommended Posts

With that said, worth another try? Questions:

- if I get the same problem (parity drive being rebuilt), the correct course of action is to stop rebuild ASAP, right?

- is there anything I should do to 'protect' the parity drive given the problem with the previous try? e.g. cloning it?

 

In the meanwhile, I am trying to recover data from the old disk2. I believe that the file system might have been wiped out due to a temporary short on the PCB (I didn't fix it properly and it might have hit a screw on the casing - not my finest hour). The data should all be there. I'm using Raise Data Recovery for that (I don't have a Linux PC and this one works for Windows).

Link to comment

@JorgeB I tried another time the original way, no luck - the invalidslot command doesn't seem to work here for some reason. Stopped immediately obviously to minimise incorrect writes on parity disk.

 

So I followed the other way you suggest and now we are up and running, see below.

 

You mention "filesystem check", "rebuild the superblock", "--rebuild-tree" - I assume these are all things I will have to do on disk2 later on to fix the partial 'damage' on the parity disk, right? So I assume the process now is:

- finish data-rebuild

- fix disk2 with the commands above (I will look them up but if confused, I might have to come back to you and ask for help - sorry!)

- run a parity check

 

Thanks!

 

Capture6.thumb.JPG.21d2e0147f12d86424dee291ca450fb6.JPG

Edited by riccume
Link to comment
On 11/3/2020 at 12:11 PM, JorgeB said:

Yes, first finish the rebuild, when that's done restart the array in maintenance mode and post the output of:

 


reiserfsck --check /dev/md2

 

@JorgeB , Here you are:

 

reiserfs_open: the reiserfs superblock cannot be found on /dev/md2.
Failed to open the filesystem.

If the partition table has not been changed, and the partition is
valid  and  it really  contains  a reiserfs  partition,  then the
superblock  is corrupted and you need to run this utility with
--rebuild-sb.

 

I guess that means I need to rebuild the superblock?

 

Thank you!

 

Link to comment
On 11/3/2020 at 10:49 AM, riccume said:

In the meanwhile, I am trying to recover data from the old disk2. I believe that the file system might have been wiped out due to a temporary short on the PCB (I didn't fix it properly and it might have hit a screw on the casing - not my finest hour). The data should all be there. I'm using Raise Data Recovery for that (I don't have a Linux PC and this one works for Windows).

Also @JorgeB, Raise Data Recovery seems to have found most if not all of the data in the old disk2 that mysteriously lost the file system (I suspect a temporary short on the PCB due to a screw on the case).

If we assume the issue with this hard drive (the old disk2) is only with the file system and not with the data on the disk, would it make sense to use the reiserfsck command on this drive too to rebuild the file system?

I might end up with all of the original data back, while I assume the current rebuild of disk2 based on a parity disk that has been 'damaged' a little bit will lead to some data loss?

 

Thanks!

Link to comment
50 minutes ago, JorgeB said:

Yep, instructions here:

https://wiki.unraid.net/Check_Disk_Filesystems#Rebuilding_the_superblock

 

And when done you'll need to run:


reiserfsck --rebuild-tree --scan-whole-partition /dev/md2

 

Done. This is what I got. Anything else I should do?

 

root@Server:~# reiserfsck --rebuild-tree --scan-whole-partition /dev/md2
reiserfsck 3.6.24

*************************************************************
** Do not  run  the  program  with  --rebuild-tree  unless **
** something is broken and MAKE A BACKUP  before using it. **
** If you have bad sectors on a drive  it is usually a bad **
** idea to continue using it. Then you probably should get **
** a working hard drive, copy the file system from the bad **
** drive  to the good one -- dd_rescue is  a good tool for **
** that -- and only then run this program.                 **
*************************************************************

Will rebuild the filesystem (/dev/md2) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: No transactions found
###########
reiserfsck --rebuild-tree started at Wed Nov  4 19:03:30 2020
###########

Pass 0:
####### Pass 0 #######
The whole partition (976754624 blocks) is to be scanned
Skipping 38019 blocks (super block, journal, bitmaps) 976716605 blocks will be r
ead
0%Killed                                             left 976631497, 28369 /sec

root@Server:~#

Link to comment
48 minutes ago, JorgeB said:

Yes, if you can access the old disk always good to make a comparison with the data recovered on this one, assuming reiserfs works, and it should.

Thanks, I'm hopeful this might further minimise data loss.

I assume I would start with the same command we used here, reiserfsck --check /dev/md2, right? (assuming I reinstall the old disk2 in the disk2 slot). And then would you mind if I check with you for next steps?

Link to comment
13 minutes ago, riccume said:

Anything else I should do?

Just the wait, it will take several hours.

 

8 minutes ago, riccume said:

I assume I would start with the same command we used here, reiserfsck --check /dev/md2, right? (assuming I reinstall the old disk2 in the disk2 slot).

You can fist see if it mounts, but no harm in running --check, after that depends on the output of --check.

  • Like 1
Link to comment
2 hours ago, JorgeB said:

Just the wait, it will take several hours.

@JorgeB, I found another post that uses the command reiserfsck --rebuild-tree --scan-whole-partition /dev/mdX

 

and based on the typical run that you posted there, it looks to me that my command was killed. See the last line I got:

0%Killed                                             left 976631497, 28369 /sec

 

and then I got the cursor back:

root@Server:~#

 

On that post you say that there were issues with reiserfsck on releases before unRAID 6.3.3; could it be the issue here given that I am running 5.0.6? Time for upgrading before proceeding any further?

 

Also, I only have 1GB of RAM in this old system and the hard drive is 4TB. I am not sure I can do much about this though...

 

Thanks.

Edited by riccume
Link to comment
1 hour ago, JorgeB said:

Most likely the issue, you need about 1GB per filesystem TB.

Thanks @JorgeB. It would seem to me that we have gone as far as we can with this old rig. The good news:

- we have an old disk2 (the original one) that currently shows 'unformatted' but we might be able to fix with reiserfsck. Given that Raise Data Recovery was able to identify 3.6 TB of files on it (it's a 4TB drive), there is hope that all or nearly all of the original data will still be there once we run reiserfsck

- we have a rebuilt disk2 that we should be able to fix with reiserfsck. There might be some data loss due to the partially overwritten old parity disk - but hopefully it will be minimal (the data rebuilds that overwrote the parity disk were both stopped after one minute)

- Raise Data Recovery is currently extracting data from old disk2 to a separate HDD, so hopefully we will have a third copy of the original data

- last but not least, I had backed up irreplaceable personal data a couple of weeks ago so worst case scenario only replaceable media will be lost

 

As a plan forward, would the one below work:

- upgrade to v6 so that I can format the new drives in XFS

- forget about getting the old disk2 data back on the old rig; it seems to be at its last leg so the less I fuss with it the less risky. Instead, start the array in Maintenance mode (disk2 will show 'unformatted') and use the command rsync -avX /mnt/diskX/ /mnt/diskY to manually copy data from the other old data drives to the two new ones (12TB and 14TB) obviously triple-checking that I am copying from and to the correct drives

- move the new data drives to the new system as disk1 and disk2. Add the parity drive (14TB) and let it rebuild parity

- once all is up and running, try to recover the disk2 data by using reisefsck on old disk2. Assuming it works, copy the data to one of the new data drives (using rsync?). If it doesn't work, I can try with the data recovered by Raise Data Recovery or by running reisefsck on the rebuilt disk2

 

Thoughts? Thank you!

Link to comment
24 minutes ago, riccume said:

start the array in Maintenance mode (disk2 will show 'unformatted') and use the command rsync -avX /mnt/diskX/ /mnt/diskY to manually copy data from the other old data drives to the two new ones (12TB and 14TB) obviously triple-checking that I am copying from and to the correct drives

You need to start in normal mode, or the drives won't mount.

 

 

1GB is not much for v6, but just to copy the data should be OK.

  • Like 1
Link to comment
6 minutes ago, JorgeB said:

You need to start in normal mode, or the drives won't mount.

Thanks. When I start in normal mode, how can I make sure that the system doesn't start a full rebuild? I want to make sure there no additional changes to the old parity drive unchanged 'just in case'. Should I just take out the parity drive?

Link to comment
4 minutes ago, JorgeB said:

If rebuild completed before it won't start another, i.e., if all disks have a green ball.

Thanks. I forgot to mention that I am out of drive slots so in order to make space for the new 'destination' drive I need to remove one of the old drives (specifically disk2). So if I just start the array after doing so, the system will start a parity rebuild I think.

 

If this is correct, these are the steps for a new config here, right?

-stop array and take a note of all the current assignments

-Utils -> New Config -> Yes I want to do this -> Apply

-Back on the main page, assign all the disks as they were plus new drive in disk2, double check all assignments are correct

-Check both "parity is already valid" and start the array

-copy data from old drives to new drive using rsync

 

Am I getting it right? Thanks.

Link to comment
19 minutes ago, JorgeB said:

You can't remove/add new devices to the array without a new config, but if you're going change the array config then parity won't be valid, so best to just no assign one, do the new config normally, assign all the data disks you need, don't assign parity.

Thanks @JorgeB. Translating in fool-proof steps (where I am the fool!):

-stop array and take a note of all the current assignments

-replace the newly rebuilt disk2 with a fresh 12TB drive

-Utils -> New Config -> Yes I want to do this -> Apply

-Back on the main page, assign data disks 1 and 3-6 as they were, new 12TB to disk2, *do not* assign parity disk

-Start the array

-copy data from old drives to new drive using rsync

 

Am I getting it right? Sorry for being slow but I'm trying to minimise the risk of mistakes!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.