HELP: Can't rebuild array to disk, only see part of my data


Recommended Posts

I seem to have lost control of my unraid (v4.6rc5) array.  After replacing a power supply (due to faulty connectors), unRAID read 3 of the 4 1.5TB disks in my array just fine.  The fourth it gave me all sorts of issues with, and with reiserfsck, I wasn't able to get past block 2 (like others in other threads I read).  All disks seem to check out fine with smartctl.

 

I had an extra 2TB disk that I put it its place, and got unRAID to copy the parity disk to the newer larger disk, but after that it was unable to format it to expand the array to it.  Right now, all I see through my SMB share is a partial listing of the data I had on there.  Syslog is here:

Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 0
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 1
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 2
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 3
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 4
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 5
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 6
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 7
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 8
Jan 30 23:04:43 Tower kernel: Buffer I/O error on device md3, logical block 9
Jan 30 23:04:43 Tower emhttp: shcmd (215): set -o pipefail ; mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md3 /mnt/disk3 2>&1 | logger
Jan 30 23:04:43 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md3,
Jan 30 23:04:43 Tower logger:        missing codepage or helper program, or other error
Jan 30 23:04:43 Tower logger:        In some cases useful info is found in syslog - try
Jan 30 23:04:43 Tower logger:        dmesg | tail  or so
Jan 30 23:04:43 Tower logger: 
Jan 30 23:04:43 Tower emhttp: _shcmd: shcmd (215): exit status: 32
Jan 30 23:04:43 Tower emhttp: disk3 mount error: 32
Jan 30 23:04:43 Tower emhttp: shcmd (216): rmdir /mnt/disk3
Jan 30 23:04:43 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 2, size 4096)
Jan 30 23:04:43 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 16, size 4096)
Jan 30 23:04:43 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3
Jan 30 23:04:44 Tower emhttp: shcmd (217): rm /etc/samba/smb-shares.conf >/dev/null 2>&1
Jan 30 23:04:44 Tower emhttp: shcmd (218): cp /etc/exports- /etc/exports
Jan 30 23:04:44 Tower emhttp: get_config_idx: fopen /boot/config/shares/torrents.cfg: No such file or directory - assigning defaults
Jan 30 23:04:44 Tower emhttp: shcmd (219): killall -HUP smbd
Jan 30 23:04:44 Tower emhttp: shcmd (220): /etc/rc.d/rc.nfsd restart | logger
Jan 30 23:06:30 Tower kernel:  sde: sde1

 

Right now, I'm running the preclear script on another that 1.5TB drive that I'm going to try and rebuild the array with, but I'm at a loss as to what to do if this doesn't work, or even why it didn't work in the first place.  I though unRAID was supposed to seamlessly take care of disk replacements for me, but this isn't turning out to be the case.  Does anyone know what's going on here?

 

 

I'm also concerned because when I bring up my disk array without disk3 configured, just running off parity, it tells me that disk3 is actually unformatted.  Why would this be?

 

I think I'm slowly losing my data, and I'm pulling my hair out trying to figure out why, as no disk as failed so unRAID shouldn't have gotten into this state to begin with.

Screen_shot_2011-01-30_at_11_20.22_PM.png.83d90bd33e4410d1b7c8c589cd27dc5f.png

Link to comment

Before you do anything more, post a syslog,  before you reboot.

 

Joe L.

 

Attached.  I've rebooted a couple times since the initial incident due to troubleshooting steps.  The system is currently running "reiserfsck --scan-whole-partition --rebuild-tree /dev/md1"

 

/dev/md3 is my trouble disk though, and with disk1, disk2, and parity up, I can't see anything that would've been on disk3.  When I try to check or mount disk3, I just get the "Cannot read the block (2): (Input/output error)" message.

syslog.zip

Link to comment

I just tried to add a pre-cleared disk and rebuild the array, and it still doesn't let me format it.  Can anyone help?

Disks that are being re-constructed via parity and the remaining data disks are NEVER formatted.  The formatting from the original disk (as simulated from parity and the other data disks) is re-created on the replacement when it is rebuilt.

 

Joe L.

Link to comment

Is this  - SAMSUNG HD154UI  S1XWJ1KS924407 - the original disk3?

 

I am not sure what steps you are taking but it seems like you are assigning and removing disks randomly here. The latest syslog says the parity is not the correct disk and there are 4 1.5T Samsung drives yet earlier on the post you made shows a 2T Samsung drive in there.

 

Maybe you could explain all the different steps you have tried to take instead of making us guess here.

 

The fact the array is running with a bad parity drive and a missing data drive leads me to beleive you did something very odd. Did you use the initconfig command?

 

Peter

Link to comment

Disk 3 (number 687, 1.5TB) started showing as unformatted.  I went and replaced it with a 2TB disk, and after stopping and starting the array several times, unRAID finally gave me the option to move the parity to the larger disk, which it said it completed.  When it finished, disk 3 (now the old parity disk) still showed up as unformatted, and my parity disk showed up as orange.  Since then, I've been trying to get the array to rebuild disk 3 unsuccessfully.  It seems like /dev/md3 in the array became unformatted somehow, and that the array doesn't think there is a proper disk 3 to rebuild anymore.  I'm so frustrated with this as I've already spent 2 days straight trying to correct it.

 

I didn't run the initconfig command, nor did I hit the restore button if/when it ever came up.

Link to comment

Disk 3 (number 687, 1.5TB) started showing as unformatted.  I went and replaced it with a 2TB disk, and after stopping and starting the array several times, unRAID finally gave me the option to move the parity to the larger disk, which it said it completed.  When it finished, disk 3 (now the old parity disk) still showed up as unformatted, and my parity disk showed up as orange.  Since then, I've been trying to get the array to rebuild disk 3 unsuccessfully.  It seems like /dev/md3 in the array became unformatted somehow, and that the array doesn't think there is a proper disk 3 to rebuild anymore.  I'm so frustrated with this as I've already spent 2 days straight trying to correct it.

 

I didn't run the initconfig command, nor did I hit the restore button if/when it ever came up.

By re-starting the array as you did, you might have lost the ability to recover from parity. It looks like disk3 died while you were trying to re-construct parity.  At that point you had two bad drives...

 

Have you tried to run a file system check on the old disk3?  You might find it can be read.

 

Joe L.

Link to comment

Did you ever go to the devices page and set the old parity to the disk3 slot?

 

Right now, I'm thinking you have a hardware problem with the port you are trying to use. You are saying that 2 different disks both show up as unformatted and just plain won't work right in that slot.

 

Your explanation still doesn't explain how you ended up with 4 1.5T drives again. You replaced the disk3 with a larger 2T drive. Then, unRAID attempted to copy the parity to the new 2T disk as part of the swap-disable processm. It would copy the parity to the larger disk and use the old parity disk as the replacement. You should have done these steps;

 

1. Stop the array.

2. Power down the unit.

3. Replace the parity hard disk with a new bigger one.

4. Replace the failed hard disk with you old parity disk.

5. Power up the unit.

6. Start the array.

 

When you start the array, the system will first copy the parity information to the new parity disk, and then reconstruct the contents of the failed disk.

 

In the above, I believe you could just stick the new disk in place of the failed one and use the devices page to assign them correctly instead of physically moving both around. Still, this is what you tried to achieve and according to your last syslog there is a 1.5T in the parity slot.

 

I would put the 2 in the parity slot and the old parity back into the disk3 slot. Possibly try a new port for the disk3. See if unRAID will allow you to start the array and rebuild the disk3.

 

If not, then try to access the old disk 3. See if you can connect and do any tests etc on it. If so, try to run reiserfsck on it. You might recover it that way. If not, it may be still be possible to copy that data to another disk (doing it at a lower level sector by sector) and then trying to recover the data on the copied disk.

 

If still not, then there's one other thing we could try.

 

Peter

 

Link to comment

Did you ever go to the devices page and set the old parity to the disk3 slot?

 

Right now, I'm thinking you have a hardware problem with the port you are trying to use. You are saying that 2 different disks both show up as unformatted and just plain won't work right in that slot.

 

Your explanation still doesn't explain how you ended up with 4 1.5T drives again. You replaced the disk3 with a larger 2T drive. Then, unRAID attempted to copy the parity to the new 2T disk as part of the swap-disable processm. It would copy the parity to the larger disk and use the old parity disk as the replacement. You should have done these steps;

 

1. Stop the array.

2. Power down the unit.

3. Replace the parity hard disk with a new bigger one.

4. Replace the failed hard disk with you old parity disk.

5. Power up the unit.

6. Start the array.

 

When you start the array, the system will first copy the parity information to the new parity disk, and then reconstruct the contents of the failed disk.

 

In the above, I believe you could just stick the new disk in place of the failed one and use the devices page to assign them correctly instead of physically moving both around. Still, this is what you tried to achieve and according to your last syslog there is a 1.5T in the parity slot.

 

I would put the 2 in the parity slot and the old parity back into the disk3 slot. Possibly try a new port for the disk3. See if unRAID will allow you to start the array and rebuild the disk3.

 

If not, then try to access the old disk 3. See if you can connect and do any tests etc on it. If so, try to run reiserfsck on it. You might recover it that way. If not, it may be still be possible to copy that data to another disk (doing it at a lower level sector by sector) and then trying to recover the data on the copied disk.

 

If still not, then there's one other thing we could try.

 

Peter

 

 

Apologies for the poor description earlier.  I've had very little sleep in the past couple days, and mixed in with work, I'm just trying to bring my NAS back up.

 

This is essentially what I did, and it copied the parity to the 2TB drive, however putting the 1.5TB (old parity) drive in the spot of disk 3 doesn't work.  After I start the array and check the box to rebuild it, it says it's unformatted.  When I try to format it, it shows the error you see in the syslog.

 

I'm copying what data is on the other two drives off right now so that I have the option of tearing the whole thing down and starting fresh.

 

What's the one other thing you're thinking about trying?

Link to comment

You have posted that you aquired another 1.5T drive so this is what you can try. There are no guarantees this will work but it might get back most of your data. Follow these steps;

 

Upgrade to unRAID 4.7.

Put the original parity back in place and leave the disk3 slot empty.

Assign the new drive to the cache slot and start the array and format the disk. Do not let a parity check run here. if it doesn't want to start then skip this step.

Stop the array and assign the new drive to the disk3 slot.

Log-in and run initconfig, answer Yes.

Use the trust my parity procedure except use 3 instead of 99 for the invalid slot - "mdcmd set invalidslot 3". You should get a response "cmdOper=set" and "cmdResult=ok".

Refresh the web interface. It should show the array is ready to start so press start.

Refresh the main unRAID page. You should see disk3 having writes and all others with reads.

Run reiserfsck on the disk3 and see if it can fix it up and what data you get back.

 

http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

 

Peter

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.