Red balled drive, replaced with new 2TB pre-cleared drive show unformatted


Recommended Posts

So I had a WD 2TB drive redball on me. I went out and bought a new 2TB WD EARX. I pre-cleared it in another computer. Pre-clear passed.

 

I pulled the bad drive out of the norco 5-in-3 and put the new one in its place. (The array was stopped at this point).

 

I assigned the new drive to disk3 (which replaces the bad disk3). It started the data rebuild which took 7hrs or so overnight. When I checked it this morning my drive still shows as unformatted, but its green.

 

What do I do next? I really don't want to lose that data.

 

SPECS:

 

UNRAID B14 Pro

ASUS RAMPAGE III EXTREME

8GB ddr3 1600mhz

Supermicro SASLP PCI-e 4x card

2tb parity seagate green

2tbx3 storage

1.5tbx2 storage

Corsair tx750

 

*normally this setup is run in ESXi, but since that drive failed I have been running unraid directly from the flash drive*

syslog-10-4-12.txt

unraid-unformatted.jpg.7bf8ed7b21b592bca2dc2579bb1b30ff.jpg

Link to comment

Unfortunately, unraid presently doesn't really differentiate on the status screen between genuinely unformatted and just unmounted. It's possible that if you just do a clean shutdown and restart the array, it might come up ok.

 

First make sure any add-ons are disabled, capture the syslog now, do a clean shutdown, start the machine, wait for it to comepletely finish starting, capture another syslog, and post both syslogs here.

Link to comment

Okay so the before reboot Syslog --- http://pastebin.com/5hHdUnj4  I had to use pastebin as it was much too large to attach here.

 

The after syslog is attached.

 

It looks like it is getting error 32 on disk3.

 

 

-----

 

SNIP

 

Oct  4 09:48:11 Vault kernel: REISERFS (device sdf1): Using r5 hash to sort names

Oct  4 09:48:11 Vault kernel: REISERFS (device md5): Using r5 hash to sort names

Oct  4 09:48:11 Vault emhttp: shcmd (15264): chmod 770 '/mnt/cache'

Oct  4 09:48:11 Vault emhttp: shcmd (15265): chown nobody:users '/mnt/cache'

Oct  4 09:48:11 Vault logger: mount: wrong fs type, bad option, bad superblock on /dev/md3,

Oct  4 09:48:11 Vault logger:        missing codepage or helper program, or other error

Oct  4 09:48:11 Vault logger:        In some cases useful info is found in syslog - try

Oct  4 09:48:11 Vault logger:        dmesg | tail  or so

Oct  4 09:48:11 Vault logger:

Oct  4 09:48:11 Vault emhttp: _shcmd: shcmd (15262): exit status: 32

Oct  4 09:48:11 Vault emhttp: disk3 mount error: 32

Oct  4 09:48:11 Vault emhttp: shcmd (15266): rmdir /mnt/disk3

Oct  4 09:48:11 Vault emhttp: shcmd (15267): chmod 770 '/mnt/disk4'

Oct  4 09:48:11 Vault emhttp: shcmd (15268): chmod 770 '/mnt/disk1'

Oct  4 09:48:11 Vault emhttp: shcmd (15269): chmod 770 '/mnt/disk2'

Oct  4 09:48:11 Vault emhttp: shcmd (15270): chmod 770 '/mnt/disk5'

Oct  4 09:48:11 Vault emhttp: shcmd (15271): chown nobody:users '/mnt/disk4'

Oct  4 09:48:11 Vault emhttp: shcmd (15272): chown nobody:users '/mnt/disk1'

Oct  4 09:48:11 Vault emhttp: shcmd (15273): chown nobody:users '/mnt/disk2'

 

SNIP

syslog-2012-10-04-AFTER.txt

Link to comment

That just indicates the disk could not be mounted.

 

You need to first verify the file-system is valid on the partition you are attempting to mount.

reiserfsck --check /dev/md3

 

If it complains about a missing superblock, you need to see where /dev/md3 is pointing on the disk.

(It might be pointing to the wrong starting sector)  edit: According to your syslog segment, it will complain about a missing superblock.

 

If it says the file-system needs repair with a rebuild-tree, then odds are it just needs to be repaired.

 

You can look at this thread for what to look for:

http://lime-technology.com/forum/index.php?topic=15385.0

more examples here:

http://lime-technology.com/forum/index.php?topic=18550.msg165748;topicseen#msg165748

 

You need to use

fdisk -lu /dev/sdX

to determine where the partition currently is pointing to...

and the "dd" command as described in the other thread to see where the file system actually resides.

Obviously, you need to use the correct /dev/sdX 

(If using reiserfsck --check on the raw filesystem on the device /dev/sdX1    <- note the trailing 1 indicating the first partition)

 

Link to comment

Okay so I ran "reiserfsck --check /dev/md3"

 

The results seemed to show no corruption. I've attached a screenshot of the completed results.

 

(during the scan towards the end it was listing the filenames for the files on drive so hopefully that means my data is still there)

 

 

I also ran a "fdisk -lu /dev/sde"

 

where sde == md3

 

I have also attached a screenshot of that. I'm not sure how to proceed from here. I just don't want to lose data.

 

 

EDIT I have also run the  dd if=/dev/sde count=195 | od -c -A d |  sed  30q

 

Attached is the screenshot for that now too.

 

From what I can see there may be a variance from 63 vs 64 sectors for the starting point. I'm not sure how to proceed and really don't want to lose data.

reiserfsck-check.png.d0d538cab028383221216662de89fe82.png

fdisk-lu.png.f2d69cb28aab20427eae7678e754e331.png

dd_if_dev_sde.png.59cf00c3fd3fc4056080ec605c9eb2ba.png

Link to comment

Okay so I ran "reiserfsck --check /dev/md3"

 

The results seemed to show no corruption. I've attached a screenshot of the completed results.

 

(during the scan towards the end it was listing the filenames for the files on drive so hopefully that means my data is still there)

 

 

I also ran a "fdisk -lu /dev/sde"

 

where sde == md3

 

I have also attached a screenshot of that. I'm not sure how to proceed from here. I just don't want to lose data.

 

 

EDIT I have also run the  dd if=/dev/sde count=195 | od -c -A d |  sed  30q

 

Attached is the screenshot for that now too.

 

From what I can see there may be a variance from 63 vs 64 sectors for the starting point. I'm not sure how to proceed and really don't want to lose data.

Everything looks fine.  Your disk was partitioned 4k-aligned, with the partition starting on sector 64.  (512 bytes further on then if partitioned for sector 63)

 

Now, you can try to simply reboot.  Or, you can try

mkdir /mnt/disk3

mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md3 /mnt/disk3

 

and then browse around /mnt/disk3 with "mc"

 

Link to comment

I rebooted and it still shows that the drive is not formatted. I tried running the command and got the result of the attached screenshot.

 

Do I have to do something to the MBR?

No, the MBR looked fine.  I'm a bit confused why the disk did not mount, since it looks like it passed a reiserfsck check just fine.

 

Are you running any plugins?  Something that might be attempting to open a file on disk3 perhaps?  (if running any plugins, please disable them for the time being)

 

Please post the newest syslog.  Tom from lime-tech should be back from his vacation by now, and perhaps he'll offer some guidance. 

 

Joe L.

Link to comment

try

mkdir /mnt/disk3

mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime,hash=r5  /dev/md3 /mnt/disk3

 

Then see if it mounts.  If it does, then browse around /mnt/disk3 with "mc"

 

Note: I added ",hash=r5" to the end of the "-o" option to force the use of a specific directory hash function.

 

I found the clue at this site:

http://tnt.aufbix.org/linux/reiserfs

 

Don't go messing with a hex editor just yet.  I don't think that will be needed, besides, don't want to mess up.  According to your prior "dd" dump of the superblock, the hash function byte is set to zero.  It needs to be set to "3"(r5 hash = 3)

 

I don't yet know the best way to set it correctly, but that will be the next step if the file system can be mounted.

 

Edit: I found that adding hash=r5 did not work for me...  The drive would still not mount.  See below for what did work...

Link to comment

Now I am about to freak out. I went to login to my array this morning and now my MD4 is red balled and my syslog is like 2GB+ of data talking about how it can't read from disk.

 

I rebooted hoping it was a fluke, now MD3 has a green ball which I doubt is correct and my MD4 says "The replacement disk must be as big or bigger than the original"

 

I can't even start the array in maint mode or anything.

 

EDIT: an unmenu shows it as a new disk.

Link to comment

Do not panic.

 

Fist, I did some tests here...  on a spare disk.

 

I formatted it with a reiserfs file system, then un-set the hash function byte in the superblock.  I could then not mount it. (just like yours)

 

I then ran a reiserfsck --check.  It passed just fine.  (the disk is completely empty, so I have nothing to lose. It is also only 8Gig, so no operation takes a long time)

 

I tried a

reiserfsck --rebuild-sb /dev/XXX

It too found nothing wrong, and just exited.  Before it exited, one of the lines in its output was:

Hash function used to sort names: not set

 

That is the problem.  (remember, I forced the byte to be zero, just like yours. zero=unset)

 

I then ran

reiserfsck --rebuild-tree /dev/XXX

and it repaired the hash function byte.!!!!!  (it set it back to a "3" as it should have been)

 

I could then mount the drive once more.

 

so... please run

reiserfsck --rebuild-tree /dev/md3

 

Respond with "Yes"  (capital "Y", lower case "es")

then reboot once more.

 

Joe L.

 

Link to comment

With md4's apparent failure  this morning and it not letting me start the array even in maintenance mode I can't run the command:

 

reiserfsck --rebuild-tree /dev/md3

 

 

EDIT: could I run: "reiserfsck --rebuild-tree /dev/sde1" instead?

 

 

 

Thank you so much for your help with this, I really appreciate it.

Link to comment

Now I am about to freak out. I went to login to my array this morning and now my MD4 is red balled and my syslog is like 2GB+ of data talking about how it can't read from disk.

 

I rebooted hoping it was a fluke, now MD3 has a green ball which I doubt is correct and my MD4 says "The replacement disk must be as big or bigger than the original"

 

I can't even start the array in maint mode or anything.

 

EDIT: an unmenu shows it as a new disk.

I know you cannot attach the new syslog in its entirety, but perhaps you can try

sed 10000q  </var/log/syslog >/boot/syslog10000.txt

 

It will put the first 10000 lines in the file syslog10000.txt.

 

Link to comment

could I run: "reiserfsck --rebuild-tree /dev/sde1" instead?

Yes, you could, but since it would possibly make parity bad for that byte (and possibly others), try this instead: (easier to un-do if needed)

 

First, read the existing byte.  I'm pretty sure it was zero on your previous screen shot

dd if=/dev/sde count=1 bs=1 skip=98368 | od -c -A d |  sed  30q

Important, this is for a MBR partitioned device with the partition starting on sector 64.  The address for a partition starting on sector 63 is different.  Your disk partition was confirmed as starting on sector 64 previously.

 

The output will look like this:

0000000 000

0000001

1+0 records in

1+0 records out

1 byte (1 B) copied, 0.000492822 s, 2.0 kB/s

 

Note the value in RED is zero.

 

Now, we can set that same byte to a "3" by running this command. 

BE ABSOLUTELY CERTAIN YOU TYPE THIS ACCURATELY AND CORRECTLY. 

It will write one byte at a specific address.

echo -ne "\0003" | dd bs=1 count=1 seek=98368 of=/dev/sde

 

repeating the command from above should verify it is now set to a "3"

dd if=/dev/sde count=1 bs=1 skip=98368 | od -c -A d |  sed  30q

The output will look like this:

0000000 003

0000001

1+0 records in

1+0 records out

1 byte (1 B) copied, 0.000492822 s, 2.0 kB/s

 

 

You should then be able to mount the disk. (assuming everything else is right)

 

Joe L.

Link to comment

Just to note, you should be able to replace disk4 and rebuild it just fine. disk3 does not need to be mounted to rebuild the other disk. unRAID will rebuild using the raw data from each partition on the disks. Since the partition does exist does exist on disk3 it can be used to rebuild disk4.

 

You may want to hold-out on messing with disk3 any further until you rebuild disk4.

 

In the bigger picture, something may be causing all these disk issues, such as a marginal power supply in your server.

 

What size does it think disk4 should be? Got a screen shot or something.

 

Link to comment

I thought it couldn't rebuild parity on 1 drive if another was down?

 

If disk3 is bad how can it rebuild disk4?

 

Disk 4 is a 1.5TB and it shows 1.5TB so idk what is wrong. I am at work currently. I will get some screenshots in a few hours.

 

Can I plug any of the drives into a USB enclosure and mount them read only in windows or mac to copy the data off? I really don't want to lose any of this.

 

I have a corsair TX750 PSU which is total overkill for this setup. It's attached to a APC battery backup. 450 I think, I'll double check when I am home.

 

 

Link to comment

I thought it couldn't rebuild parity on 1 drive if another was down?

Correct... but we are not really sure at this time what is really "down" and what is functional.  I know for sure disk3 has one byte that should define the hashing method used is not set correctly.  Your prior  "dd" command proved that.  (I have no idea how it got set to zero, but that is a different issue)

If disk3 is bad how can it rebuild disk4?

Because disk3 is not really bad, just not able to be mounted because that one byte is not set as it is supposed to be set.  There is a very big difference.

Disk 4 is a 1.5TB and it shows 1.5TB so idk what is wrong.

Neither do we...  not yet anyways...  The screen-shot and leading part of the syslog should help us to figure it out.
I am at work currently. I will get some screenshots in a few hours.
Later this evening I'll be at a dance with my wife, but I'll check for your posts when I get home afterwards.  (got to have a social life, you know...  ;))

Can I plug any of the drives into a USB enclosure and mount them read only in windows or mac to copy the data off? I really don't want to lose any of this.

You could, there are reiserfs drivers you can install on windows...  Windows will not know how to read it natively.

I have a corsair TX750 PSU which is total overkill for this setup. It's attached to a APC battery backup. 450 I think, I'll double check when I am home.

The power supply sounds fine. 

 

Joe L.

Link to comment

My findings thus far:

 

1.) YAReG - is a great free tool to read and copy from reiserFS in windows.

 

2.) I have started the copy process to copy the 1TB of data that I had on MD3 (2TB WD Green drive) off of it onto some other 500GB's spares I had laying around.

The 2TB drive does not seem to copy slow or make grinding noises I am wondering if the redball is a fluke. Once the data is safe and secure I will run the WD lifeguard tools extended test on it.

 

3.) I have also started the copy process to copy the 483GB of data I had on MD4 (1.5TB WD Green drive) off onto some other 500GB spares I had laying around.

The 1.5TB drive does seem to copy slow and is making very loud grinding noises esp on spinup. Once the data is safe and secure I will run the WD lifeguard tools extended test on it. It is under warranty until Nov 2012 :)

 

4.) I think I am going to give unraid its own box from now on and take it out of my ESXi config just for the sake of simplicity of troubleshooting in the future. I have an "old" AMD tri-core phenom, and asrock 785 mobo, with 4 or maybe 8GB ddr2 I could drop into it.

 

I'd be willing to try and troubleshooting you'd like after I get my data secured Joe. And again I really do appreciate your time and effort on this. I know mods on forums don't get the thanks they deserve at times. (first hand experience). So again thank you.

 

Link to comment

I just wanted to point out that kwiksilver is on Beta 14, and running unmenu.  I know you don't want to be doing an upgrade when solving an issue, but shouldn't he be on RC8?  Could these problems be caused because he is on an older beta and not on the RC?

unMENU itself, to my knowledge, is not an issue unless you are installing add-on packages that replace shared library files or running out of memory.    To simplify the upgrade problems, it is strongly recommended to disable them...   

 

Beta14 does have issues with some drive controllers...  It could be an issue to deal with.  An upgrade is probably in order, once we get the data safe.  5.0-rc8a is the latest version, and it apparently also needs the newer SAMBA version added as described in the release thread. 

 

The "red ball" issues might be fixed once the spin-up issues are resolved.  (if it indeed  was a time-out from a spin-up that caused it)

 

Joe L.

Link to comment

All data has been backed up onto spare drives. (which hopefully hold out until I can copy them back to the array at some point  :P )

 

Currently I am running WD Data Lifeguard tools Extended Test on the 2TB (MD3) and the 1.5TB (MD4) on two separate computers.

If there is a failure on the 2TB I'd be surprised. If there is a failure on the 1.5TB I'd less surprised.

 

I am hoping the WD tools give me come kind of detailed SMART report after the test.

 

If both drives pass as the data is all backed up what would be the best way to proceed? Just pre-clear them and re-add them as new drives to the array?

Then just copy the backed up data over to the array via gigabit?

 

RE: Upgrade to RC8A

The best way to disable all addons would be to replace my "go" file with a vanilla one and make sure nothing is in my "extra" folder right?

 

 

Link to comment

All data has been backed up onto spare drives. (which hopefully hold out until I can copy them back to the array at some point  :P )

Good.

Currently I am running WD Data Lifeguard tools Extended Test on the 2TB (MD3) and the 1.5TB (MD4) on two separate computers.

If there is a failure on the 2TB I'd be surprised. If there is a failure on the 1.5TB I'd less surprised.

Obviously, they are tools you are comfortable with.  I just hope they are not writing to your disks.

I am hoping the WD tools give me come kind of detailed SMART report after the test.

It will be no different than the ones in unRAID, but again, if it makes you happy, fine.

If both drives pass as the data is all backed up what would be the best way to proceed? Just pre-clear them and re-add them as new drives to the array?

I would have preferred you fix the one drive and then proceed with fixing the second.  Obviously, you are more comfortable with just re-doing everything.  Your choice, but I feel like I am losing a chance to learn how to help others in your situation...  I would at least like you to try to change the one byte I researched and duplicated here... It might save you a lot of aggravation and save you a lot of time re-loading everything...  I'm pretty sure the disk will mount once you fix that one byte.

 

Just be careful when you put the disks back in the unRAID server in case the device names change if you install the disks on different ports.  It will not matter, but you need to run the command on the correct disk.  Now, it will not do anything to fix the drive making funny noises... that sounds like a mechanical issue...

Then just copy the backed up data over to the array via gigabit?

That will work.

RE: Upgrade to RC8A

The best way to disable all addons would be to replace my "go" file with a vanilla one and make sure nothing is in my "extra" folder right?

Yes.  or in the plugins folder (if you have one)

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.