Please Help - Disk showing Unformatted


Recommended Posts

Joe,

 

I ran the following commands you suggested and attached the results:

 

unraid_partition_disk.sh  /dev/sde

unraid_partition_disk.sh  -p /dev/sde

reiserfsck /dev/md1

reiserfsck /dev/sde1

 

The reiserfsck command said failure to open the device (not sure if that is the same as missing superblock), so I just stopped there before doing the rebuild command.  So if the output attached is what you expected then I will run the rebuild command next:

 

reiserfsck --rebuild-sb /dev/sde1

report.txt

Link to comment

Joe,

 

I ran the following commands you suggested and attached the results:

 

unraid_partition_disk.sh  /dev/sde

unraid_partition_disk.sh  -p /dev/sde

The above two commands did as expected.  The output of the "fdisk -l" command it performs now shows a full size single partition.

 

Old:

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes

1 heads, 63 sectors/track, 31008336 cylinders, total 1953525168 sectors

Units = sectors of 1 * 512 = 512 bytes

Disk identifier: 0x247daecd

 

  Device Boot      Start        End      Blocks  Id  System

/dev/sde1              63      65133      32535+  83  Linux

 

New:

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes

1 heads, 63 sectors/track, 31008336 cylinders, total 1953525168 sectors

Units = sectors of 1 * 512 = 512 bytes

Disk identifier: 0x00000000

 

  Device Boot      Start        End      Blocks  Id  System

/dev/sde1              63  1953525167  976762552+  83  Linux

 

 

reiserfsck /dev/md1

reiserfsck /dev/sde1

Neither of these were able to open the device.  Therefore, neither was able to do anything.  No sense going onward until we know we have the correct device name.  It is possible when I re-started the udev daemon in the command it re-assigned the devices...

 

Please reboot the server, and then attach a fresh syslog.  Then we'll re-try the reiserfsck md1 command again.

We need it to tell us the superblock needs correcting. 

 

So if the output attached is what you expected then I will run the rebuild command next:

reiserfsck --rebuild-sb /dev/sde1

It was not as expected, so don't run the rebuild command just yet, not until we can open the device with the basic reiserfsck command.  As I said, reboot, then attach a fresh syslog.

 

Joe L.

Link to comment

Here is the fresh syslog after rebooting.

 

I tried running reiserfsck /dev/md1 and it said:

 

Failed to open the device '/dev/md1': No such file or directory

 

So I ran reiserfsck /dev/sde1 and here is the output.

 

###########

Replaying journal..

Reiserfs journal '/dev/sde1' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 147642

        Internal nodes 938

        Directories 12869

        Other files 100565

        Data block pointers 142621763 (0 of them are zero)

        Safe links 0

###########

syslog3.txt

Link to comment

Your disk size changed (and that is a very good thing), therefore unRAID is complaining about it being the "wrong" disk, as it expected to see the same (tiny/invalid) disk size as was there when you re-initialized the configuration when you last pressed the "restore" button.  The last reiserfsck you sent via PM looked like there were no corruptions and the file-system was sound.

 

Now, I think you just need to go through the "trust-parity" process once more.   You may find parity errors, but you can ignore them.  Let the parity check run to completion.  In the interim, I think you'll be able to get to your files on the disk that was mangled by your MB BIOS.  

 

I'll keep my fingers crossed.

 

Joe L.  

Link to comment

I ran the "trust-parity" process and it worked.  Disk1 showed up with all my files and I was able to copy them to other media in case something like this would happen in the future.  Unfortunately before the parity check was able to complete Disk2 lost connection and stopped the parity check.  I though I fixed this issue by using a new sata cable and also by switching what sata port it was connected to on the MB, but evidently that was not the problem.  So it's likely either the power connection or the hard drive.  I attached a copy of the syslog and hopefully someone might be able to determine how it failed.  (Sorry I had to use a zip file because it was larger then the maximum attachment size allowed of 128 KB)

 

I did notice the temperatures seem higher then normal and I am wondering if that is because I have the sides of case off since I have been trying different cables and ports.  I would think the sides off would all more hot air to escape, but maybe is changes the air flow of the fans.  Either way I am going to add 2 more fans right in front of the hard drives to get those temperatures down.

 

I just wanted to give a huge public thanks to Joe.  You saved the day and I cannot thank you enough.  I never would have been able to recover those files without you.  You are an asset to this lime tech community and know you have helped out a lot of other people besides myself.  It's nice to know there are people out there like you who are willing to spend their time to help out others like myself.   

errror2.zip

screen.JPG.b60d70e4c17133f73335a5e3ddcad473.JPG

Link to comment

I've read that some disks stop working when their temperatures get above 50 degrees C.  I'd seriously put some fans on that box.   When disk2 failed, it just plain stopped responding.   It could be losing power, or it might be going into hard-disk-self-preservation mode (I made that term up)  Or, it could be that something in the box when heated to 50 degrees (or more) expands and looses contact.   50 C = 122 F ... ouch... too hot to touch for very long.

 

Glad you got the files off you needed.  If you are lucky, you fixed enough of the parity that was originally clobbered to be able to get the data off of the "simulated" disk2 that is currently failed.

 

This thread is the first where we were able to reset an HPA using the hdparm command, and the first where we had to fix a clobbered partition table.   I hope I never see another instance like this... but if I do, I'll remember the utility attached to this thread to fix the partition table.

 

Joe L.

Link to comment

Did you have a chance to look at your BIOS, and disable HPA?

 

Yes, I looked all over and in the advanced bios and could not find anything there about disabling HPA.  It is a slightly older board, so maybe that was before they starting having that option.

 

You mean, before they started having the option to disable HPA?

 

Fact is, something did mangle your disks with HPA. 

So, if you don't get to the bottom of it, chances are you'll run into the same problem again. 

Do some research and find if it would be possible to update your BIOS. 

 

And, get Gigabyte on the phone, and talk to them about their mother.

They need to hear all about the disaster they caused!

 

Purko

 

 

Link to comment
So, if you don't get to the bottom of it, chances are you'll run into the same problem again.  

 

I think what happened was the HPA's were created when I initially built the server.  I believe the Gigabyte MB I used then had the ability to enable/disable HPA.  It does not say it directly in the BIOS, but you can pick the onchip sata type mode (ide, raid, ahci).  Some people have mentioned in this forum that putting it in ahci modes stops it from creating HPA's.  By default my MB was set on IDE, which I am thinking now created the HPA's when I built the sever with 3 disks.  If I would have know then about HPA's then I would have put it in ahci mode from the start.  A few months ago I installed an older Gigabyte MB I had been using in my HTPC and also expanded to unraid server to 5 disks.  That might explain why the previous 3 drives had HPA's and my two drives did not.

 

 

You mean, before they started having the option to disable HPA?

 

Yes, I meant maybe this MB was built before they started having the option to disable HPA?  Before I started using the MB I updated it to the most recent BIOS revision so it is up to date.  I still am going to call Gigabyte and see what they have to say about HPA's and that MB.

 

Unfortunately for me all 4 of the computers in my house are Gigabyte MB with AMD processors.  Make for easy swapping of parts and I always had good luck with their MB for my other applications.  However, if I ever run into this HPA issue again then I will likely just buy a new MB that allows you to disable HPA.  Some of the ones recommended on this forum that people liked are only $50 and that is well worth stopping the headaches it can create.

Link to comment

The HPA on disk1 was added while connected to the current motherboard.  It added it.  It is not something I've ever heard about a disk doing on its own.   I'd suspect your current BIOS, regardless of if you think your other MB added the HPA on the other disks.

 

No matter what, no BIOS should ever make a 1TB disk look like a 34Meg disk.    After the huge HPA was added, apparently, when unRAID saw the change in size, it re-sized the partition to the size of the HPA.  That, to me might be a logic flaw in how unRAID handles a re-size of the disks when you press "restore" on a new (to it) disk.  It did the best it could, as the partition could not be bigger with the HPA occupying the balance of the disk.

 

I don't like ANY BIOS writing to ANY disk.  It is not just that they create an HPA, they also write to the disk in other ways, even in "legacy" mode.  

Link to comment

....if I ever run into this HPA issue again then I will likely just buy a new MB

 

By keeping your server with the same motherboard/BIOS version... that "if" above is rather a "when".

 

And don't doubt it, when it does it again, you'll probably have some irreplaceable files on the disks.

 

So don't let a low-level tech rep brush you off.  Insist that they give you a solution to disable the HPA.

 

 

Link to comment

....if I ever run into this HPA issue again then I will likely just buy a new MB

 

By keeping your server with the same motherboard/BIOS version... that "if" above is rather a "when".

 

And don't doubt it, when it does it again, you'll probably have some irreplaceable files on the disks.

 

So don't let a low-level tech rep brush you off.  Insist that they give you a solution to disable the HPA.

 

 

I agree, pursue getting a BIOS update.  Also, to help in preventing corruption in a power failure, consider getting a small UPS.  Even a small one will power the server through a short power blip.  I use a APC brand "Back-UPS ES 750" with a power rating of about 450 watts.  it gives me about an 8 minute run-time in a power failure.

 

Joe L.

Link to comment

A UPS is a great thing to have!  I got one after 2 power blips at my apartment, one of which caused a little trouble like yours did.  I run my system off of a Back-UPS ES 650 and have it configured to shutdown the server almost immediately.  My motherboard BIOS is set to bring the computer back online once the power is restored so that everything is more "seamless."

 

The UPS has saved me twice this week in which I have power blips of about 3 seconds each.  It was enough to restart a few things in my house, but the server stayed up the whole time and was just fine.

Link to comment

Just thought I would post an update to bring closure to this thread.  Thanks to Joe, I was able to recover all my files on disk1 and run a full parity check, so now my data is restored and also backed up.

 

The cause of disk2 kicking out was a intermittent power connection to the drive.  At first I thought it was the data cable, but after replacing it and trying a new sata port without success I figured it had to be either the drive or power cable.  So as I was looking around my PC too see how many power connectors I had for some extra fans I accidentally bumped the disk2 power connector and immediately after I heard the disk turn on and start to spin.  When I held down on the connector the disk kept spinning.  Then when I let got it stopped.  Now, whether or not it was the cable or connector on the hard drive I do not know.  I could have plugged a different molex cable into the hard drive and that would have proved which was bad, but I was sick of the drive kicking out and ruining my parity check.  So since this drive allows both standard 4 Pin molex and sata cables for power, I just used one of the sata power connectors instead of the molex one.  So far it's been running for almost a week without any hiccups, so it's pretty save to say the power connection was the problem.

 

While disk2 was not the drive I almost lost all my data on, it did cause me to keep restarting my server every time it kicked out since unraid would put a red dot next to it.  This constant rebooting somehow caused the BIOS to make disk1 look like a 34Meg disk by creating the HPA.  Luckily Joe was able to help me fix this issue with all the steps mentioned in the previous posts.

 

I also added two 120mm fans in the front of my antec 300 case to blow air on the hard drives since heat might have been the cause for the faulty power connection.  When only one or two drives were spinning the temperatures were not too hot, but when they all were spinning for a parity check the temperatures were in the high 40's and sometimes even would crack 50 degrees.  So these high temperatures might have caused the connection or cable to loosen up or fail.  After adding the two fans my temperatures dropped drastically and now they are between 31 and 32 degrees.  That is a drop of nearly 20 degrees and I was shocked.  The fans are also running on low speed and if I kick them up to high the temperatures drop down to 30 degrees.  But I decided to keep them at low speeds since the temperature gain is not significant compared to how much quieter it sounds running low.

 

I still need to contact Gigabyte and see if somehow I can remove the HPA function on this MB.  I just have been busy the past week focusing on fixing the intermittent power problem and also trying to bring the temperatures down so I could run a parity check to completion.

 

While I hope this problem never happens to anyone else, if it does at least the steps are well documented and this thread can be used to help someone out.  As Joe mentioned earlier, this thread is the first where he was able to reset an HPA using the hdparm command and also the first where he fixed a clobbered partition table with his utility.

 

Thanks to everyone for their help and support.

Link to comment
  • 4 months later...

I'm glad this thread is still around.

My love for solid-cap Gigabyte boards has lead me here. All my Seagate drives were "infected" with the HPA issue.(for some reason Hitachi and Samsung were immune)

 

Thanks for this thread and thanks to Joe L. for his unraid_partition_disk.sh file. It was the only thing that could re-partition the fixed drives to Unraid standards after I had disabled HPA using HDAT2.

And not sacrificing my TBs of data was a true blessing!

 

Link to comment
  • 3 weeks later...

I just had the same problem with my server, and a gigabyte mobo that didn't have an option to turn of backup to disk, so if anyone have the same problem i have the solution, go to gigabytes site and ask in their support for a specially built bios with that option.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.