Jump to content

Help with disk problem - weird behavior


seanant

Recommended Posts

Posted

All,

 

I lost my power supply right around the San Diego black out couple months back.  Since then my unRAID has been down.  The last couple of weeks I've been testing and trouble shooting using the FAQ and reading the posts.  Well now I need to turn to the experts to ask for some assistance.  My syslog file is attached.  If you need anything from me please let me know.  I thank all of you ahead of time for reading my post and helping me recover from this.

 

The gist of the problem is disk #5 starts out blue with the array in stopped state with stop; initiate drive.  Then I click start and the parity disk and #5 both turn red.  I've done all the smartctl tests on the drives with no errors.  The drives on my 3ware 9600 are not replying to the smartmon queries but the parity disk does.  Here's an example of the reply:

root@Tower:/# smartctl -T permissive /dev/twa0 -d 3ware,1

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

WARNING - NO DEVICE FOUND ON 3WARE CONTROLLER (disk 1)

Note: /dev/sdX many need to be replaced with /dev/tweN or /dev/twaN

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

 

SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

root@Tower:/# smartctl -T verypermissive /dev/twa0 -d 3ware,1

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

WARNING - NO DEVICE FOUND ON 3WARE CONTROLLER (disk 1)

Note: /dev/sdX many need to be replaced with /dev/tweN or /dev/twaN

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

 

SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.

                  Checking for SMART support by trying SMART ENABLE command.

Error SMART Enable failed: Input/output error

                  SMART ENABLE failed - this establishes that this device lacks SMART functionality.

 

SMART Disabled. Use option -s with argument 'on' to enable it.

#######################################

 

I enabled and this:

 

SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.

                  Checking for SMART support by trying SMART ENABLE command.

Error SMART Enable failed: Input/output error

                  SMART ENABLE failed - this establishes that this device lacks SMART functionality.

#######################################

Then forced offline test :

 

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 152 minutes for test to complete.

Test will complete after Fri Dec  9 03:42:27 2011

#######################################

 

What are my options at this point?  Im happy with losing one drive and recovering the other 7 and move on.

 

R/

 

Chuck

syslog.txt

Posted

Disk 5 is definitely having problems, somewhat unusual ones.  I suspect it may have been electrically zapped, damaging or scrambling the firmware and DMA capabilities.  It has no problems establishing SATA link communications, or identifying itself, but seems unable to establish any data channels, obviously rather important for reading and writing data.  There is a remote chance that the problem is actually on the motherboard, so you might want to try this drive on a different SATA port, that you know works.  Otherwise, the drive needs to be replaced.

 

You have a third drive on the 3Ware controller, that appears unformatted, indicates 'unknown partition table'.  Unfortunately, it can NOT be used to replace your Disk 5, the Seagate 400GB drive, because it has slightly fewer sectors, and replacement requires a drive of equal or greater number of sectors.

 

I'm a bit surprised you got the 3Ware controller to work at all.  It is not at all surprising that it does not pass SMART queries!

Posted

Ok I use WINHEX to look at all the drives.  The two drives are not going to work, both are history. 

 

How should I proceed?  I dont have any drives to replace these bad drives with.  Do I have the chance to save my array?

Posted

You only lost what was on the failed disks.  It appears one was the parity?  If so, then you only lost the data on the one other drive.  Not too bad for such a bad event.  If it had been a RAID5 array, you'd have lost everything!

Posted

Ok I use WINHEX to look at all the drives.  The two drives are not going to work, both are history. 

 

How should I proceed?  I dont have any drives to replace these bad drives with.  Do I have the chance to save my array?

 

Your post leaves me with a number of questions...  WinHEX is a great tool, but I don't think it is the correct tool here, and I don't know what sort of tests you tried.  As far as I can see, you only have one bad drive - Disk 5.  The parity disk looks fine, and the third disk on the 3Ware controller just looks empty, unformatted, which is not a problem here as it was not assigned to anything.  You really really need another drive to replace Disk 5, but even without one, the array should start fine (although unprotected), and the files on Disk 5 should be retrievable.  I would definitely disconnect Disk 5 as soon as possible, to stop it spewing so many error messages and slowing the system down.  I'll have more comments in a bit ...

Posted

I checked the file system on all drives and tried to recover data on the Disk 5.  I use winhex  the data recovery.

 

Here's the problem with the drive that wasn't formated. I'm stumped and frustrated.  Need some direction.

 

Thank You

Posted

From your syslog, your Parity drive looked good.  You don't want to examine the Parity drive with WinHex or any file or partition recovery tool, because it does NOT have a file system on it.  It has an MBR with a valid partition table, with one entry, a single partition taking up almost all of the drive.  If you examine the contents of that partition, you will see nothing but garbage, because it is just binary parity info, and completely unreadable (by anything but UnRAID).

 

You mentioned that unformatted drive - were you expecting to see data on it?  The drive seems physically fine, although it would be good to see a SMART report for it, and the only way to get that is to connect it to a motherboard SATA port, instead of accessing it through the 3Ware card.  If you were expecting to see files on it, then that is a problem, but for me it had seemed an irrelevant drive because it is not assigned at all to your array, not even as a Cache drive.  With WinHex, did you try to write a valid MBR to it?  Then you could try to recover partitions on it, if they had previously existed.

 

By the way, is WinHex capable of reading Reiser file systems?

 

Again, unless there is something I am not understanding, I don't see that you have lost any data at all so far.  I still strongly recommend obtaining another drive and installing it in the place of Disk 5, and letting UnRAID rebuild the contents of Disk 5 onto it.  If that is not possible, here is a suggested course of action, based on my current and limited (possibly faulty) view of your system:

 

1. Shut down the system and remove Disk 5 (the physical disk Seagate ST3400832AS).

2. Power back up and start the array.  (Disk 5 will be simulated, will appear to be present, but access will be slower than normal.)

3. Determine what data is most critical on your array, especially any that is located on Disk 5, and copy it to a safe place.  There will be less risk to your array if you copy this data to an external machine, not elsewhere on the array.

4. Copy all remaining data from Disk 5 to safe locations.

5. Now remove the Disk 5 assignment from UnRAID - see How do I remove a hard disk that I do not plan on replacing?  (this will be a long procedure, completely rebuilding parity)

 

That should do it, leaving you with a smaller array of course, until you can add storage to it.  Hopefully others will correct any errors I may have made, or have better ideas...

Posted

Thanks again for your help with this problem.  I think we can mark as SOLVED.

 

I used winhex because of the advanced disk tools.  Yes Reiser and MBR search, manipulation, disk cloning and snapshot and a few more options.

 

Thank You again for your help.  You direction got me on track and now I have 2367 minutes until parity is sync is complete.

Posted

Everything is working great now with the exception of parity disk.  Now its red and errors in the log.  I've attached my latest syslog.  Thanks

 

Chuck

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...