Solved: Array reports unformatted drive

May 2, 201313 yr

Please read this thread http://lime-technology.com/forum/index.php?topic=27247.0

Situation now gone from bad to worse.

I had what appeared to be a failing 2Tb WD drive. As it only had a few Gbs on it I copied the data from the drive.

I then made a mistake by taking the drive out of the array (following the instructions in the WIKI) and starting a parity check as there was one less drive now. I then left it running whilst I went to work. Half way through the day I got an email from the tower warning of a over hot hdd (47 degrees C), which seemed to heat up the adjacent drives.

When I got home the 3rd drive, one of those that got hot, was now showing errors and taking ages to mount. I tried to copy the data of to no avail. The drive now shows as unformatted.

As a parity check was in progress at the time I now have no parity either.

I have tried the reiserfsck command but it tells me there is hardware fault and will not carry on. I have looked up some other threads and some people in similar situations have recovered their drives by moving the drive from one controller to another and trying reiserfsck again. When I do this the drive again tells me there is a hardware fault and will not carry on.

"The problem has occurred looks like a hardware problem. If you have

bad blocks, we advise you to get a new hard drive, because once you

get one bad block that the disk drive internals cannot hide from

your sight,the chances of getting more are generally said to become

much higher (precise statistics are unknown to us), and this disk

drive is probably not expensive enough for you to you to risk your

time and data on it. If you don't want to follow that follow that

advice then if you have just a few bad blocks, try writing to the

bad blocks and see if the drive remaps the bad blocks (that means

it takes a block it has in reserve and allocates it for use for

of that block number). If it cannot remap the block, use badblock

option (-B) with reiserfs utils to handle this block correctly.

bread: Cannot read the block (2): (Input/output error)."

I have ordered a couple of new drives but as there is no valid parity replacing the drive will not invoke a valid rebuild.

Any ideas about what I can try as a last ditch attempt to get anything back from the drive?

Someone on another thread suggested copying zeros to the superblock(?) if this a valid idea as a last ditch effort then how do I do it?

Quote

May 3, 201313 yr

Please read this thread http://lime-technology.com/forum/index.php?topic=27247.0

Situation now gone from bad to worse.

I had what appeared to be a failing 2Tb WD drive. As it only had a few Gbs on it I copied the data from the drive.

I then made a mistake by taking the drive out of the array (following the instructions in the WIKI) and starting a parity check as there was one less drive now. I then left it running whilst I went to work. Half way through the day I got an email from the tower warning of a over hot hdd (47 degrees C), which seemed to heat up the adjacent drives.

When I got home the 3rd drive, one of those that got hot, was now showing errors and taking ages to mount. I tried to copy the data of to no avail. The drive now shows as unformatted.

As a parity check was in progress at the time I now have no parity either.

I have tried the reiserfsck command but it tells me there is hardware fault and will not carry on. I have looked up some other threads and some people in similar situations have recovered their drives by moving the drive from one controller to another and trying reiserfsck again. When I do this the drive again tells me there is a hardware fault and will not carry on.

"The problem has occurred looks like a hardware problem. If you have

bad blocks, we advise you to get a new hard drive, because once you

get one bad block that the disk drive internals cannot hide from

your sight,the chances of getting more are generally said to become

much higher (precise statistics are unknown to us), and this disk

drive is probably not expensive enough for you to you to risk your

time and data on it. If you don't want to follow that follow that

advice then if you have just a few bad blocks, try writing to the

bad blocks and see if the drive remaps the bad blocks (that means

it takes a block it has in reserve and allocates it for use for

of that block number). If it cannot remap the block, use badblock

option (-B) with reiserfs utils to handle this block correctly.

bread: Cannot read the block (2): (Input/output error)."

I have ordered a couple of new drives but as there is no valid parity replacing the drive will not invoke a valid rebuild.

Any ideas about what I can try as a last ditch attempt to get anything back from the drive?

Someone on another thread suggested copying zeros to the superblock(?) if this a valid idea as a last ditch effort then how do I do it?

Unless the data is worthless, do NOT write to the drive with ANYTHING until recovery efforts are all exhausted.

If you write zeros to the drive, it will be erased. You will lose your data with no chance of recovery.

If it has badblocks, then the first step will be to get a smart report on the drive. Then, based on the output of the smart report you can see what might be possible.

Joe L.

Quote

May 4, 201313 yr

Author

OK I have run a SHORT Smart report (The long one I am running at the moment.)

At first I could not get the smart report to run as even though the drive was shown in the unraid console as sdh when I tried to run the report smartctl reported no such device, at the time UNRAID was also reporting a temp of 0 degrees C

.

After a reboot UNRAID reports a more sensible temperature and the smart report ran, this drive does not look good...

****

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

=== START OF INFORMATION SECTION ===

Model Family: Seagate Barracuda 7200.11 family

Device Model: ST31500341AS

Serial Number: 6VS0EY9K

Firmware Version: CC3H

User Capacity: 1,500,301,910,016 bytes

Device is: In smartctl database [for details use: -P show]

ATA Version is: 8

ATA Standard is: ATA-8-ACS revision 4

Local Time is: Sat May 4 10:57:14 2013 Local time zone must be set--see zic

m

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 642) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off supp

ort.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: ( 2) minutes.

SCT capabilities: (0x103f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_

FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 226275730

3 Spin_Up_Time 0x0003 100 089 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 091 091 020 Old_age Always - 9408

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2

7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 14779277

9 Power_On_Hours 0x0032 064 064 000 Old_age Always - 32024

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1

12 Power_Cycle_Count 0x0032 093 093 020 Old_age Always - 7297

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x003a 066 066 000 Old_age Always - 34

190 Airflow_Temperature_Cel 0x0022 078 045 045 Old_age Always In_the_past 22 (Lifetime Min/Max 22/22)

194 Temperature_Celsius 0x0022 022 055 000 Old_age Always - 22 (0 6 0 0)

195 Hardware_ECC_Recovered 0x001a 044 026 000 Old_age Always - 226275730

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 111441516437485

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1853856889

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2988060042

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA

_of_first_error

# 1 Short offline Completed without error 00% 32024 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

*******

Any ideas?

Quote

May 4, 201313 yr

Author

It is prooving "difficult" to run the long test as I think the drive keeps going offline, certainly the drive in unraid seems to cycle between "*" ie not spun up to 0 degrees C to 22 degrees C and smart reports extended offline "Aborted by Host".

The drive is also "clicking".

Am I right in thinking that this is hopeless?

If so does UNRAID cache the directory information anywhere? As the drive was one of the shares I do not know exactly what was on this drive.

(I know that it is planned but what would I give for a duel parity drive system right now.)

Quote

May 4, 201313 yr

It is prooving "difficult" to run the long test as I think the drive keeps going offline, certainly the drive in unraid seems to cycle between "*" ie not spun up to 0 degrees C to 22 degrees C and smart reports extended offline "Aborted by Host".

The extended smart test will abort if the drive is spun down. DISABLE unRAID's spin-down feature while you run it.

The drive is also "clicking".

That is not a really good sign, but it does not indicate all is lost. (grinding sounds and ear-piercing screeches are far worse)

Am I right in thinking that this is hopeless?

No, not entirely.

If so does UNRAID cache the directory information anywhere? As the drive was one of the shares I do not know exactly what was on this drive.

No, it does not contain a separate listing of files.

(I know that it is planned but what would I give for a duel parity drive system right now.)

Dual parity would not help you when your array overheats, or when you take disks out of the array (as you described in your first post)

Your smart report only shows 2 re-allocated sectors, and none pending re-allocation. That;s actually pretty decent.

The reason the reiserfsck failed originally was because the disk was not responding. Now that it is, it might work.

Just be certain you run reiserfsck on the first partition, not on the raw drive, therefore if still assigned t the array you must use it on the /dev/mdX device, or if not assigned to the array, the /dev/sdX1 device (note the trailing "1" designating the first partition)

Quote

May 4, 201313 yr

Author

reiserfsck seems to be working.

It recommends running --rebuild-sb

*****

If the partition table has not been changed, and the partition is

valid and it really contains a reiserfs partition, then the

superblock is corrupted and you need to run this utility with

--rebuild-sb.

****

However when I do that it asks me

reiserfs_open: the reiserfs superblock cannot be found on /dev/sdh1.

what the version of ReiserFS do you use[1-4]

(1) 3.6.x

(2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, ch

oose this one)

(3) < 3.5.9 converted to new format (don't choose if unsure)

(4) < 3.5.9 (this is very old format, don't choose if unsure)

(X) exit

What is the correct version to try or should I not be trying this?

Quote

May 5, 201313 yr

Author

I managed to get the reiserfsck --check command to run and it report as follows.

****

root@Tower:~# reiserfsck --check /dev/sdh1

reiserfsck 3.6.21 (2009 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and it fails **

** please email bug reports to [email protected], **

** providing as much information as possible -- your **

** hardware, kernel, patches, settings, all reiserfsck **

** messages (including version), the reiserfsck logfile, **

** check the syslog file for any related information. **

** If you would like advice on using this program, support **

** is available for $25 at www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/sdh1

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Sat May 4 23:42:38 2013

###########

Replaying journal: Done.

Reiserfs journal '/dev/sdh1' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

Leaves 312933

Internal nodes 1926

Directories 1210

Other files 17902

Data block pointers 315054145 (15925 of them are zero)

Safe links 0

###########

reiserfsck finished at Sun May 5 00:02:42 2013

###########

**************

It seems to say there are no corruptions but unraid still shows the drive as unformatted.

Any ideas?

Ifr all else fails is there a way I can save the output of the reiserfsck --check command as it seemed to list all the files on the drive as it ran?

Quote

May 5, 201313 yr

Can you mount the drive via the commandline?

Quote

May 5, 201313 yr

Author

How would I do that?

I have tried using the mount command but I am probably doing it wrong "mount /dev/md3 /dev/disk3" the command comes back with "mount point /mnt/disk3 does not exist"

I have managed to run reiserfsck --check /dev.sdh1 >/boot reiser.txt which has produced a file that appears to show me all the files that are/were on the drive, all 17902 of them :-(

Anyone any ideas about what I can do to try and bring this drive back, reiserfsck seems to think the data is there but UNRAID is still showing the drive as unformatted.

Quote

May 5, 201313 yr

Author

I have attached a syslog, however from it.

*****

May 5 10:54:10 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md3,

May 5 10:54:10 Tower logger: missing codepage or helper program, or other error

May 5 10:54:10 Tower logger: In some cases useful info is found in syslog - try

May 5 10:54:10 Tower logger: dmesg | tail or so

May 5 10:54:10 Tower logger:

May 5 10:54:10 Tower emhttp: _shcmd: shcmd (356): exit status: 32

May 5 10:54:10 Tower emhttp: disk3 mount error: 32

May 5 10:54:10 Tower emhttp: shcmd (357): rmdir /mnt/disk3

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 2, size 4096)

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 16, size 4096)

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3

***********

How come when I run the reiserfsck --check /dev/sdh1 (this is disk 3) reiserfsck is reporting no errors but UNRAID is unable to mount the drive?

Or am I running the command wrong?

Should I be running resiserfsck with the --sb or --rebuild tree options?

syslog.zip

Quote

May 5, 201313 yr

I have attached a syslog, however from it.

*****

May 5 10:54:10 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md3,

May 5 10:54:10 Tower logger: missing codepage or helper program, or other error

May 5 10:54:10 Tower logger: In some cases useful info is found in syslog - try

May 5 10:54:10 Tower logger: dmesg | tail or so

May 5 10:54:10 Tower logger:

May 5 10:54:10 Tower emhttp: _shcmd: shcmd (356): exit status: 32

May 5 10:54:10 Tower emhttp: disk3 mount error: 32

May 5 10:54:10 Tower emhttp: shcmd (357): rmdir /mnt/disk3

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 2, size 4096)

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 16, size 4096)

May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3

***********

How come when I run the reiserfsck --check /dev/sdh1 (this is disk 3) reiserfsck is reporting no errors but UNRAID is unable to mount the drive?

Or am I running the command wrong?

Should I be running resiserfsck with the --sb or --rebuild tree options?

you should slow down... and wait for some advice. You might have already taken steps you cannot undo.

By now you've figured out that you must use reiserfsck on either the /dev/mdX device (which maintains parity)

or on the /dev/sdX1 device (which does NOT keep parity updated... You must after fixing the file-system re-sync parity.)

You should always start with

reiserfsck --check

followed by its instructions.

If the drive does not mount subsequently, you can usually get it to mount by running

reiserfsck --rebuild-tree

That would be your next step.

reiserfsck --rebuild-tree /dev/sdh1

do NOT use the -S or scan-entire-tree unless you want to recover old deleted files.

Then, to mount it:

mkdir /mnt/ridley

mount -t reiserfs /dev/sdh1 /mnt/ridley

or perhaps

mkdir /mnt/disk3

mount -t reiserfs /dev/md3 /mnt/disk3

Joe L.

Quote

May 5, 201313 yr

Author

Thanks for the reply.

So as it is finishing the reisferfsck --check and reports no problems and so does not recommend anything further but UNRAID still ill not mount it I should go ahead and run the reiserfsck --rebuild-tree /dev/sdh1 command?

Quote

May 5, 201313 yr

Thanks for the reply.

So as it is finishing the reisferfsck --check and reports no problems and so does not recommend anything further but UNRAID still ill not mount it I should go ahead and run the reiserfsck --rebuild-tree /dev/sdh1 command?

Yes. After you run the rebuild tree, reboot the server and see if it will mount. You MUST then run a correcting parity check/sync once the disks are back mounted. It will probably find some differences representing your corrections.

Quote

May 6, 201313 yr

Author

Ran it but still will not mount the drive.

****

root@Tower:~# reiserfsck --rebuild-tree /dev/sdh1

reiserfsck 3.6.21 (2009 www.namesys.com)

*************************************************************

** Do not run the program with --rebuild-tree unless **

** something is broken and MAKE A BACKUP before using it. **

** If you have bad sectors on a drive it is usually a bad **

** idea to continue using it. Then you probably should get **

** a working hard drive, copy the file system from the bad **

** drive to the good one -- dd_rescue is a good tool for **

** that -- and only then run this program. **

** If you are using the latest reiserfsprogs and it fails **

** please email bug reports to [email protected], **

** providing as much information as possible -- your **

** hardware, kernel, patches, settings, all reiserfsck **

** messages (including version), the reiserfsck logfile, **

** check the syslog file for any related information. **

** If you would like advice on using this program, support **

** is available for $25 at www.namesys.com/support.html. **

*************************************************************

Will rebuild the filesystem (/dev/sdh1) tree

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

Replaying journal: Done.

Reiserfs journal '/dev/sdh1' in blocks [18..8211]: 0 transactions replayed

###########

reiserfsck --rebuild-tree started at Sun May 5 19:52:06 2013

###########

Pass 0:

####### Pass 0 #######

Loading on-disk bitmap .. ok, 315372468 blocks marked used

Skipping 19389 blocks (super block, journal, bitmaps) 315353079 blocks will be r

ead

0%... left 272378554, 27233 /s

left 0, 23598 /secc

19111 directory entries were hashed with "r5" hash.

"r5" hash is selected

Flushing..finished

Read blocks (but not data blocks) 315353079

Leaves among those 312933

Objectids found 19152

Pass 1 (will try to insert 312933 leaves):

####### Pass 1 #######

Looking for allocable blocks .. finished

0%....20%....40%....60%....80%....100% left 0, 324 /sec

Flushing..finished

312933 leaves read

312905 inserted

28 not inserted

####### Pass 2 #######

Pass 2:

0%....20%....40%....60%....80%....100% left 0, 18 /sec

Flushing..finished

Leaves inserted item by item 28

Pass 3 (semantic):

####### Pass 3 #########

Flushing..finished

Files found: 17902

Directories found: 1211

Pass 3a (looking for lost dir/files):

####### Pass 3a (lost+found pass) #########

Looking for lost directories:

Flushing..finished2, 230 /sec

Pass 4 - finisheddone 309831, 166 /sec

Flushing..finished

Syncing..finished

###########

reiserfsck finished at Mon May 6 00:24:47 2013

###########

root@Tower:~#

************

Any ideas as to what to do now?

Quote

May 6, 201313 yr

Author

Syslog added

syslog_0605130115.zip

Quote

May 7, 201313 yr

With all the error messages in the syslog, I'd try a different SATA port.

Joe L.

Quote

May 7, 201313 yr

Author

Yesterday I downloaded YAREG and installed it on my Windows PC. When I connected the drive that UNRAID reports as unformatted to the PC then YAREG sees the drive and the data!

I have been copying the data from the drive ever since, Yareg might have an incredibly slow transfer rate but it is copying data and I am prepared to wait if I get the data back.

When the problems started I moved the drive from an on motherboard controller to a port on the Supermicro MV8 controller.

I do not suppose you could be more specific about the errors as I am a noob when it comes to unix and find the syslog difficult to make heads or tails of.

Quote

May 8, 201313 yr

Author

I think I have now recovered all the data from the drive.

I am now going to attempt to add a replacement drive to the array and allow a parity rebuild, when that is complete I will copy the data back to the replacement drive.

Fingers crossed.

Then I will have to upgrade UNRAID so I can integrate these 3TB drives I bought

Quote

May 9, 201313 yr

Author

I have replaced the drive. got the data back and have rebuilt parity so ATM (fingers crossed) things are looking better, a lot better, than they were.

Thankyou for all the help.

Quote

Solved: Array reports unformatted drive

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)