Another red-balled disk - superblock corruption? - General Support (V5 and Older)

October 13, 201114 yr

Another disk came up red-balled tonight. I did have an event earlier where I couldn't get the server to shut down, and had to forcibly shut it down, and reboot. The SMART report came back as PASSED, there do seem to be some pending sectors. However when I ran reiserfsck --check against the disk I got this error:

reiserfs_open: the reiserfs superblock cannot be found on /dev/sdg.
Failed to open the filesystem.

If the partition table has not been changed, and the partition is
valid  and  it really  contains  a reiserfs  partition,  then the
superblock  is corrupted and you need to run this utility with
--rebuild-sb.

I just want to be sure that this is a safe procedure before I proceed. After fixing the superblock I plan to rebuild the disk. Are there any other things I should be doing first? I'm as sure as I can be that the data on the disk is OK.

SMART_11.12.11.txt

syslog-20111012-215229.txt.zip

Quote

October 13, 201114 yr

reiserfsck should be run against the first partition on the drive (note the 1):

reiserfsck --check /dev/sdX1

Quote

October 13, 201114 yr

Author

well, this doesn't look good:

###########
reiserfsck --check started at Wed Oct 12 23:00:00 2011
###########
Replaying journal: Trans replayed: mountid 152, transid 47341, desc 5188, len 1, commit 5190, next trans offset 5173
Trans replayed: mountid 152, transid 47342, desc 5191, len 1, commit 5193, next trans offset 5176
Replaying journal: Done.                                                        
Reiserfs journal '/dev/sdg1' in blocks [18..8211]: 2 transactions replayed
Checking internal tree.. \/  6 (of  16//131 (of 170|/ 39 (of 170\
The problem has occurred looks like a hardware problem. If you have
bad blocks, we advise you to get a new hard drive, because once you
get one bad block  that the disk  drive internals  cannot hide from
your sight,the chances of getting more are generally said to become
much higher  (precise statistics are unknown to us), and  this disk
drive is probably not expensive enough  for you to you to risk your
time and  data on it.  If you don't want to follow that follow that
advice then  if you have just a few bad blocks,  try writing to the
bad blocks  and see if the drive remaps  the bad blocks (that means
it takes a block  it has  in reserve  and allocates  it for use for
of that block number).  If it cannot remap the block,  use badblock
option (-B) with  reiserfs utils to handle this block correctly.

bread: Cannot read the block (263314195): (Input/output error).

Aborted

and now I can't get a SMART report:

root@Tower:~# smartctl  -a  -d  ata  /dev/sdg
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Quote

October 13, 201114 yr

Author

A bit of an update: I restarted the array and was able to get a SMART report from that disk. I ran the long SMART tests overnight and it aborted about 10% of the way through with many read errors. I am currently copying all data off the drive , with the plan to remove it from the array, then re-add and rebuild it. Does that seem like a good strategy, or is there something I should be doing first?

SMART report below

SMART_101113.txt

Quote

October 13, 201114 yr

A bit of an update: I restarted the array and was able to get a SMART report from that disk. I ran the long SMART tests overnight and it aborted about 10% of the way through with many read errors. I am currently copying all data off the drive , with the plan to remove it from the array, then re-add and rebuild it. Does that seem like a good strategy, or is there something I should be doing first?

SMART report below

I would not add the drive back into the array first, I would run a preclear cycle on it and see if it manages to pass.

Quote

October 13, 201114 yr

Author

I would not add the drive back into the array first, I would run a preclear cycle on it and see if it manages to pass.

That's a good idea, I'll do that. Thanks.

Quote

October 13, 201114 yr

A bit of an update: I restarted the array and was able to get a SMART report from that disk. I ran the long SMART tests overnight and it aborted about 10% of the way through with many read errors. I am currently copying all data off the drive , with the plan to remove it from the array, then re-add and rebuild it. Does that seem like a good strategy, or is there something I should be doing first?

SMART report below

I would not add the drive back into the array first, I would run a preclear cycle on it and see if it manages to pass.

The read errors are not expected, but if your spin-down timer spins a drive down, the long test will abort.

Make sure you disable any spin down timer when doing a "long" test.

Quote

October 13, 201114 yr

I hope you have the array running and simulating the disk and you're copying that data. It's quite pointless to be copying the data off of the failed disk directly, since you won't know what data failed to copy to the disk and which files have been corrupted by the bad sectors.

Peter

Quote

October 13, 201114 yr

Author

The read errors are not expected, but if your spin-down timer spins a drive down, the long test will abort.
Make sure you disable any spin down timer when doing a "long" test.

I thought I had done so, but now I'm second guessing myself. I assume the individual disc spin down settings override the general array spin down settings, correct? I had the bad disk set to never spin down and the array was left at 30 minute spin down. Hope that was right.

Quote

October 13, 201114 yr

Author

I hope you have the array running and simulating the disk and you're copying that data. It's quite pointless to be copying the data off of the failed disk directly, since you won't know what data failed to copy to the disk and which files have been corrupted by the bad sectors.

Peter

Yes the array is started, and I'm using rsync to copy all files from /mnt/disk3 (the red-balled one) to /mnt/disk10. Please tell me this is using the simulated data and I haven't been wasting my time all day. It seems like it must be correct as the UnRAID main page shows only a handful of reads on disk3 compared to well over 2 million writes to disk 10.

Quote

October 14, 201114 yr

Author

Any comments about my last 2 posts? Sorry for the multiposting but I just want to be sure that I'm doing things the right way.

Quote

October 14, 201114 yr

Not sure on the spin-down but /mnt/disk3 is the correct simulated disk to copy the data from.

Peter

Quote

October 14, 201114 yr

Author

Not sure on the spin-down but /mnt/disk3 is the correct simulated disk to copy the data from.

Peter

Thanks. I thought so, but wanted to be sure. Probably another 12 hours to go for the copy.

Quote

Another red-balled disk - superblock corruption?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)