1 Red Ball disk, 1 Unmountable disk

AndrewT · June 6, 2017

My issue is related to this post but I might need a different solution:

I haven't written anything to the array since this has happened, but have been trying to fix it without any luck for a couple weeks now. Attached is a syslog after rebooting.

Disk2 red balled on me, but the contents are emulated, because I was able to fully rsync the disk to my cache disk as backup (rsync /mnt/disk2 /mnt/cache/bak.disk2). I've tried twice to repair the red disk with a fresh precleared disk and both stopped rebuilding the array when it accumulated hundreds of write errors. I initially thought I just had a bad unlucky disk, but 2 in a row seemed unlikely especially after successful runs of preclear prior to adding them.

Disk4 reads as Unmountable - No file system (32) and when I tried plugging it in to a Linux desktop, mounting truly doesn't seem to be possible with an error "dsk image failed : can't read superblock"

What's curious to me is that disk4 is green, not red, yet remains unmountable to read the array files on it. What should I do to try to save the data from disk2 (backed up on cache) and disk4 (?) and get the array back up to running happily? I'm particularly unsure of how I might be able to save the data from disk4.

Ironically, I have plenty of precleared disks that I'm preparing to build a backup tower.

tower-diagnostics-20170606-1840.zip

Edited May 5, 2018 by AndrewT
removed diagnostics ZIP file

JorgeB · June 6, 2017

Post the complete diags: Tools -> Diagnostics.

AndrewT · June 6, 2017

I updated the attachment from only the sys log to the entire diagnostics dump.

JorgeB · June 6, 2017

Check filesystem on disk4 (md4):

https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later

JorgeB · June 6, 2017

Forgot to mention, this is to deal with the 1st part of your problem, the unmountable disk, it should also be the easiest one, then we'll take care of the rebuild, but multiple rebuild fails, server with 2 SASLP, it may not be that simple.

It's getting past my bedtime so can't look at all the SMART reports but if reiserfsck fixes disk4 and you want to attempt another rebuild grab the diags before rebooting if something goes wrong.

AndrewT · June 7, 2017

Thanks, I am running `reiserfsck --rebuild-tree /dev/md4` after `reiserfsck --check /dev/md4` suggested it:

Checking internal tree..  block 471171107: The level of the node (52700) is not correct, (4) expected
 the problem in the internal node occured (471171107), whole subtree is skipped
finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree

AndrewT · June 8, 2017

Weird, so `reiserfsck --rebuild-tree /dev/md4` failed with this printed out:

block 97091595: The number of items (1) is incorrect, should be (0) - corrected
block 97091595: The free space (32768) is incorrect, should be (4072) - corrected
20%                                            left 390393334, 14207 /sec
The problem has occurred looks like a hardware problem. If you have
bad blocks, we advise you to get a new hard drive, because once you
get one bad block  that the disk  drive internals  cannot hide from
your sight,the chances of getting more are generally said to become
much higher  (precise statistics are unknown to us), and  this disk
drive is probably not expensive enough  for you to you to risk your
time and  data on it.  If you don't want to follow that follow that
advice then  if you have just a few bad blocks,  try writing to the
bad blocks  and see if the drive remaps  the bad blocks (that means
it takes a block  it has  in reserve  and allocates  it for use for
of that block number).  If it cannot remap the block,  use badblock
option (-B) with  reiserfs utils to handle this block correctly.
bread: Cannot read the block (98002942): (Input/output error).
Aborted

I looked for `badblock` and tab-completion indicates no such thing exists, only badblocks but the help menu lacks a -B option. I then tried `reiserfsck --help` and see the -B option exists there to specify a file with list of all bad blocks on the fs. Does this mean I should somehow re-run reiserfsck in such a way that it lists files with bad blocks and redirect stderr (or stdout?) to a file, then give that file to -B <file>? It seems there's a warning to not run the --rebuild-sb twice or data will be lost, but is that also true for just --rebuild-tree (source: https://www.cyberciti.biz/tips/repairing-reiserfs-file-system-with-reiserfsck.html ) ?

Edited June 8, 2017 by AndrewT
add link/ref

JorgeB · June 8, 2017

You can try running badblocks to create a list of bad sectors and then use it with reiserfsck, to do that with the array stopped:

badblocks -v -o /boot/badblocks.log /dev/sdX1

then start in maintenance mode:

reiserfsck --rebuild-tree -B /boot/badblocks.log /dev/mdX

Other option would be too clone that disk to a spare one using dd an then running reiserfsck there.

This also means you're going to have problems rebuilding the disabled disk.

AndrewT · June 13, 2017

The first time I ran badblocks, I put the log file on my flash drive and noticed the output was over 2GB and still running. I worried the file would fill up my USB, so I killed it (and foolishly removed it). The second time I ran it, I put it on my cache disk `badblocks -v -o /mnt/cache/md4.badblocks.log /dev/sdo1` but now the log file is empty and it finished and printed:

Checking blocks 0 to 3907018534
Checking for bad blocks (read-only test):
done
Pass completed, 0 bad blocks found. (0/0/0 errors)

Running it the first time wouldn't have fixed it, so since I don't have the first log listing the bad blocks and now it won't give any, is the disk4 just toast now and I have to rely on a ~80-90% backed up disk version of it?

JorgeB · June 13, 2017

Try running reiserfsk again, if you still get the bad sectors error best bet is cloning the disk to another with dd like this:

dd if=/dev/sdX of=/dev/sdY bs=4k conv=noerror,sync status=progress

X=source

Y=destination (use a disk outside the array

Then run reiserfsck on the clone.

AndrewT · June 13, 2017

I have disks that are precleared but not formatted. When you add a disk to the array, which formats it properly, doesn't that modify the parity disk as well because the array expands? I can see the precleared disks have "device" IDs assigned under "Unassigned Devices" but is there a way to mount it and format a disk without adding it to the array?

JorgeB · June 13, 2017

To clone the disk with dd you don't need to format it first.

AndrewT · June 13, 2017

But how can I mount sdk then?

JorgeB · June 13, 2017

And what disk is sdk?

AndrewT · June 13, 2017

Precleared disk listed in Unassigned Devices, but the "Mount" GUI button is greyed out, which I assumed was because it lacks a filesystem(?)

JorgeB · June 13, 2017

And what are you trying to do with that disk?

AndrewT · June 13, 2017

Mount it, so it has a path to give dd of=<path>

JorgeB · June 13, 2017

You don't need to mount it, path will be:

/dev/sdk

AndrewT · June 13, 2017

Gosh, sorry, yes `dd if=/dev/sdo of=/dev/sdk bs=4k conv=noerror,sync status=progress` is working as expected to copy the disk. If it finishes without error, do I then pull out the sdo drive off the "Disk 4" slot on the array and re-start the array with sdk in place?

JorgeB · June 13, 2017

You could do a new config with the cloned disk if it's the same size as the original, but it's somewhat involved and I think it would be easier to just run reiserfsck with the disk unassigned, you'll need to use --rebuild-tree since previous run didn't finish:

reiserfsck --rebuild-tree /dev/sdk1

Note the 1 in the end.

If it works you can mount the disk with the UD plugin and copy the data to the array or another computer, not however that some files will be corrupt, since dd will skip any read errors and fill the destination with zeros.

You then still need to rebuild the disabled disk, again there will be some corruption because of these errors, doesn't matter if you use the original disk or the clone.

1 Red Ball disk, 1 Unmountable disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation