Failed drive report during replacement of another drive

JorgeB · August 1, 2017

stop the array, start in maintenance mode again and run reiserfsck on the emulated disk9 and post the output:

reiserfsck /dev/md9

talmania · August 1, 2017

Ok I get to console and see the following attached picture1--this came up after one of the initial steps I believe. So I putty'd in and was able to login that way and got the output shown as picture2.

Edit: I should add that doing anything in console in this top picture didn't respond--could not get a login prompt etc.

Edited August 1, 2017 by talmania
Clarification

JorgeB · August 1, 2017

Superblock is damaged, this is not normal, could be the result of the problems with the original disk9 or there were changes to the array after the disk13 upgrade attempt, any change, like dockers writing to the array, etc will cause problems.

You can still rebuild the superblock and see how it goes, follow the instructions from here:

https://wiki.lime-technology.com/Check_Disk_Filesystems#Rebuilding_the_superblock

talmania · August 1, 2017

Ok thanks johnnie--I'm following the wiki and crossing my fingers. Didn't change the array status at all--hope that was correct. Currently status:

root@Deed:~# reiserfsck --rebuild-sb /dev/md9
reiserfsck 3.6.24

Will check superblock and rebuild it if needed
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

Did you use resizer(y/n)[n]: n
rebuild-sb: wrong block count occured (854657433), fixed (488378624)
rebuild-sb: wrong bitmap number occured (26083), fixed (14905)
rebuild-sb: wrong free block count occured (791198636), zeroed
Reiserfs super block in block 16 on 0x909 of format 3.6 with standard journal
Count of blocks on the device: 488378624
Number of bitmaps: 14905
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0
Root block: 130410427
Filesystem is clean
Tree height: 5
Hash function used to sort names: "r5"
Objectid map size 8, max 972
Journal parameters:
        Device [0x0]
        Magic [0x63a76705]
        Size 8193 blocks (including 1 for journal header) (first block 18)
        Max transaction length 1024 blocks
        Max batch size 900 blocks
        Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x1:
         some corruptions exist.
sb_version: 2
inode generation number: 2232
UUID: a150194f-d4db-4f12-8220-a6300f0b8386
LABEL:
Set flags in SB:
        ATTRIBUTES CLEAN
Mount count: 205
Maximum mount count: 30
Last fsck run: Fri Nov 5 23:24:38 2010
Check interval in days: 180
Is this ok ? (y/n)[n]: y
The fs may still be unconsistent. Run reiserfsck --check.

root@Deed:~# reiserfsck --check
Usage: reiserfsck [mode] [options] device

Modes:
--check                       consistency checking (default)
--fix-fixable                 fix corruptions which can be fixed without
                                --rebuild-tree
--rebuild-sb                  super block checking and rebuilding if needed
                                (may require --rebuild-tree afterwards)
--rebuild-tree                force fsck to rebuild filesystem from scratch
                                (takes a long time)
--clean-attributes            clean garbage in reserved fields in StatDatas
Options:
-j | --journal device         specify journal if relocated
-B | --badblocks file         file with list of all bad blocks on the fs
-l | --logfile file           make fsck to complain to specifed file
-n | --nolog                  make fsck to not complain
-z | --adjust-size            fix file sizes to real size
-q | --quiet                  no speed info
-y | --yes                    no confirmations
-f | --force          force checking even if the file system is marked clean
-V                            prints version and exits
-a and -p                     some light-weight auto checks for bootup
-r                    ignored
Expert options:
--no-journal-available        do not open nor replay journal
-S | --scan-whole-partition   build tree of all blocks of the device

root@Deed:~# reiserfsck --check /dev/md9
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md9
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Tue Aug 1 14:18:50 2017
###########
Replaying journal: Done.
Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed

EDIT: It's continuing to run--worried that it hung up there on 0 transactions replayed.

Edited August 1, 2017 by talmania
More info

JorgeB · August 1, 2017

Wait, since it's running on the emulated disk it will be slower than normal, as long as there's disk activity it should be running.

talmania · August 1, 2017

Just now, johnnie.black said:

Wait, since it's running on the emulated disk it will be slower than normal, as long as there's disk activity it should be running.

It is now---I can see the counters incrementing 2 (of 19)/52 (of 92)/ 107 (of 170) etc...

talmania · August 1, 2017

Looking further ahead--when this process completes do I simply complete the remaining steps in the original directive johnnie?

JorgeB · August 1, 2017

reiserfsck will probably still find errors, but since they should be fixable and the replacement disk is new, so you don't overwrite anything, might as well rebuild first then finish fixing the filesystem, specially if it needs --rebuild-tree, since that would take much longer on the emulated disk.

Keep old disk9 intact for now, some data (or most of it with some luck) should still be salvageable if needed.

talmania · August 2, 2017

Evening update: came back home from the office and found the following in the summary of the log:

Quote

Replaying journal: Done.
Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed

Zero bit found in on-disk bitmap after the last valid bit.
Checking internal tree.. / 1 (of 19)/ 1 (of 93)/ 1 (of 87)block 8211: The number of items (6) is incorrect, should be (1)
the problem in the internal node occured (8211), whol/ 36 (of 93)/142 (of 170)block 195496755: The level of the node (40014) is not correct, (1) expected
the problem in the internal node occured/ 9 (of 19)/130 (of 130)/114 (of 115)bad_stat_data: The objectid (841) is marked free, but used by an object [834 841 0x0 SD (0)] finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
2 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Tue Aug 1 16:08:26 2017
###########
root@Deed:~#
root@Deed:~#

Then I unmounted the array, assigned the new disk9 and brought the array online and now I see the following:

talmania · August 2, 2017

Should I be concerned that it states "unmountable"? It appears to be rebuilding but never been here before....thanks!

Frank1940 · August 2, 2017

I would wait until it finishes and then see what it says... (As I recall, it will say that until the disk has been rebuilt.) If it is unmounted, @johnnie.black will probably be around to give you further advise. (I think he is somewhere in the British Isles so it is the middle of the night there.)

talmania · August 2, 2017

Many thanks Frank! You both have been incredible!

JorgeB · August 2, 2017

3 hours ago, talmania said:

Should I be concerned that it states "unmountable"? It appears to be rebuilding but never been here before....thanks!

Wait until the rebuild finishes then run reiserfsck --rebuid-tree, that will take several hours but it should fix it.

talmania · August 3, 2017

17 hours ago, johnnie.black said:

Wait until the rebuild finishes then run reiserfsck --rebuid-tree, that will take several hours but it should fix it.

Ok I ran it and it completed....then had to stop the array and restart it. It allowed disk9 to be mounted and I can now see the disk share but nothing is there except "lost and found". Can't browse the share with windows but can under the gui. Tons of folders in there and files as well of the correctish sizes I presume. Assume I have to open them up to read them etc? They are named with simple numeric sequences--see attached picture.

Attached is the output of the --rebuild-tree.

rebuild-tree.txt

talmania · August 3, 2017

And more diagnostics too....

deed-diagnostics-20170802-1745.zip

talmania · August 3, 2017

And I suck...was trying to access via disk9 and NOT the users share lost+found. Buried under those original directories are my ACTUAL directories and files! Tried a couple and they seem to work perfectly. I think we're done here...time to check and move!

JonathanM · August 3, 2017

20 minutes ago, talmania said:

I think we're done here...time to check and move!

Are you familiar with the user share copy "bug"?

talmania · August 3, 2017

3 minutes ago, jonathanm said:

Are you familiar with the user share copy "bug"?

Nope...sounds like I need to be!

JonathanM · August 3, 2017

2 minutes ago, talmania said:

Nope...sounds like I need to be!

Yep. I'd advise researching a little, but in a nutshell, don't copy between /mnt/diskX locations and /mnt/user locations or vice-versa. Use either disk only or user only locations in copy operations, don't mix them.

I only piped up because you mentioned accessing disk9 directly. If you want to copy using disk9 as a source, make sure your destination is also disk9 but another folder, or another diskX, not a user share.

It doesn't effect all operations, but until you understand why it happens and what exactly causes your files to disappear if you do it wrong, just don't do it.

talmania · August 3, 2017

Just now, jonathanm said:

Yep. I'd advise researching a little, but in a nutshell, don't copy between /mnt/diskX locations and /mnt/user locations or vice-versa. Use either disk only or user only locations in copy operations, don't mix them.

I only piped up because you mentioned accessing disk9 directly. If you want to copy using disk9 as a source, make sure your destination is also disk9 but another folder, or another diskX, not a user share.

It doesn't effect all operations, but until you understand why it happens and what exactly causes your files to disappear if you do it wrong, just don't do it.

Actually windows gave me a permissions error and when I started poking around I realized I was under disk9/lost+found and not \\tower\lost+found. I'm assuming all I have to do is move the files from \\tower\lost+found to their respective \\tower\sharename and I'll be set no? Or is a more complex move needed? Usershare to usershare if I'm not mistaken...

JonathanM · August 3, 2017

3 minutes ago, talmania said:

all I have to do is move the files from \\tower\lost+found to their respective \\tower\sharename and I'll be set

Yep.

Failed drive report during replacement of another drive

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation