[SOLVED] Drive Error Red Ball After Parity Drive Upgraded: can't read superblock

Christopher · June 18, 2015

I started by running a parity check on existing 4GB parity drive. Then I followed the steps described on the forum to upgrade my parity drive from 4GB to 5TB. I left it building the parity drive. After a day or so I check to see what the status was. It was complete but disk13 has a red ball with 200K+ errors reported in the Dynamix web gui. I tried accessing the drive and was able to list and open files. When I attempted to get the system log I got some memory error. It was getting late so I decided to shutdown the server pick up in the morning. I turn on the server this morning and disk13 is showing as unformatted. I'm not sure how to interpret the mdcmd results below but I don't think it's good. Please let me know if the mdcmd status is ok and I can rebuild disk13 using the new 5GB parity drive. If not, what other options do I have to recover the data? Also, I still have the old 4GB parity drive.

Below is a link to the system log and smart report . The smart report is from a prior test. I didn't run a smart test since this happened. I wasn't sure if it would make the situation worse.

Thanks,

Christopher

Unraid Version: 5.0.6

System log and smart report:

https://drive.google.com/file/d/0B00Diiihkv_qSmY5YXpHaFM4VkU/view?usp=sharing

/root/mdcmd status | egrep "mdResync|mdState|sbSync"

sbSynced=0

sbSyncErrs=0

mdState=STARTED

mdResync=0

mdResyncCorr=1

mdResyncPos=0

mdResyncDt=0

mdResyncDb=0

This looks like the error in the system log:

Jun 18 14:43:52 FileServer02 emhttp: shcmd (10217): mkdir /mnt/disk13
Jun 18 14:43:52 FileServer02 emhttp: shcmd (10218): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md13 /mnt/disk13 |& logger

Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): found reiserfs format "3.6" with standard journal

Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): using ordered data mode

Jun 18 14:43:52 FileServer02 kernel: reiserfs: using flush barriers

Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): journal params: device md13, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): checking transaction log (md13)

Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): replayed 2 transactions in 0 seconds

Jun 18 14:43:53 FileServer02 kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 28661 does not match to the expected one 4

Jun 18 14:43:53 FileServer02 kernel: REISERFS error (device md13): vs-5150 search_by_key: invalid format found in block 602112004. Fsck?

Jun 18 14:43:53 FileServer02 kernel: REISERFS (device md13): Remounting filesystem read-only

Jun 18 14:43:53 FileServer02 kernel: REISERFS error (device md13): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD]

Jun 18 14:43:53 FileServer02 kernel: REISERFS (device md13): Using r5 hash to sort names

Jun 18 14:43:53 FileServer02 logger: mount: /dev/md13: can't read superblock

Jun 18 14:43:53 FileServer02 emhttp: _shcmd: shcmd (10218): exit status: 32

Jun 18 14:43:53 FileServer02 emhttp: disk13 mount error: 32

Jun 18 14:43:53 FileServer02 emhttp: shcmd (10219): rmdir /mnt/disk13

Christopher · June 19, 2015

I ran reiserfsck --check /dev/sdr and here's the output:

reiserfs_open: the reiserfs superblock cannot be found on /dev/sdr.
Failed to open the filesystem.

If the partition table has not been changed, and the partition is

valid and it really contains a reiserfs partition, then the

superblock is corrupted and you need to run this utility with

--rebuild-sb.

JonathanM · June 19, 2015

I ran reiserfsck --check /dev/sdr and here's the output:

reiserfs_open: the reiserfs superblock cannot be found on /dev/sdr.
Failed to open the filesystem.

The filesystem that you need to be working on is at /dev/md13.

NEVER operate on the raw /dev/sd? device, and only mess with the /dev/sd?1 location if you are ok with invalidating parity, and know exactly what and why you are doing it.

http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

Christopher · June 19, 2015

I'm running the check command as "reiserfsck --check /dev/md13" and will report the results in the morning. Thanks.

Christopher · June 19, 2015

Here's the result of "reiserfsck --check /dev/md13":

###########
reiserfsck --check started at Fri Jun 19 00:57:26 2015

###########

Replaying journal: Done.

Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. \block 602112004: The level of the node (28661) is not correct, (4) expected

the problem in the internal node occured (602112004), whole subtree is skipped

finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

1 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Fri Jun 19 01:18:49 2015

###########

Should I proceed and run "reiserfsck --rebuild-tree /dev/md13"?

Thanks,

Christopher

itimpi · June 19, 2015

Here's the result of "reiserfsck --check /dev/md13":

###########
reiserfsck --check started at Fri Jun 19 00:57:26 2015

###########

Replaying journal: Done.

Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. \block 602112004: The level of the node (28661) is not correct, (4) expected

the problem in the internal node occured (602112004), whole subtree is skipped

finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

1 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Fri Jun 19 01:18:49 2015

###########

Should I proceed and run "reiserfsck --rebuild-tree /dev/md13"?

Thanks,

Christopher

Yes. It looks as if that is what is required.

Christopher · June 19, 2015

This is going to take a long time. $:-\$ Is there anyway to disable parity while during the reiserfsck process? I can build the parity drive after. I don't think my parity is correct anyways based on the "mdcmd status" results in my original post above. Thanks.

Current reiserfsck Status:

Replaying journal: Done.
Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed

###########

reiserfsck --rebuild-tree started at Fri Jun 19 13:16:07 2015

###########

Pass 0:

####### Pass 0 #######

Loading on-disk bitmap .. ok, 678797287 blocks marked used

Skipping 30567 blocks (super block, journal, bitmaps) 678766720 blocks will be read

0%. left 586061335, 4043 /sec

Christopher · June 21, 2015

The reiserfsck process just finish and the end of the report is shown below. The web front end still shows the drive as unformatted. The wiki says I should restart the array after reiserfsck completes. Please confirm if there's anything else I should do before a restart? Also what exactly does "Deleted unreachable items 40581" and "Empty lost dirs removed 2" mean from the report? Thanks!

Flushing..finished
Objects without names 1329

Empty lost dirs removed 2

Dirs linked to /lost+found: 17

Dirs without stat data found 8

Files linked to /lost+found 1312

Pass 4 - finished done 501821, 11 /sec

Deleted unreachable items 40581

Flushing..finished

Syncing..finished

###########

reiserfsck finished at Sat Jun 20 23:34:12 2015

itimpi · June 21, 2015

The reiserfsck process just finish and the end of the report is shown below. The web front end still shows the drive as unformatted. The wiki says I should restart the array after reiserfsck completes. Please confirm if there's anything else I should do before a restart? Also what exactly does "Deleted unreachable items 40581" and "Empty lost dirs removed 2" mean from the report? Thanks!

Flushing..finished
Objects without names 1329

Empty lost dirs removed 2

Dirs linked to /lost+found: 17

Dirs without stat data found 8

Files linked to /lost+found 1312

Pass 4 - finished done 501821, 11 /sec

Deleted unreachable items 40581

Flushing..finished

Syncing..finished

###########

reiserfsck finished at Sat Jun 20 23:34:12 2015

The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)?

Christopher · June 21, 2015

The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)?

No. I have not restarted the array yet since reiserfsck completed. I'm asking if there's anything else I should do before a restart? Also, I did not run reiserfsck in Maintenance mode. My mistake. What's are negative effects of this?

itimpi · June 21, 2015

The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)?

No. I have not restarted the array yet since reiserfsck completed. I'm asking if there's anything else I should do before a restart?]

The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point).

Also, I did not run reiserfsck in Maintenance mode. My mistake. What's are negative effects of this?

I thought that reiserfsck would refuse to run against a mounted file system so not quite sure how you managed to run it against a 'md' type device if it was not in Maintenance mode. If you did not run against a 'md' type device then perhaps you better tell us what actually happened before proceeding.

Christopher · June 21, 2015

The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point).

Perhaps I was already in Maintenance mode and I didn't know it? Can Maintenance mode be automatically enable by a redball, parity build or some failure event? I don't see anything regarding Maintenance mode in the Dynamix webgui. Is there cmd I can run in the terminal to show if its in Maintenance mode or not?

I thought that reiserfsck would refuse to run against a mounted file system so not quite sure how you managed to run it against a 'md' type device if it was not in Maintenance mode. If you did not run against a 'md' type device then perhaps you better tell us what actually happened before proceeding.

I did run reiserfsck against md device md13 as shown below. reiserfsck didn't refuse to run or mention anything about Maintenance mode. What's you recommendation moving forward?

Replaying journal: Done.
Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed

###########

reiserfsck --rebuild-tree started at Fri Jun 19 13:16:07 2015

###########

Pass 0:

####### Pass 0 #######

Loading on-disk bitmap .. ok, 678797287 blocks marked used

Skipping 30567 blocks (super block, journal, bitmaps) 678766720 blocks will be read

0%. left 586061335, 4043 /sec

itimpi · June 21, 2015

The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point).

Perhaps I was already in Maintenance mode and I didn't know it? Can Maintenance mode be automatically enable by a redball, parity build or some failure event? I don't see anything regarding Maintenance mode in the Dynamix webgui. Is there cmd I can run in the terminal to show if its in Maintenance mode or not?

On Main->Array Operations there is a checkbox under the button for starting the array to start in Maintenance mode. Perhaps you checked and do not remember?

If in a terminal session use the 'df' command. If in Maintenance mode the you will not see any disks mounted. If running in normal mode you will see all the disks are mounted.

Christopher · June 21, 2015

On Main->Array Operations there is a checkbox under the button for starting the array to start in Maintenance mode. Perhaps you checked and do not remember?

I'm pretty sure I didn't check it but you never know.

If in a terminal session use the 'df' command. If in Maintenance mode the you will not see any disks mounted. If running in normal mode you will see all the disks are mounted.

Here's the df output. md13 is the only one not mounted. I read in another thread that the "unformatted" label in the webgui should really read "unmounted". So the array is in normal mode with all disks mounted except md13. Perhaps reiserfsck didn't complain because the device it was operating on was already unmounted? I've attached a few webgui screenshots. Should I stop the array and then restart the server?

root@FileServer02:~# df

Filesystem 1K-blocks Used Available Use% Mounted on

tmpfs 131072 600 130472 1% /var/log

/dev/sda1 1957600 633120 1324480 33% /boot

/dev/md1 2930177100 2891592192 38584908 99% /mnt/disk1

/dev/md2 2930177100 2918796832 11380268 100% /mnt/disk2

/dev/md3 2930177100 2912435808 17741292 100% /mnt/disk3

/dev/md4 2930177100 2792846216 137330884 96% /mnt/disk4

/dev/md5 2930177100 2859854476 70322624 98% /mnt/disk5

/dev/md6 2930177100 2906808036 23369064 100% /mnt/disk6

/dev/md7 2930177100 2922271496 7905604 100% /mnt/disk7

/dev/md8 2930177100 2898554888 31622212 99% /mnt/disk8

/dev/md9 2930177100 2930177100 0 100% /mnt/disk9

/dev/md10 2930177100 2898371724 31805376 99% /mnt/disk10

/dev/md11 2930177100 2923205360 6971740 100% /mnt/disk11

/dev/md12 2930177100 2930177100 0 100% /mnt/disk12

/dev/md14 3906899292 3906899292 0 100% /mnt/disk14

/dev/md15 2930177100 2624423100 305754000 90% /mnt/disk15

/dev/md16 3906899292 3524213932 382685360 91% /mnt/disk16

/dev/sdo1 234423872 189317232 45106640 81% /mnt/cache

shfs 45906100884 44840627552 1065473332 98% /mnt/user0

shfs 46140524756 45029944784 1110579972 98% /mnt/user

root@FileServer02:~#

Thanks,

Christopher

Christopher · June 23, 2015

I went ahead and restarted the server and now disk13 is no longer showing as unformatted but still has a red ball. The reiserfsck tool recovered about 2/3 of my files. I've copied the recovered files using terracopy to a drive on my PC.

I want to attempt to rebuild the disk13 by putting my old parity drive in. I know that my new parity is not correct. To do this I need to remove the disk13 red ball. Unraid will not let me put back my old parity drive with a red ball. The reason I didn't do a rebuild initially was because the events leading up to disk13 failure were obscure in my mind. Therefore I didn't trust my old parity drive.

The reason I want to do a parity(old) rebuilt after a reiserfsck recovery: I want to md5 compare reiserfsck recovered files with parity(old) rebuilt files. If I see that the md5 matches for all or most files, then this give me confidence in the parity(old) rebuilt disk13 data.

I found this information below but I wanted to confirm with you before I move forward. Also I have Unraid version 5.0.6. Whats the best way to remove the redball and put my old parity drive back?

http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

Thanks,

Christopher

[SOLVED] Drive Error Red Ball After Parity Drive Upgraded: can't read superblock

Recommended Posts

Christopher

Link to comment

Christopher

Link to comment

JonathanM

Link to comment

Christopher

Link to comment

Christopher

Link to comment

itimpi

Link to comment

Christopher

Link to comment

Christopher

Link to comment

itimpi

Link to comment

Christopher

Link to comment

itimpi

Link to comment

Christopher

Link to comment

itimpi

Link to comment

Christopher

Link to comment

Christopher

Link to comment

Join the conversation