Christopher Posted June 18, 2015 Share Posted June 18, 2015 I started by running a parity check on existing 4GB parity drive. Then I followed the steps described on the forum to upgrade my parity drive from 4GB to 5TB. I left it building the parity drive. After a day or so I check to see what the status was. It was complete but disk13 has a red ball with 200K+ errors reported in the Dynamix web gui. I tried accessing the drive and was able to list and open files. When I attempted to get the system log I got some memory error. It was getting late so I decided to shutdown the server pick up in the morning. I turn on the server this morning and disk13 is showing as unformatted. I'm not sure how to interpret the mdcmd results below but I don't think it's good. Please let me know if the mdcmd status is ok and I can rebuild disk13 using the new 5GB parity drive. If not, what other options do I have to recover the data? Also, I still have the old 4GB parity drive. Below is a link to the system log and smart report . The smart report is from a prior test. I didn't run a smart test since this happened. I wasn't sure if it would make the situation worse. Thanks, Christopher Unraid Version: 5.0.6 System log and smart report: https://drive.google.com/file/d/0B00Diiihkv_qSmY5YXpHaFM4VkU/view?usp=sharing /root/mdcmd status | egrep "mdResync|mdState|sbSync" sbSynced=0 sbSyncErrs=0 mdState=STARTED mdResync=0 mdResyncCorr=1 mdResyncPos=0 mdResyncDt=0 mdResyncDb=0 This looks like the error in the system log: Jun 18 14:43:52 FileServer02 emhttp: shcmd (10217): mkdir /mnt/disk13 Jun 18 14:43:52 FileServer02 emhttp: shcmd (10218): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md13 /mnt/disk13 |& logger Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): found reiserfs format "3.6" with standard journal Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): using ordered data mode Jun 18 14:43:52 FileServer02 kernel: reiserfs: using flush barriers Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): journal params: device md13, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): checking transaction log (md13) Jun 18 14:43:52 FileServer02 kernel: REISERFS (device md13): replayed 2 transactions in 0 seconds Jun 18 14:43:53 FileServer02 kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 28661 does not match to the expected one 4 Jun 18 14:43:53 FileServer02 kernel: REISERFS error (device md13): vs-5150 search_by_key: invalid format found in block 602112004. Fsck? Jun 18 14:43:53 FileServer02 kernel: REISERFS (device md13): Remounting filesystem read-only Jun 18 14:43:53 FileServer02 kernel: REISERFS error (device md13): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Jun 18 14:43:53 FileServer02 kernel: REISERFS (device md13): Using r5 hash to sort names Jun 18 14:43:53 FileServer02 logger: mount: /dev/md13: can't read superblock Jun 18 14:43:53 FileServer02 emhttp: _shcmd: shcmd (10218): exit status: 32 Jun 18 14:43:53 FileServer02 emhttp: disk13 mount error: 32 Jun 18 14:43:53 FileServer02 emhttp: shcmd (10219): rmdir /mnt/disk13 Link to comment
Christopher Posted June 19, 2015 Author Share Posted June 19, 2015 I ran reiserfsck --check /dev/sdr and here's the output: reiserfs_open: the reiserfs superblock cannot be found on /dev/sdr. Failed to open the filesystem. If the partition table has not been changed, and the partition is valid and it really contains a reiserfs partition, then the superblock is corrupted and you need to run this utility with --rebuild-sb. Link to comment
JonathanM Posted June 19, 2015 Share Posted June 19, 2015 I ran reiserfsck --check /dev/sdr and here's the output: reiserfs_open: the reiserfs superblock cannot be found on /dev/sdr. Failed to open the filesystem. The filesystem that you need to be working on is at /dev/md13. NEVER operate on the raw /dev/sd? device, and only mess with the /dev/sd?1 location if you are ok with invalidating parity, and know exactly what and why you are doing it. http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Link to comment
Christopher Posted June 19, 2015 Author Share Posted June 19, 2015 I'm running the check command as "reiserfsck --check /dev/md13" and will report the results in the morning. Thanks. Link to comment
Christopher Posted June 19, 2015 Author Share Posted June 19, 2015 Here's the result of "reiserfsck --check /dev/md13": ########### reiserfsck --check started at Fri Jun 19 00:57:26 2015 ########### Replaying journal: Done. Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \block 602112004: The level of the node (28661) is not correct, (4) expected the problem in the internal node occured (602112004), whole subtree is skipped finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 1 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Fri Jun 19 01:18:49 2015 ########### Should I proceed and run "reiserfsck --rebuild-tree /dev/md13"? Thanks, Christopher Link to comment
itimpi Posted June 19, 2015 Share Posted June 19, 2015 Here's the result of "reiserfsck --check /dev/md13": ########### reiserfsck --check started at Fri Jun 19 00:57:26 2015 ########### Replaying journal: Done. Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \block 602112004: The level of the node (28661) is not correct, (4) expected the problem in the internal node occured (602112004), whole subtree is skipped finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 1 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Fri Jun 19 01:18:49 2015 ########### Should I proceed and run "reiserfsck --rebuild-tree /dev/md13"? Thanks, Christopher Yes. It looks as if that is what is required. Link to comment
Christopher Posted June 19, 2015 Author Share Posted June 19, 2015 This is going to take a long time. Is there anyway to disable parity while during the reiserfsck process? I can build the parity drive after. I don't think my parity is correct anyways based on the "mdcmd status" results in my original post above. Thanks. Current reiserfsck Status: Replaying journal: Done. Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Fri Jun 19 13:16:07 2015 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 678797287 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 678766720 blocks will be read 0%. left 586061335, 4043 /sec Link to comment
Christopher Posted June 21, 2015 Author Share Posted June 21, 2015 The reiserfsck process just finish and the end of the report is shown below. The web front end still shows the drive as unformatted. The wiki says I should restart the array after reiserfsck completes. Please confirm if there's anything else I should do before a restart? Also what exactly does "Deleted unreachable items 40581" and "Empty lost dirs removed 2" mean from the report? Thanks! Flushing..finished Objects without names 1329 Empty lost dirs removed 2 Dirs linked to /lost+found: 17 Dirs without stat data found 8 Files linked to /lost+found 1312 Pass 4 - finished done 501821, 11 /sec Deleted unreachable items 40581 Flushing..finished Syncing..finished ########### reiserfsck finished at Sat Jun 20 23:34:12 2015 Link to comment
itimpi Posted June 21, 2015 Share Posted June 21, 2015 The reiserfsck process just finish and the end of the report is shown below. The web front end still shows the drive as unformatted. The wiki says I should restart the array after reiserfsck completes. Please confirm if there's anything else I should do before a restart? Also what exactly does "Deleted unreachable items 40581" and "Empty lost dirs removed 2" mean from the report? Thanks! Flushing..finished Objects without names 1329 Empty lost dirs removed 2 Dirs linked to /lost+found: 17 Dirs without stat data found 8 Files linked to /lost+found 1312 Pass 4 - finished done 501821, 11 /sec Deleted unreachable items 40581 Flushing..finished Syncing..finished ########### reiserfsck finished at Sat Jun 20 23:34:12 2015 The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)? Link to comment
Christopher Posted June 21, 2015 Author Share Posted June 21, 2015 The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)? No. I have not restarted the array yet since reiserfsck completed. I'm asking if there's anything else I should do before a restart? Also, I did not run reiserfsck in Maintenance mode. My mistake. What's are negative effects of this? Link to comment
itimpi Posted June 21, 2015 Share Posted June 21, 2015 The array should have been running in Maintenance mode while you were doing this? Are you saying that stopping the array from running in Maintenance mode and restarting it in normal mode is still showing the disk as unformatted (it should not)? No. I have not restarted the array yet since reiserfsck completed. I'm asking if there's anything else I should do before a restart?] The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point). Also, I did not run reiserfsck in Maintenance mode. My mistake. What's are negative effects of this? I thought that reiserfsck would refuse to run against a mounted file system so not quite sure how you managed to run it against a 'md' type device if it was not in Maintenance mode. If you did not run against a 'md' type device then perhaps you better tell us what actually happened before proceeding. Link to comment
Christopher Posted June 21, 2015 Author Share Posted June 21, 2015 The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point). Perhaps I was already in Maintenance mode and I didn't know it? Can Maintenance mode be automatically enable by a redball, parity build or some failure event? I don't see anything regarding Maintenance mode in the Dynamix webgui. Is there cmd I can run in the terminal to show if its in Maintenance mode or not? I thought that reiserfsck would refuse to run against a mounted file system so not quite sure how you managed to run it against a 'md' type device if it was not in Maintenance mode. If you did not run against a 'md' type device then perhaps you better tell us what actually happened before proceeding. I did run reiserfsck against md device md13 as shown below. reiserfsck didn't refuse to run or mention anything about Maintenance mode. What's you recommendation moving forward? Replaying journal: Done. Reiserfs journal '/dev/md13' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Fri Jun 19 13:16:07 2015 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 678797287 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 678766720 blocks will be read 0%. left 586061335, 4043 /sec Link to comment
itimpi Posted June 21, 2015 Share Posted June 21, 2015 The normal procedure would be to stop the array (which was expected to be in Maintenance mode) and restart it in normal mode. That is probably what you need to do (but see the next point). Perhaps I was already in Maintenance mode and I didn't know it? Can Maintenance mode be automatically enable by a redball, parity build or some failure event? I don't see anything regarding Maintenance mode in the Dynamix webgui. Is there cmd I can run in the terminal to show if its in Maintenance mode or not? On Main->Array Operations there is a checkbox under the button for starting the array to start in Maintenance mode. Perhaps you checked and do not remember? If in a terminal session use the 'df' command. If in Maintenance mode the you will not see any disks mounted. If running in normal mode you will see all the disks are mounted. Link to comment
Christopher Posted June 21, 2015 Author Share Posted June 21, 2015 On Main->Array Operations there is a checkbox under the button for starting the array to start in Maintenance mode. Perhaps you checked and do not remember? I'm pretty sure I didn't check it but you never know. If in a terminal session use the 'df' command. If in Maintenance mode the you will not see any disks mounted. If running in normal mode you will see all the disks are mounted. Here's the df output. md13 is the only one not mounted. I read in another thread that the "unformatted" label in the webgui should really read "unmounted". So the array is in normal mode with all disks mounted except md13. Perhaps reiserfsck didn't complain because the device it was operating on was already unmounted? I've attached a few webgui screenshots. Should I stop the array and then restart the server? root@FileServer02:~# df Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 131072 600 130472 1% /var/log /dev/sda1 1957600 633120 1324480 33% /boot /dev/md1 2930177100 2891592192 38584908 99% /mnt/disk1 /dev/md2 2930177100 2918796832 11380268 100% /mnt/disk2 /dev/md3 2930177100 2912435808 17741292 100% /mnt/disk3 /dev/md4 2930177100 2792846216 137330884 96% /mnt/disk4 /dev/md5 2930177100 2859854476 70322624 98% /mnt/disk5 /dev/md6 2930177100 2906808036 23369064 100% /mnt/disk6 /dev/md7 2930177100 2922271496 7905604 100% /mnt/disk7 /dev/md8 2930177100 2898554888 31622212 99% /mnt/disk8 /dev/md9 2930177100 2930177100 0 100% /mnt/disk9 /dev/md10 2930177100 2898371724 31805376 99% /mnt/disk10 /dev/md11 2930177100 2923205360 6971740 100% /mnt/disk11 /dev/md12 2930177100 2930177100 0 100% /mnt/disk12 /dev/md14 3906899292 3906899292 0 100% /mnt/disk14 /dev/md15 2930177100 2624423100 305754000 90% /mnt/disk15 /dev/md16 3906899292 3524213932 382685360 91% /mnt/disk16 /dev/sdo1 234423872 189317232 45106640 81% /mnt/cache shfs 45906100884 44840627552 1065473332 98% /mnt/user0 shfs 46140524756 45029944784 1110579972 98% /mnt/user root@FileServer02:~# Thanks, Christopher Link to comment
Christopher Posted June 23, 2015 Author Share Posted June 23, 2015 I went ahead and restarted the server and now disk13 is no longer showing as unformatted but still has a red ball. The reiserfsck tool recovered about 2/3 of my files. I've copied the recovered files using terracopy to a drive on my PC. I want to attempt to rebuild the disk13 by putting my old parity drive in. I know that my new parity is not correct. To do this I need to remove the disk13 red ball. Unraid will not let me put back my old parity drive with a red ball. The reason I didn't do a rebuild initially was because the events leading up to disk13 failure were obscure in my mind. Therefore I didn't trust my old parity drive. The reason I want to do a parity(old) rebuilt after a reiserfsck recovery: I want to md5 compare reiserfsck recovered files with parity(old) rebuilt files. If I see that the md5 matches for all or most files, then this give me confidence in the parity(old) rebuilt disk13 data. I found this information below but I wanted to confirm with you before I move forward. Also I have Unraid version 5.0.6. Whats the best way to remove the redball and put my old parity drive back? http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily Thanks, Christopher Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.