crakhed Posted March 8, 2013 Share Posted March 8, 2013 unRAID 5.0rc11 no plugins currently mobo - Asus NAOS cpu - amd am2 3.7ghz mem - 2gb ddr2 800mhz syslog attached (zipped due to size) So I've followed probably every single thread having anything to do with superblock finding/repairing/rebuilding or mbr errors and now I'm stuck... Shortly (about a week) after upgrading from 5rc10 to 5rc11 and installing a few SimpleFeatures plugins (which I've now removed in trying to sort this out), while copying a file to one of my user shares, the server stopped responding in explorer. Switching to my browser found the webui non-responsive as well, but showing errors on disk2, 3, and 5. I tried to telnet to the tower but go no response, so I hard-powered down. After a cigarette/bathroom break, I booted the server to find those same 3 disks unformatted (entire array reports SMART = pass). So I came here. 2 hours of reading later, I used suggestions in several threads to: stop/start server - that got 1 drive back; used Joe L's nifty partitioning script, got a 2nd drive recovered. Now this 3rd drive is confusing the hell out of me and it seems to be a unique problem. The filesystem seems to be on sector 65... Linux 3.4.26-unRAID. root@Tower:~# dd if=/dev/sdf count=195 | od -c -A d | sed 30q 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000448 \0 \0 203 \0 \0 \0 @ \0 \0 \0 p 210 340 350 \0 \0 0000464 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000496 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 U 252 0000512 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0098816 016 021 034 035 4 016 235 \0 \f \0 267 005 022 \0 \0 \0 0098832 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 342 250 8 q 0098848 204 003 \0 \0 036 \0 \0 \0 \0 \0 \0 \0 \0 020 314 003 0098864 312 003 002 \0 R e I s E r 2 F s \0 \0 \0 0098880 003 \0 \0 \0 005 \0 9 : 002 \0 \0 \0 336 216 \0 \0 0098896 001 \0 \0 \0 031 Z 360 314 260 250 G w 214 257 314 001 0098912 345 002 225 S \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0098928 \0 \0 \0 \0 253 \0 036 \0 024 m G O \0 N 355 \0 0098944 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0099008 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 0099024 370 8 \0 \0 371 8 \0 \0 373 8 \0 \0 374 8 \0 \0 0099040 005 9 \0 \0 006 9 \0 \0 H 9 \0 \0 I 9 \0 \0 0099056 i 9 \0 \0 j 9 \0 \0 n 9 \0 \0 o 9 \0 \0 0099072 r 9 \0 \0 s 9 \0 \0 x 9 \0 \0 y 9 \0 \0 0099088 B > \0 \0 C > \0 \0 4 @ \0 \0 5 @ \0 \0 0099104 272 A \0 \0 273 A \0 \0 \v C \0 \0 \f C \0 \0 0099120 021 C \0 \0 022 C \0 \0 030 C \0 \0 031 C \0 \0 0099136 L D \0 \0 M D \0 \0 k D \0 \0 l D \0 \0 0099152 267 K \0 \0 270 K \0 \0 275 K \0 \0 276 K \0 \0 0099168 D O \0 \0 E O \0 \0 336 Q \0 \0 337 Q \0 \0 0099184 342 Q \0 \0 343 Q \0 \0 346 Q \0 \0 347 Q \0 \0 195+0 records in 195+0 records out 99840 bytes (100 kB) copied, 0.0196374 s, 5.1 MB/s root@Tower:~# In this thread: http://lime-technology.com/forum/index.php?topic=26153.msg229132#msg229132, I tried checking if unraid would emulate the missing drive, but I got the same result as nmotion96 had, but my reiserfsck output afterwards wasn't the same as his. Elsewhere, I saw a suggestion that unassigning->starting->stopping->reassigning->starting tricks unraid into thinking the drive is new and rebuilding data. Since I don't have access to a blank drive for cloning and assuming my parity hadn't been modified since the drive went down, I tried that. No dice. Of course thinking back now, if the emulated drive was corrupt, and I rebuilt disk2 with that data, I may have made things worse... reiserfsck /dev/sdf1 still returned no superblock found and suggests rebuild-sb so I hunted down answers for the prompts and ran it, then it suggested rebuild-tree. After seeing others having success at this stage, I ran it also. Started server, still unformatted, even though i saw all my files being discovered during the rebuild. BUT running Joe L's script to check the drive is weird, ######################################################################## Model Family: Hitachi Deskstar 7K2000 Device Model: Hitachi HDS722020ALA330 Serial Number: JK1130YAHAMDWT Firmware Version: JKAOA28A User Capacity: 2,000,398,934,016 bytes Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes 1 heads, 63 sectors/track, 62016336 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdf1 64 3907029167 1953514552 83 Linux Partition 1 does not end on cylinder boundary. ######################################################################## ============================================================================ == == DISK /dev/sdf IS partitioned for unRAID properly == expected start = 64, actual start = 64 == expected size = 3907029104, actual size = 3907029104 == ============================================================================ root@Tower:/boot# It says the partition is fine, and that the filesystem starts on sector 64. Is my math right that reiserfsck's discovery at block 0098864 is sector 65? Following instructions in this thread: http://lime-technology.com/forum/index.php?topic=5072.0, i get this: root@Tower:/boot# sfdisk -g /dev/sdf /dev/sdf: 243201 cylinders, 255 heads, 63 sectors/track root@Tower:/boot# blockdev --getsz /dev/sda 1953525168 root@Tower:/boot# fdisk -l -u /dev/sdf Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes 1 heads, 63 sectors/track, 62016336 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdf1 64 3907029167 1953514552 83 Linux Partition 1 does not end on cylinder boundary. root@Tower:/boot# od -x -A d /dev/sdf | head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000448 0000 0083 0000 0040 0000 8870 e8e0 0000 0000464 0000 0000 0000 0000 0000 0000 0000 0000 * 0000496 0000 0000 0000 0000 0000 0000 0000 aa55 0000512 0000 0000 0000 0000 0000 0000 0000 0000 * 0098816 110e 1d1c 0e34 009d 000c 05b7 0012 0000 0098832 0000 0000 2000 0000 0400 0000 a8e2 7138 root@Tower:/boot# If it matters, 4/6 of these drives were in a windows machine until my friend hipped me to unRAID about a year ago. I bought a couple new drives, set up the server, then moved each drive over to the tower as I emptied them from the windows machine and it's been pretty sweet until now. So that's where I'm at right now. Any ideas? Thanks in advance. syslog-2013-03-08.zip Quote Link to comment
Joe L. Posted March 8, 2013 Share Posted March 8, 2013 It looks to me like your analysis is correct. The partition is starting on sector 64, but the file system seems to be at sector 65. reiserfsck /dev/sdf1 still returned no superblock found and suggests rebuild-sb so I hunted down answers for the prompts and ran it, What EXACT set of answers did you supply to the prompts when you ran --rebuild-sb ? This is the only thing I can think of that would have moved the superblock of the reiser file system. Joe L. Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 I got the answers from this thread, http://lime-technology.com/forum/index.php?topic=1483, but I answered 1 at the first prompt for version. I figured since I had setup the server originally with a 5.0beta (14 I think), that it would be the most recent version of the file system. I'm pretty sure I answered the rest just as bjp999 suggested. Maybe I shouldn't have been so hasty in trying to fix it myself? I'm bad at asking for help, lol. Quote Link to comment
Joe L. Posted March 8, 2013 Share Posted March 8, 2013 I got the answers from this thread, http://lime-technology.com/forum/index.php?topic=1483, but I answered 1 at the first prompt for version. I figured since I had setup the server originally with a 5.0beta (14 I think), that it would be the most recent version of the file system. I'm pretty sure I answered the rest just as bjp999 suggested. Maybe I shouldn't have been so hasty in trying to fix it myself? I'm bad at asking for help, lol. What "device" did you run the repair on? /dev/? ? ? ? Joe L. Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 I MAY have run rebuild-sb at disk level, as in sdf not sdf1. Would that have made a difference? It's been days now and I honestly can't remember. I had to run rebuild-sb on a disk a few months back, but I can't remember if it was the same drive. I didn't have any issues that time though. Quote Link to comment
Joe L. Posted March 8, 2013 Share Posted March 8, 2013 I MAY have run rebuild-sb at disk level, as in sdf not sdf1. Would that have made a difference? It's been days now and I honestly can't remember. I had to run rebuild-sb on a disk a few months back, but I can't remember if it was the same drive. I didn't have any issues that time though. It should have been on the /dev/mdX device. (that way parity is kept in sync, otherwise, you've clobbered valid parity.) All I can think of at this time is to run the --rebuild-sb once more, but this time on /dev/mdX (disk1 = /dev/md1, disk 2 = /dev/md2, etc...) Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 I don't mind resyncing parity as long as I can get the data drive fixed. Before I start, can you ascertain from the info I gave already whether or not the mbr is correct? Should I use your partition script to set sector 63? or leave it as 64? And in using 'md2' as the device, I don't need to specify the partition with the trailing '1', right? Also, what specific answers do you suggest for my scenario in case I picked the wrong ones before? Quote Link to comment
Joe L. Posted March 8, 2013 Share Posted March 8, 2013 I don't mind resyncing parity as long as I can get the data drive fixed. Before I start, can you ascertain from the info I gave already whether or not the mbr is correct? Should I use your partition script to set sector 63? or leave it as 64?Leave it where it is. And in using 'md2' as the device, I don't need to specify the partition with the trailing '1', right? No, the "md" device is already connected to the correct partition. For disk9 you would use /dev/md9, for disk17 you would use /dev/md17 Also, what specific answers do you suggest for my scenario in case I picked the wrong ones before? 1. I would choose the 3.6.X for the file system version. 2. Block size is 4096 (default) 3. “No journal device was specified. (If journal is not available, re-run with --no-journal-available option specified)” Is journal default?” (Answer Y) 4. “Do you use resizer?” (Answer N) 5. It tells you that a new uuid has been generated. 6. “rebuild-sb: You either have a corrupted journal or have just changed the start of the partition with some partition table editor. If you are sure that the start of the partition is ok, rebuild the journal header. Do you want to rebuild the journal header?” Answer Y Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 Okay, thanks, I'll fire it up and post back later after it finishes. Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 I ran --rebuild-sb, then --check as it suggested and got the bad root block 0 error... run --rebuild-tree? Tower login: root Linux 3.4.26-unRAID. root@Tower:~# reiserfsck --rebuild-sb /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will check superblock and rebuild it if needed Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Failed to open the device '/dev/md2': No such file or directory root@Tower:~# reiserfsck --rebuild-sb /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will check superblock and rebuild it if needed Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes reiserfs_open: the reiserfs superblock cannot be found on /dev/md2. what the version of ReiserFS do you use[1-4] (1) 3.6.x (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, ch oose this one) (3) < 3.5.9 converted to new format (don't choose if unsure) (4) < 3.5.9 (this is very old format, don't choose if unsure) (X) exit 1 Enter block size [4096]: No journal device was specified. (If journal is not available, re-run with --no- journal-available option specified). Is journal default? (y/n)[y]: y Did you use resizer(y/n)[n]: n rebuild-sb: no uuid found, a new uuid was generated (c82f131e-5f7b-4555-99f7-ca9 a086801bc) rebuild-sb: You either have a corrupted journal or have just changed the start of the partition with some partition table editor. If you are sure that the start of the partition is ok, rebuild the journal header. Do you want to rebuild the journal header? (y/n)[n]: y Reiserfs super block in block 16 on 0x902 of format 3.6 with standard journal Count of blocks on the device: 488378624 Number of bitmaps: 14905 Blocksize: 4096 Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0 Root block: 0 Filesystem is NOT clean Tree height: 0 Hash function used to sort names: not set Objectid map size 0, max 972 Journal parameters: Device [0x0] Magic [0x0] Size 8193 blocks (including 1 for journal header) (first block 18) Max transaction length 1024 blocks Max batch size 900 blocks Max commit age 30 Blocks reserved by journal: 0 Fs state field: 0x1: some corruptions exist. sb_version: 2 inode generation number: 0 UUID: c82f131e-5f7b-4555-99f7-ca9a086801bc LABEL: Set flags in SB: Mount count: 1 Maximum mount count: 30 Last fsck run: Fri Mar 8 11:26:55 2013 Check interval in days: 180 Is this ok ? (y/n)[n]: y The fs may still be unconsistent. Run reiserfsck --check. root@Tower:~# reiserfsck --check /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md2 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Fri Mar 8 11:28:54 2013 ########### Replaying journal: No transactions found Checking internal tree.. Bad root block 0. (--rebuild-tree did not complete) Aborted Quote Link to comment
itimpi Posted March 8, 2013 Share Posted March 8, 2013 That looks correct. it is quite normal to have to run with --rebuild-tree after having run with --rebuild-sb. Quote Link to comment
Joe L. Posted March 8, 2013 Share Posted March 8, 2013 That looks correct. it is quite normal to have to run with --rebuild-tree after having run with --rebuild-sb. exactly as he said. Run --rebuild-tree next. Quote Link to comment
crakhed Posted March 8, 2013 Author Share Posted March 8, 2013 Running now, about 10-12 hours to go I'm guessing. I work 11p-11a tonight, so it'll be tmrw afternoon before I check back. Thanks again, guys. Quote Link to comment
crakhed Posted March 9, 2013 Author Share Posted March 9, 2013 Well I just got up to get ready for work and it finished. Stopped/started server and it mounts proper now. However, when I browse to \\TOWER\disk2 the only directory is lost+found, and I get a permissions error trying to open it. I has a size of 0 (probably due to the permission error), even though the webui shows disk2 as 90% full . Should I reboot? Fix permissions? I did notice a lot of what looked like permission corrections/adjustments during the --rebuild-tree process. Quote Link to comment
itimpi Posted March 9, 2013 Share Posted March 9, 2013 You need to run the newperms command on the lost+found folder before you will be able to access it via the network. The reiserfsck command will almost certainly have created it and all the files it contains as owned by 'root'. Quote Link to comment
crakhed Posted March 9, 2013 Author Share Posted March 9, 2013 It's like I don't have permission to the disk at all, none of the user shares appear on disk2 ONLY the lost+found, which I can't access. The rest of the server is fully functional at disk level, but each top level share folder is still missing the files that are specifically on disk2. So do I run newperms /mnt/disk2 via telnet on disk2, or do I run the script on the whole array through the webui button? Quote Link to comment
itimpi Posted March 9, 2013 Share Posted March 9, 2013 It's like I don't have permission to the disk at all, none of the user shares appear on disk2 ONLY the lost+found, which I can't access. The rest of the server is fully functional at disk level, but each top level share folder is still missing the files that are specifically on disk2. So do I run newperms /mnt/disk2 via telnet on disk2, or do I run the script on the whole array through the webui button? It is much quicker to run it via a telnet session on /mnt/disk2, and that would be the approach I would recommend. Running it via the GUI will achieve the same effect in the end but take much longer as it will be run against all disks. Quote Link to comment
crakhed Posted March 10, 2013 Author Share Posted March 10, 2013 Well the drive is accessible now, although EVERYTHING on it was relegated to the lost+found folder. Some of it is recoverable, but most of this drive was a big chunk of my music collection, and it's been almost completely thrashed. Also, I'm seeing 3 undeletable files that are impossible... 1 file showing a size of 300 petabytes, and 2 showing 900 petabytes, which alone is 1 million times the size of the drive, lol. Should I run a parity check with it like that? Or should I salvage the stuff I can and reset/preclear/reassign this drive as new? Maybe Tom has unlocked the ultimate file compression? I'll just proceed with sorting through the other 18,000 unidentified files/folders. At least a few 2nd level directories were preserved, but not many. I've looked around here a bit, are there any good tools for at least identifying some of this stuff? Anyway, I guess this can be tagged as solved. Thanks a lot for the help. I definitely learned my lesson about trying to fix it myself, next time I'll be quicker at asking for help. Quote Link to comment
crakhed Posted March 11, 2013 Author Share Posted March 11, 2013 Just wanted to update, I went on a quest. I dug up a cool program called Trid http://mark0.net/soft-trid-e.html for identifying and renaming (adding a coherent extension) all the loose files. Copy-pasta'd some automated script commands and am happy to report that it is chugging along nicely at least identifying mp3s. And it seems that the id3 tags are preserved so, thanks to a super customized version of Mediamonkey and an archive index to check against, I'll be able to reorganize them properly with a few scripts. Thanks a lot again, this was almost worth the learning experience. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.