October 11, 201411 yr I have a problem with on of the drives in my array. I am running Version 5.0 I have 4 data disks in the array, 1 Parity and a cache drive A few days ago I got an error when trying to write to a user folder that access was denied. I went to my server and there were a number of messages on the screen re a ReiserFS error. After running a search I came across http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems and ran: reiserfsck --check /dev/md2 (md2 was reporting the error) The result recommended running a --rebuild-tree on md2 The process aborted with the message killed. Now the 2nd drive is appearing in the array as unformatted. The files that are on that drive are not being presented as they usually are when 1 drive is unavailable. I tried a rebuild of the drive but that didn't work. Any advice on how I can access these files or are they gone? Thanks
October 11, 201411 yr The drive will continue to show as unformatted unless reiserfsck can run to completion. Some more detail on what was reported when reiserfsck failed might help to work out why it failed. You can probably also get data off the drive by unplugging it from the array and attaching it to a PC and running the linux reader program.
October 12, 201411 yr Author I'll try access the drive from my windows desktop. I don't have a copy of the error message from reiserfsck. If I try and run a reiserfsck on the disk now it aborts with the message 'killed' The array is still reported as 12TB (4 x 3) but the drive lists as unformatted and Unraid isn't presenting the 'missing' data using the parity check which makes me think the data is gone :'(
October 12, 201411 yr The array is still reported as 12TB (4 x 3) but the drive lists as unformatted and Unraid isn't presenting the 'missing' data using the parity check which makes me think the data is gone :'( I think you misunderstand how parity works? All it does is reconstruct any sector that cannot be read from the physical disk. It does not fix any file system corruption. If the disk is not physically damaged then the data is still there and should still be retrievable. Going the PC route would then work. Another possibility is to try running the reiserfsck with the physical drive removed so it is trying to fix the 'logical' drive. If the reiserfsck was aborting because of an error on the physical drive it might mean it can now complete and I'd the 'logical' drive and make the data accessible on the array again.
October 12, 201411 yr Author Hi itimpi thanks for your response. I might be misunderstanding parity. Last time I physically removed a drive from an array it still appeared in the array as a 'logical' drive. Now, when I remove it and start the array that doesn't happen. Disk 2 is just listed as uninstalled and unformatted. I put the drive in my windows PC with Linux Reader running. It could recognize it as a reiserfs but couldn't open it ('Unable to open drive').
October 12, 201411 yr thanks for your response. I might be misunderstanding parity. Last time I physically removed a drive from an array it still appeared in the array as a 'logical' drive. Now, when I remove it and start the array that doesn't happen. Disk 2 is just listed as uninstalled and unformatted. When you remove a drive physically then it is 'emulated' using the other drives. In this case the emulated drive has file system corruption which is why it cannot be mounted and still shows as unformatted. Running reiserfsck against the logical drive is the way to fix the file system corruption and make the data visible again. If reiserfsck fails then knowing the exact details of the error message can suggest a way forward. I put the drive in my windows PC with Linux Reader running. It could recognize it as a reiserfs but couldn't open it ('Unable to open drive'). Hmm - it is possible that the disk has really failed. A useful program for getting a SMART report is disk checkup. You could also try running the manufacturers diagnostics against the drive to see what is reported. The diagnostics will include an option that can do none-destructive testing of the drive to find out if all sectors on it are physically accessible.
October 12, 201411 yr Author When you remove a drive physically then it is 'emulated' using the other drives. In this case the emulated drive has file system corruption which is why it cannot be mounted and still shows as unformatted. Running reiserfsck against the logical drive is the way to fix the file system corruption and make the data visible again. If reiserfsck fails then knowing the exact details of the error message can suggest a way forward. Unfortunately I can't do that because unRaid isn't presenting the emulated disk. I'm not sure if the parity for the error disk is still being maintained either. I'm going to try putting another 'new' disk in the array and try to get it built as disk 2 to see if that works... Hmm - it is possible that the disk has really failed. A useful program for getting a SMART report is disk checkup. You could also try running the manufacturers diagnostics against the drive to see what is reported. The diagnostics will include an option that can do none-destructive testing of the drive to find out if all sectors on it are physically accessible. Thanks, I'll give that a try.
October 12, 201411 yr Author Tried to run a reiserfsck check in md2 with no luck... Output: root@Tower:/# reiserfsck --check /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md2 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Sun Oct 12 19:51:27 2014 ########### Replaying journal: Done. Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. Bad root block 0. (--rebuild-tree did not complete) Aborted (core dumped)
October 12, 201411 yr Unfortunately I can't do that because unRaid isn't presenting the emulated disk. I'm not sure if the parity for the error disk is still being maintained either. I thought you said disk2 was showing as uninstalled and unformatted? That means the disk IS being emulated. I'm going to try putting another 'new' disk in the array and try to get it built as disk 2 to see if that works... You can only do a rebuild of the disk IS being emulated. If the rebuild works successfully it will still show as unformatted and require reiserfsck to fix the problem. A rebuild does not fix file system corruption - it exactly mirrors the emulated disk. Therefore if it is showing as 'unformatted' before the rebuild it will still show as unformatted afterwards.
October 12, 201411 yr Tried to run a reiserfsck check in md2 with no luck... Output: root@Tower:/# reiserfsck --check /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md2 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Sun Oct 12 19:51:27 2014 ########### Replaying journal: Done. Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. Bad root block 0. (--rebuild-tree did not complete) Aborted (core dumped) Once you have attempted a reiserfsck with the --rebuild-tree option then the --check option will not work until --rebuild-tree has completed. You should retry it with the --rebuild-tree option.
October 12, 201411 yr Author Thanks. Tried that again. Same result... root@Tower:/# reiserfsck --rebuild-tree /dev/md2 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** Do not run the program with --rebuild-tree unless ** ** something is broken and MAKE A BACKUP before using it. ** ** If you have bad sectors on a drive it is usually a bad ** ** idea to continue using it. Then you probably should get ** ** a working hard drive, copy the file system from the bad ** ** drive to the good one -- dd_rescue is a good tool for ** ** that -- and only then run this program. ** ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will rebuild the filesystem (/dev/md2) tree Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal: Done. Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sun Oct 12 20:22:45 2014 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 558849451 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 558818884 blocks will be read Killed
October 12, 201411 yr I wonder what is killing it? How much RAM do you have? Do you have many plugins - it so it might be worth trying to run in Safe Mode so no plugins are running? Another thought - are you running this via a console or a telnet session?
October 12, 201411 yr Author I was running via Telnet. I hooked up a monitor and ran the same command ( reiserfsck --rebuild-tree /dev/md2). Output was the same as via Telnet. I have 2GB on the box. I am running in maintenance mode. I'm not sure if that is the same as safe mode. I'll do a search tonight after work.
October 14, 201411 yr Author Quick update: As per itimpi's hypothesis the problem appears to be a lack of RAM. Instead of trying to work out how to free up system memory I just grabbed a 4GB stick from my desktop (it runs 8GB normally) and put it in the server. The reiserfsck --rebuild-tree is running now. Total memory usage has jumped above 2GB... Fingers crossed it makes it to the end and I can get a good recovery. Thanks to itimpi for all the help.
October 14, 201411 yr I will be interested to know if RAM was the issue? I have always had plenty of RAM in my server so would not see such an issue. What size disk are we talking about? I am not sure if RAM is determined by disk size, by the number of files on the disk; or some combination. As far as I know reiserfsck does not have any options for controlling RAM usage. I think the equivalent for XFS (xfs_repair) does.
October 14, 201411 yr Author I will be interested to know if RAM was the issue? I have always had plenty of RAM in my server so would not see such an issue. What size disk are we talking about? I am not sure if RAM is determined by disk size, by the number of files on the disk; or some combination. As far as I know reiserfsck does not have any options for controlling RAM usage. I think the equivalent for XFS (xfs_repair) does. I'm pretty sure RAM was the reason for the 'killed' abort message earlier. The process has never survived so long before. It's a 3TB drive. Don't have an exact count but there would be a pretty large number of files. The system is showing RAM usage as Cached 2.43GM, Used 538MB & Free 917MB. Prior to reiserfsck Cached+used were just below 2GB (the prior memory limit). Once reiserfsck was initiated the cached+used jumped above 2GB withing a few minutes; which is how long the process would run prior to a 'killed' abort + RAM increase from 2GB to 4GB is the only change I have made between attempts to execute. About another 7 hours to go at current run-rate...
October 15, 201411 yr Author OK, the results are in and it looks like good news. The reiserfsck completed. There are a few files in lost+found but not much. There was 1 unreachable item. Chances are I will never know what it was... Big thanks to itimpi for all the help. Unraid was originally recommended to me in part because of the strong community support and I've just been a big beneficiary of that!
Archived
This topic is now archived and is closed to further replies.