May 2, 201313 yr Please read this thread http://lime-technology.com/forum/index.php?topic=27247.0 Situation now gone from bad to worse. I had what appeared to be a failing 2Tb WD drive. As it only had a few Gbs on it I copied the data from the drive. I then made a mistake by taking the drive out of the array (following the instructions in the WIKI) and starting a parity check as there was one less drive now. I then left it running whilst I went to work. Half way through the day I got an email from the tower warning of a over hot hdd (47 degrees C), which seemed to heat up the adjacent drives. When I got home the 3rd drive, one of those that got hot, was now showing errors and taking ages to mount. I tried to copy the data of to no avail. The drive now shows as unformatted. As a parity check was in progress at the time I now have no parity either. I have tried the reiserfsck command but it tells me there is hardware fault and will not carry on. I have looked up some other threads and some people in similar situations have recovered their drives by moving the drive from one controller to another and trying reiserfsck again. When I do this the drive again tells me there is a hardware fault and will not carry on. "The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (2): (Input/output error)." I have ordered a couple of new drives but as there is no valid parity replacing the drive will not invoke a valid rebuild. Any ideas about what I can try as a last ditch attempt to get anything back from the drive? Someone on another thread suggested copying zeros to the superblock(?) if this a valid idea as a last ditch effort then how do I do it?
May 3, 201313 yr Please read this thread http://lime-technology.com/forum/index.php?topic=27247.0 Situation now gone from bad to worse. I had what appeared to be a failing 2Tb WD drive. As it only had a few Gbs on it I copied the data from the drive. I then made a mistake by taking the drive out of the array (following the instructions in the WIKI) and starting a parity check as there was one less drive now. I then left it running whilst I went to work. Half way through the day I got an email from the tower warning of a over hot hdd (47 degrees C), which seemed to heat up the adjacent drives. When I got home the 3rd drive, one of those that got hot, was now showing errors and taking ages to mount. I tried to copy the data of to no avail. The drive now shows as unformatted. As a parity check was in progress at the time I now have no parity either. I have tried the reiserfsck command but it tells me there is hardware fault and will not carry on. I have looked up some other threads and some people in similar situations have recovered their drives by moving the drive from one controller to another and trying reiserfsck again. When I do this the drive again tells me there is a hardware fault and will not carry on. "The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (2): (Input/output error)." I have ordered a couple of new drives but as there is no valid parity replacing the drive will not invoke a valid rebuild. Any ideas about what I can try as a last ditch attempt to get anything back from the drive? Someone on another thread suggested copying zeros to the superblock(?) if this a valid idea as a last ditch effort then how do I do it? Unless the data is worthless, do NOT write to the drive with ANYTHING until recovery efforts are all exhausted. If you write zeros to the drive, it will be erased. You will lose your data with no chance of recovery. If it has badblocks, then the first step will be to get a smart report on the drive. Then, based on the output of the smart report you can see what might be possible. Joe L.
May 4, 201313 yr Author OK I have run a SHORT Smart report (The long one I am running at the moment.) At first I could not get the smart report to run as even though the drive was shown in the unraid console as sdh when I tried to run the report smartctl reported no such device, at the time UNRAID was also reporting a temp of 0 degrees C . After a reboot UNRAID reports a more sensible temperature and the smart report ran, this drive does not look good... **** smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 family Device Model: ST31500341AS Serial Number: 6VS0EY9K Firmware Version: CC3H User Capacity: 1,500,301,910,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sat May 4 10:57:14 2013 Local time zone must be set--see zic m SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 642) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off supp ort. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 226275730 3 Spin_Up_Time 0x0003 100 089 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 091 091 020 Old_age Always - 9408 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 14779277 9 Power_On_Hours 0x0032 064 064 000 Old_age Always - 32024 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1 12 Power_Cycle_Count 0x0032 093 093 020 Old_age Always - 7297 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 066 066 000 Old_age Always - 34 190 Airflow_Temperature_Cel 0x0022 078 045 045 Old_age Always In_the_past 22 (Lifetime Min/Max 22/22) 194 Temperature_Celsius 0x0022 022 055 000 Old_age Always - 22 (0 6 0 0) 195 Hardware_ECC_Recovered 0x001a 044 026 000 Old_age Always - 226275730 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 111441516437485 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1853856889 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2988060042 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed without error 00% 32024 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ******* Any ideas?
May 4, 201313 yr Author It is prooving "difficult" to run the long test as I think the drive keeps going offline, certainly the drive in unraid seems to cycle between "*" ie not spun up to 0 degrees C to 22 degrees C and smart reports extended offline "Aborted by Host". The drive is also "clicking". Am I right in thinking that this is hopeless? If so does UNRAID cache the directory information anywhere? As the drive was one of the shares I do not know exactly what was on this drive. (I know that it is planned but what would I give for a duel parity drive system right now.)
May 4, 201313 yr It is prooving "difficult" to run the long test as I think the drive keeps going offline, certainly the drive in unraid seems to cycle between "*" ie not spun up to 0 degrees C to 22 degrees C and smart reports extended offline "Aborted by Host". The extended smart test will abort if the drive is spun down. DISABLE unRAID's spin-down feature while you run it. The drive is also "clicking". That is not a really good sign, but it does not indicate all is lost. (grinding sounds and ear-piercing screeches are far worse) Am I right in thinking that this is hopeless? No, not entirely. If so does UNRAID cache the directory information anywhere? As the drive was one of the shares I do not know exactly what was on this drive. No, it does not contain a separate listing of files. (I know that it is planned but what would I give for a duel parity drive system right now.) Dual parity would not help you when your array overheats, or when you take disks out of the array (as you described in your first post) Your smart report only shows 2 re-allocated sectors, and none pending re-allocation. That;s actually pretty decent. The reason the reiserfsck failed originally was because the disk was not responding. Now that it is, it might work. Just be certain you run reiserfsck on the first partition, not on the raw drive, therefore if still assigned t the array you must use it on the /dev/mdX device, or if not assigned to the array, the /dev/sdX1 device (note the trailing "1" designating the first partition)
May 4, 201313 yr Author reiserfsck seems to be working. It recommends running --rebuild-sb ***** If the partition table has not been changed, and the partition is valid and it really contains a reiserfs partition, then the superblock is corrupted and you need to run this utility with --rebuild-sb. **** However when I do that it asks me reiserfs_open: the reiserfs superblock cannot be found on /dev/sdh1. what the version of ReiserFS do you use[1-4] (1) 3.6.x (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, ch oose this one) (3) < 3.5.9 converted to new format (don't choose if unsure) (4) < 3.5.9 (this is very old format, don't choose if unsure) (X) exit What is the correct version to try or should I not be trying this?
May 5, 201313 yr Author I managed to get the reiserfsck --check command to run and it report as follows. **** root@Tower:~# reiserfsck --check /dev/sdh1 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/sdh1 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Sat May 4 23:42:38 2013 ########### Replaying journal: Done. Reiserfs journal '/dev/sdh1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 312933 Internal nodes 1926 Directories 1210 Other files 17902 Data block pointers 315054145 (15925 of them are zero) Safe links 0 ########### reiserfsck finished at Sun May 5 00:02:42 2013 ########### ************** It seems to say there are no corruptions but unraid still shows the drive as unformatted. Any ideas? Ifr all else fails is there a way I can save the output of the reiserfsck --check command as it seemed to list all the files on the drive as it ran?
May 5, 201313 yr Author How would I do that? I have tried using the mount command but I am probably doing it wrong "mount /dev/md3 /dev/disk3" the command comes back with "mount point /mnt/disk3 does not exist" I have managed to run reiserfsck --check /dev.sdh1 >/boot reiser.txt which has produced a file that appears to show me all the files that are/were on the drive, all 17902 of them :-( Anyone any ideas about what I can do to try and bring this drive back, reiserfsck seems to think the data is there but UNRAID is still showing the drive as unformatted.
May 5, 201313 yr Author I have attached a syslog, however from it. ***** May 5 10:54:10 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md3, May 5 10:54:10 Tower logger: missing codepage or helper program, or other error May 5 10:54:10 Tower logger: In some cases useful info is found in syslog - try May 5 10:54:10 Tower logger: dmesg | tail or so May 5 10:54:10 Tower logger: May 5 10:54:10 Tower emhttp: _shcmd: shcmd (356): exit status: 32 May 5 10:54:10 Tower emhttp: disk3 mount error: 32 May 5 10:54:10 Tower emhttp: shcmd (357): rmdir /mnt/disk3 May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 2, size 4096) May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 16, size 4096) May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3 *********** How come when I run the reiserfsck --check /dev/sdh1 (this is disk 3) reiserfsck is reporting no errors but UNRAID is unable to mount the drive? Or am I running the command wrong? Should I be running resiserfsck with the --sb or --rebuild tree options? syslog.zip
May 5, 201313 yr I have attached a syslog, however from it. ***** May 5 10:54:10 Tower logger: mount: wrong fs type, bad option, bad superblock on /dev/md3, May 5 10:54:10 Tower logger: missing codepage or helper program, or other error May 5 10:54:10 Tower logger: In some cases useful info is found in syslog - try May 5 10:54:10 Tower logger: dmesg | tail or so May 5 10:54:10 Tower logger: May 5 10:54:10 Tower emhttp: _shcmd: shcmd (356): exit status: 32 May 5 10:54:10 Tower emhttp: disk3 mount error: 32 May 5 10:54:10 Tower emhttp: shcmd (357): rmdir /mnt/disk3 May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 2, size 4096) May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2006 read_super_block: bread failed (dev md3, block 16, size 4096) May 5 10:54:10 Tower kernel: REISERFS warning (device md3): sh-2021 reiserfs_fill_super: can not find reiserfs on md3 *********** How come when I run the reiserfsck --check /dev/sdh1 (this is disk 3) reiserfsck is reporting no errors but UNRAID is unable to mount the drive? Or am I running the command wrong? Should I be running resiserfsck with the --sb or --rebuild tree options? you should slow down... and wait for some advice. You might have already taken steps you cannot undo. By now you've figured out that you must use reiserfsck on either the /dev/mdX device (which maintains parity) or on the /dev/sdX1 device (which does NOT keep parity updated... You must after fixing the file-system re-sync parity.) You should always start with reiserfsck --check followed by its instructions. If the drive does not mount subsequently, you can usually get it to mount by running reiserfsck --rebuild-tree That would be your next step. reiserfsck --rebuild-tree /dev/sdh1 do NOT use the -S or scan-entire-tree unless you want to recover old deleted files. Then, to mount it: mkdir /mnt/ridley mount -t reiserfs /dev/sdh1 /mnt/ridley or perhaps mkdir /mnt/disk3 mount -t reiserfs /dev/md3 /mnt/disk3 Joe L.
May 5, 201313 yr Author Thanks for the reply. So as it is finishing the reisferfsck --check and reports no problems and so does not recommend anything further but UNRAID still ill not mount it I should go ahead and run the reiserfsck --rebuild-tree /dev/sdh1 command?
May 5, 201313 yr Thanks for the reply. So as it is finishing the reisferfsck --check and reports no problems and so does not recommend anything further but UNRAID still ill not mount it I should go ahead and run the reiserfsck --rebuild-tree /dev/sdh1 command? Yes. After you run the rebuild tree, reboot the server and see if it will mount. You MUST then run a correcting parity check/sync once the disks are back mounted. It will probably find some differences representing your corrections.
May 6, 201313 yr Author Ran it but still will not mount the drive. **** root@Tower:~# reiserfsck --rebuild-tree /dev/sdh1 reiserfsck 3.6.21 (2009 www.namesys.com) ************************************************************* ** Do not run the program with --rebuild-tree unless ** ** something is broken and MAKE A BACKUP before using it. ** ** If you have bad sectors on a drive it is usually a bad ** ** idea to continue using it. Then you probably should get ** ** a working hard drive, copy the file system from the bad ** ** drive to the good one -- dd_rescue is a good tool for ** ** that -- and only then run this program. ** ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will rebuild the filesystem (/dev/sdh1) tree Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal: Done. Reiserfs journal '/dev/sdh1' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sun May 5 19:52:06 2013 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 315372468 blocks marked used Skipping 19389 blocks (super block, journal, bitmaps) 315353079 blocks will be r ead 0%... left 272378554, 27233 /s left 0, 23598 /secc 19111 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 315353079 Leaves among those 312933 Objectids found 19152 Pass 1 (will try to insert 312933 leaves): ####### Pass 1 ####### Looking for allocable blocks .. finished 0%....20%....40%....60%....80%....100% left 0, 324 /sec Flushing..finished 312933 leaves read 312905 inserted 28 not inserted ####### Pass 2 ####### Pass 2: 0%....20%....40%....60%....80%....100% left 0, 18 /sec Flushing..finished Leaves inserted item by item 28 Pass 3 (semantic): ####### Pass 3 ######### Flushing..finished Files found: 17902 Directories found: 1211 Pass 3a (looking for lost dir/files): ####### Pass 3a (lost+found pass) ######### Looking for lost directories: Flushing..finished2, 230 /sec Pass 4 - finisheddone 309831, 166 /sec Flushing..finished Syncing..finished ########### reiserfsck finished at Mon May 6 00:24:47 2013 ########### root@Tower:~# ************ Any ideas as to what to do now?
May 7, 201313 yr Author Yesterday I downloaded YAREG and installed it on my Windows PC. When I connected the drive that UNRAID reports as unformatted to the PC then YAREG sees the drive and the data! I have been copying the data from the drive ever since, Yareg might have an incredibly slow transfer rate but it is copying data and I am prepared to wait if I get the data back. When the problems started I moved the drive from an on motherboard controller to a port on the Supermicro MV8 controller. I do not suppose you could be more specific about the errors as I am a noob when it comes to unix and find the syslog difficult to make heads or tails of.
May 8, 201313 yr Author I think I have now recovered all the data from the drive. I am now going to attempt to add a replacement drive to the array and allow a parity rebuild, when that is complete I will copy the data back to the replacement drive. Fingers crossed. Then I will have to upgrade UNRAID so I can integrate these 3TB drives I bought
May 9, 201313 yr Author I have replaced the drive. got the data back and have rebuilt parity so ATM (fingers crossed) things are looking better, a lot better, than they were. Thankyou for all the help.
Archived
This topic is now archived and is closed to further replies.