Vocatus Posted October 15, 2014 Share Posted October 15, 2014 Please help, I stand to lose months of critical work if I can't recover these files. I was moving a number of large VM disk files to the tower. About 80% of the way through it crashed (hard lock). I hard reset it, and it came back up just fine. However, the directory I was moving everything to is empty. Additionally, the cache drive appears as unformatted. I'm on version 5.0.5. Please help, how do I recover the data? I assume/hope it's sitting on the cache drive... Link to comment
trurl Posted October 15, 2014 Share Posted October 15, 2014 When you say moving to tower, do you mean they were on some other machine, and instead of just copying them, you moved them so you no longer have the originals? If something is important you should have multiple copies. unRAID is not a backup solution unless you use it as an additional copy of your files. You should never have only one copy of anything important. What are you seeing that makes you think the directory is empty? Are you looking at it from the unRAID command line, Windows Explorer, or what? Are you looking at a disk share or a user share? A syslog and maybe a screenshot might better demonstrate what you are working with. Link to comment
itimpi Posted October 15, 2014 Share Posted October 15, 2014 What format is the cache drive meant to be? The fact that it is showing as unformatted suggests that some sort of file system corruption has happened so that unRAID can no longer mount it (do not format it unless you want to lose your data). That would explain why your files are not showing up as they were almost certainly written to the cache drive. Depending on what format it is meant to be the recovery tools will be different. Link to comment
Vocatus Posted October 15, 2014 Author Share Posted October 15, 2014 What are you seeing that makes you think the directory is empty? Are you looking at it from the unRAID command line, Windows Explorer, or what? Are you looking at a disk share or a user share? A syslog and maybe a screenshot might better demonstrate what you are working with. Hi trurl, Yes, I was moving the files to the tower. The reason I was moving instead of copying was I needed to temporarily clear local disk space for a different operation, and do not have enough space to hold three copies (originally one copy was on workstation, one copy on tower array). When I browse to the directory in Windows 7 file explorer I do not see any of the files I moved prior to the crash sitting there - the directory is empty. What format is the cache drive meant to be? The fact that it is showing as unformatted suggests that some sort of file system corruption has happened so that unRAID can no longer mount it (do not format it unless you want to lose your data). That would explain why your files are not showing up as they were almost certainly written to the cache drive. Depending on what format it is meant to be the recovery tools will be different. The cache drive was in whatever filesystem Unraid defaults to; ReiserFS I believe? And yes, it's showing as unformatted and that's new. I have not hit "format" yet. Link to comment
Vocatus Posted October 16, 2014 Author Share Posted October 16, 2014 What do I need to do to recover the files from the cache drive? Thanks for helping with this, seriously. Link to comment
itimpi Posted October 16, 2014 Share Posted October 16, 2014 What do I need to do to recover the files from the cache drive? Thanks for helping with this, seriously. You need to first look up the device name for the cache in the GUI. Also check that the GUI shows reiserfs format for the cache (as other formats require different actions). Then from a telnet/console session run a command of the form reiserfsck --check /dev/sd?1 where sd? Is the device name you looked up. Do not forget the 1 on the end which specifies the partition. If there is any corruption that command will tell you and what is the recommended action to fix it. Link to comment
Vocatus Posted October 16, 2014 Author Share Posted October 16, 2014 OK, I followed your instructions and this was the result: Trans replayed: mountid 21, transid 161379, desc 4709, len 72, commit 4782, next trans offset 4765 <40 lines of output omitted> Trans replayed: mountid 21, transid 161380, desc 4783, len 561, commit 5345, next trans offset 5328 Replaying journal: Done. Reiserfs journal '/dev/sdf1' in blocks [18..8211]: 40 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 35883 Internal nodes 217 Directories 41 Other files 811 Data block pointers 36156914 (495824 of them are zero) Safe links 0 ########### reiserfsck finished at Thu Oct 16 12:32:32 2014 ########### Should I reboot the server now? In the GUI the cache drive still shows "unformatted" EDIT: Stopped and restarted the array, and all files are back. Thank-you for the help, you saved me months of work. I learned my lesson - copy THEN delete, not just move, critical files. Link to comment
WeeboTech Posted October 16, 2014 Share Posted October 16, 2014 You should be able to stop and start the array, If the drive can be mounted it will mount it. If the drive cannot be mounted it will falsely say 'unformatted' which is why you need to fsck it. Also look in the lost+found directory just in case. Link to comment
Vocatus Posted October 16, 2014 Author Share Posted October 16, 2014 Well, some more bad news. I ran the mover script to get everything off the cache drive and onto the array, and it failed. When re-running reiserfsck, I get this message: ########### reiserfsck --check started at Thu Oct 16 13:55:21 2014 ########### Replaying journal: Trans replayed: mountid 22, transid 161381, desc 5346, len 1, commit 5348, next trans offset 5331 Trans replayed: mountid 22, transid 161382, desc 5349, len 1, commit 5351, next trans offset 5334 Trans replayed: mountid 22, transid 161383, desc 5352, len 10, commit 5363, next trans offset 5346 The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (5358): (Input/output error). Aborted (core dumped) Anything else I can try? Link to comment
WeeboTech Posted October 16, 2014 Share Posted October 16, 2014 Well, some more bad news. I ran the mover script to get everything off the cache drive and onto the array, and it failed. When re-running Anything else I can try? Can you post a smart report of the cache drive? I had something like this once and I had to use ddrescue to copy a bad drive to another drive in order to repair it. Link to comment
itimpi Posted October 17, 2014 Share Posted October 17, 2014 I would carefully check SATA/power cabling as well. It is possible that a connection is loose which is causing the drive to drop offline for some reason. Link to comment
Vocatus Posted October 17, 2014 Author Share Posted October 17, 2014 Can you post a smart report of the cache drive? I had something like this once and I had to use ddrescue to copy a bad drive to another drive in order to repair it. Sure thing. root@TheBrain:~# smartctl --all /dev/sdf smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 Device Model: ST3500830AS Serial Number: 6QG0G750 Firmware Version: 3.AAE User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA/ATAPI-7 (minor revision not indicated) Local Time is: Fri Oct 17 07:13:52 2014 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 113 088 006 Pre-fail Always - 229052527 3 Spin_Up_Time 0x0003 098 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1948 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 531914233 9 Power_On_Hours 0x0032 060 060 000 Old_age Always - 35522 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 161 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 044 045 Old_age Always In_the_past 32 (Min/Max 32/32) 194 Temperature_Celsius 0x0022 032 056 000 Old_age Always - 32 (0 18 0 0 0) 195 Hardware_ECC_Recovered 0x001a 062 054 000 Old_age Always - 243699885 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I would carefully check SATA/power cabling as well. It is possible that a connection is loose which is causing the drive to drop offline for some reason. Thanks, I'll check this when I get home today. Link to comment
WeeboTech Posted October 17, 2014 Share Posted October 17, 2014 Nothing jumps out at me via the smart report, can you post the syslog associated at the time of the bread failure? i.e. bread: Cannot read the block (5358): (Input/output error). I would suggest a confidence test and do a smart -t long test on the drive. Make sure you turn off spin down on the drive before issuing the test and do not access it until the test is done. Upon initiating the test it will provide an ETA of how long it will take. It could be a cabling issue. Vibration can cause intermittent connection issues. Link to comment
Vocatus Posted October 19, 2014 Author Share Posted October 19, 2014 Well, I managed to get everything off the cache drive by rebooting multiple times over a few days. I noticed after a reboot the drive would work for about an hour before crashing again, so I used that to get everything off slowly. The drive is very old and I suspect failing, so now that everything is recovered I'm replacing it. Thank-you again for the help. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.