January 24, 201214 yr I found an especially stupid way to lose data to add to the wonderful post I just read about the 10 ways to lose data: 1. Don't install smartctl or unMENU 2. Don't even TRY to understand or setup PuTTY or telnet before loading 12TB of movies, photos and music on the unRAI server 3. What is linux anyway? An air conditioner, right? 4. Delete backups on WHS so I can add 2 more disks at the same time to the unRAID box. 5. Skip preclear on these 2 disks, just do it from the GUI. 6. Read some advice about how if you deselect the parity drive during big copies over network, speed up process (it did, of course). 7. Decide to reorganize shares and start copying vast amounts of data to 1 and only 1 of the new drives (yes, with parity OFFLINE!) 8. Wake up in the AM to find md8 (that last disk added) is at a rather cool temp - 0. And hung. 9. Realize I didn't copy, but cut/paste to the new drive from Explorer (Win 7). 10. Further realize I just deleted the files from WHS backup to get these 2 disks. Yes, I am an idiot and a noob at linux. Here is what I have done since: 1. Read every forum post I could, the unRAID official and unofficial guides, wiki. 2. Could not access gui to unRAID management, was frozen on screen. Could not kill processes (learned a little from a post). 3. Killed it with manual power off (eeks!) 4. On reboot, md8 was normal temp and mounted! But had to go to work. 5. Later, red ball of death, an unmounted md8, temp 0. This time stopped array, powered down correctly, rebooted. Showed up as unformatted now! Real progress - not. 6. Took out disk physically and used dock to read in Win 7. Of course couldn't - reiser file system I found out. 7. Downloaded YAReG and started recovering the data. 8. Downloaded smartctl - smartmontools-5.39.1-i486-1 9. Read how to do a syslog finally and attached it 10. I set the md8 disk to unassigned (it is not physically in the box), but even after reboot it shows dsik 8 as "not istalled" and "unformatted. Is this because of the parity disk? I am confused how to get the new smartclr onto the flash drive. It has 3 folder (etc, install, usr). In install their is "doinst.sh" but I haven't read enough about linux commands to know if this is what I should run to intall the package and what folder structure do I put it in on the flash drive. Since I copied data from drives without parity, when I turned enabled the parity drive again will it rebuild my stupidity? Or will it find many, many errors? So the gui shows the disk 8 as not installed and unformatted, and I have no idea how to (or even if I should) check parity. There is no "Check" button. Thanks in advance for the help. MOBO: ASUS M4A88T-M LE CPU: Sempron 145 unRAID Version 4.7Pro HDD: 8 2TB mix of WD EARX, EARS, Seagate + Parity 2TB WD EARX syslog-Jan-23-2012-11.55pm.txt
January 24, 201214 yr Author Added note - the 1st syslog I tried to attach was 10MB and I couldn't upload it. Why was that? EDIT - here is the zipped syslog from the 1st boot. Duh! syslog-jan23-2012-22.46pm.zip
January 24, 201214 yr I didn't look at your syslog, but I've seen some some instances of drive temps at 0 (drive not recognized) resulting from PSU issues, so you may want to look in to that. Sometimes I've had drives die, and sometimes I find the drives are fine, I just need to re-work the PSU cabling and/or remove drives to reduce PSU load.
January 24, 201214 yr I had a 500GB drive die today, and replaced it with a spare empty 500GB drive. That one did not spin up. And I could smell burning. This was bad. I started to sweat, thinking maybe the PSU had melted down and was taking drives with it. Looking at the drive, I saw that this chip had melted: I realised what had happened -- I had stupidly left the spare drive on a metal surface while the machine was powered up and it shorted out. I found a small (320GB) empty drive and decided to connect it up to make sure it wasn't something else that was killing drives other than my stupidity. I wasn't too worried if the 320GB died. It would have sacrificed itself in the name of testing. Fortunately that worked, and I'm now copying the contents of the dead drive on to another server. I don't have any other spare drives to replace the (now 2) dead drives, so unRAID is simulating the missing one while I copy the data. So far, no data lost, fingers crossed!
January 24, 201214 yr Author Fortunately that worked, and I'm now copying the contents of the dead drive on to another server. I don't have any other spare drives to replace the (now 2) dead drives, so unRAID is simulating the missing one while I copy the data. So far, no data lost, fingers crossed! Unfortunately I think I did lose data. You see, I was in the middle of transferring data from 1 disk to another WITHOUT parity when the crash occurred. Suddenly I couldn't see the data on the new drive and I couldn't stop the processes. It was hung trying to access the new disk. And, stupidly, I didn't "COPY/PASTE" I "CUT/PASTE." It didn't wait to verify the new disk to check for errors before assuming it was moved and deleting it off the original disk. Or maybe the hang and reboot wiped it out. I don't know. In Windows I use teracopy and it verifies after copy/move. Not so on the unRAI box. Bummer! But I will try a new disk today at that spot to make sure. No, I will wait until told to do that by a wiser one.
January 24, 201214 yr Before you go farther, you can likely recover the files off disk1 assuming you had deleted them during the move and haven't copying anything new to the disk yet. I can't recall the exact command and could find them later but I believe it's "reiserfsck --rebuild-tree /dev/sda1", where sda1 is the device you are rebuilding so figure out the device ID for the disk you want to recover and don't juse use sda1. Do a forum search and you'll likely find threads about file recover. Peter
January 24, 201214 yr It was hung trying to access the new disk. And, stupidly, I didn't "COPY/PASTE" I "CUT/PASTE." It didn't wait to verify the new disk to check for errors before assuming it was moved and deleting it off the original disk. That's a Microsoft "feature", I believe. Another variant is this... If you try to cut/paste a file that is open with some application that you've forgotten about with the intent of overwriting a file of the same name elsewhere, the cut will fail (since the file is open) but the file at the destination that you were going to overwrite is deleted just the same. If you then screw up the file that was open as well - bye-bye data (especially if the deleted one was too big for the recycle bin).
January 24, 201214 yr There's a good chance that all of your data is recoverable. All of it should still be on the source disk. Even if you ran a 'cut-paste' and the source files were deleted, as long as you haven't written any new data to the disk you should be able to get the data back. Since the disk came from WHS I'm going to assume it was formatted as NTFS. Try GetDataBack NTFS. It isn't free, but there's a good chance it will let you recover your data. I personally wouldn't bother with data recovery on the unRAID disk because you know that it will be incomplete at best. Focus your data recovery attempts on the WHS source disk, then take a second look at the unRAID disk if you don't have success with the WHS disk.
January 25, 201214 yr Author There's a good chance that all of your data is recoverable. All of it should still be on the source disk. Even if you ran a 'cut-paste' and the source files were deleted, as long as you haven't written any new data to the disk you should be able to get the data back. Since the disk came from WHS I'm going to assume it was formatted as NTFS. Try GetDataBack NTFS. It isn't free, but there's a good chance it will let you recover your data. I personally wouldn't bother with data recovery on the unRAID disk because you know that it will be incomplete at best. Focus your data recovery attempts on the WHS source disk, then take a second look at the unRAID disk if you don't have success with the WHS disk. I understand this. But I didn't really make it very clear in my 1st post. The unRAID box was working great but getting full. I added 2 disks (empty 2TB each) from my WHS that I am phasing out for media. While transferring FROM UNRAID DISKS (md1, md2, md3, md4) to make more room to the NEW DISK (md8) the new disk hung and move couldn't be completed, even though the files were now gone from the unRAID disks (md1, md2, md3, md4). No parity at the time so I assume that the parity disk thinks the files are still on the md1-md4 disks, right? And it sees that I had 8 disks, but it never ran a parity check, so it shouldn't think anything is on md8 (or the other disk added md7), right? But when I remounted the parity disk again, does it do any automatic parity checks? I am confused if this is an automatic process when brought online or a manual one from the console. UPDATE: I recovered 80% at least of the data. I had some backed up and recovered many using YAReG. But now I am more worried about the whole array: 1. Do I add the trouble disk back in and remove it from the array the correct way (I think I can find the correct way to remove a failing disk without replacing it...) 2. Or do I add in another 2TB disk from another computer? I can scrounge one up or open one of my external enclosures. 3. Will my array be introuble since this probelm occurred while parity unmounted? 4. Currently the md8 is not attached to the unRAID array, but it still shows up on the web console as Not Installed and Unformatted. And I see no way to do a parity check from this screen. Do I need to run reiserfsck -- rebuild etc. or can I do a parity check from the unRAID box console? Please someone steer me in the right direction. I do not want to lose the whole array. The only web console choice I have showing is Format but their is no disk there to format! And last, will a rebuild command restore the lost files on each disk? So for md1 (sdf) I should do: reiserfsck --rebuild-tree /dev/sdf1 And for each disk that lost data do the same?
January 25, 201214 yr As soon as you unassigned the parity drive you no longer had a protected array. You have to unassign the bad drive and re-assign parity drive and then run initconfig to reset the array. Start the array and build parity. Do any other disk re-arranging before running initconfig if you so desire to re-arrange the other disks. As for the rebuild - I suggested that to try and recover the files that you posted that you had just deleted off of disk1. There are 2 ways to do it. If you want to first restore parity then you leave the array started and unmount each disk and then use the md1, md2 etc devices. If you want to first try it before restoring parity then you just stop the array and use sda1, sdb1 etc. I also believe the command is actually reiserfsck –scan-whole-partition –rebuild-tree /dev/md1. Once again, you can put the rest of your data in jepardy running these commands so I suggest you search the commands and understand what they do before blindly doing more potentially bad things. Peter
January 25, 201214 yr Author As soon as you unassigned the parity drive you no longer had a protected array. You have to unassign the bad drive and re-assign parity drive and then run initconfig to reset the array. Start the array and build parity. Do any other disk re-arranging before running initconfig if you so desire to re-arrange the other disks. As for the rebuild - I suggested that to try and recover the files that you posted that you had just deleted off of disk1. There are 2 ways to do it. If you want to first restore parity then you leave the array started and unmount each disk and then use the md1, md2 etc devices. If you want to first try it before restoring parity then you just stop the array and use sda1, sdb1 etc. I also believe the command is actually reiserfsck –scan-whole-partition –rebuild-tree /dev/md1. Once again, you can put the rest of your data in jepardy running these commands so I suggest you search the commands and understand what they do before blindly doing more potentially bad things. Peter Thanks. I immediately unassigned the bad drive and re-assigned the parity after I realized what an idiot I was. But now the drive shows up as missing and unformatted. I have no options from the gui so I have to do a command from the linux prompt and frankly I am scared to screw things up worse. I will keep reading unless some expert (I saw Joe L refer to an expert as someone who has read the man) tells me go ahead. I am not worried about the lost data so much - my stupidity and they were mostly IMAX documentaries that I wasn't likely to watch anyways and I should have backups somewhere. I just want to have a stable unRAID box again, so I am trying to learn more and do things right this time.
January 25, 201214 yr You stop the array, type initconfig on the command line, answer Yes and then start the array. The missing drive will be forgotten. You will have to build the parity again when you start so let the parity build complete. FYI, you can't do a parity check with a missing drive. A parity check requires all drives to be healthy. Peter
January 26, 201214 yr Author In parity sync now. At 41MB/sec, it should be done in 787 minutes. Zzzzz.... I could hit refresh all day or just take a nap. Thanks for the help. I will preclear and stress the disk next time first, and maybe just be more careful to keep parity. Live and learn. And I actually only lost I file that I have the Blray disk for! Born lucky I guess....
January 27, 201214 yr Author OK. Parity check went fine. Wanting to see if I really had a bad disk or just blew by user error. Tried to run preclear_disk.sh -1 but always gave me help screen. I was very careful on entry. Thne tried to preclear the unassigned 2TB EARS (sdb) by: /boot/preclear_disk.sh -d ata -A /dev/sdb It gave me an error about not reading the block. Darn, the screen is closed and I can't see it now. So I mounted the disk, now letting unRAID clear it. Did preclear not work because it already had been cleared before? Should I have used sdb1 since it did have this partition? I won't put data on it until I am positive it is ok. Here is the smart report: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00S8B1 Serial Number: WD-WCAVY1955119 Firmware Version: 80.00A80 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jan 26 18:13:42 2012 Local time zone must be set--see zic m SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (41580) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 3 Spin_Up_Time 0x0027 147 138 021 Pre-fail Always - 9608 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1375 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 3 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 5 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 17004 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 63 193 Load_Cycle_Count 0x0032 056 056 000 Old_age Always - 432618 194 Temperature_Celsius 0x0022 118 098 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 48 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 56 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 76 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 16992 38395295 # 2 Short offline Completed: read failure 10% 16992 38442687 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. So it says PASSED but : SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 16992 38395295 # 2 Short offline Completed: read failure 10% 16992 38442687 I have no idea what to trust. A little help?
January 27, 201214 yr That is starting to fail. Replace it ASAP. You can run a few pre-clears on it to see what happens with the Current_Pending_Sector Reallocated_Sector_Ct values. Post a new SMART report after the pre-clears.
January 27, 201214 yr Author That is starting to fail. Replace it ASAP. You can run a few pre-clears on it to see what happens with the Current_Pending_Sector Reallocated_Sector_Ct values. Post a new SMART report after the pre-clears. Thanks. I will do that. Not worth losing data though. If it is an unjumpered 2TB EARS, I should run it as: /boot/preclear_disk.sh -A -d ata /dev/sdb OR /boot/preclear_disk.sh -A -d ata /dev/sdb1 ?? I am a linux noob and after reading I still cannot tell which I should use. I know sdX1 refers to the 1st partition, but I don't know if preclear needs this or the whole disk as sdX. Thanks for the help. I am learning...
January 27, 201214 yr OK. Parity check went fine. Wanting to see if I really had a bad disk or just blew by user error. Tried to run preclear_disk.sh -1 but always gave me help screen. I was very careful on entry. Probably because the command argument is the lower case letter "L" not the number 1. This: preclear_disk.sh -l NOT preclear_disk.sh -1
January 27, 201214 yr I am a linux noob and after reading I still cannot tell which I should use. I know sdX1 refers to the 1st partition, but I don't know if preclear needs this or the whole disk as sdX. Thanks for the help. I am learning... The preclear script ALWAYS uses the "whole disk" and not the partition name.
January 27, 201214 yr So it says PASSED but : SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 16992 38395295 # 2 Short offline Completed: read failure 10% 16992 38442687 I have no idea what to trust. A little help? The short and long tests abort on their first read error. Both are aborting when a read of a sector fails. The disk has about 48 sectors it has marked (so far) as unreadable. It has already re-allocated a few others. The SMART reports are not always going to say FAILED until the drive is nearly dead and a given parameter has dropped below its affiliated threshold.
January 29, 201214 yr Author Probably because the command argument is the lower case letter "L" not the number 1. This: preclear_disk.sh -l NOT preclear_disk.sh -1 OUCH!!! The short and long tests abort on their first read error. Both are aborting when a read of a sector fails. The disk has about 48 sectors it has marked (so far) as unreadable. It has already re-allocated a few others. The SMART reports are not always going to say FAILED until the drive is nearly dead and a given parameter has dropped below its affiliated threshold. Here is an updated syslog and smart report. The disk passed the short test after clearing. The long test was apparently aborted by me. My Win7 machine decided to do a reboot for fun, and the PuTTY session closed. Will that abort the smart test? I believe so. Am running long test now. This has taken long enough to get a Seagate 2TB in the mail. I still want to know if this EARS disk is salvageable, or really I should say trustworhty. I will post the long test after it finishes. Thanks Joe! Thanks dgaschk! smart01282012.txt syslog-2012-01-28.txt
January 29, 201214 yr The drive has 21 pending sectors: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 21 Repeat pre-clear until pending goes to zero and stays there for at least one additional pre-clear cycle.
January 29, 201214 yr Author The drive has 21 pending sectors: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 21 Repeat pre-clear until pending goes to zero and stays there for at least one additional pre-clear cycle. Started preclear from unRAID box this time. Going well so far....
January 29, 201214 yr It's a bad disk. Look at the current pending sectors. Those are sectors on the disk that cant' be read. Peter
Archived
This topic is now archived and is closed to further replies.