April 30, 200917 yr Greetings, I was attempting to swap out a 250Gb drive for a shiny new 750Gb drive (drive 9). Followed the instructions and it all was going well - until it got stuck at 20.6% complete. See attached file for screenshot. So it looks like Drive 6 has died during the data-rebuild process. Drive 9 is the new drive. Just wanted to confirm my options with the folks here at the forum. It would appear that the data on Drive 6 will be lost. I have the original drive 9 and can reinstall to the Unraid but given that the data-rebuild process has begun, it will not be helpful to restore the contents of drive 6 - right?? Assuming that I cannot recover the data on drive 6, is the best course of action to remove the faulty drive and initialize Unraid as new to regenerate parity? I figured that I could mount the legacy drive 9 on my PC and just copy the contents to the Unraid once the initialization is complete. Thanks for any help.... Bummer but at least the content of the lost drive can be recreated as it was primarily video/music.
April 30, 200917 yr Greetings, I was attempting to swap out a 250Gb drive for a shiny new 750Gb drive (drive 9). Followed the instructions and it all was going well - until it got stuck at 20.6% complete. See attached file for screenshot. So it looks like Drive 6 has died during the data-rebuild process. Drive 9 is the new drive. Just wanted to confirm my options with the folks here at the forum. It would appear that the data on Drive 6 will be lost. I have the original drive 9 and can reinstall to the Unraid but given that the data-rebuild process has begun, it will not be helpful to restore the contents of drive 6 - right?? Assuming that I cannot recover the data on drive 6, is the best course of action to remove the faulty drive and initialize Unraid as new to regenerate parity? I figured that I could mount the legacy drive 9 on my PC and just copy the contents to the Unraid once the initialization is complete. Thanks for any help.... Bummer but at least the content of the lost drive can be recreated as it was primarily video/music. There is a way to recover... as long as you have the original drive9 to re-install... The basic game plan will be this: 1. Before you reboot, or do anything else, log in via telnet, grab a copy of the syslog so we have a better idea of exactly what is happening. (attach it to your next post) Instructions are in the wiki on how to do this here. 2. Grab/Print a screen-shot of the "Devices" page. (For your own records) 3. Log in via telnet and make a copy of the entire "config" folder by using the following command (it will copy the entire folder to one named for today): cp -r /boot/config /boot/config20090430 4. Next... if you can, if the management console is still functional, stop the array. 5. Then, power down 6. Take the shiny new drive out of disk9 and put back in the old 250Gig drive. This drive has not been written to and should still have your data. 7. Take a new drive somewhere between 250Gig and 750Gig (it can even be the one you were going to use to upgrade disk9,) and use it to replace the disk in slot6. 8. Power up. Odds are the server will notice the change in drive and not start. If it does start, stop it immediately. Go to the devices page, assign the old (original) disk9 to disk9, assign the new replacement drive to disk6. (Now the new drive is going to be used to replace a failed disk6 instead of a working disk9) Now, we need to get the server to come back on-line with it thinking drive6 has failed, and that drive9 is valid. That way, it will use parity and all the other data drives to rebuild disk6 (The drive that has actually failed) onto the replacement drive. To do that requires a very special set of steps. A. Press the "Restore" button, but DO NOT "Start" the array just yet. All the drives indicators should turn blue. The array status will be "Stopped - Initial Configration" B. Now, log in via telnet, or on the system console, and type two commands cd mdcmd set invalidslot 6 It should respond with: cmdOper=set cmdResult=ok The prior command at the telnet/console prompt will tell the server that it is disk6 that needs reconstruction, and that parity should be trusted. (Without you telling the server that disk6 is the bad drive, the array would throw away your parity and start re-computing it based on the new config. This would cause the loss of disk6's data) C. Now, once you have typed the mdcmd set invalidslot 6, and have seen its response, you can press the "Start" button. You should then see disk6 being written to, and all the other drives being read. Once it is completely re-constructed, you should have everything back as you wanted, with all the data The re-construction will take a number of hours... much like a parity check. If you have any questions about this procedure... ask first... before you do anything that will invalidate your current parity. If you press "Restore" and did not tell it disk6 was bad, odds are you will lose disk6's data. Normally the "Restore" button only sets an initial configuration and tells the server that the parity drive needs to be rebuilt. (it normally invalidates parity) By doing the steps outlined above... you should be able to recover all of disk6. Joe L.
April 30, 200917 yr BTW, this brings up something I have been meaning to mention.... When upgrading a drive, don't use the array until it's finished. Consider killing Samba and NFS so that no one can map or write to the drive over your network. If you don't change any data on any other drives in the interim, you can recover from a drive failure during the upgrade process.
April 30, 200917 yr Author I have attached the syslog (zipped as it was 12MB) and have a screenshot of the devices page. I am no longer getting a response from the server so I am not sure if it stopped gracefully or not. The telnet session is still responsivess - should I issue a shutdown command? Appreciate the timely response to this post. Thanks, Kevin
April 30, 200917 yr I have attached the syslog (zipped as it was 12MB) and have a screenshot of the devices page. I am no longer getting a response from the server so I am not sure if it stopped gracefully or not. The telnet session is still responsivess - should I issue a shutdown command? Appreciate the timely response to this post. Thanks, Kevin Yes, you can easily see the errors in the syslog... Tons of them when trying to read the failed disk. See if the emhttp process is still running. It might be, or it might have been killed off as the syslog used up all the available space in memory. Now that you captured the syslog, we can power down as I said. Make the copy of the "config folder now is you have not yet done so before you power down. You can "try" to take the array off-line cleanly by typing the following series of commands: cd killall smbd nmbd sync for disk in /mnt/disk* do umount $disk done mdcmd stop Then you can power down by typing: poweroff Of course, if you have the "powerdown" add-on package installed, you can just type: powerdown as it does all the above individual commands for you. If a drive is unable to be un-mounted, it is probably "busy" (has an open file, or is the current directory for some process) and you will not be able to cleanly stop the array. I would not let that worry you too much, since you will be forcing the array to think parity is good anyway later when you tell it disk6 is the one that is invalid. Joe L.
April 30, 200917 yr Author Performed the steps are directed and it appears to be rebuilding drive 6. As you said, it may take a while to reconstruct (~900 minutes). Thanks again for the accurate instructions and rapid response. Kevin
April 30, 200917 yr Performed the steps are directed and it appears to be rebuilding drive 6. As you said, it may take a while to reconstruct (~900 minutes). Thanks again for the accurate instructions and rapid response. Kevin Let us know how it works out... Glad it is able to save you some effort. (Re-ripping 250Gig of media is not fun) Might I suggest several things once you get back stable. 1. Monthly full parity checks... Not to check parity, but to detect bad drives and let the SMART firmware on the drives work at fixing things they can fix. 2. Upgrade to 4.5-beta4 (it is very stable, and provides a fair amount of fixes since your version) 4.5-beta5 is due out shortly (I'll bet Tom is waiting till May 1st so he does not blow out his monthly bandwidth allotment, but that is tomorrow... so hopefully it will out soon.) The upgrade only involves replacing two files on your flash drive. You can even re-name the existing ones in case the new version has any issues. You don't need to re-configure, or to re-format, just download the new release, unzip, copy two files to the flash drive, and reboot. Joe L.
May 1, 200917 yr Author Rebuild just completed successfully. Thanks again for the help!!! I will follow up on the remaining recommendations after a day or two of trouble free operation. Kevin
Archived
This topic is now archived and is closed to further replies.