August 18, 200718 yr So as an excercise I built my unRAID server using a 200 GB IDE drive in lieu of one of the 500 GB SATA drives that I had here. I moved some data to the drives, etc. This morning I pulled the 200 GB IDE drive and repleced it with the 500 GB SATA drive. The idea was to test a drive failure situation. The online documentation merely says stop the array, down the server, replace the drive, power up, and start the array. The data is supposed to be rebult. I wish I'd documented what I actually did, but it seems to me as if I was presented with "missing disks" or something, and I wasn't able to start the array until I did some combination of restore, add the new disk on the device page, heck maybe another restore, a format, followed by the lengthy parity check. Of course when it was over, the array showed the parity disk and the two 500 GB SATA drives and as I suspected, no data on drive 2. As this was just an exercise, I haven't lost any real data. But obviously I did something wrong here, and next time it might not be an exercise. So explaining it like I'm a compete idiot, what EXACT steps am I supposed to take when swapping out a failed drive?
August 18, 200718 yr You should never use "restore" in that situation. It initializes the configuration and assumes you do not want to recover from a failed disk but instead eliminate it from the array. It then rebuilds parity without it. Effectively it is gone and forgotten. In my opinion, that button is improperly labeled. It should probably be labeled "Initialize Configuration" Assuming you have a data drive (at least one) and a parity drive you can simulate a failure by unassigning the data drive... Or, as you did, swap it with the replacement. You will then need to go to the devices page and assign the new drive to the old slot with the failed drive. Lastly, just start the array. (I don't remember the prompts it gives, but you might have to check a checkbox to indicate you want to rebuild data onto the new disk.) To simulate a failure all you really need to do is stop the array and unassign a data drive and then re-start it. It will come up in a failed mode allowing you access to the unassigned drive's data contents. To restore from parity you stop the array once more, re-assign the drive to its logical slot in the array and re-start it. It will then re-build the drive from parity.
August 18, 200718 yr Author Thanks. I figured I messed it up, which is why I wanted to try it out before I really needed to recover something. I'm going to try it again (I have a 4th drive to add now that I have my pro key), but it seems to me that it came up with the missing disk error, and when I went to the devices page it listed the three original disks, with no option to add the brand new disk (or remove the old one). That was probably when I went back and hit the restore button, which then removed the missing disk and allowed me to add the new one. But maybe I'm wrong on the sequence of events. I'll give it another go once I'm able to get the 4th drive out from where it is now, and I'll document my steps better.
August 18, 200718 yr Joe, I very much agree with you as to the desirability of clearer and/or additional explanatory text concerning the options that appear when something has gone wrong with our unRAID system. I think I understand the feeling behind fountainhead's subject title, although I would express it more like 'I want to be 100% confident in my understanding and decisions concerning my data'. Those 'Are you sure' check boxes are somewhat ironic, in that I doubt I have ever checked one and been 100% sure that that was the correct option to take. Sometimes I have felt about 90% sure, but other times maybe only 60% sure of my choice. It's not that the documentation is unclear or even seems lacking. After reading the online docs, I've always felt I understood it and would know what to do. But when the actual system failure happened, the options presented rarely seemed to be what I had expected, as fountainhead found above. Surprise options, what seemed like missing options, and unclear options are not confidence building. And at that moment, faced with the potential loss of our data, there is a strong need for a hand-holding level of assurance as to what has happened, and the absolutely correct steps to take. With hopefully very little programming work, it would be nice to have some additional explanatory text added to each option, or even generalized text added at the bottom of the web page, below the Command area. It could include general info as to why these options have appeared above, and more extensive help as to what each option will do, and what will happen to the various drives. Ideally, in a future version, the help text will be context-sensitive, and will provide precise information as to what has gone wrong, and precisely explain the options and what will happen with each one. For example: "By selecting ___this___ option, the parity drive will be unchanged/unassigned/sync'ed/checked/etc, Drive 1 will be unchanged/cleared/formatted/rebuilt/etc, and Drive N will be ...". Tom, I suggest these things with great respect. I know from large experience how others are quick to suggest 'easy' changes, with a very poor idea of how much work the change will entail.
August 19, 200718 yr Author I think you make some very good points Rob. Thanks for posting. And you're correct (as was Joe) that it's not always immediately obvious to the uninitiated exactly what the ramifications are for selecting certain options. But I wanted to follow-up by saying that I've tried it again. I put the IDE back in as a 4th drive and copied some data to it. Today I pulled it out and replaced it with another 500 GB SATA drive. This time I was able to go to the devices page and add the new drive as drive 3, and go back to to main page and start the array and initiate the rebuilding process. It's still running, but I can see that the small amount of data has already been restored. I can't say what I did wrong the first time around, but I feel much better that it seems to be working as expected now.
August 19, 200718 yr As a former IT manager, I strongly recommend that folks PRINT the instructions for doing basic tasks, put it in one of those cheapie plastic folders, and tape it to the side of their server. In the biz, it is called a run book (http://searchnetworking.techtarget.com/sDefinition/0,290660,sid7_gci537193,00.html). It always seems that when something goes wrong, we are either in too much of a hurry to look up the instructions or our web service happens to be down. Given the "not as clear as they should be" button labels, the possibility of permanently screwing something up is quite real. Here is the URL: http://www.lime-technology.com/wordpress/?page_id=16 Print it, folder it, tape it, use it. BTW, I hadn't done this until I saw this thread, so consider me appropriately admonished. Cheers, Bill
Archived
This topic is now archived and is closed to further replies.