rattlehead05 Posted December 8, 2011 Posted December 8, 2011 I have a very small, low power high performance system that I'm testing unRAID on. It has two drive bays, which are both populated. 1 is set to parity, the other is set to disk1. Because I like to tinker & test, I removed the disk1 drive during a read operation. The system switched to parity, I never saw a service hiccup. Now I am having trouble re-introducing the disk1 drive back into the system. The system sees the drive, and I set it back to disk1, but it has a red status indicator. How do I force the system to use the drive again?
lionelhutz Posted December 8, 2011 Posted December 8, 2011 initconfig or the equivalent on the web interface if you are using the beta. Moderator Comment: I don't usually edit an existing post, but since the above advice would lead to data loss in most cases where a drive has failed, I've struck it out and would suggest anyone reading this thread read a few more posts in this thread before doing anything to recover a drive in their array that shows as "disabled" (red indicator) Joe L.
rattlehead05 Posted December 8, 2011 Author Posted December 8, 2011 I see, I was under the impression that the initconfig command would make data on the drives inaccessible. I noticed that the system has to rebuild parity after running initconfig. I wonder if data would be lost if I was writing to a drive while it was removed. Time to test that. Thanks for the quick response!
Joe L. Posted December 8, 2011 Posted December 8, 2011 initconfig or the equivalent on the web interface if you are using the beta. That is NOT the best advice. The disk is showing as "red" because a write to it failed. Forcing it back in operation (by setting a new disk configuration, invlidating parity, and re-calculating parity based on the improperly written contents ) will result in you not having the correct contents of the failed drive. To restore the disk to service you either need to replace it with a new drive, and then unRAID will reconstruct it based on parity and any other data drives, OR force unRAID to forget the model/serial number of the failed drive so that the next time it sees it, it will think of it as a new drive and re-construct onto itself, with the correct contents of the failed "write" To do that you stop the array un-assign the failed drive start the array with the disk un-assigned (this forces unRAID to forget the old model/serial number of the failed drive) stop the array once more re-assign the failed drive. (It will then be considered as a "new" drive, since its model/serial number will not be recognized.) start the array once more. When you start the array that second time, the failed/replaced data disk will be completely re-constructed based on parity in combination with all the other data disks. When the re-construction is complete you will be protected from a disk failure once more. You would almost NEVER use "initconfig" or its equivalent in the Utilities tab of the management console unless you a removing a disk from the array and not replacing it. If you did, it IMMEDIATELY invalidates parity and reconstruction of a failed drive would be impossible. The only reason it may appear to work in this case is the data disk does have data on it, even though we are guaranteed it is corrupted in some way. (remember, the write to it failed, so we know the disk has bad contents somewhere)
rattlehead05 Posted December 8, 2011 Author Posted December 8, 2011 Great information! I was wondering why parity would rebuild itself to restore data to a new drive, it seemed like the wrong way to be doing things. As soon as the parity drive finishes rebuilding itself, I will test this method. Thanks for the response!
Joe L. Posted December 8, 2011 Posted December 8, 2011 I see, I was under the impression that the initconfig command would make data on the drives inaccessible. It immediately invalidates parity and sets a new disk configuration based on the currently assigned and working disks. When you next start the array, parity is completely re-calculated based on the new configuration of disks. I noticed that the system has to rebuild parity after running initconfig.Exactly, as you specifically invalidated any prior parity calculations by invoking the new configuration command. I wonder if data would be lost if I was writing to a drive while it was removed.Yes, the disk was disabled when a write to it failed. (you may think it was not being written to, but it was) You would lose the contents of everything written. That could potentially be the entire contents of the disk if it failed within minutes of it being put in service but you continued to fill it with your files. The contents would look fine, as it woulf be simulated by parity in combination with all the other data disks, but the failed disk would have hardly anything written to it. Time to test that. Thanks for the quick response! You can test, but as I said, drives are not taken off-line on a read-failure, they are taken off-line when a write to them fails. (On a read failure, the desired contents is re-constructed from parity in combination with the other data drives. In the event of a un-readable sector on the disk the SMART firmware on the disk would mark the sector as pending re-allocation. To assist in this data fixup, when unRAID sees a "read"failure, it writes that same re-constructed sector back to the disk. If re-allocation of that sector was set by the SMART firmware, that should then allow the correct contents of the sector to be written to the re-allocated sector. That self-healing of un-readable sectors is only possible if you have a raid system that knows (or can re-construct) the correct contents. Joe L.
rattlehead05 Posted December 8, 2011 Author Posted December 8, 2011 I see what you mean. I am only able to use two physical disks in this instance, so I would not be able to failover share write ops to a different drive. Oh well, the software still does what I need it to do. Thanks for the clarification!
rattlehead05 Posted December 8, 2011 Author Posted December 8, 2011 Oh nice, it does still do failover writing with only one data drive. I had the system running with one parity drive & disk1, began transferring an iso file to a share, pulled the disk1 out in the middle of the transfer, and the copy continued uninterrupted. After I shut down the array & had the software rebuild disk1, the completed iso file was in the correct place.
Joe L. Posted December 8, 2011 Posted December 8, 2011 I see what you mean. I am only able to use two physical disks in this instance, so I would not be able to failover share write ops to a different drive. Oh well, the software still does what I need it to do. Thanks for the clarification! There is no such thing as a "failover". unRAID will never write to a different data drive if one in the array has failed. ALL "writes" involve only two drives... the specific data drive being written to, and the parity drive. If the drive being written to is disabled because a write to it has failed, then only the parity drive is written. Parity in this case is based on the re-constructed block of data in the block being written (as calculated from parity and all the remaining data drives) combined with the new block being written. If your one data drive fails, the parity disk is still written to exactly as if the data drive was still online. There is no difference. You can still read and write to the array... you probably would never even notice the difference in most cases... (if you had multiple data drives, you would see them all spin up so they could be read to reconstruct the data on the failed driv being accessed) When you re-construct, the parity drive (usually in combination with all the other remaining data drives) is used to reconstruct the data of the failed drive. Since you have no other data drives, the parity drive alone is used to re-construct your data on the one data drive being re-constructed.
lionelhutz Posted December 8, 2011 Posted December 8, 2011 He posted that he pulled the the drive during a read operation to test that the data would be simulated from parity and that he now wanted to put the disk back. If this is true, then an initconfig is a fine way to put the disk back. If the OP also wrote to the simulated disk and wanted to keep that new data then he should have posted so. Don't edit my posts and comment that my advice is wrong when I answer the OP's question.
Joe L. Posted December 8, 2011 Posted December 8, 2011 He posted that he pulled the the drive during a read operation to test that the data would be simulated from parity and that he now wanted to put the disk back. If this is true, then an initconfig is a fine way to put the disk back. If the OP also wrote to the simulated disk and wanted to keep that new data then he should have posted so. Don't edit my posts and comment that my advice is wrong when I answer the OP's question. As I said, failed reads do not take a disk off-line. He said he removed the drive when it was being read, Read failures do not take a disk off-line, they just show on the statistics web-page in the management utility as read errors. The only way a drive is disabled (marked as "red") is if a write to it has failed. Therefore, a "write" failure had occurred. This is not a matter of opinion, or my thoughts vs. yours. unRAID detected that a "write" failed, and it disabled the disk. I apologize for editing your post, but the advice to use "initconfig" , even if it has some potential to be correct for this specific thread since the user is just playing with his array and has no critical data stored on it. In my opinion, it is basically bad advice for a less experienced unRAID user who searches in the future for "how to re-introduce a failed drive into an array" and finds it. The better advice is to re-construct the failed disk, not invalidate parity and assume the failed disk's contents are perfectly fine. Joe L.
lionelhutz Posted December 8, 2011 Posted December 8, 2011 What happens to the disk indicator when you physically disconnect a drive, check the "I'm sure I want to start" box and then start the array? I'm pretty sure the disk indicator doesn't stay green.... I'm also positive you can not stop the array and re-assign the disk and have it accepted again.... All without doing any writing to the server. Also, have you pulled, ie unplugged a disk while it was in use? Does it stay green if you do that? As for confusing some newby that searches. I anwer the question. I don't answer every thread assuming someone with a different problem will read it in the future and not understand the thread yet still follow what the answer says. Otherwise, every answer needs to be a long explanation covering any possible similar circumstance that might come up in the future. And finally, I would rather see the OP re-use his data disk if possible rather then reconstruct it. What happens to the rebuild if his parity isn't correct? The data disk either has problems or all data is lost. I consider it riskier to rebuild a disk compared to rebuilding parity and wouldn't recommend doing it when it's not necessary. Peter
Joe L. Posted December 9, 2011 Posted December 9, 2011 What happens to the disk indicator when you physically disconnect a drive, check the "I'm sure I want to start" box and then start the array? I'm pretty sure the disk indicator doesn't stay green.... If the disk was disconnected before you rebooted, it would be marked as DISK_MISSING (different than DISK_DISABLED, but still a "red" indicator) If you disconnected the disk after booting, but before starting the array, the indicator will turn red, but it is because the first "write" to the disk fails. That first write occurs when the disk is mounted. Whenever any file-system is mounted, unless it is mounted as read-only, its superblock is updated and it is "written" You can see this by rebooting your array and looking at the read and write statistics. You will see that all the data disks have had "writes" to them. Read failures do not force the disk to be disabled... that first failed write does. The user said he pulled the disk while it was being read. The read failure would result in parity and all the other disks being used to compute the disk block that could not be read. It is my understanding that unRAID then "writes" the computed block back so the SMART firmware can re-allocate the sector if it can. If that write fails, the disk is taken out of service. I'm also positive you can not stop the array and re-assign the disk and have it accepted again.... A different disk will be accepted. It is only because we are trying to get unRAID to forget that it is the same disk we have to go through hoops. All without doing any writing to the server. Even though you are not writing files, the disks are being written to. Also, have you pulled, ie unplugged a disk while it was in use? Does it stay green if you do that? No, they will turn red, but only because the corrective "write" fails. As for confusing some newby that searches. I anwer the question. I don't answer every thread assuming someone with a different problem will read it in the future and not understand the thread yet still follow what the answer says. Otherwise, every answer needs to be a long explanation covering any possible similar circumstance that might come up in the future. And finally, I would rather see the OP re-use his data disk if possible rather then reconstruct it. What happens to the rebuild if his parity isn't correct? The data disk either has problems or all data is lost. I consider it riskier to rebuild a disk compared to rebuilding parity and wouldn't recommend doing it when it's not necessary. Peter I know you are helpful... and do not want to make any more of an issue of this other than to explain my position. I've helped enough users through awful messes they've gotten into because they half read what has been written in other threads and in the wiki. In some cases, stuff I've written. I've learned that a simple answer is not enough in many cases. This, I felt, was one where the initialization of a new disk configuration was the wrong answer. The user expressed surprise that parity was being re-computed rather than the data disk being re-constructed. I explained why. Please accept my explanation, even if you do not agree with the best way to handle a disk that is failing. I do see your point in not re-constructing a disk if you suspect parity is wrong. In that situation, it would be better to recover the data first. No matter what we say in this thread, each situation must take into consideration all that is known. That task is not always easy when someone's priceless family photos are the files at risk.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.