January 19, 200917 yr Question for you guys. I was in the middle of upgrading the parity disk to a larger one and one of the disks in my array started having a ton of errors. I'm not sure if this drive is tanking or not but I want to restore the array with the old parity drive so I can get the info off this potentially dead drive before I change out the parity disk. My problem is that when I switched out the parity drives the old drive still needs to have parity sync and won't finish because of the bad drives errors. Isn't there a way to restore the previous parity drive with out parity sync? Can someone help me out with the exact steps I should take to repair this issue I have? Another issue I'm going to run into is that the only drive i have to replace the potentially bad drive is larger than the previous parity (which is why I was upgrading parity). I'm somewhat at a loss as to how to fix this. maybe move the data off the bad disk, remove the disk from the array and then upgrade parity? I have room in the array for the data that is on the bad drive so this could possibly work...
January 19, 200917 yr Question for you guys. I was in the middle of upgrading the parity disk to a larger one and one of the disks in my array started having a ton of errors. I'm not sure if this drive is tanking or not but I want to restore the array with the old parity drive so I can get the info off this potentially dead drive before I change out the parity disk. My problem is that when I switched out the parity drives the old drive still needs to have parity sync and won't finish because of the bad drives errors. Isn't there a way to restore the previous parity drive with out parity sync? Can someone help me out with the exact steps I should take to repair this issue I have? Just below is a procedure to restore the array with the old parity drive, but I think the first step is to figure out what happened to that data drive with errors. If you happened to capture the syslog from that session, great! please attach it. If not, then obtain a SMART report for that drive, and post it and your current syslog, so we can check them. When you get a whole bunch of errors all of a sudden, then it could be a very simple problem, like a cable has come loose or an IRQ was disabled etc, and is not really a drive problem at all. Here's a procedure that I think should work, if you don't want to wait. Re-install your old parity drive, and boot the array. If the array starts, Stop it. If a parity check begins, Cancel it immediately. Now use the Trust My Array procedure to re-validate your array. Follow the instructions very carefully, except in the very last one about clicking the Start button, you want to be ready to immediately click the Cancel button, to abort the parity check. If you don't see a Cancel button, then refresh the screen, and it should appear. You don't want the parity check to make any 'corrections' to your good parity drive. This all depends on there not being any errors at the very beginning of that possibly failing drive. Another issue I'm going to run into is that the only drive i have to replace the potentially bad drive is larger than the previous parity (which is why I was upgrading parity). I'm somewhat at a loss as to how to fix this. maybe move the data off the bad disk, remove the disk from the array and then upgrade parity? I have room in the array for the data that is on the bad drive so this could possibly work... The next step I think, depends on what is determined about the 'bad' drive. I don't recommend a Swap-Disable procedure here, until you can run a complete parity check without any errors. Since one drive produced errors, I would say there is a higher chance that there may be other problems on other drives, and I suspect you have not run a parity check recently. The one step for sure, copy the most important data elsewhere.
January 19, 200917 yr This description is pretty vague. I am not sure if, by errors, we are talking about parity sync errors, or if we are talking about drive errors (the error column is getting incremented). It's also not clear if the drive has a red ball or not. If it is drive errors, then you need to run a SMART report. If the SMART report is showing sector relocations, then you are likely really having a problem with the drive. But if the SMART report is pretty clean, perhaps some logged errors of unknown commands, THAT means you have some sort of connectivity problem between the computer and the drive. It might be a bad cable, it might be a bad backplane, or it could mean a bad or incompatible motherboard. I had a very nasty cabling problem that came and went (seemed every time I put my hand inside the case I'd get a slightly different symptom). I rebuilt (successfully) onto 2 new drives, each one failing after a time. Turned out that I needed a locking SATA cable due to a lose SATA port on the motherboard. Now that it is healthy I have thoroughly checked the two drives I replaced and they are fine. RobJ has laid out a good plan if the drive is really bad - but if it were me I'd try to make sure that was the case before proceeding. If you do have connectivity problems, replacing the drive may not fix anything.
January 20, 200917 yr Author Thanks for the reply's guys! Here is a copy of the syslog from the problem time. I also finally figured out how to get a smart report on 4.4 final. The errors I'm seeing are the ones listed on the reads, writes, errors table. I took the drive that was causing the problem and connected it with another SATA cable and still had the same issue. Basically it gets about 42% or so through the Parity-Sync at around 40 to 50 mb/s then drops to 300 to 500 kb/s and starts listing errors on the table for disk 3. Drive does not have a red ball. I was able to access the drive and copy almost all the information off of it on to another drive in the array. It would not copy one file completely and I didn't copy down the entire error message windows gave me but it was along the lines of "can not continue copy, lost communication with drive" or something to that effect. Sorry I don't have the exact message, I can get it if I need to but it takes 20 mins of copying to get the error and I didn't want to do it again if I didn't need to. Thanks for looking into this! Chris Guess the syslog is too big even after zipping it... here is is online... http://www.mediafire.com/?sharekey=a6816afc7639f1b307258ee67c679e4ae04e75f6e8ebb871
January 20, 200917 yr This disk is in seriously bad shape, definitely in need of replacement. There are 201 reallocated sectors and 563 pending reallocated sectors. unRAID had a terrible time with the drive - hope that parity is still usable for drive rebuild.
January 20, 200917 yr Author well that isn't good... Ok, so I got most of the information off the disk and am willing to sacrifice the remaining data to get the array back up and running. What steps should I take now? Is this the time to use the restore button? I'm thinking remove the bad drive and rebuild parity on the remaining drives.
January 20, 200917 yr well that isn't good... Ok, so I got most of the information off the disk and am willing to sacrifice the remaining data to get the array back up and running. What steps should I take now? Is this the time to use the restore button? I'm thinking remove the bad drive and rebuild parity on the remaining drives. DO NOT PRESS THE RESTORE BUTTON!!!! It will not restore anything (horrible name). If you check the newbie section of the "Best of the Forums", see link in my sig, you can read about how bad the name is. Restore will invalidate your parity, and not allow unRAID to rebuild a disk. (Continued in next post - I wanted to send this ASAP)
January 20, 200917 yr I would suggest powering down and removing the defective drive (physically) from the computer, or at least disconnecting its cable. Then reboot. The array should come online with a red ball beside the removed disk. In this mode, unRAID will simulate the failed disk. If you were to do a drive rebuild, this is what you would get. You should go to that disk and look around at the simulated data. If it contains music, try to plan a song. Movies, try to play the movie. Zip files, do an integrity test. Anything that you can to do spot check data accuracy is a good idea. If it looks bad or you are getting weird errors, we'll need to consider what to do next. If it looks good and if there is important data that you have not been able to copy off, copy it off now. You should then get a replacement disk, install it, and when you bring up unRAID, the disk should have a blue ball next to it and there should be a message that it plans to rebuild the disk. When you start the array, the rebuild will occur. (There are more detailed directions on how to do a drive rebuild in the FAQ.)
January 21, 200917 yr I believe that Brian, probably because he was immediately concerned about the dangers of improper use of the Restore button, may have forgotten or not realized that you did not have a valid parity drive installed. Because you don't, the picture changes quite a bit. I see 2 possible courses of action. One would be the course you already suggested, to drop the bad Disk 3, and rebuild parity without it. That would mean clicking the Restore button, then Starting the array, which will build parity onto the new parity drive from the remaining data drives (Disk 1, Disk 4, Disk 5). The contents of Disk 3 would be lost. The other choice is to keep Disk 3 installed, re-install the old parity drive, and do the Trust My Array procedure to re-validate your array. Just remember to immediately Cancel the parity check. At this point, you are back where you started, but with a considerably better knowledge of the state of your array. I would not access Disk 3 at all, but the rest of the array can be used, while you purchase a replacement drive for Disk 3, as large or larger than the current Disk 3 but less than or equal to the size of the *old* parity drive. Once you have it and have tested it, then remove the current Disk 3, and install the replacement drive, and rebuild it. Once the array is good, I would run a last parity check to make sure there are no other hidden problems. Then you can return to replacing the parity drive, where all of this started.
January 21, 200917 yr No, I was just brain dead. Thanks for unearthing the err of my ways. I wasn't thinking that since the parity build didn't finish, that he had no valid parity.
Archived
This topic is now archived and is closed to further replies.