July 11, 201411 yr I have a failed drive, and I figured I'll take this opportunity to upgrade my largest drive to 3TB. I will also need to upgrade parity to 3TB since otherwise I won't be able to utilize the full drive; all my other drives are 2TB. I wanted to check if my steps look ok before I hose the array. Here actions to be taken: Replace failed drive with cache drive Replace 2TB parity with a new 3TB drive Add a new 3TB data drive Switch old parity drive to become new cache drive Detailed steps: [*]move stuff to safety off cache drive [*]array offline [*]disable failed drive [*]start the array so it starts with a missing drive and warns the array is unprotected [*]stop the array [*]set cache drive to be data drive that was just disabled [*]start array [*]wait till data drive rebuild is complete [*]shut down array [*]physically remove failed drive, install new drives [*]start array [*]pre-clear new data drives [*]array offline [*]assign first new 3TB drive as parity [*]start array [*]wait till data drive rebuild is complete [*]array offline [*]add second new 3TB drive to array as data drive [*]assign old parity drive as cache drive [*]array online [*]rejoice About right? edit: updated to incorporate suggestions in two first replies.
July 11, 201411 yr Am I right in the following two assumptions? Your cache drive is at least as large as the failed drive and you wish to rebuild the failed drive onto it? You want to end up with one more drive than you have at present (and this will be a 3TB drive). On that basis here are some comment on the proposed steps After step 3 you should start the array (so it starts with a missing drive and warns the array is unprotected) Stop the array again and proceed with step 4. Step 8 could be done after anytime after step 3. I assume that it is delayed just for convenience? You could also remove quite a few steps by following the swap-disable process where you upgrade the parity drive and rebuild a failed drive at the same time using the old parity drive. This would remove the need for you to do anything with the cache disk. This basically involves Carry out any confidence checks on the 3TB drives that you want to do prior to using them in anger. You could probably do this by carrying out a pre_clear on both the 3TB disk to be used as replacement parity and the other 3TB to be added as an additional drive. These could be done in parallel. Also if necessary the failed disk could be removed if its space is needed and put in a safe place until recovery is completed (in case it is required for emergency attempts at data recovery if a step fails). Stop the array Assign the old parity drive as a replacement for the failed drive, and the new 3TB drive as parity. Start the array - this will first copy the old parity drive contents to the new 3TB parity drive, and then rebuild the failed drive onto the old drive. The GUI should advise you that is going to do this before you press the Start Array button. Carry out a non-correcting parity check as a confidence check when rebuild of the failed disk has completed Add the other 3TB drive as another data disk to the array. Ideally this will have been pre-cleared to make the addition fast. Assuming it has not physically failed you might want to carry out a pre_clear cycle on the 'failed disk' to see if it is a permanent failure or was just a glitch However if you are happier going through the longer process you detailed then I do not see any issue.
July 11, 201411 yr Some additional thoughts. Probably best if you compile our suggestions and restate your intended steps for further confirmation. Step 6 says "wait till parity is ok". Shouldn't this be something like "wait till data drive rebuild is complete"? Also, you say you want to upgrade your largest drive to 3TB, but then step 16 says "add". Shouldn't this be a "replace" step? And then step 18 would be another "rebuild" instead of "wait till parity is ok"? Or do you actually intend to add a drive instead of upgrade an existing one? If instead you do mean to add a precleared drive to the array then there won't be any step 18 as a precleared drive already matches parity.
July 11, 201411 yr Author OP updated with both suggestions, thank you! Am I right in the following two assumptions? Your cache drive is at least as large as the failed drive and you wish to rebuild the failed drive onto it? You want to end up with one more drive than you have at present (and this will be a 3TB drive). That is correct. I'll take a closer look at your simpler suggestion!
July 11, 201411 yr OP updated with both suggestions, thank you! I assume that is on your original steps rather than the swap-disabled based one (as on the swap-disable approach there would not be any steps involving the cache drive) ?
July 11, 201411 yr Your step 16 should now be parity rebuild instead of data rebuild. If you don't want to do swap-disable as suggested by itimpi, you could shorten this by: [*]move stuff to safety off cache drive [*]stop the array [*]unassign cache drive [*]shut down array [*]physically remove failed drive, install new drives [*]start the array so it starts with a missing drive and warns the array is unprotected [*]assign old cache drive to missing data drive that was just removed [*]start array [*]wait till data drive rebuild is complete [*]pre-clear new data drives [*]array offline [*]assign first new 3TB drive as parity [*]start array [*]wait till parity drive rebuild is complete [*]array offline [*]add second new 3TB drive to array as data drive [*]assign old parity drive as cache drive [*]array online [*]rejoice You could do the preclears in parallel at any time after step 5, but it won't make a lot of difference. The 2TB rebuild at step 9 should only take a few hours whereas one cycle of preclear on 3TB takes me about 30 hours. While waiting on the preclears you might do a non-correcting parity check after step 9 to make sure everything went OK. I think when you assign the old parity as the new cache it should just format it which will only take a few minutes. No need for a cache drive to be clear.
July 11, 201411 yr Only 1 major thing to add, but I happen to believe it's pretty important. 0. Take smart reports of all involved drives, and examine them for issues, posting them for others to look at if you wish. Any time you are recreating a drive using parity, you are asking every other array drive to perform a full flawless read, so it's nice to have an idea ahead of time if another drive is marginal. Also, if you have addons, I'd disable those until the process is complete. The only thing I personally would have running would be apcupsd and screen.
July 11, 201411 yr Author Your step 16 should now be parity rebuild instead of data rebuild. If you don't want to do swap-disable as suggested by itimpi, you could shorten this by: [*]move stuff to safety off cache drive [*]stop the array [*]unassign cache drive [*]shut down array [*]physically remove failed drive, install new drives [*]start the array so it starts with a missing drive and warns the array is unprotected [*]assign old cache drive to missing data drive that was just removed [*]start array [*]wait till data drive rebuild is complete [*]pre-clear new data drives [*]array offline [*]assign first new 3TB drive as parity [*]start array [*]wait till parity drive rebuild is complete [*]array offline [*]add second new 3TB drive to array as data drive [*]assign old parity drive as cache drive [*]array online [*]rejoice You could do the preclears in parallel at any time after step 5, but it won't make a lot of difference. The 2TB rebuild at step 9 should only take a few hours whereas one cycle of preclear on 3TB takes me about 30 hours. While waiting on the preclears you might do a non-correcting parity check after step 9 to make sure everything went OK. I think when you assign the old parity as the new cache it should just format it which will only take a few minutes. No need for a cache drive to be clear. Thank you, this looks better! The SMART reports are a good idea, I'll take a look at them to see if there's anything worrying. Several hours to go on copying stuff off cache drive...
July 12, 201411 yr Author Ran SMART reports on all good disks. Below Disk 3, which has a normal Raw Read Error Rate, no reallocated sectors, but its Seek Error Rate has a value of 71, and a very high raw value. Further down is disk 5, with 1200 reallocated sectors; but the Value is only 99 suggesting that the reallocated sector count is within limits? It also has similarly high Seek Error Rate as on Disk 3. What do these mean? Any actions I should take before moving on with the array overhaul? Other disks have no reallocated sectors or seek errors, and raw read error rate is at or near 100 (+/-6) Disk 3: ID# ATTRIBUTE NAME FLAG VALUE WORST THRESH TYPE UPDATED FAILED RAW VALUE 1 Raw Read Error Rate 0x000f 114 099 006 Pre-fail Always Never 66742376 3 Spin Up Time 0x0003 093 092 000 Pre-fail Always Never 0 4 Start Stop Count 0x0032 100 100 020 Old age Always Never 243 5 Reallocated Sector Ct 0x0033 100 100 036 Pre-fail Always Never 0 7 Seek Error Rate 0x000f 071 060 030 Pre-fail Always Never 12897904 9 Power On Hours 0x0032 091 091 000 Old age Always Never 8361 10 Spin Retry Count 0x0013 100 100 097 Pre-fail Always Never 0 12 Power Cycle Count 0x0032 100 100 020 Old age Always Never 55 183 Runtime Bad Block 0x0032 100 100 000 Old age Always Never 0 184 End-to-End Error 0x0032 100 100 099 Old age Always Never 0 187 Reported Uncorrect 0x0032 100 100 000 Old age Always Never 0 188 Command Timeout 0x0032 100 100 000 Old age Always Never 0 189 High Fly Writes 0x003a 100 100 000 Old age Always Never 0 190 Airflow Temperature Cel 0x0022 069 057 045 Old age Always Never 31 (Min/Max 23/42) 191 G-Sense Error Rate 0x0032 100 100 000 Old age Always Never 0 192 Power-Off Retract Count 0x0032 100 100 000 Old age Always Never 22 193 Load Cycle Count 0x0032 100 100 000 Old age Always Never 243 194 Temperature Celsius 0x0022 031 043 000 Old age Always Never 31 (0 21 0 0 0) 195 Hardware ECC Recovered 0x001a 018 009 000 Old age Always Never 66742376 197 Current Pending Sector 0x0012 100 100 000 Old age Always Never 0 198 Offline Uncorrectable 0x0010 100 100 000 Old age Offline Never 0 199 UDMA CRC Error Count 0x003e 200 200 000 Old age Always Never 0 240 Head Flying Hours 0x0000 100 253 000 Old age Offline Never 88695369632457 241 Total LBAs Written 0x0000 100 253 000 Old age Offline Never 1133177506 242 Total LBAs Read 0x0000 100 253 000 Old age Offline Never 971490899 Disk 5: ID# ATTRIBUTE NAME FLAG VALUE WORST THRESH TYPE UPDATED FAILED RAW VALUE 1 Raw Read Error Rate 0x000f 112 090 006 Pre-fail Always Never 42470816 3 Spin Up Time 0x0003 093 091 000 Pre-fail Always Never 0 4 Start Stop Count 0x0032 098 098 020 Old age Always Never 2898 5 Reallocated Sector Ct 0x0033 099 099 036 Pre-fail Always Never 1200 7 Seek Error Rate 0x000f 074 060 030 Pre-fail Always Never 25954151150 9 Power On Hours 0x0032 058 058 000 Old age Always Never 37041 10 Spin Retry Count 0x0013 100 100 097 Pre-fail Always Never 0 12 Power Cycle Count 0x0032 100 100 020 Old age Always Never 233 183 Runtime Bad Block 0x0032 099 099 000 Old age Always Never 1 184 End-to-End Error 0x0032 100 100 099 Old age Always Never 0 187 Reported Uncorrect 0x0032 001 001 000 Old age Always Never 1160 188 Command Timeout 0x0032 100 097 000 Old age Always Never 8590065670 189 High Fly Writes 0x003a 084 084 000 Old age Always Never 16 190 Airflow Temperature Cel 0x0022 069 046 045 Old age Always Never 31 (Min/Max 23/43) 191 G-Sense Error Rate 0x0032 100 100 000 Old age Always Never 0 192 Power-Off Retract Count 0x0032 100 100 000 Old age Always Never 80 193 Load Cycle Count 0x0032 099 099 000 Old age Always Never 2911 194 Temperature Celsius 0x0022 031 054 000 Old age Always Never 31 (0 17 0 0 0) 195 Hardware ECC Recovered 0x001a 016 005 000 Old age Always Never 42470816 197 Current Pending Sector 0x0012 100 094 000 Old age Always Never 0 198 Offline Uncorrectable 0x0010 100 094 000 Old age Offline Never 0 199 UDMA CRC Error Count 0x003e 200 200 000 Old age Always Never 0 240 Head Flying Hours 0x0000 100 253 000 Old age Offline Never 108954730383465 241 Total LBAs Written 0x0000 100 253 000 Old age Offline Never 1061879919 242 Total LBAs Read 0x0000 100 253 000 Old age Offline Never 1982410072
July 12, 201411 yr I tend to only worry about the re-allocated (actual or pending) counts. many raw values can be manufacturer dependent and therefore their significance hard to determine. Disk 5 with 1200 reallocated sectors is certainly one I would be looking at very closely and consider replacing. With reallocated sectors the raw value is the actual count of reallocated sectors. In principle as long as the reallocated count stays stable the actual value does not matter, but I have found that once it gets to anything other than a relatively small value then the count keeps growing. If it is continually growing then it is certainly a sign that failure might be imminent. If the disk is still under warranty then I would expect a count that high to be sufficient for a RMA to be accepted by the manufacturer.
July 12, 201411 yr Author Data rebuild on-going, and pre-clearing the first 3TB disk (2nd one is hosting my cache files). The failed drive is a WD Green from Feb/Mar, and I ran two pre-clear cycles on it with no issues to report. I wonder if a third cycle would help in the future. Thanks itimpi, I'll keep an eye on Disk 5 to see if it sprouts more re-allocated sectors.
July 14, 201411 yr Author So I decided to run an extra parity check after failed drive data rebuild was done (after step 9) as suggested by Jonathan. Good thing I did: disk 1 has 872 errors on it now, and went red. That was a bit too close a call, as there was only a few hours after data rebuild and when the array went yellow again I wonder if there are plans to include the option to have a second parity drive. I see a lot of talk about VMs in the future versions which don't do anything to me, but second parity drive would improve security for everyone who has a lot of drives in their arrays.
July 14, 201411 yr I wonder if there are plans to include the option to have a second parity drive. I see a lot of talk about VMs in the future versions which don't do anything to me, but second parity drive would improve security for everyone who has a lot of drives in their arrays. This has been on the "Wish List" for some time. What has been talked about is so-called diagonal parity for the second parity drive (as opposed to a simple additional copy of the parity). The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always possible with the current single parity drive. However, implementing this will be non-trivial so I would be extremely surprised if it appeared in the first v6 production release. Testing would have to be extremely thorough as if it went wrong there would be a high likelihood of data loss. Whether it becomes a roadmap item for beyond the initial v6 release we will have to wait and see.
July 14, 201411 yr The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always not possible with the current single parity drive.Fixed that for you. The current system can not ever tell you for sure which drive(s) are in error. We can make educated guesses based on smart reports and possible data corruption, but without checksum comparison on all your data, there is no way to tell which drive is wrong.
July 15, 201411 yr Author This has been on the "Wish List" for some time. What has been talked about is so-called diagonal parity for the second parity drive (as opposed to a simple additional copy of the parity). The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always possible with the current single parity drive. However, implementing this will be non-trivial so I would be extremely surprised if it appeared in the first v6 production release. Testing would have to be extremely thorough as if it went wrong there would be a high likelihood of data loss. Whether it becomes a roadmap item for beyond the initial v6 release we will have to wait and see. That would be a great feature. I don't know the needs of the overall unRAID community, but I would imagine a second parity drive would be a feature benefiting a large portion of the user base, and open unRAID for even more secure applications.
July 16, 201411 yr Author Now that I have a second dead drive, it's back to the drawing board as I'm running into further problems on how to manage data migration. I have two 3TB drives, one empty and pre-cleared installed in The Monolith, and another on my desktop as backup. The 4TB drive which will become the new parity drive is on its way. I found good suggestions in this thread. It looks like the best option is to move the data from the second failed Disk 1 to safety to the 3TB drive on my desktop. After that is done I will make a new config, excluding failed Disk 1. The new 4TB disk will be made parity, and old 2TB parity drive will become new cache. Whew. Below updated steps in bold, and tried to streamline the approach. [*]move stuff to safety off cache drive [*]stop the array [*]unassign cache drive [*]shut down array [*]physically remove failed drive, install new drives [*]start the array so it starts with a missing drive and warns the array is unprotected [*]assign old cache drive to missing data drive that was just removed [*]start array [*]wait till data drive rebuild is complete <--- Disk 1 failed right after this step [*]pre-clear first 3TB data drive array offline move stuff away from Disk 1 to safety shut down array physically install 4TB drive turn array on, but don't start it pre-clear 4TB drive reset array setup via unRAID Web UI: Utils/New Config reassign the drives you want to KEEP only, leaving failed Disk 1 unassigned, set pre-cleared 4TB drive as parity, and old parity drive as a cache drive start array wait till parity drive rebuild is complete move stuff back to array run an extra parity check shut down array physically add second new 3TB drive to array as data drive array online pre-clear second 3TB drive array offline add second 3TB drive to the array as data drive start array rejoice I would be adding the first empty pre-cleared 3TB data drive into the array in step 18. I assume it is ok to add an empty drive and a new parity drive to a "new config" at the same time? Or should I first add the 4TB parity, let parity drive to build parity, then add 3TB data drive, let parity rebuild again?
July 16, 201411 yr After step 17, "New config", you can just assign all the drives as you wish, and build parity from there. The 2nd 3TB drive can be added at that point, I see no need to build parity and then assign it. However, after reading back through your post, I'm wondering if that 2nd 3TB drive is where your data that you copy off of disk1 is going to live temporarily. I'm also not sure why you want to rebuild parity again in step 24. Valid parity is not disturbed by adding a data drive, that's why unraid either clears it, or you run a preclear pass to set all zeroes so parity is maintained. If you only have 1 red ball, you should be able to copy the data off of disk1 (emulated by parity) onto your newly installed blank 3TB. As you can tell, I'm a little confused as to where you actually are in the process, and what state your array is in right now, and what data is where.
July 16, 201411 yr Author After step 17, "New config", you can just assign all the drives as you wish, and build parity from there. The 2nd 3TB drive can be added at that point, I see no need to build parity and then assign it. Thanks. I won't be adding the 2nd 3TB drive at this point, as it's hosting backups. The 1st 3TB pre-cleared drive will be added to the array at this point, see below for further. However, after reading back through your post, I'm wondering if that 2nd 3TB drive is where your data that you copy off of disk1 is going to live temporarily. Correct. The 2nd 3TB drive is currently in a USB dock attached to my desktop. I'm also not sure why you want to rebuild parity again in step 24. Valid parity is not disturbed by adding a data drive, that's why unraid either clears it, or you run a preclear pass to set all zeroes so parity is maintained. Good catch, deleted. If you only have 1 red ball, you should be able to copy the data off of disk1 (emulated by parity) onto your newly installed blank 3TB. I can't add the 1st 3TB drive to the array as a data drive, as the current parity drive is only 2TB. I don't want to mess with utils to turn it into a 2TB drive as there are already way too many ways things can go wrong as is. Therefore I need to add the 4TB drive as new parity, and for that to happen I need to rebuild parity - and I can't rebuild parity with a failed drive in the array.
July 17, 201411 yr Ok. I think I've got a little better understanding of where you are. Apparently at this point there is not enough free space on the array as is to move the data off of drive1 onto another array drive. So, at this point, I think you have a pretty good handle on it. Only 1 more nit I can think of right now, and that is between 21 and 22, where you add the 3TB as a data drive, you should probably run 1 preclear cycle on it after you are sure the data that was on it is safely on the array. That way you won't have to wait several hours while unraid clears it to add to the parity protected array. On second thought, if you don't already have any backups, just keep that drive packed safely away, and get another 4TB to add to the array. That way you have a good start on a backup routine. As you have found out, unraid is NOT a backup by itself.
July 17, 201411 yr Author Yes, you're right, I forgot to add the proper steps to add the second 3TB drive, updated. I also added another parity check round right after moving stuff back to the array to be safe. I'm fully aware unRAID is not backup. I have multiple backups of the critical stuff, fortunately. Thanks!
July 20, 201411 yr Author More tweaks to the process after I realized that if I moved the old 2TB parity drive (to be replaced by a new 4TB drive) to cache, it would not work as a warm spare for the 3TB data drives, or the 4TB parity drive. I decided to bite the bullet, and ordered a second 4TB drive to use as cache. I'll use it as a warm spare, like I used the previous cache drive. The first 3TB data drive and 4TB parity drive to-be are now both pre-cleared with two cycles. Setting up a "new config" I moved the old parity drive to be the old Disk 1, and the first new 3TB drive to the end of the array, as Disk 12. Doing "new config" and re-assigning all the drives is a pretty nerve-wrecking experience I triple-checked that the serial numbers were correct, and that the order and slots are the same (not sure if that is necessary). unRAID asked to format the old parity (now Disk 1) and the new 3TB drive (Disk 12) which took less than five minutes. Curiously, it didn't ask to format the new 4TB parity drive, and just went straight to parity sync after the other two drives were formatted. Working as intended I'm sure; I don't recall whether the same happened when I first set her up. After rebuild is complete, I'll move stuff back to the array, pre-clear the 3TB drive to assign as data drive, and start RMAing the old 2TB WD Greens.
July 20, 201411 yr ... Curiously, it didn't ask to format the new 4TB parity drive, and just went straight to parity sync after the other two drives were formatted. Working as intended I'm sure; I don't recall whether the same happened when I first set her up. ... Parity does not contain any files, and so has no file system, so it does not need to be formatted. Formatting means creating an empty file system. It does not mean erasing a drive or similar. If it had wanted to format parity then I would have thought something was wrong somewhere. It is still a good idea to preclear a parity drive just to test it though. When you get your warm spare, it is also a good idea to preclear a cache disk (as long as it's not an SSD) to test it even though a cache drive does not need to be clear (it doesn't need to match parity). A cache drive does have a file system, though, so it will be formatted.
July 21, 201411 yr After rebuild is complete, I'll move stuff back to the array, pre-clear the 3TB drive to assign as data drive, and start RMAing the old 2TB WD Greens.Since you have been having issues, I'd do a non-correcting parity check after the parity build is complete, and pull new smart reports on all the drives. That way you have a fresh benchmark of the health of your array before you start trusting it again.
August 4, 201411 yr Author Well, it took a while, but now the array is back up and running with new 3TB data drives, new 4TB parity and cache drive as a warm spare Thanks to multiple backups, I was able to recover everything. I had some pretty scary situations, not least of which was a fire in the restaurant downstairs while I was doing all the recovery. Reminded me to get my backups up and running. Thank you for all your help, couldn't have done this without you!
Archived
This topic is now archived and is closed to further replies.