Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Failed data drive, switching it with cache and upgrading parity and data drives

Featured Replies

I have a failed drive, and I figured I'll take this opportunity to upgrade my largest drive to 3TB. I will also need to upgrade parity to 3TB since otherwise I won't be able to utilize the full drive; all my other drives are 2TB. I wanted to check if my steps look ok before I hose the array.

 

Here actions to be taken:

 

  • Replace failed drive with cache drive
  • Replace 2TB parity with a new 3TB drive
  • Add a new 3TB data drive
  • Switch old parity drive to become new cache drive

 

Detailed steps:

[*]move stuff to safety off cache drive

[*]array offline

[*]disable failed drive

[*]start the array so it starts with a missing drive and warns the array is unprotected

[*]stop the array

[*]set cache drive to be data drive that was just disabled

[*]start array

[*]wait till data drive rebuild is complete

[*]shut down array

[*]physically remove failed drive, install new drives

[*]start array

[*]pre-clear new data drives

[*]array offline

[*]assign first new 3TB drive as parity

[*]start array

[*]wait till data drive rebuild is complete

[*]array offline

[*]add second new 3TB drive to array as data drive

[*]assign old parity drive as cache drive

[*]array online

[*]rejoice

 

About right?

 

edit: updated to incorporate suggestions in two first replies.

Am I right in the following two assumptions?

  • Your cache drive is at least as large as the failed drive and you wish to rebuild the failed drive onto it?
  • You want to end up with one more drive than you have at present (and this will be a 3TB drive).

 

On that basis here are some comment on the proposed steps

  • After step 3 you should start the array (so it starts with a missing drive and warns the array is unprotected)
  • Stop the array again and proceed with step 4.
  • Step 8 could be done after anytime after step 3.  I assume that it is delayed just for convenience?

You could also remove quite a few steps by following the swap-disable process where you upgrade the parity drive and rebuild a failed drive at the same time using the old parity drive.  This would remove the need for you to do anything with the cache disk.  This basically involves

  • Carry out any confidence checks on the 3TB drives that you want to do prior to using them in anger.  You could probably do this by carrying out a pre_clear on both the 3TB disk to be used as replacement parity and the other 3TB to be added as an additional drive. These could be done in parallel.  Also if necessary the failed disk could be removed if its space is needed and put in a safe place until recovery is completed (in case it is required for emergency attempts at data recovery if a step fails).
  • Stop the array
  • Assign the old parity drive as a replacement for the failed drive, and the new 3TB drive as parity.
  • Start the array - this will first copy the old parity drive contents to the new 3TB parity drive, and then rebuild the failed drive onto the old drive.  The GUI should advise you that is going to do this before you press the Start Array button.
  • Carry out a non-correcting parity check as a confidence check  when rebuild of the failed disk has completed
  • Add the other 3TB drive as another data disk to the array.  Ideally this will have been pre-cleared to make the addition fast.
  • Assuming it has not physically failed you might want to carry out a pre_clear cycle on the 'failed disk' to see if it is a permanent failure or was just a glitch

However if you are happier going through the longer process you detailed then I do not see any issue.

Some additional thoughts. Probably best if you compile our suggestions and restate your intended steps for further confirmation.

 

Step 6 says "wait till parity is ok". Shouldn't this be something like "wait till data drive rebuild is complete"?

 

Also, you say you want to upgrade your largest drive to 3TB, but then step 16 says "add". Shouldn't this be a "replace" step? And then step 18 would be another "rebuild" instead of "wait till parity is ok"? Or do you actually intend to add a drive instead of upgrade an existing one? If instead you do mean to add a precleared drive to the array then there won't be any step 18 as a precleared drive already matches parity.

 

  • Author

OP updated with both suggestions, thank you!

 

Am I right in the following two assumptions?

  • Your cache drive is at least as large as the failed drive and you wish to rebuild the failed drive onto it?
  • You want to end up with one more drive than you have at present (and this will be a 3TB drive).

 

That is correct. I'll take a closer look at your simpler suggestion!

OP updated with both suggestions, thank you!

I assume that is on your original steps rather than the swap-disabled based one (as on the swap-disable approach there would not be any steps involving the cache drive) ?

Your step 16 should now be parity rebuild instead of data rebuild. If you don't want to do swap-disable as suggested by itimpi, you could shorten this by:

[*]move stuff to safety off cache drive

[*]stop the array

[*]unassign cache drive

[*]shut down array

[*]physically remove failed drive, install new drives

[*]start the array so it starts with a missing drive and warns the array is unprotected

[*]assign old cache drive to missing data drive that was just removed

[*]start array

[*]wait till data drive rebuild is complete

[*]pre-clear new data drives

[*]array offline

[*]assign first new 3TB drive as parity

[*]start array

[*]wait till parity drive rebuild is complete

[*]array offline

[*]add second new 3TB drive to array as data drive

[*]assign old parity drive as cache drive

[*]array online

[*]rejoice

You could do the preclears in parallel at any time after step 5, but it won't make a lot of difference. The 2TB rebuild at step 9 should only take a few hours whereas one cycle of preclear on 3TB takes me about 30 hours. While waiting on the preclears you might do a non-correcting parity check after step 9 to make sure everything went OK.

 

I think when you assign the old parity as the new cache it should just format it which will only take a few minutes. No need for a cache drive to be clear.

 

Only 1 major thing to add, but I happen to believe it's pretty important.

 

0. Take smart reports of all involved drives, and examine them for issues, posting them for others to look at if you wish.

 

Any time you are recreating a drive using parity, you are asking every other array drive to perform a full flawless read, so it's nice to have an idea ahead of time if another drive is marginal.

 

Also, if you have addons, I'd disable those until the process is complete. The only thing I personally would have running would be apcupsd and screen.

  • Author

Your step 16 should now be parity rebuild instead of data rebuild. If you don't want to do swap-disable as suggested by itimpi, you could shorten this by:

[*]move stuff to safety off cache drive

[*]stop the array

[*]unassign cache drive

[*]shut down array

[*]physically remove failed drive, install new drives

[*]start the array so it starts with a missing drive and warns the array is unprotected

[*]assign old cache drive to missing data drive that was just removed

[*]start array

[*]wait till data drive rebuild is complete

[*]pre-clear new data drives

[*]array offline

[*]assign first new 3TB drive as parity

[*]start array

[*]wait till parity drive rebuild is complete

[*]array offline

[*]add second new 3TB drive to array as data drive

[*]assign old parity drive as cache drive

[*]array online

[*]rejoice

You could do the preclears in parallel at any time after step 5, but it won't make a lot of difference. The 2TB rebuild at step 9 should only take a few hours whereas one cycle of preclear on 3TB takes me about 30 hours. While waiting on the preclears you might do a non-correcting parity check after step 9 to make sure everything went OK.

 

I think when you assign the old parity as the new cache it should just format it which will only take a few minutes. No need for a cache drive to be clear.

 

Thank you, this looks better!

 

The SMART reports are a good idea, I'll take a look at them to see if there's anything worrying.

 

Several hours to go on copying stuff off cache drive...

  • Author

Ran SMART reports on all good disks. Below Disk 3, which has a normal Raw Read Error Rate, no reallocated sectors, but its Seek Error Rate has a value of 71, and a very high raw value.

 

Further down is disk 5, with 1200 reallocated sectors; but the Value is only 99 suggesting that the reallocated sector count is within limits? It also has similarly high Seek Error Rate as on Disk 3.

 

What do these mean? Any actions I should take before moving on with the array overhaul?

 

Other disks have no reallocated sectors or seek errors, and raw read error rate is at or near 100 (+/-6)

 

Disk 3:

ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x000f	114	099	006	Pre-fail	Always	Never	66742376
3	Spin Up Time	0x0003	093	092	000	Pre-fail	Always	Never	0
4	Start Stop Count	0x0032	100	100	020	Old age	Always	Never	243
5	Reallocated Sector Ct	0x0033	100	100	036	Pre-fail	Always	Never	0
7	Seek Error Rate	0x000f	071	060	030	Pre-fail	Always	Never	12897904
9	Power On Hours	0x0032	091	091	000	Old age	Always	Never	8361
10	Spin Retry Count	0x0013	100	100	097	Pre-fail	Always	Never	0
12	Power Cycle Count	0x0032	100	100	020	Old age	Always	Never	55
183	Runtime Bad Block	0x0032	100	100	000	Old age	Always	Never	0
184	End-to-End Error	0x0032	100	100	099	Old age	Always	Never	0
187	Reported Uncorrect	0x0032	100	100	000	Old age	Always	Never	0
188	Command Timeout	0x0032	100	100	000	Old age	Always	Never	0
189	High Fly Writes	0x003a	100	100	000	Old age	Always	Never	0
190	Airflow Temperature Cel	0x0022	069	057	045	Old age	Always	Never	31 (Min/Max 23/42)
191	G-Sense Error Rate	0x0032	100	100	000	Old age	Always	Never	0
192	Power-Off Retract Count	0x0032	100	100	000	Old age	Always	Never	22
193	Load Cycle Count	0x0032	100	100	000	Old age	Always	Never	243
194	Temperature Celsius	0x0022	031	043	000	Old age	Always	Never	31 (0 21 0 0 0)
195	Hardware ECC Recovered	0x001a	018	009	000	Old age	Always	Never	66742376
197	Current Pending Sector	0x0012	100	100	000	Old age	Always	Never	0
198	Offline Uncorrectable	0x0010	100	100	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x003e	200	200	000	Old age	Always	Never	0
240	Head Flying Hours	0x0000	100	253	000	Old age	Offline	Never	88695369632457
241	Total LBAs Written	0x0000	100	253	000	Old age	Offline	Never	1133177506
242	Total LBAs Read	0x0000	100	253	000	Old age	Offline	Never	971490899

 

Disk 5:

ID#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESH	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw Read Error Rate	0x000f	112	090	006	Pre-fail	Always	Never	42470816
3	Spin Up Time	0x0003	093	091	000	Pre-fail	Always	Never	0
4	Start Stop Count	0x0032	098	098	020	Old age	Always	Never	2898
5	Reallocated Sector Ct	0x0033	099	099	036	Pre-fail	Always	Never	1200
7	Seek Error Rate	0x000f	074	060	030	Pre-fail	Always	Never	25954151150
9	Power On Hours	0x0032	058	058	000	Old age	Always	Never	37041
10	Spin Retry Count	0x0013	100	100	097	Pre-fail	Always	Never	0
12	Power Cycle Count	0x0032	100	100	020	Old age	Always	Never	233
183	Runtime Bad Block	0x0032	099	099	000	Old age	Always	Never	1
184	End-to-End Error	0x0032	100	100	099	Old age	Always	Never	0
187	Reported Uncorrect	0x0032	001	001	000	Old age	Always	Never	1160
188	Command Timeout	0x0032	100	097	000	Old age	Always	Never	8590065670
189	High Fly Writes	0x003a	084	084	000	Old age	Always	Never	16
190	Airflow Temperature Cel	0x0022	069	046	045	Old age	Always	Never	31 (Min/Max 23/43)
191	G-Sense Error Rate	0x0032	100	100	000	Old age	Always	Never	0
192	Power-Off Retract Count	0x0032	100	100	000	Old age	Always	Never	80
193	Load Cycle Count	0x0032	099	099	000	Old age	Always	Never	2911
194	Temperature Celsius	0x0022	031	054	000	Old age	Always	Never	31 (0 17 0 0 0)
195	Hardware ECC Recovered	0x001a	016	005	000	Old age	Always	Never	42470816
197	Current Pending Sector	0x0012	100	094	000	Old age	Always	Never	0
198	Offline Uncorrectable	0x0010	100	094	000	Old age	Offline	Never	0
199	UDMA CRC Error Count	0x003e	200	200	000	Old age	Always	Never	0
240	Head Flying Hours	0x0000	100	253	000	Old age	Offline	Never	108954730383465
241	Total LBAs Written	0x0000	100	253	000	Old age	Offline	Never	1061879919
242	Total LBAs Read	0x0000	100	253	000	Old age	Offline	Never	1982410072

I tend to only worry about the re-allocated (actual or pending) counts.  many raw values can be manufacturer dependent and therefore their significance hard to determine.

 

Disk 5 with 1200 reallocated sectors is certainly one I would be looking at very closely and consider replacing.  With reallocated sectors the raw value is the actual count of reallocated sectors.  In principle as long as the reallocated count stays stable the actual value does not matter, but I have found that once it gets to anything other than a relatively small value then the count keeps growing.  If it is continually growing then it is certainly a sign that failure might be imminent.  If the disk is still under warranty then I would expect a count that high to be sufficient for a RMA to be accepted by the manufacturer.

  • Author

Data rebuild on-going, and pre-clearing the first 3TB disk (2nd one is hosting my cache files).

 

The failed drive is a WD Green from Feb/Mar, and I ran two pre-clear cycles on it with no issues to report. I wonder if a third cycle would help in the future.

 

Thanks itimpi, I'll keep an eye on Disk 5 to see if it sprouts more re-allocated sectors.

  • Author

So I decided to run an extra parity check after failed drive data rebuild was done (after step 9) as suggested by Jonathan. Good thing I did: disk 1 has 872 errors on it now, and went red.

 

That was a bit too close a call, as there was only a few hours after data rebuild and when the array went yellow again  :o

 

I wonder if there are plans to include the option to have a second parity drive. I see a lot of talk about VMs in the future versions which don't do anything to me, but second parity drive would improve security for everyone who has a lot of drives in their arrays.

I wonder if there are plans to include the option to have a second parity drive. I see a lot of talk about VMs in the future versions which don't do anything to me, but second parity drive would improve security for everyone who has a lot of drives in their arrays.

This has been on the "Wish List" for some time.  What has been talked about is so-called diagonal parity for the second parity drive (as opposed to a simple additional copy of the parity).  The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always possible with the current single parity drive.

 

However, implementing this will be non-trivial so I would be extremely surprised if it appeared in the first v6 production release. Testing would have to be extremely thorough as if it went wrong there would be a high likelihood of data loss.  Whether it becomes a roadmap item for beyond the initial v6 release we will have to wait and see.

The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always not possible with the current single parity drive.
Fixed that for you. The current system can not ever tell you for sure which drive(s) are in error. We can make educated guesses based on smart reports and possible data corruption, but without checksum comparison on all your data, there is no way to tell which drive is wrong.
  • Author
This has been on the "Wish List" for some time.  What has been talked about is so-called diagonal parity for the second parity drive (as opposed to a simple additional copy of the parity).  The advantage of this approach would be that in the event of a parity fail one can pin it down to the actual drive that has the faulty sector - something that is not always possible with the current single parity drive.

 

However, implementing this will be non-trivial so I would be extremely surprised if it appeared in the first v6 production release. Testing would have to be extremely thorough as if it went wrong there would be a high likelihood of data loss.  Whether it becomes a roadmap item for beyond the initial v6 release we will have to wait and see.

 

That would be a great feature. I don't know the needs of the overall unRAID community, but I would imagine a second parity drive would be a feature benefiting a large portion of the user base, and open unRAID for even more secure applications.

  • Author

Now that I have a second dead drive, it's back to the drawing board as I'm running into further problems on how to manage data migration. I have two 3TB drives, one empty and pre-cleared installed in The Monolith, and another on my desktop as backup. The 4TB drive which will become the new parity drive is on its way.

 

I found good suggestions in this thread. It looks like the best option is to move the data from the second failed Disk 1 to safety to the 3TB drive on my desktop. After that is done I will make a new config, excluding failed Disk 1. The new 4TB disk will be made parity, and old 2TB parity drive will become new cache.

 

Whew.

 

Below updated steps in bold, and tried to streamline the approach.

 

[*]move stuff to safety off cache drive

[*]stop the array

[*]unassign cache drive

[*]shut down array

[*]physically remove failed drive, install new drives

[*]start the array so it starts with a missing drive and warns the array is unprotected

[*]assign old cache drive to missing data drive that was just removed

[*]start array

[*]wait till data drive rebuild is complete <--- Disk 1 failed right after this step

[*]pre-clear first 3TB data drive

  • array offline
  • move stuff away from Disk 1 to safety
  • shut down array
  • physically install 4TB drive
  • turn array on, but don't start it
  • pre-clear 4TB drive
  • reset array setup via unRAID Web UI: Utils/New Config
  • reassign the drives you want to KEEP only, leaving failed Disk 1 unassigned, set pre-cleared 4TB drive as parity, and old parity drive as a cache drive
  • start array
  • wait till parity drive rebuild is complete
  • move stuff back to array
  • run an extra parity check
  • shut down array
  • physically add second new 3TB drive to array as data drive
  • array online
  • pre-clear second 3TB drive
  • array offline
  • add second 3TB drive to the array as data drive
  • start array
  • rejoice

 

I would be adding the first empty pre-cleared 3TB data drive into the array in step 18. I assume it is ok to add an empty drive and a new parity drive to a "new config" at the same time? Or should I first add the 4TB parity, let parity drive to build parity, then add 3TB data drive, let parity rebuild again?

After step 17, "New config", you can just assign all the drives as you wish, and build parity from there. The 2nd 3TB drive can be added at that point, I see no need to build parity and then assign it.

 

However, after reading back through your post, I'm wondering if that 2nd 3TB drive is where your data that you copy off of disk1 is going to live temporarily.

 

I'm also not sure why you want to rebuild parity again in step 24. Valid parity is not disturbed by adding a data drive, that's why unraid either clears it, or you run a preclear pass to set all zeroes so parity is maintained.

 

If you only have 1 red ball, you should be able to copy the data off of disk1 (emulated by parity) onto your newly installed blank 3TB.

 

As you can tell, I'm a little confused as to where you actually are in the process, and what state your array is in right now, and what data is where.

  • Author

After step 17, "New config", you can just assign all the drives as you wish, and build parity from there. The 2nd 3TB drive can be added at that point, I see no need to build parity and then assign it.

 

Thanks. I won't be adding the 2nd 3TB drive at this point, as it's hosting backups. The 1st 3TB pre-cleared drive will be added to the array at this point, see below for further.

 

However, after reading back through your post, I'm wondering if that 2nd 3TB drive is where your data that you copy off of disk1 is going to live temporarily.

 

Correct. The 2nd 3TB drive is currently in a USB dock attached to my desktop.

 

I'm also not sure why you want to rebuild parity again in step 24. Valid parity is not disturbed by adding a data drive, that's why unraid either clears it, or you run a preclear pass to set all zeroes so parity is maintained.

 

Good catch, deleted.

 

If you only have 1 red ball, you should be able to copy the data off of disk1 (emulated by parity) onto your newly installed blank 3TB.

 

I can't add the 1st 3TB drive to the array as a data drive, as the current parity drive is only 2TB. I don't want to mess with utils to turn it into a 2TB drive as there are already way too many ways things can go wrong as is. Therefore I need to add the 4TB drive as new parity, and for that to happen I need to rebuild parity - and I can't rebuild parity with a failed drive in the array.

Ok. I think I've got a little better understanding of where you are. Apparently at this point there is not enough free space on the array as is to move the data off of drive1 onto another array drive.

 

So, at this point, I think you have a pretty good handle on it. Only 1 more nit I can think of right now, and that is between 21 and 22, where you add the 3TB as a data drive, you should probably run 1 preclear cycle on it after you are sure the data that was on it is safely on the array. That way you won't have to wait several hours while unraid clears it to add to the parity protected array.

 

On second thought, if you don't already have any backups, just keep that drive packed safely away, and get another 4TB to add to the array. That way you have a good start on a backup routine.

 

As you have found out, unraid is NOT a backup by itself.

  • Author

Yes, you're right, I forgot to add the proper steps to add the second 3TB drive, updated. I also added another parity check round right after moving stuff back to the array to be safe.

 

I'm fully aware unRAID is not backup. I have multiple backups of the critical stuff, fortunately.

 

Thanks!

  • Author

More tweaks to the process after I realized that if I moved the old 2TB parity drive (to be replaced by a new 4TB drive) to cache, it would not work as a warm spare for the 3TB data drives, or the 4TB parity drive. I decided to bite the bullet, and ordered a second 4TB drive to use as cache. I'll use it as a warm spare, like I used the previous cache drive.

 

The first 3TB data drive and 4TB parity drive to-be are now both pre-cleared with two cycles. Setting up a "new config" I moved the old parity drive to be the old Disk 1, and the first new 3TB drive to the end of the array, as Disk 12.

 

Doing "new config" and re-assigning all the drives is a pretty nerve-wrecking experience :) I triple-checked that the serial numbers were correct, and that the order and slots are the same (not sure if that is necessary). unRAID asked to format the old parity (now Disk 1) and the new 3TB drive (Disk 12) which took less than five minutes.

 

Curiously, it didn't ask to format the new 4TB parity drive, and just went straight to parity sync after the other two drives were formatted. Working as intended I'm sure; I don't recall whether the same happened when I first set her up.

 

After rebuild is complete, I'll move stuff back to the array, pre-clear the 3TB drive to assign as data drive, and start RMAing the old 2TB WD Greens.

...

Curiously, it didn't ask to format the new 4TB parity drive, and just went straight to parity sync after the other two drives were formatted. Working as intended I'm sure; I don't recall whether the same happened when I first set her up.

...

Parity does not contain any files, and so has no file system, so it does not need to be formatted. Formatting means creating an empty file system. It does not mean erasing a drive or similar. If it had wanted to format parity then I would have thought something was wrong somewhere. It is still a good idea to preclear a parity drive just to test it though.

 

When you get your warm spare, it is also a good idea to preclear a cache disk (as long as it's not an SSD) to test it even though a cache drive does not need to be clear (it doesn't need to match parity). A cache drive does have a file system, though, so it will be formatted.

After rebuild is complete, I'll move stuff back to the array, pre-clear the 3TB drive to assign as data drive, and start RMAing the old 2TB WD Greens.
Since you have been having issues, I'd do a non-correcting parity check after the parity build is complete, and pull new smart reports on all the drives. That way you have a fresh benchmark of the health of your array before you start trusting it again.
  • 2 weeks later...
  • Author

Well, it took a while, but now the array is back up and running with new 3TB data drives, new 4TB parity and cache drive as a warm spare  :) Thanks to multiple backups, I was able to recover everything. I had some pretty scary situations, not least of which was a fire in the restaurant downstairs while I was doing all the recovery. Reminded me to get my backups up and running.

 

Thank you for all your help, couldn't have done this without you!

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.