Better drive replacement tools


Go to solution Solved by unconcerned-beloved6864,

Recommended Posts

Recently one of the drives in my pool had gone into pre-failure presenting in multiple SMART errors. First thing I did was empty the contents of the drive to prevent any potential data loss. After the drive was emptied I got an alert that my parity drive was entering pre-failure as well.

 

I purchased 2 new drives to place into the system and started by replacing the drive in the pool. The array had to do a full rebuild for the 0GB drive that took 24 hours. After that I put in the new parity drive which took another 24 hours to sync up which was understandable. Finally I went to remove the failing parity drive and was notified that I would need to rebuilt the entire parity drive again wasting ANOTHER 24 hours.

At this point I am looking at a total of 4 million useless reads/writes off my entire array and 72 hours of downtime.

 

This seems incredibly inefficient to do and is not acceptable at all. There must be a better way that the system can handle removing one of two parity drives without resulting in 48 hours of downtime and millions of pointless writes.

Link to comment
18 minutes ago, unconcerned-beloved6864 said:

First thing I did was empty the contents of the drive to prevent any potential data loss

Why? You should have just rebuilt the drive instead of emptying it. That was the first and main thing that caused a lot of useless activity.

 

Wish you had asked for advice we could have given you some options.

 

I also wonder why you had multiple disks giving problems. Do you have Notifications setup to alert you immediately as soon as a problem is detected? Don't allow one ignored problem to become multiple problems and data loss.

 

If you attach your diagnostics to your NEXT post in this thread we can take a look to see if you have any other potential issues.

 

 

Link to comment

Thanks for the reply I appreciate it. This has been frustrating for me and I know the frustrating was likely caused by my own doing. I am trying not to let my frustration get the best of me here and apologize if it comes through.

 

Quote

Why? You should have just rebuilt the drive instead of emptying it.

Probably the way I should have done it. However, when I originally got the first alert I was planning to just remove it from the pool and not replace it. After I got the second alert for the newer larger parity drive I decided that I should just get 2 drive and do them both

 

Quote

Wish you had asked for advice we could have given you some options.

I don't know why I'd ever reach out for advice on replacing two drives, the process should have been pretty straight forward.

 

Quote

I also wonder why you had multiple disks giving problems.

The 2TB drives in my system are approaching 6-7 years old, and the 8TB is just over 2 years old. The 8TB was unexpected and probably could have lived for a while longer as it was only 8 uncorrectable sectors, however since its use as a parity device is critical I opted for replacement.


 

Quote

If you attach your diagnostics to your NEXT post in this thread we can take a look to see if you have any other potential issues.

I've rebooted recently so I don't expect there to much present here but attached.


Also to note, I did find online that when making changes to the array via "new config" there should have been an option to check saying that parity is valid. For some reason I didn't have this option but after a reboot it did! I figured while time consuming, I should run a parity check to be safe. 
 

titan-02-diagnostics-20230915-1258.zip

Link to comment
4 minutes ago, unconcerned-beloved6864 said:

when making changes to the array via "new config" there should have been an option to check saying that parity is valid

If you in fact do make changes to the array with New Config, parity will not be valid and must be rebuilt.

 

1 hour ago, unconcerned-beloved6864 said:

had to do a full rebuild for the 0GB drive that took 24 hours

Parity doesn't know anything about how much data is on a disk, it is all just bits and has to be completely rebuilt.

 

And

14 minutes ago, unconcerned-beloved6864 said:

use as a parity device is critical

Parity is arguably the least important disk since it contains none of your data. Parity by itself can recover nothing. All disks must be reliably read to reliably rebuild a disk.

 

Parity, whether Unraid or some other implementation, is just an extra bit that allows a missing bit to be calculated from all the other bits.

 

https://docs.unraid.net/unraid-os/manual/what-is-unraid/#parity-protected-array

 

Since you seemed to have some misunderstandings, please ask for advice in the future.

 

I see you are running an extended self-test on disk6. Probably a good idea to do that with your other old disks too.

 

Do you have another copy of everything important and irreplaceable? Parity is not a backup.

 

Link to comment
  • Solution

Thanks for the help.


After looking online it does seem like there are ways to remove 1 of 2 parity drives without leaving the array unprotected but this is not done by default. My original feature request stands, it would be nice to be able to remove 1 of 2 parity drives without potentially putting data at risk.


Additionally, I understand how parity bits, backups and SMART error detection work, I know that 2 parity drives are better than 1. It is incredibly insulting to be talked down by anyone let alone a moderator. Once I can afford it I will be moving away from Unraid.

 

I am closing this as the solution and turning off notifications.

Edited by unconcerned-beloved6864
Link to comment

Just to chime in here with my two cents:

 

I can see both points of view here. You seam to not understand some things, so asking for advise is probably a good idea in a case like this in future.

 

That said, I've requested before for better drive management features in unRAID. All too often the advice is "replace it and let parity rebuild" which to me (working in enterprise storage) is a terrible default option. Yes homelab is different to enterprise, but leaving user data in a zero-redundancy state shouldn't be the default option. unRAId should have native evac/replace type functionality, and no rely on a parity rebuild to save the day, falling back on "well you should have a backup" if something goes wrong. 

Link to comment
On 9/15/2023 at 4:23 PM, unconcerned-beloved6864 said:

talked down by anyone

I apologize, it certainly wasn't my intention. If you understand parity you are way ahead of most Unraid users.

 

But then why did you decide to exercise parity so much by emptying the disk when rebuild would have put everything back in sync in one operation?

Link to comment
  • 2 weeks later...

I think the spirit of this FR gets at something that is missing native to Unraid.  It would be nice to have a wizard that guides you through what you are trying to do.  In this case, it would have been nice for OP to have selected drive > what do you want to do? > replace it and then the wizard walks you through the steps and what to expect, even stopping and starting the array.

 

The current process is "just know it" or google it, but for the spirit of Unraid, the OS itself should be capable of guiding you through an expected mechanic of owning an Unraid system, changing out disks either for maintenance or upgrading.

  • Upvote 2
Link to comment
  • 4 weeks later...
On 10/2/2023 at 1:03 PM, user2579 said:

I think the spirit of this FR gets at something that is missing native to Unraid.  It would be nice to have a wizard that guides you through what you are trying to do.  In this case, it would have been nice for OP to have selected drive > what do you want to do? > replace it and then the wizard walks you through the steps and what to expect, even stopping and starting the array.

 

The current process is "just know it" or google it, but for the spirit of Unraid, the OS itself should be capable of guiding you through an expected mechanic of owning an Unraid system, changing out disks either for maintenance or upgrading.

Absolutely 100% agree.  I've had problems with drive replacements as well and the current process/steps seem VERY fragile.  It needs a wizard.

Link to comment
  • 4 months later...

Just came by to request this same feature.   Seeing as this isn't something most people won't do often enough to commit to long term memory, it would be nice to not have to google and cross your fingers to do a couple of basic maintenance tasks (or emergency maintenance tasks) with possibly old/unconfirmed answers and the potential of doing real damage to your setup. 

Edited by tiny-e
Link to comment
5 hours ago, trurl said:

Wizards can help users do the wrong things. Replacing a disk isn't always the solution. If you are uncertain, better to ask for help.


Replacing a parity drive with a newer larger one shouldn’t require a 30 minute google journey or pleading for help on a discord server.   There should either be concise instructions easily accessible in the gui, or a built in function to do so.  

Link to comment
2 hours ago, JonathanM said:

 

  1. Search should work.  It doesn't.  At all. 
  2. I'm pretty sure the process described in there doesn't work as that's what I initially did.  Unraid said "wrong disk" when I tried to assign the new drive parity.
  3. I wound up having to stop the array,
    1. put the old drive back in,
    2. start the array,
    3. stop the array,
    4. unassign it,
    5. put the new drive in,
    6. do "new config',
    7. ssign the new drive as parity.

Piece of cake and totally intuitive.   OR there could be a function "Replace/Upgrade Parity Drive(s)" that covers that.

Edited by tiny-e
Link to comment
2 minutes ago, tiny-e said:

Piece of cake and totally intuitive. 

Replacing parity is basically the same as replacing a data disk. Rebuild is all the same parity calculation across all the other disks.

 

New Config not required at all.

Link to comment
27 minutes ago, trurl said:

Replacing parity is basically the same as replacing a data disk. Rebuild is all the same parity calculation across all the other disks.

 

New Config not required at all.

Well... tell that to the "wrong disk" message.  The array refused to start until I did the hokey pokey described above.   Which is why I feel there should be a function for this or some accurate, specific, instructions. 

 

I did exactly this *see below*  (minus the power down/up as I have hot swap bays) --  Didn't work.  Wound up searching / going to the discord (that didn't work either.. but usually does).   Finally stumbled across an article somewhere that got me back up and running. 

Quote

The procedure to remove a parity drive is as follows:

  1. Stop the array.
  2. Power down the server.
  3. Install new larger parity disks. Note if you do this as your first step then steps 2 & 4 listed here are not needed.
  4. Power up the server.
  5. Assign the larger disk to the parity slot (replacing the former parity device).
  6. Start the array



To that end, I've spent enough of my life on this issue to last me for awhile.  So, take my feedback into consideration --or don't.   I'm a relatively happy customer, but if customers take the time to suggest things they'd like to see, I'd at least pretend to care about their feedback. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.