Jump to content

[6.9.x] Pool device replacement doesn't work

  • Solved
  • Minor

How to reproduce:

 

-start with a dual device pool in default raid1 profile

-stop array, assign new device as a replacement, start array

-result:

Unmountable: Too many missing/misplaced devices

 

It worked on -beta25, it doesn't since -beta30, so it was introduced somewhere in between, diags below.

tower15-diagnostics-20210107-1504.zip

User Feedback

Recommended Comments

Alex.b

Members
(edited)

Found a workaround :

 

-stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

-enjoy

 

Edited by Alex.b

hawihoney

Members

This is how BTRFS is designed.

 

That is one of the reasons why I did throw BTRFS out of my systems. Consider a broken disk in a 2-disk RAID. What is needed as the first step? Put stress on the remaining disk (let BTRFS work #1). After that you can replace the failing disk and put stress on the remainig disk again (let BTRFS work #2).

 

All other RAID systems I know about don't need #1.

 

10 minutes ago, hawihoney said:

This is how BTRFS is designed.

No, this is bug.

Quote

It worked on -beta25, it doesn't since -beta30

 

Skylinar

Members

Any news on this one? I will have to upgrade my Cache Pool as well (RAID1).

John_M

Members

Do we know if this is a btrfs bug or a GUI bug?

It's an Unraid/GUI bug, you can still do it manually using the console.

 

 

 

Skylinar

Members

Is there any thread where I can see how it's done "manually using the console"?

 

Thanks

6 hours ago, Skylinar said:

Is there any thread where I can see how it's done "manually using the console"?

If you want start a thread in the general support forum, post the diags and what you want to do and I can post the instructions.

Hammer8

Members

Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

 

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

 

I can do the above via the GUI.  But the thread seems to suggest this is an Unraid GUI bug.  Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

 

Thanks! 

3 hours ago, Hammer8 said:

Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

You can do it manually using the console, to replace a device from a pool (if you have enough ports to have both old and new devices connected simultaneously):

 

You need to first partition the new device, to do that format it using the UD plugin, you can use any filesystem, then with the array started type:

btrfs replace start -f /dev/sdX1 /dev/sdY1 /mnt/poolname

Replace X with source, Y with target, note the 1 in the end of both, you can check replacement progress with:

btrfs replace status /mnt/poolname

When done you need to reset the pool assignments, for that you can do this:

 

Stop the array, if Docker/VM services are using that pool disable them, unassign all pool devices, start array to make Unraid "forget" current pool config, stop array, reassign all pool devices now with the replaced device (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any pool device), re-enable Docker/VMs if needed, start array.

 

 

Hammer8

Members

Thank you very much.  I will give it a try and report back.  Really appreciate your help!  I must say I am very grateful to the members of this forum who are so helpful in making a complex product like UnRAID much more manageable for newbies like myself. 

Hammer8

Members
On 6/18/2021 at 6:23 PM, John_M said:

Do we know if this is a btrfs bug or a GUI bug?

 

On 10/11/2021 at 4:54 AM, Hammer8 said:

Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

 

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

 

I can do the above via the GUI.  But the thread seems to suggest this is an Unraid GUI bug.  Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

 

Thanks! 

Hi, when I try to do these steps:

 

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

 

I stopped the array and deselected the faulty drive from the pool using the drop down box, when I go to start the array, it says "Start will remove the missing cache disk and then bring the array on-line."

 

Should I check the box to okay the removal of the missing cache disk to start the array?  I can't tell if the message means removing the faulty disk I just deselected (which I want to do) or removing the entire pool (which I don't want to do).

 

Thanks!

 

 

Hammer8

Members

Hi, so I went ahead and started the array without the bad disk that is in the pool and while the array started, all the drives in the pool are labeled as Unmountable:  Too many missing/misplaced devices.

 

I've also tried scrubbing the pool, and while many errors were fixed, some were uncorrectable.  The pool is formatted as RAID6, so I'm wondering why a single drive failure can lead to this.

 

Any thoughts on what I should do next to try and recover?

 

Thanks!

On 10/16/2021 at 9:09 AM, Hammer8 said:

Any thoughts on what I should do next to try and recover?

Please start a new thread in the general support section, don't forget to include the diagnostics (after array start).

DarkKnight

Members
(edited)

Bumping this, since I am experiencing another cache drive failure, and this really important basic feature isn't working. I did pre-clear these drives, but there are a set of 3 used, but in warranty enterprise drives I bought from a guy on facebook. He shipped these to me in a bubble wrapped envelope. SMFH. 

 

Anyway, I'd really like to not have to nuke my cache pool or dip into the command line when I need to do a drive replacement. I took a look through all the 6.10-rc2 bug reports and a lot of them look much more serious than this, but this is also a really basic feature. Seems like the secondary cache pool should have stayed a beta feature until recovery was correctly worked out IMHO. 

Edited by DarkKnight

On 2/15/2022 at 10:18 AM, JorgeB said:

This is supposedly getting fix for next release.

 

I have been testing the latest internal build, and can report the issue is solved.

Please wait until rc3 is released.

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Add a comment...