[6.9.x] Pool device replacement doesn't work

Alex.b · January 7, 2021

Found a workaround :

-stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

-enjoy

Edited January 7, 2021 by Alex.b

JorgeB · March 2, 2021

Moved from prereleases since it's still present in v6.9.0

JorgeB · April 9, 2021

Still not working in v6.9.2, bumping one last time in case it was missed.

hawihoney · April 9, 2021

This is how BTRFS is designed.

That is one of the reasons why I did throw BTRFS out of my systems. Consider a broken disk in a 2-disk RAID. What is needed as the first step? Put stress on the remaining disk (let BTRFS work #1). After that you can replace the failing disk and put stress on the remainig disk again (let BTRFS work #2).

All other RAID systems I know about don't need #1.

JorgeB · April 9, 2021

10 minutes ago, hawihoney said:

This is how BTRFS is designed.

No, this is bug.

Quote

It worked on -beta25, it doesn't since -beta30

Skylinar · April 28, 2021

Any news on this one? I will have to upgrade my Cache Pool as well (RAID1).

John_M · June 18, 2021

Do we know if this is a btrfs bug or a GUI bug?

JorgeB · June 19, 2021

It's an Unraid/GUI bug, you can still do it manually using the console.

Skylinar · July 21, 2021

Is there any thread where I can see how it's done "manually using the console"?

Thanks

JorgeB · July 21, 2021

6 hours ago, Skylinar said:

Is there any thread where I can see how it's done "manually using the console"?

If you want start a thread in the general support forum, post the diags and what you want to do and I can post the instructions.

Hammer8 · October 11, 2021

Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

I can do the above via the GUI. But the thread seems to suggest this is an Unraid GUI bug. Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

Thanks!

JorgeB · October 11, 2021

3 hours ago, Hammer8 said:

Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

You can do it manually using the console, to replace a device from a pool (if you have enough ports to have both old and new devices connected simultaneously):

You need to first partition the new device, to do that format it using the UD plugin, you can use any filesystem, then with the array started type:

btrfs replace start -f /dev/sdX1 /dev/sdY1 /mnt/poolname

Replace X with source, Y with target, note the 1 in the end of both, you can check replacement progress with:

btrfs replace status /mnt/poolname

When done you need to reset the pool assignments, for that you can do this:

Stop the array, if Docker/VM services are using that pool disable them, unassign all pool devices, start array to make Unraid "forget" current pool config, stop array, reassign all pool devices now with the replaced device (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any pool device), re-enable Docker/VMs if needed, start array.

Hammer8 · October 11, 2021

Thank you very much. I will give it a try and report back. Really appreciate your help! I must say I am very grateful to the members of this forum who are so helpful in making a complex product like UnRAID much more manageable for newbies like myself.

Hammer8 · October 15, 2021

On 6/18/2021 at 6:23 PM, John_M said:

Do we know if this is a btrfs bug or a GUI bug?

On 10/11/2021 at 4:54 AM, Hammer8 said:

Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

I can do the above via the GUI. But the thread seems to suggest this is an Unraid GUI bug. Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

Thanks!

Hi, when I try to do these steps:

stop array, remove 1 device in the pool

-start array, let BTRFS work

-stop array, add replacement device

-start array, let btrfs work

I stopped the array and deselected the faulty drive from the pool using the drop down box, when I go to start the array, it says "Start will remove the missing cache disk and then bring the array on-line."

Should I check the box to okay the removal of the missing cache disk to start the array? I can't tell if the message means removing the faulty disk I just deselected (which I want to do) or removing the entire pool (which I don't want to do).

Thanks!

Hammer8 · October 16, 2021

Hi, so I went ahead and started the array without the bad disk that is in the pool and while the array started, all the drives in the pool are labeled as Unmountable: Too many missing/misplaced devices.

I've also tried scrubbing the pool, and while many errors were fixed, some were uncorrectable. The pool is formatted as RAID6, so I'm wondering why a single drive failure can lead to this.

Any thoughts on what I should do next to try and recover?

Thanks!

JorgeB · October 18, 2021

On 10/16/2021 at 9:09 AM, Hammer8 said:

Any thoughts on what I should do next to try and recover?

Please start a new thread in the general support section, don't forget to include the diagnostics (after array start).

DarkKnight · February 15, 2022

Bumping this, since I am experiencing another cache drive failure, and this really important basic feature isn't working. I did pre-clear these drives, but there are a set of 3 used, but in warranty enterprise drives I bought from a guy on facebook. He shipped these to me in a bubble wrapped envelope. SMFH.

Anyway, I'd really like to not have to nuke my cache pool or dip into the command line when I need to do a drive replacement. I took a look through all the 6.10-rc2 bug reports and a lot of them look much more serious than this, but this is also a really basic feature. Seems like the secondary cache pool should have stayed a beta feature until recovery was correctly worked out IMHO.

Edited February 15, 2022 by DarkKnight

JorgeB · February 15, 2022

This is supposedly getting fix for next release.

bonienl · February 25, 2022

On 2/15/2022 at 10:18 AM, JorgeB said:

This is supposedly getting fix for next release.

I have been testing the latest internal build, and can report the issue is solved.

Please wait until rc3 is released.

JorgeB · February 25, 2022

Happy to confirm it's finally fixed.

JorgeB · March 11, 2022

Fixed on v6.10-rc3

JorgeB · March 11, 2022

Changed Status to Solved

[6.9.x] Pool device replacement doesn't work

User Feedback

Recommended Comments

Alex.b 16

Link to comment

JorgeB 7515

Link to comment

JorgeB 7515

Link to comment

hawihoney 599

Link to comment

JorgeB 7515

Link to comment

Skylinar 10

Link to comment

John_M 413

Link to comment

JorgeB 7515

Link to comment

Skylinar 10

Link to comment

JorgeB 7515

Link to comment

Hammer8 1

Link to comment

JorgeB 7515

Link to comment

Hammer8 1

Link to comment

Hammer8 1

Link to comment

Hammer8 1

Link to comment

JorgeB 7515

Link to comment

DarkKnight 13

Link to comment

JorgeB 7515

Link to comment

bonienl 1768

Link to comment

JorgeB 7515

Link to comment

JorgeB 7515

Link to comment

JorgeB 7515

Link to comment

Join the conversation