• [6.9.x] Pool device replacement doesn't work


    JorgeB
    • Solved Minor

    How to reproduce:

     

    -start with a dual device pool in default raid1 profile

    -stop array, assign new device as a replacement, start array

    -result:

    Unmountable: Too many missing/misplaced devices

     

    It worked on -beta25, it doesn't since -beta30, so it was introduced somewhere in between, diags below.

    tower15-diagnostics-20210107-1504.zip




    User Feedback

    Recommended Comments

    Found a workaround :

     

    -stop array, remove 1 device in the pool

    -start array, let BTRFS work

    -stop array, add replacement device

    -start array, let btrfs work

    -enjoy

     

    Edited by Alex.b
    Link to comment

    This is how BTRFS is designed.

     

    That is one of the reasons why I did throw BTRFS out of my systems. Consider a broken disk in a 2-disk RAID. What is needed as the first step? Put stress on the remaining disk (let BTRFS work #1). After that you can replace the failing disk and put stress on the remainig disk again (let BTRFS work #2).

     

    All other RAID systems I know about don't need #1.

     

    Link to comment
    10 minutes ago, hawihoney said:

    This is how BTRFS is designed.

    No, this is bug.

    Quote

    It worked on -beta25, it doesn't since -beta30

     

    Link to comment
    6 hours ago, Skylinar said:

    Is there any thread where I can see how it's done "manually using the console"?

    If you want start a thread in the general support forum, post the diags and what you want to do and I can post the instructions.

    Link to comment

    Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

     

    stop array, remove 1 device in the pool

    -start array, let BTRFS work

    -stop array, add replacement device

    -start array, let btrfs work

     

    I can do the above via the GUI.  But the thread seems to suggest this is an Unraid GUI bug.  Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

     

    Thanks! 

    Link to comment
    3 hours ago, Hammer8 said:

    Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

    You can do it manually using the console, to replace a device from a pool (if you have enough ports to have both old and new devices connected simultaneously):

     

    You need to first partition the new device, to do that format it using the UD plugin, you can use any filesystem, then with the array started type:

    btrfs replace start -f /dev/sdX1 /dev/sdY1 /mnt/poolname

    Replace X with source, Y with target, note the 1 in the end of both, you can check replacement progress with:

    btrfs replace status /mnt/poolname

    When done you need to reset the pool assignments, for that you can do this:

     

    Stop the array, if Docker/VM services are using that pool disable them, unassign all pool devices, start array to make Unraid "forget" current pool config, stop array, reassign all pool devices now with the replaced device (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any pool device), re-enable Docker/VMs if needed, start array.

     

     

    Link to comment

    Thank you very much.  I will give it a try and report back.  Really appreciate your help!  I must say I am very grateful to the members of this forum who are so helpful in making a complex product like UnRAID much more manageable for newbies like myself. 

    Link to comment
    On 6/18/2021 at 6:23 PM, John_M said:

    Do we know if this is a btrfs bug or a GUI bug?

     

    On 10/11/2021 at 4:54 AM, Hammer8 said:

    Hi, trying to follow the discussion...this looks like the suggested workaround to replace a bad drive:

     

    stop array, remove 1 device in the pool

    -start array, let BTRFS work

    -stop array, add replacement device

    -start array, let btrfs work

     

    I can do the above via the GUI.  But the thread seems to suggest this is an Unraid GUI bug.  Is there a way I can replace the bad disk with a good disk in "one" step using CLI?

     

    Thanks! 

    Hi, when I try to do these steps:

     

    stop array, remove 1 device in the pool

    -start array, let BTRFS work

    -stop array, add replacement device

    -start array, let btrfs work

     

    I stopped the array and deselected the faulty drive from the pool using the drop down box, when I go to start the array, it says "Start will remove the missing cache disk and then bring the array on-line."

     

    Should I check the box to okay the removal of the missing cache disk to start the array?  I can't tell if the message means removing the faulty disk I just deselected (which I want to do) or removing the entire pool (which I don't want to do).

     

    Thanks!

     

     

    Link to comment

    Hi, so I went ahead and started the array without the bad disk that is in the pool and while the array started, all the drives in the pool are labeled as Unmountable:  Too many missing/misplaced devices.

     

    I've also tried scrubbing the pool, and while many errors were fixed, some were uncorrectable.  The pool is formatted as RAID6, so I'm wondering why a single drive failure can lead to this.

     

    Any thoughts on what I should do next to try and recover?

     

    Thanks!

    Link to comment
    On 10/16/2021 at 9:09 AM, Hammer8 said:

    Any thoughts on what I should do next to try and recover?

    Please start a new thread in the general support section, don't forget to include the diagnostics (after array start).

    Link to comment

    Bumping this, since I am experiencing another cache drive failure, and this really important basic feature isn't working. I did pre-clear these drives, but there are a set of 3 used, but in warranty enterprise drives I bought from a guy on facebook. He shipped these to me in a bubble wrapped envelope. SMFH. 

     

    Anyway, I'd really like to not have to nuke my cache pool or dip into the command line when I need to do a drive replacement. I took a look through all the 6.10-rc2 bug reports and a lot of them look much more serious than this, but this is also a really basic feature. Seems like the secondary cache pool should have stayed a beta feature until recovery was correctly worked out IMHO. 

    Edited by DarkKnight
    Link to comment
    On 2/15/2022 at 10:18 AM, JorgeB said:

    This is supposedly getting fix for next release.

     

    I have been testing the latest internal build, and can report the issue is solved.

    Please wait until rc3 is released.

     

    • Like 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.