• [6.7.x] New cache pools are not redundant


    JorgeB
    • Solved Minor

    Likely related to this bug but this one is more serious, any new multi device pools created on v6.7+ will be created with raid1 profile for data but single (or DUP if HDDs are used) profile for metadata, so if one of the devices fails pool will be toast.

    • Thanks 2



    User Feedback

    Recommended Comments

    Also, any users that created what should be a redundant pool from v6.7.0 should convert metadata to raid1 now, since even after this bug is fixed any existing pools will remain as they were, use:

     

    btrfs balance start -mconvert=raid1 /mnt/cache

     

    To check if it's using correct profile type:

     

    btrfs fi usage -T /mnt/cache

     

    Example of a v6.7 created pool, note that while data is raid1, metadata and system are single profile, i.e. some part of the metadata is on each device, and will be incomplete if one of them fails, all chunks types need to be raid1 for the pool to be redundant :


     

                 Data      Metadata  System              
    Id Path      RAID1     single    single   Unallocated
    -- --------- --------- --------- -------- -----------
     2 /dev/sdg1 166.00GiB   1.00GiB        -   764.51GiB
     1 /dev/sdi1 166.00GiB   1.01GiB  4.00MiB   764.50GiB
    -- --------- --------- --------- -------- -----------
       Total     166.00GiB   2.01GiB  4.00MiB     1.49TiB
       Used      148.08GiB 555.02MiB 48.00KiB            

     

    Edited by johnnie.black
    • Like 1
    • Thanks 3
    Link to comment

    Just dropping by to say this affected me when adding a new RAID1 member to cache, thankfully johnnie.black is on the forums directing users here, was able to make sure metadata is updated correctly to RAID1 now. :)

    Link to comment

    I'll take a look at this when I get home.  I also noticed that my initial cache pool refused to be raid 1 (I had to go back down to two drives, and then readd the second drive before raid 1 started).

     

    I think the UI needs to be much clearer with what's going on (or the code not break in the first place).  Going to all the expense and effort of a second cache drive... to find out it's not even working isn't fun.

    Link to comment
    root@Tower:~# btrfs fi usage -T /mnt/cache
    Overall:
        Device size:                   1.86TiB
        Device allocated:              1.01TiB
        Device unallocated:          870.73GiB
        Device missing:                  0.00B
        Used:                          1.01TiB
        Free (estimated):            437.38GiB      (min: 437.38GiB)
        Data ratio:                       2.00
        Metadata ratio:                   1.00
        Global reserve:               16.00MiB      (used: 0.00B)
    
                      Data      Metadata System               
    Id Path           RAID1     single   single    Unallocated
    -- -------------- --------- -------- --------- -----------
     1 /dev/nvme0n1p1 518.00GiB  1.01GiB   4.00MiB   434.86GiB
     2 /dev/nvme1n1p1 518.00GiB        -         -   435.87GiB
    -- -------------- --------- -------- --------- -----------
       Total          518.00GiB  1.01GiB   4.00MiB   870.73GiB
       Used           515.98GiB  5.30MiB 112.00KiB            
    root@Tower:~# btrfs balance start -mconvert=raid1 /mnt/cache
    Done, had to relocate 3 out of 521 chunks
    root@Tower:~# btrfs fi usage -T /mnt/cache
    Overall:
        Device size:                   1.86TiB
        Device allocated:              1.02TiB
        Device unallocated:          867.68GiB
        Device missing:                  0.00B
        Used:                          1.01TiB
        Free (estimated):            435.86GiB      (min: 435.86GiB)
        Data ratio:                       2.00
        Metadata ratio:                   2.00
        Global reserve:               16.00MiB      (used: 0.00B)
    
                      Data      Metadata System               
    Id Path           RAID1     RAID1    RAID1     Unallocated
    -- -------------- --------- -------- --------- -----------
     1 /dev/nvme0n1p1 518.00GiB  2.00GiB  32.00MiB   433.84GiB
     2 /dev/nvme1n1p1 518.00GiB  2.00GiB  32.00MiB   433.84GiB
    -- -------------- --------- -------- --------- -----------
       Total          518.00GiB  2.00GiB  32.00MiB   867.68GiB
       Used           515.98GiB  5.39MiB 112.00KiB        


    Yep, reproduced and fixed on my system.  Thanks!

    Edited by NPSF3000
    Link to comment

    Wow, thought mine was redundant, but it seems it was not. Got it fixed now with the balance command. This post probably saved me some future trouble, so thanks!

    Edited by hypercoffeedude
    Link to comment

    woah so much for investing in 2 ssd to have redundant caches.. i also had this problem.

     

    is this reported to the dev team? im sure they would want to fix this for future release.

    Edited by je82
    Link to comment
    44 minutes ago, je82 said:

    is this reported to the dev team? im sure they would want to fix this for future release.

     

    On 10/17/2019 at 4:54 AM, johnnie.black said:

    Changed Status to Solved (v6.8rc1)

     

    Link to comment

    Oh wow, I actually noticed this during my tests on the excessive amounts of writes by the docker container.

    Noticed metadata and system was in RAID1 on Debian (which I created as a test), but not on my unRAID box.

    I was actually planning on raising a topic once my docker issue was solved, since I wasn't 100% sure whether it was a bug or it wasn't.

     

    Thank you very much for pointing this out!

    I will start conversion tonight.

    Link to comment

    This bit me yesterday on my 6.8.3 install when a disk was unmountable on my 2 disk cache btrfs "raid1" array 😩. Just like in this post:

     

     

    I'm glad it's fixed in 6.8, but it would have been nice if Unraid checked for this bug and fixed it with future OS upgrades. Even a check and notification with the Fix Common Problems Plugin would have been helpful.

     

    -JesterEE

    Link to comment

    I was just about to swap 2 of my 3 cache pool SSD's with one new bigger one and found this when seeking the best method to accomplish.

     

    Was shocked that my btrfs cache pool had not had the redundancy I thought it had for over a year :/

    Link to comment
    On 7/29/2019 at 8:38 AM, JorgeB said:

    Also, any users that created what should be a redundant pool from v6.7.0 should convert metadata to raid1 now, since even after this bug is fixed any existing pools will remain as they were, use:

     

    btrfs balance start -mconvert=raid1 /mnt/cache

     

    To check if it's using correct profile type:

     

    btrfs fi usage -T /mnt/cache

     

    Example of a v6.7 created pool, note that while data is raid1, metadata and system are single profile, i.e. some part of the metadata is on each device, and will be incomplete if one of them fails, all chunks types need to be raid1 for the pool to be redundant :


     

                 Data      Metadata  System              
    Id Path      RAID1     single    single   Unallocated
    -- --------- --------- --------- -------- -----------
     2 /dev/sdg1 166.00GiB   1.00GiB        -   764.51GiB
     1 /dev/sdi1 166.00GiB   1.01GiB  4.00MiB   764.50GiB
    -- --------- --------- --------- -------- -----------
       Total     166.00GiB   2.01GiB  4.00MiB     1.49TiB
       Used      148.08GiB 555.02MiB 48.00KiB            

     

    Can you provide clarity on what we should be seeing on the usage command please?

    You've given an example of what it shouldn't look like, but not what it should look like - this would be useful for understanding for everyone.

    Link to comment
    18 hours ago, boomam said:

    Can you provide clarity on what we should be seeing on the usage command please?

    For a default raid1 pool Data, Metadata and System should all be using the raid1 profile, instead of single for Metadata and System as in the example above.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.