• [6.10.0-RC1] Parity rebuild uses unassigned device(?)


    Tomr
    • Minor

    I removed one of the drives from the array, created new config and let it recreate parity. But it seems that it's using unassigned device (the one I removed) for some reason. You can see the reads amount on the screen, it's on par with the array disks and is growing with them. dstat and the GUI shows 0MB/s read and write on it though, so it might be a visual bug. But on the other hand, it won't let me spin-down this drive, logs will show:

     

    Quote

    Oct 16 11:10:40 Tower emhttpd: spinning down /dev/sdc
    Oct 16 11:10:40 Tower emhttpd: read SMART /dev/sdc

     

    I don't know if it's just a visual bug or Unraid one and I'm screwing up my parity.

     

    I first asked on UA's thread and got a response, so I'm writing here now:

     

    unraid parity.png

    • Like 1



    User Feedback

    Recommended Comments

    6 minutes ago, dlandon said:

    Remove the UD plugin and see if the reads and writes continue to increment on the stock UD page.

    I did that, it still does increment.

     

    EDIT: I made some tests with writes and it seems that the unassigned device "copies" it's read/write count from disk1.

    Edited by Tomr
    Link to comment

    The only thing I see is that you have 3 data disks, but have 5 slots for data disks - two unused.  I think you set the total slots to 7, but are only using 5.

    Link to comment
    1 hour ago, Tomr said:

    EDIT: I made some tests with writes and it seems that the unassigned device "copies" it's read/write count from disk1.

    That would indicate an issue with the routine that tracks the reads and writes.

     

    @bonienl It appears that diskio may have an issue.

    Link to comment

    Parity rebuild is done, it's still mirroring the stats and preventing spindown. I rebooted the machine to see if it fixes it, it didn't. I changed the slot amount from 7 to 5 (to remove the empty ones); nope. It only got "fixed" when I created new pool and assigned this drive to it. But it's not really a fix If I wanted to have an unassigned device.

    Link to comment

    Sounds like both disks were previously part of a btrfs pool, and if you did a new config the array disk wouldn't be cleared, you'd need to wipe one of the devices (or both) before use in the array.

    Link to comment

    They were both in the array, not pool devices. They were btrfs, but they were separate disks, not part of a jbod or raid1.

    Link to comment
    49 minutes ago, Tomr said:

    They were both in the array, not pool devices.

    Wasn't it one array disk and one unassigned disk?

     

    But anyway, if accessing disk1 accessed a different unassigned (or assigned) disk they were almost certainly part of the same pool, but if it's fixed now it's not possible to confirm, we'd need to see before doing this:

    On 10/17/2021 at 11:16 AM, Tomr said:

    It only got "fixed" when I created new pool and assigned this drive to it.

    Because doing this wiped that device.

    Link to comment

    They were both in the array (one was disk1, the other one was disk3), I created new config, removed the disk3 from the array and started the array to rebuild the parity. I wanted that disk3 to be unassigned.

     

    The thing is it didn't have activity, GUI and other monitoring apps showed it at 0MB/s read & write when the parity was rebuilding, it's just the read and write count in GUI that were inaccurate (and mirroring disk1's count), and because of that it wouldn't spin down.

     

     

    I removed the drive from it's own pool to test your theory about wiping, that those drives were somehow connected (if they were it would be a different Unraid bug) and it's again mirroring reads & writes count from disk1.

    Link to comment
    1 hour ago, Tomr said:

    and it's again mirroring reads & writes count from disk1.

    The please post new diags and the output of:

    btrfs fi usage -T /mnt/disk1

     

    Link to comment
    8 minutes ago, JorgeB said:

    The please post new diags and the output of:

    btrfs fi usage -T /mnt/disk1

     

    The disks are not connected, how could they be. That drive drive that has incorrect counters is not being actually used for anything, just the counters are wrong.

     

    Quote


    root@Tower:~# btrfs fi usage -T /mnt/disk1
    Overall:
        Device size:                  10.91TiB
        Device allocated:              8.41TiB
        Device unallocated:            2.50TiB
        Device missing:                  0.00B
        Used:                          8.07TiB
        Free (estimated):              2.83TiB      (min: 1.58TiB)
        Free (statfs, df):             2.83TiB
        Data ratio:                       1.00
        Metadata ratio:                   2.00
        Global reserve:              512.00MiB      (used: 0.00B)
        Multiple profiles:                  no

                Data    Metadata System
    Id Path     single  DUP      DUP      Unallocated
    -- -------- ------- -------- -------- -----------
     1 /dev/md1 8.38TiB 32.00GiB 16.00MiB     2.50TiB
    -- -------- ------- -------- -------- -----------
       Total    8.38TiB 16.00GiB  8.00MiB     2.50TiB
       Used     8.05TiB  9.36GiB  1.25MiB

     

    Link to comment
    Just now, Tomr said:

    just the counters are wrong.

    OK, if it's only that and it still keeps happening if might really be a visual bug, but it's very strange, you didn't post new diags now, but in the old ones the actual reads/writes values for both devices are completely different, not what the GUI was showing when the screenshot was taken.

    Link to comment
    11 minutes ago, JorgeB said:

    OK, if it's only that and it still keeps happening if might really be a visual bug, but it's very strange, you didn't post new diags now, but in the old ones the actual reads/writes values for both devices are completely different, not what the GUI was showing when the screenshot was taken.

    Not only a visual one as it prevented spindown's and/or woke up the disk immediately when I tried to spin down it manually. I made a new diag now and a screenshot after clearing the stats and doing a small write to disk1. Nothing is written on that unassigned device.

    unraid2.png

    tower-diagnostics-20211018-1941.zip

    Link to comment

    Same, values on the actual diags are correct, looks like just a GUI issue, and of course the fact that it prevents spindown:

     

        [sde] => 0 0 22791858 13954498
        [sdc] => 0 0 111481 174

     

    3rd column is total reads, 4th is total writes.

    Link to comment
    On 10/16/2021 at 9:16 PM, dlandon said:

    That would indicate an issue with the routine that tracks the reads and writes.

     

    @bonienl It appears that diskio may have an issue.

    I got same issue when switch to R/W counter mode. All figure is copy from array R/W figure and places in UD different disk.

     

    Or more clear state the symptom, array Disk 3,4 figure copy to Dev 3,4 ........ ( due to delay in capture and update, they will show different figure, but actually they are same number )

     

    Array:

    image.thumb.png.84097ee4a7b71bb23ee23c92fbee8f75.png

     

    UD:

    image.png.1601d1c439b76c511ee246319effde4c.png

     

    And this also prevent UD Dev 3,4 to spindown.

     

     

     

    Edited by Vr2Io
    Link to comment
    7 hours ago, Vr2Io said:

    I got same issue when switch to R/W counter mode. All figure is copy from array R/W figure and places in UD different disk.

     

    Or more clear state the symptom, array Disk 3,4 figure copy to Dev 3,4 ........ ( due to delay in capture and update, they will show different figure, but actually they are same number )

     

    Array:

    image.thumb.png.84097ee4a7b71bb23ee23c92fbee8f75.png

     

    UD:

    image.png.1601d1c439b76c511ee246319effde4c.png

     

    And this also prevent UD Dev 3,4 to spindown.

     

     

     

    Remove the UD plugin and see if it clears up.

    Link to comment
    2 hours ago, dlandon said:

    Remove the UD plugin and see if it clears up.

    Note, I just re-install UD and reboot, I have 10 data disk , 16 UD disk and 2 parity.

     

    Below is the screen capture when array start, Disk 1-10 counter figure show at Dev 1-10, i.e. Dev 1=Disk 1, Dev 2=Disk2 ..... and Dev 11-16 will be zero.

     

    image.png.da05716b90d054e85061ff8ff5bf873a.png   image.png.dead3850b20a1c9c93f742bc3c52edd9.png

     

     

    For about spindown issue, I also reproduce by

    - Start array

    - Spindown all disk

    - Clear all counter by "clear stats"

    - Let say made activity on Disk 1,5 then Dev 1,5 also spinup

    - Same counter figure show on UD

     

    image.png.919b09628eda3902633c275af2b8166d.png

     

     image.png.1a2409501163e883352e16f0031611c6.png

     

    @bonienl @limetech @dlandon So, if I perform parity check/sync, I have 10 data disk, then UD dev 1-10 will spinup and waste 100W+ uncessary. 

     

     

     

     

    Edited by Vr2Io
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.