Jump to content

atconc

Members
  • Posts

    12
  • Joined

  • Last visited

Report Comments posted by atconc

  1. On 10/14/2020 at 4:33 PM, limetech said:

    There is something misconfigured, please post diagnostics.zip.

     

    On 10/14/2020 at 4:40 PM, atconc said:

     

    Bumping this - I just temporarily reinstalled Beta 30 for another reason and took the opportunity to check if this was still happening - it is, very slow web ui and apps in docker containers, extremely high cpu usage for shfs while this is happening.  Reverting to b25 again solves this for me.  Any idea what's going on? Trying to avoid ending up with this issue on the next stable

  2. On 10/14/2020 at 9:57 AM, JorgeB said:

    I’ve been ruminating on this SAMBA aio issue because the very large read performance difference first reported by @trypowercyclereminded me of an issue I’ve seen before, but I was having trouble finding that post, now I know why, because those forums are gone? I did finally find it in my content:

     

    39923698_rc4post.thumb.PNG.8e301f7fc3363825a3b8852af79b6e15.PNG

     

    And this is the comparison I posted at the time:

    b21_rc4_xfs.thumb.png.b1fbc793d562bedd1542f4e2e140f5c7.png

     

    So I believe I noticed this issue at around the same time aio was introduced in Samba, and at the time disabling smb3 fixed it, now I wonder if it was already the same issue and disabling smb3 was also disabling aio, symptoms are very similar, and the problem wasn’t controller related but device related, some brands/models perform worse than others, so I now did some more tests with –beta30 and different disks.

     

    Ignore the normal max speed difference from brand to brand, I used whatever disks I had at hand, so some disks are older and slower than others, the important part is the aio on/off difference, tested with disk shares so no shfs interference, all connected to the same Intel SATA controller, each test was repeated 3 times to make sure results are consistent, read speed reported by robocopy after transferring the same large file.

     

    imagem.png.a17b2e645dec94d2860f75258adf6811.png

     

     

    I think the results are very clear, and by luck (or bad luck) the tests this past weekend were done with the only disk that now doesn’t show a significant difference, note that I don’t think this is a disk brand issue, but a disk model issue, likely firmware related, possibly worse in older disks?

     

    I know you already plan to leave aio disable, but still one more data point that I believe really confirms it should be left disable.

     

     

    On 10/14/2020 at 4:40 PM, atconc said:

     

    5 hours ago, JorgeB said:

    It will be recognized, -beta25 is where the new alignment was introduced, curiously just recently found out that the new alignment works even in previous releases, but only for pools, though you can still have a single device btrfs "pool".

     

     

     

    I rolled back to beta 25 and the apps in docker containers are noticeably significantly more responsive again so there's definitely a regression in beta29 and beta30 for me.  Happy to help troubleshoot, let me know if there's anything I can do.

  3. On 10/14/2020 at 4:40 PM, atconc said:

    Is there anything else I can try to troubleshoot this (issue with very slow apps in docker and high SHFS cpu use)? or if i roll back to beta 25 where I didn't have this issue will the new partition layout be recognized or will I have to rebuild my cache again?

  4. Has anyone else also noticed slower performance from cache pools since the partition layout changed in beta 29?  

     

    This is really noticeable for me using applications in docker containers - plex loading thumbnails in the web interface, tautulli loading history, sonarr v3 showing it's witty quotes while it loads are all noticeably much much slower since repartitioning with beta 29 - Sonarr takes about 15 seconds to load now when it was a second or 2 before and I don't remember any noticeable lag loading the plex thumbs or tautulli history before this change.  

     

    At first I thought that the combination of the write amplification issue and several rebalances had finally killed my 2 480gb sandisk ssds (they had 3+years power on time and showing several hundred bad blocks) so I replaced them with new samsung pro drives but haven't seen any improvement.  I also tried switching from a docker img file to a directory on the share which also doesn't seem to help.

     

     I also noticed that there's a lot of SHFS processes that often are using the most cpu of anything, one has had 48hrs Cpu time on a machine with 6 days uptime

    (filtered htop screenshot attached)

     

    After reading the earlier posts in this thread I was wondering if this might be related.  My app data and docker folders are both on cache only shares if that matters.  Array is single parity with 5x8tb and 5x3tb, cache is 2x500gb sata ssds in btrfs raid 1, I also have a single 3tb hd defined as another pool and an old 128gb ssd as an unassigned device.

     

     

    Screen Shot 2020-10-14 at 16.00.26.png

  5. 4 hours ago, limetech said:

    Alright I see an issue with that procedure with encrypted pool.  I'll fix in next release, but workaround is this:

     

    Unassign/Re-assign Method (workaround for encrypted pool):

    First, create a single-slot temporary pool, I'll call it 'temp' here.

    Then follow these steps:

    1. Stop array and unassign one of the devices from your existing pool; and, assign the device to 'temp' pool.
    2. Start array.  A balance will take place on your existing pool.  Let the balance complete.  The 'temp' pool will appear 'unformatted' - just leave it (don't Format).
    3. Stop array.  Unassign the device from 'temp', adding it back to your existing pool.
    4. Start array.  The added device will get re-partitioned and a balance will start moving data to the new device.  Let the balance complete.

    Repeat steps 1-4 for the other device in your existing pool.

     

     

    The work around seems to have worked for me - the only difference being in step 2 no balance was triggered.  I carried on and everything else worked as expected and my partitions now show start at 2048.

  6. 3 hours ago, limetech said:

    Seems to not have re-partitioned.  Under Start column it should say 2048 for those devices.

     

    Quick sanity check, type these commands, both should return '0':

    
    cat /sys/block/sdm/queue/rotational
    cat /sys/block/sdn/queue/rotational

    You can repeat procedure but after the first device uninstall/reinstall post your diags.

     

    I tried again, this time no balance seems to have been triggered at all - I can switch to the mover way and recreate the cache contents but wanted to help troubleshoot this first.  Diags attached, let me know what else I can do.

     

    Edit to add - I'm using the Nvidia build in case that's relevant here.

     

    bb8-diagnostics-20201001-1605.zip

  7. How do I check it the partition alignment actually worked / happened? I tried the 2nd method (removing the drives from the pool 1 by 1) but when I re-added them it didn't seem that a balance was automatically triggered so I manually triggered one.

     

    from my quick bit of research this seems to be a way to check but I'm not sure how to interpret the output:

    Quote

    fdisk -l -u /dev/sdm

    Disk /dev/sdm: 447.13 GiB, 480103981056 bytes, 937703088 sectors

    Disk model: SanDisk Ultra II

    Units: sectors of 1 * 512 = 512 bytes

    Sector size (logical/physical): 512 bytes / 512 bytes

    I/O size (minimum/optimal): 512 bytes / 512 bytes

    Disklabel type: dos

    Disk identifier: 0x00000000

     

    Device     Boot Start       End   Sectors   Size Id Type

    /dev/sdm1          64 937703087 937703024 447.1G 83 Linux

     

    fdisk -l -u /dev/sdn

    Disk /dev/sdn: 447.13 GiB, 480103981056 bytes, 937703088 sectors

    Disk model: SSD PLUS 480GB

    Units: sectors of 1 * 512 = 512 bytes

    Sector size (logical/physical): 512 bytes / 512 bytes

    I/O size (minimum/optimal): 512 bytes / 512 bytes

    Disklabel type: dos

    Disk identifier: 0x00000000

     

    Device     Boot Start       End   Sectors   Size Id Type

    /dev/sdn1          64 937703087 937703024 447.1G 83 Linux

     

×
×
  • Create New...