• [6.7.x] Very slow array concurrent performance


    JorgeB
    • Solved Urgent

    Since I can remember Unraid has never been great at simultaneous array disk performance, but it was pretty acceptable, since v6.7 there have been various users complaining for example of very poor performance when running the mover and trying to stream a movie.

     

    I noticed this myself yesterday when I couldn't even start watching an SD video using Kodi just because there were writes going on to a different array disk, and this server doesn't even have a parity drive, so did a quick test on my test server and the problem is easily reproducible and started with the first v6.7 release candidate, rc1.

     

    How to reproduce:

     

    -Server just needs 2 assigned array data devices (no parity needed, but same happens with parity) and one cache device, no encryption, all devices are btrfs formatted

    -Used cp to copy a few video files from cache to disk2

    -While cp is going on tried to stream a movie from disk1, took a long time to start and would keep stalling/buffering

     

    Tried to copy one file from disk1 (still while cp is going one on disk2), with V6.6.7:

     

    2083897607_Screenshot2019-08-0511_58_06.png.520373133cc121c80a361538a5fcc99b.png

     

    with v6.7rc1:

     

    856181720_Screenshot2019-08-0511_54_15.png.310bce8dbd6ed80d11d97727de55ac14.png

     

    A few times transfer will go higher for a couple of seconds but most times it's at a few KB/s or completely stalled.

     

    Also tried with all unencrypted xfs formatted devices and it was the same:

     

    1954593604_Screenshot2019-08-0512_21_37.png.6fb39b088e6cc77d99e45b37ea3184d8.png

     

    Server where problem was detected and test server have no hardware in common, one is based on X11 Supermicro board, test server is X9 series, server using HDDs, test server using SSDs so very unlikely to be hardware related.

    • Like 1
    • Upvote 22



    User Feedback

    Recommended Comments



    I'm also experiencing this in mc just copying files from array to array. Does the same thing as mover, locks up and maxes CPU, even mc transfer freezes up every 5 seconds or so and pauses for a few seconds. 

    Link to comment
    On 8/21/2019 at 4:33 AM, Marshalleq said:

    OK thanks for clarifying. Can you advise what was slow though, other file copying, the GUI etc?

    Everything bar the copy. The video I was playing on the HTPB stalled, doing a unraid directory listing on my pc was very slow with the top bar slowly filling. The copy that I triggered via Krusader seemed to be slowed as well. 

     

    I really wish the Lime boys would at least acknowledge they are aware of this issue. 

    Edited by dalben
    • Like 1
    Link to comment

    I will downgrade to 6.6.x to test, I only got 2.3M/s when using MakeMKV to renux a file, total disk read/write is only 1xM/s, iotop shows below, I don't think it is normal

     

     

    IO-6.72.JPG

    Link to comment
    8 hours ago, rclifton said:

    Is there anything major to look out for when downgrading?

    I asked in the thread below if there is anything specific that needs to happen.  It looks like all you have to do is download the zip, stop the array, copy over the bz* files to /boot and then reboot.

     

     

    Link to comment
    5 minutes ago, zinderr said:

    I asked in the thread below if there is anything specific that needs to happen.  It looks like all you have to do is download the zip, stop the array, copy over the bz* files to /boot and then reboot.

     

     

    The only other thing would be to check if you have any specific 6.7 plugins that won't work in 6.6

    Link to comment

    I want to report that enable NCQ help me a lot on this issue,  my MAKEMKV got 35M/s compare with 2.3M/s before in the same situation.

     

    but unraid seems has the bug to enable NCQ,  even if you setup Tunable (enable NCQ) to yes in disk setting,  the queue_depth is still 1,  I have to manually setup queue_depth in CLI for each disk like below

     

    echo 31 > /sys/block/sdf/device/queue_depth

    Link to comment
    3 hours ago, trott said:

    I want to report that enable NCQ help me a lot on this issue,  my MAKEMKV got 35M/s compare with 2.3M/s before in the same situation.

     

    but unraid seems has the bug to enable NCQ,  even if you setup Tunable (enable NCQ) to yes in disk setting,  the queue_depth is still 1,  I have to manually setup queue_depth in CLI for each disk like below

     

    echo 31 > /sys/block/sdf/device/queue_depth

    Interesting find. Setting the same on mine has helped a lot as well. Thank you!

    Link to comment
    7 hours ago, trott said:

    I have to manually setup queue_depth in CLI for each disk like below

    Good for know depth still 1

    Edited by Benson
    Link to comment

    Hmm, This is an excellent find.  I can see that my SSD's have defaulted to a queue depth of 32, however all the rotational media including both that on the PCIe Sata card and that on the on board ports, has defaulted to 1.  This would definitely make a marked performance difference to reading and writing simultaneously.  I'm not 100% convinced though that it is the 'whole' issue.  @trott perhaps you could log this as a separate bug as it's easy to replicate and I assume easy to solve?

    Edited by Marshalleq
    Link to comment

    Just wanted to add my 2c,

    Same exact issue, most obvious when NZBGET is processing/extracting a download resulting in a bunch of read+writes at once. completely freezes up any other disk i/o with upwards of 50% cpu time IOWAIT even if just trying to do a simple file read from a disk not in use at all by NZBGET or other process's

    Reverting unraid version has eliminated the issue so something between 6.6.7 and current is definately the cause.

    Hopefully we get a fix soon :)

    Link to comment
    20 minutes ago, patchrules2000 said:

    Just wanted to add my 2c,

    Same exact issue, most obvious when NZBGET is processing/extracting a download resulting in a bunch of read+writes at once. completely freezes up any other disk i/o with upwards of 50% cpu time IOWAIT even if just trying to do a simple file read from a disk not in use at all by NZBGET or other process's

    Reverting unraid version has eliminated the issue so something between 6.6.7 and current is definately the cause.

    Hopefully we get a fix soon :)

    Hi, out of interest can you check the above queue depth issue on your earlier version?

     

    cat /sys/block/sdx/device/queue_depth (replace sdx with your spinning HDD device names from the Main tab.

     

    Many thanks,

     

    Marshalleq

    Link to comment
    7 minutes ago, Marshalleq said:

    Hi, out of interest can you check the above queue depth issue on your earlier version?

     

    cat /sys/block/sdx/device/queue_depth (replace sdx with your spinning HDD device names from the Main tab.

     

    Many thanks,

     

    Marshalleq

    i had NCQ disabled on newest version and 6.6.7 when seeing the huge performance improvement

     

    I have since enabled it on 6.6.7 and havent noticed any tangeable difference besides everything just working again after the downgrade in version.

     

    Que Depth was default to 1 after enabling in menu, i changed it to 31 on all drives.

    Might be noticible in a benchmark but at the moment no obvious change (although server is currently not under a very large load)

     

    Dont really feel like upgrading again to a broken version just to test NCQ on 6.7.* but if it is a turning point in getting the issue fixed i can give it a try.

     

    Hope this info has been helpfull :)

    Link to comment

    Anecdotal evidence here ( I have been watching this thread carefully ), but when I was doing some large copies (TBs moves using unBALANCE) I found that all of a sudden my winVM was experiencing significant latency issues.  This had not been an issue previously.  I just found it odd that it surfaced... and the thread had recency on my mind.  Maybe just Baader-Meinhof.

     

    Probably unrelated (the VM exists on a dedicated nvme that sits in UD unmounted when not in use), but thought it worth noting.

     

    I am going to try and something more objective (large copies while streaming UHD from Plex - which has never stuttered)

     

    Cheers.

     

    EDIT:  Hmmmm - slight performance degradation, but still not stalling and constant buffering that others noted.  (on side note:  Thor Ragnarock, great fun).

     

    EDIT2:  Intersting. I am again moving a large data set.  I went to start a VM and the page just hung.  VM manager is no longer accessible via the GUI as well.  Again, don't know if related, but odd.  Will see if I can access once move completes.

     

     

    Edited by J.Nerdy
    Link to comment

    my 2cents here (i'm back on 6.6.7 for 12 days and all is as good as it ever was):

     

    Disk Settings: Tunable (md_write_method): Auto (have never touched it)

    cat /sys/block/sdX/device/queue_depth for all rotational HDDs is "1"

     

    QD for cache NVMe drive is unknown (doesn't have the same path to print the value)

     

    wouldn't this contradict the opinion, that because of 6.6.x series has a higher QD value, it performs better?

    Link to comment

    I am also back on 6.6.7, and the performance is back to what is was prior to 6.7.x.

     

    I have never changed the NCQ setting, and so it is set to 'Auto'.

    cat /sys/block/sdX/device/queue_depth = 1 for all spinning disks, but is 31 for the (single) SSD cache drive.

     

    I don't know if this was the case before upgrading to 6.7.x, or if this was set during the upgrade and the new config survived the downgrade.   However, as it is set to 1 at the moment, and the performance is fine (with 6.6.7), then I am not sure this is the underlying problem. 

     

    Tunable (md_write_method) is set to 'Auto', but i do have the 'CA Auto Turbo Write Mode' plugin installed that turns on Turbo Write when all disks are spinning (and greatly increases my write speed to ~110Mb/s when transferring over SMB).    I did not think about turning this off when having issues in 6.7.2, to see if this was a problem or not.   

     

    I don't really want to re-upgrade to test things at the moment, until there is some acknowledgement/movement on this issue, as 6.6.7 is working for me. 

     

    Link to comment

    Does anyone know if the new beta with the new Linux kernel solves this?

    Quote

    Version 6.7.3-rc1 2019-07-22

    Linux kernel:

    version: 4.19.60

    Thanks,

    craigr

    Link to comment
    9 minutes ago, craigr said:

    Does anyone know if the new beta with the new Linux kernel solves this?

    Thanks,

    craigr

     

    it Does not 

    Link to comment
    18 minutes ago, craigr said:

    Does anyone know if the new beta with the new Linux kernel solves this?

    Thanks,

    craigr

    And that's not the latest RC - there's been RC2 out for a while now.

    Edited by Marshalleq
    Link to comment
    4 hours ago, s.Oliver said:

    my 2cents here (i'm back on 6.6.7 for 12 days and all is as good as it ever was):

     

    Disk Settings: Tunable (md_write_method): Auto (have never touched it)

    cat /sys/block/sdX/device/queue_depth for all rotational HDDs is "1"

     

    QD for cache NVMe drive is unknown (doesn't have the same path to print the value)

     

    wouldn't this contradict the opinion, that because of 6.6.x series has a higher QD value, it performs better?

    I'm not sure what you're saying here.  Queue Depth is set to 1 on both 6.6.7 and latest stable.  So how does 6.6.x have a higher Queue depth?

    Link to comment
    6 minutes ago, Marshalleq said:

    And that's not the latest RC - there's been RC2 out for a while now.

    But the only change from rc1 to rc2 was a docker version. It was specifically targeted at users with SQLite corruption. 

    Edited by StevenD
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.