• [6.7.x] Very slow array concurrent performance


    JorgeB
    • Solved Urgent

    Since I can remember Unraid has never been great at simultaneous array disk performance, but it was pretty acceptable, since v6.7 there have been various users complaining for example of very poor performance when running the mover and trying to stream a movie.

     

    I noticed this myself yesterday when I couldn't even start watching an SD video using Kodi just because there were writes going on to a different array disk, and this server doesn't even have a parity drive, so did a quick test on my test server and the problem is easily reproducible and started with the first v6.7 release candidate, rc1.

     

    How to reproduce:

     

    -Server just needs 2 assigned array data devices (no parity needed, but same happens with parity) and one cache device, no encryption, all devices are btrfs formatted

    -Used cp to copy a few video files from cache to disk2

    -While cp is going on tried to stream a movie from disk1, took a long time to start and would keep stalling/buffering

     

    Tried to copy one file from disk1 (still while cp is going one on disk2), with V6.6.7:

     

    2083897607_Screenshot2019-08-0511_58_06.png.520373133cc121c80a361538a5fcc99b.png

     

    with v6.7rc1:

     

    856181720_Screenshot2019-08-0511_54_15.png.310bce8dbd6ed80d11d97727de55ac14.png

     

    A few times transfer will go higher for a couple of seconds but most times it's at a few KB/s or completely stalled.

     

    Also tried with all unencrypted xfs formatted devices and it was the same:

     

    1954593604_Screenshot2019-08-0512_21_37.png.6fb39b088e6cc77d99e45b37ea3184d8.png

     

    Server where problem was detected and test server have no hardware in common, one is based on X11 Supermicro board, test server is X9 series, server using HDDs, test server using SSDs so very unlikely to be hardware related.

    • Like 1
    • Upvote 22



    User Feedback

    Recommended Comments



    51 minutes ago, Marshalleq said:

    I'm not sure what you're saying here.  Queue Depth is set to 1 on both 6.6.7 and latest stable.  So how does 6.6.x have a higher Queue depth?

    I think what he was saying is, now that he is back on 6.6.7 he checked and the queue depth is 1 on 6.6.7 as well.  Which means that the speculation that NCQ in 6.7 might be part of the problem with that release would be incorrect since for him queue depth was 1 for both 6.6.7 and 6.7 but he has no issues with 6.6.7.  Or at least that's how I read what he said anyway.

    Link to comment
    11 hours ago, Marshalleq said:

    I'm not sure what you're saying here.  Queue Depth is set to 1 on both 6.6.7 and latest stable.  So how does 6.6.x have a higher Queue depth?

    i was just reacting on @patchrules2000 post, he was setting all drives to QD=32 (even on 6.6.x).

    Link to comment
    10 hours ago, rclifton said:

    I think what he was saying is, now that he is back on 6.6.7 he checked and the queue depth is 1 on 6.6.7 as well.  Which means that the speculation that NCQ in 6.7 might be part of the problem with that release would be incorrect since for him queue depth was 1 for both 6.6.7 and 6.7 but he has no issues with 6.6.7.  Or at least that's how I read what he said anyway.

    nearly perfect ;-)

    i haven't checked my own QD settings on 6.7.x before i left (no one has brought up the QD as a possible reason), but i looked at a friends unRAID system. a fresh setup (just a few weeks old) and there all spinners are also on QD=1.

    Link to comment

     

    1 hour ago, s.Oliver said:

    i was just reacting on @patchrules2000 post, he was setting all drives to QD=32 (even on 6.6.x).

    OK - regardless of it not being 'the' issue - yes I'd agree that these should be set to 32 on all drives - unless Unraid has also invented a way of replacing NCQ - which I highly doubt, or there some reason to do with their unraid parity logic or something.  So this is an unexpected and hopefully additional performance benefit.  It IS enabled on my SSD's if I recall correctly.

     

    That said the below actually outlines areas where performance is decreased and specifically mentions RAID.  So perhaps that is Limetech's testing found it works better switched off.

     

    https://en.wikipedia.org/wiki/Native_Command_Queuing

    Edited by Marshalleq
    Link to comment
    13 hours ago, Marshalleq said:

    That said the below actually outlines areas where performance is decreased and specifically mentions RAID.  So perhaps that is Limetech's testing found it works better switched off.

     

    https://en.wikipedia.org/wiki/Native_Command_Queuing

    on normal SSDs (SATA) (at least on one machine as cache drive seen) it is set to "32". but these are fast enough to handle it and they are not embedded in that special "RAID" operation as the data/parity drives.

    because of the nature of unRAIDs "RAID"-modus i guess, the drives are "faster" if they work one small chunks of data in 'sequential' order.

    Edited by s.Oliver
    Link to comment
    1 hour ago, s.Oliver said:

    because of the nature of unRAIDs "RAID"-modus i guess, the drives are "faster" if they work one small chunks of data in sequentiell order.

    I think it's more that when a sector is written on a data drive then for parity to be consistent the same sector needs to be updated near real time as well on the parity drive. The individual drives don't understand this concept and allowing a drive to update in which ever order it chooses would increase the chance of the parity drive being out of sync with the actual data drives, especially when you have updates in multiple data drives simultaneously.

    Imagine having to update sector 13456 on drive 3 and sector 25789 on drive 4 in that order and then the parity drive deciding that is should update first sector 25789 and then 13456 and at the same time having a power failure in-between those writes. Then you would end up having 2 sectors with invalid parity data, even though your data drives both have the correct information. 

     

    Link to comment
    8 hours ago, simalex said:

    I think it's more that when a sector is written on a data drive then for parity to be consistent the same sector needs to be updated near real time as well on the parity drive. The individual drives don't understand this concept and allowing a drive to update in which ever order it chooses would increase the chance of the parity drive being out of sync with the actual data drives, especially when you have updates in multiple data drives simultaneously.

    Imagine having to update sector 13456 on drive 3 and sector 25789 on drive 4 in that order and then the parity drive deciding that is should update first sector 25789 and then 13456 and at the same time having a power failure in-between those writes. Then you would end up having 2 sectors with invalid parity data, even though your data drives both have the correct information. 

     

    i didn't want to go into deep of the concept of unRAIDs parity algorithm. so you're right, unRAID needs to be strict in writing the same sector to data/parity drive(s) at (more or less) at the same time (given how fast different drives are completing the request). so the slowest drive in the mix (which is in the data writing cycle – doesn't matter if parity or data) is responsible for the time needed (or how fast that write cycle will be completed).

     

    but, unRAID is not immune against data loss because of not finished write operations (whatever reason) and has no concept of a journal (to my knowledge). so this file (at that time when writing was abrupt ended and not finished) is damaged/incomplete and parity doesn't/can't change anything here and probably isn't in sync anyway. so unRAID does usually force an parity sync on next start of the array (and it will rebuild parity information completely/only based on the values of the data drive(s)).

     

    unRAID would need some concept of journaling to replay the writes and find the missing part. it has not (again, to my knowledge). ZFS is one file system, which has an algorithm to prevent exactly this.

     

    my observation is, that it is a pretty much synchronous write operation (all drives which need to write data, do write the sectors in the same order/same time – else i imagine, i could hear much more 'noise' from my drives, especially if you do a rebuild).

     

    but i do confess – that is only my understanding of unRAIDs way of writing data into the array.

    Edited by s.Oliver
    Link to comment

    FWIW, New 1st timer Unraid build, I have an array speed issue(Posted) that maybe related, drives within the array are maxing at 40mb's including clearing of 2 new drives, the same drives(Unassigned) outside the array transfer speeds disk to disk via krusader are approx 170mb/s, the drives within the array have a NCQ of 1(all drives including parity), the same identical drives outside the array have a NCQ value of 32 and are blazing quick, how can I set the NCQ value to 32 for the array drives? or what for a patch?

    Link to comment

    Sadly, this kind of speed is normal for Unraid when writing.  It's due to parity calculations.  Like you I do struggle to see how it can be 'that' much slower, but it really is.  Unraid trades this for allowing differing sized drives and being able to power down unused drives.  Mostly this speed is enough, but when doing large copies it does become rather noticeable.  That's why there is a cache option to offset the worst part of Unraid - writing.  Some day, when SSD's are affordable for the array we'll get a speed increase.  Until then the choice is to put up with it, or to move to something like Proxmox, Freenas etc... I get about the same speed BTW and my disks perform at around 230MB/s individually.  Quite a decrease!

    Link to comment

    Ok, googlefu...changed the NCQ of the array drives from 1 to 31 and speed has increased dramatically, doing a clear on 2 drives, gone from 38mb/s to 130mb/s as soon as I made the change, I am a happy camper!, I was about to hit the go button on another HBA controller!

     

    Command to change your NCQ level is:

    echo 31 > /sys/block/sdX/device/queue_depth 

    sd'X' is your device, echo XX is the level you want to change to, max is 32

     

    command to check what your NCQ is set too:

    cat /sys/block/sdx/device/queue_depth

    sd'X' is your device

    Edited by DaMAN
    Link to comment
    On 8/24/2019 at 11:05 AM, Marshalleq said:

    OK - regardless of it not being 'the' issue - yes I'd agree that these should be set to 32 on all drives - unless Unraid has also invented a way of replacing NCQ - which I highly doubt,

    It was determined a long time ago that the Linux kernel is better at queueing up disk I/O requests than the firmware on the drives themselves (the reverse of the situation with Windows), so in Unraid NCQ is disabled by default. See here.

     

    It looks like something's broken in the Unraid 6.7.x kernel at the moment and enabling NCQ with a value of 32 helps with mitigation, but users reverting to Unraid 6.6.7 should set it to the default "Auto", which in this case means "Off".

    Link to comment
    6 hours ago, Marshalleq said:

    Sadly, this kind of speed is normal for Unraid when writing.  It's due to parity calculations.  Like you I do struggle to see how it can be 'that' much slower, but it really is.

    It is not to do with parity calculations, per se, because they are quick. It is to do with the way the parity drive is updated. There are two modes of operation.

     

    The default is read-modify-write, which requires a block to be read from the parity disk, modified and rewritten. This depends on the rotational latency of the mechanical hard drive since, after reading, it has to wait for a whole rotation before it can write the data back in the same place on the platter. The advantage of this method is that only the data disk being written to and the parity disk need to be spun up - the rest of the disks can be spun down.

     

    The alternative mode is reconstruct write, sometimes called "turbo write", which reads all the data disks simultaneously and calculates then writes parity without having to read it first. It's faster because it doesn't depend on the rotational latency of the parity drive but it requires that all disks be spun up.

    Link to comment

    Do we think that intermittent latency problems with a VM could be traced back to this as well?  The VM is not on the array (nor cache) but booted from a dedicated nvme drive passed through.

     

    It is incredibly frustrating to try and chase down intermittent problems...

    Link to comment
    40 minutes ago, J.Nerdy said:

    Do we think that intermittent latency problems with a VM could be traced back to this as well?  The VM is not on the array (nor cache) but booted from a dedicated nvme drive passed through.

     

    It is incredibly frustrating to try and chase down intermittent problems...

    If NVMe was PT, it won't help even set any command for it.

    Link to comment
    6 hours ago, John_M said:

    It is not to do with parity calculations, per se, because they are quick. It is to do with the way the parity drive is updated. There are two modes of operation.

     

    The default is read-modify-write, which requires a block to be read from the parity disk, modified and rewritten. This depends on the rotational latency of the mechanical hard drive since, after reading, it has to wait for a whole rotation before it can write the data back in the same place on the platter. The advantage of this method is that only the data disk being written to and the parity disk need to be spun up - the rest of the disks can be spun down.

     

    The alternative mode is reconstruct write, sometimes called "turbo write", which reads all the data disks simultaneously and calculates then writes parity without having to read it first. It's faster because it doesn't depend on the rotational latency of the parity drive but it requires that all disks be spun up.

    Even so, in either read-modify-write or reconstruct-write (I've done both) it is still an awful lot slower than RAID 5 done with striping.  At least that's what came out with my testing.  Perhaps I'm wrong.

    Link to comment
    1 hour ago, Benson said:

    If NVMe was PT, it won't help even set any command for it.

    I assume you're taking into account the normal performance issues on NVME caused by heat.

    Link to comment
    20 minutes ago, Marshalleq said:

    Even so, in either read-modify-write or reconstruct-write (I've done both) it is still an awful lot slower than RAID 5 done with striping.  At least that's what came out with my testing.  Perhaps I'm wrong.

    I guess hardware plays a role too.

    I have pretty fast drives and with reconstruct write on, I do get satisfying results.

    Below a 12GB file copied from my PC (nvme disk) directly to the array over a 10G connection.

     

    image.png.255bccd5ab192c0c74689691fea6176f.png

     

    It starts of at 1 GBps (10 Gbps) until the RAM caching is full and continues writing at around 140 MBps (1.12 Gbps)

    Link to comment

    Very nice - I'm yet to buy one end of my 10G connection.  Anyway, as I said above it is a tradeoff and I've chosen to trade having the drives spin down rather than have the speed.  I guess you can always turn reconstruct write on if you know you're going to do a large copy.  But now we're starting to move away from the topic of this thread. :D

    Link to comment

    Of course I wasn't talking about concurrent drives access...

     

    (disclosure: I am testing on a newer kernel)

    Edited by bonienl
    Link to comment
    8 minutes ago, bonienl said:

    Of course I wasn't talking about concurrent drives access...

     

    (disclosure: I am testing on a newer kernel)

    So, I'm guessing a new kernel fixes this problem?

     

    I see similar speeds on 6.6.x, but not on any version of 6.7.x.

    Link to comment
    1 minute ago, StevenD said:

    So, I'm guessing a new kernel fixes this problem?

     

    I see similar speeds on 6.6.x, but not on any version of 6.7.x.

    So far nothing has been found that fixes this and @limetech continues to be silent.

    • Upvote 1
    Link to comment
    4 minutes ago, Marshalleq said:

    Very nice - I'm yet to buy one end of my 10G connection.  Anyway, as I said above it is a tradeoff and I've chosen to trade having the drives spin down rather than have the speed.  I guess you can always turn reconstruct write on if you know you're going to do a large copy.  But now we're starting to move away from the topic of this thread. :D

    You are so interesting.

    Link to comment
    17 minutes ago, StevenD said:

    So, I'm guessing a new kernel fixes this problem?

    I am afraid that simultaneous transfers to the array (different disks) is hampered.

     

    Here are the results of the same 12TB file copied to two different disks simultaneously.

     

    image.png.57a9b344164b6bf33ead440681dcc5f5.png

     

    This is not extremely low, but pales in comparison when I read this 12TB file from the array 😐

     

    image.png.0f552acd40be812c5ed1d30a220c5feb.png

     

    Okay, I was cheating a little when reading from the array (things were cached in RAM).

     

    I do get around 250 MBps in real world circumstances.

     

    image.png.5d6d1e991191ec7eb55f8e7414580e60.png

    Edited by bonienl
    Link to comment
    40 minutes ago, bonienl said:

    Of course I wasn't talking about concurrent drives access...

     

    (disclosure: I am testing on a newer kernel)

    Unraid 6.8?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.