• [v6.8.x] Parity check slower than before in certain cases


    JorgeB
    • Annoyance

    There is a known issue with v6.8 where parity syncs can be slower than before, mostly but not limited to faster and/or very wide arrays.

     

    Although this isn't an urgent issue by any means this thread is meant to make it easier for LT to track the problem, also to have a better idea of how many users are affected and if there is a typical server configuration where the slowdown starts to be more noticeable.

     

    This issue results from changes made to the md driver and tunables to fix the "writes starve reads" problem with v6.7 and based on my testing there's a general slowdown fo the max possible speed for any array size, but since most servers will be disk/controller limited those shouldn't noticed any slowdown, users with very fast and/or wide arrays can notice it.

     

    This is from my test server using only SSDs:

    pc12.thumb.PNG.91ba055287d3a900aa186f6d82bf68b9.PNG

    You can see parity check speed with v6.8 is always slower independent of array size, in fact v6.7.2 is faster with dual parity than v6.8.0 with single parity, there's one curious exception, a read check (no parity) with 8 or 12 devices where v6.8.0 is noticeably faster, all results are consistent on repeat testing (+/- 5MB/s).

     

    Since most servers are disk/controller limited I ran another test this time always using one slower SSD with a max read speed of 220MB/s (with single parity):

     

    pc9.thumb.PNG.1011d711521b71ee34d553939c8f653c.PNG

    So in a typical HDD array where disks are limited from 150 to 200MB/s the user would only start noticing the issue with more than 20 devices or so.

     

    I'm limited by the hardware I can test on but to try and rule it out as much as possible, mostly the disk controllers, I also did some tests using 2 old SAS2LP controllers (also single parity):

    pc8.thumb.PNG.21c3e1f5b125f875f65f5105f0b5b25d.PNG

    I can only connect up to 16 devices on both controllers (parity is on an Intel onboard SATA port), but the trend seems clear, though interestingly it's faster than before with 8 devices, this can be because the tunables used with v6.7.2 are more tuned for the LSI or the new code just works better with the SAS2LP and few devices.

     

    Now for some real world examples, I have two 30 disks servers, on the first one parity check started at 200MB/s (disk limited), now it starts at 150MB/s, on the second server, which uses older hardware and slower disks parity check started at 140MB/s (also disk limited) and slows down just a little with v6.8 to 130MB/s, speed remains unchanged for all my other servers, mostly between 10 to 14 disks.

     

    Anyone having the same issue please post here together with a general server description, mainly CPU, board, controller and disk types used, or post the diagnostics which contains all the info we need to see the hardware used, note that if there is a slowdown it should be more evident at the start of the parity check, which should be the max speed, average speed will likely not suffer as much, so try to compare parity check start speed after a minute or so so it can stabilize, make sure nothing else is using the array, or do a quick test in maintenance mode so you can be certain nothing else is.

     

    Also note, that anyone using default or not optimized tunables with 6.7 might even notice a performance improvement due to the new code which attempts to auto configure the tunables for best performance.

    • Like 5


    User Feedback

    Recommended Comments



    Same here only more drastic. With 6.7.x I got 85-90 mb/s, now with scheduler mqdeadline only 55mb/s and with scheduler set to none 67mb/s. I only have a slow J1900 and it seems the new code does not play well with slow cpu's. Diagnostics added. The system has eight 8tb data drives and two 8 tb parity drives. I tried various disk settings (num_stripes, que_limit, NCQ, nr_requests) with no improvement over defaults.

    tower-diagnostics-20191121-0717.zip

    Edited by Videodr0me
    • Thanks 1
    Link to comment
    Share on other sites

    Unfortunately, I do not have version history. That would be nice to have in the Parity Check History.  But, I usually update fairly shortly after an update is released, with the exception of 6.7.  That one took me a while, IIRC.  I was using "tuned" settings, but reverted to defaults with 6.8.

     

    For a couple of years, I have had the same size array:  2 x Parity + 13 x Data.  All HGST 8TB HDN728080ALE604.

     

    image.png.80e24ff23ea7c2de0d1d2e359019f06d.png 

     

     

    Edited by StevenD
    Link to comment
    Share on other sites

    I have also been experiencing this slow parity checks for at least 6 months now.

    When I start a parity check, I get anywhere from 95 MB/s to 115 MB/s for the first few minutes. Definitely not maxing out my controllers or each individual drive.

     

    Based on my configuration I should be seeing speeds of 175 MB/s + at the bargaining since my slowest drive tops out at around 180 MB/s on average.

     

    In the past on older versions I was definitely seeing much higher speeds at the beginning of the check    even when I have some older slower drives in my array at the time (4TB maxing out at 145 - 150 MB/s).

     

    Here is my current drive configuration:

    image.png.fe5b14e41c396b123a37fa5e6dc4096b.png

     

    Also here is my parity check history:

    image.png.2d5ce2abf0844ab72a6c7a511b7a45d2.png

     

    Also attaching my diagnostics file.

    ds1-diagnostics-20191210-1216.zip

    Link to comment
    Share on other sites

    Retested with 6.8.0 and its as slow as the 6.8.0RCs. This rules out the 5.x kernels as the reason for the slowdown. This points to the new multistream code as a potential reason for the slowdown. Especially with slowish Cpus it seems to not play well. Also one CPU core is maxed out at 100% while the others are close to idle during a parity check. Maybe its possible to distribute the load more evenly.

    Edited by Videodr0me
    Link to comment
    Share on other sites

    Recently upgraded to Unraid 6.8.0 - after a few hiccups with the Nerd Pack - and have also noticed a drop in performance (see attached). Storage is a very old array that hasn't been changed in years - 4TB max, across 4 data drives, 1 parity, 1 cache (SSD/SATA). 

     

    System hardware was upgraded last year to a TR2950X with 32GB RAM, which should be more than capable!

     

    While I don't have massive or fast data access needs (hoping to upgrade and add faster drives soon), it would be great to maintain performance - is there any work around yet? Or just wait for the next release? I can live with it for now; it's stable at least.

     

    Cheers,

     

    Xav.

    Unraid Parity Check History.JPG

    Link to comment
    Share on other sites
    22 minutes ago, xavierda said:

    is there any work around yet?

    Not yet, we'll need to wait.

     

    Taking the opportunity to move this to stable reports since it's still an issue.

    • Like 1
    Link to comment
    Share on other sites

    I am glad that I am not the only one who is experiencing slowness in there Parity.

     

    looking at the about results - can you please inform me on what package you are using, and I would like start to record the information, and share my results with the group.

    Link to comment
    Share on other sites
    7 minutes ago, chris_netsmart said:

    can you please inform me on what package you are using

    On the main GUI page there's a "history button", just below the parity "check".

    Link to comment
    Share on other sites

    thanks. here is my results:  as you can see my speed has drop over time

    image.png.a2fba1b59478bb3dddb2f4438c66b746.png

    and here is my current hard drivers

    image.png.0c17d7a6a5b89e971e1bd3e837719e8a.png

    at the moment I don't have a cache drive. but this is schedule to my installed this mouth.

     

    Update.   I have now addes a cache drive and it ia still slow when it does the parity check

     

    Edited by chris_netsmart
    Link to comment
    Share on other sites

    16 4TB SSDs, that's a nice array :o, and one where would expect the slowdown to be very noticeable, I assume 395MB/s was with v6.7 and now you get 295MB/s.

    Link to comment
    Share on other sites
    5 hours ago, johnnie.black said:

    16 4TB SSDs, that's a nice array :o, and one where would expect the slowdown to be very noticeable, I assume 395MB/s was with v6.7 and now you get 295MB/s.

    Yes, that is correct. The last result was with 6.8.1 and the previous results were with 6.7.2. I'm using the nvidia unraid build.

    Link to comment
    Share on other sites

    No improvement with 6.9 beta1. Parity speeds still at about 55-65mb/s instead of approx 90mb/s. I rechecked very early logs (pre 6.x) and then i had over 100mb/s. I hope this gets addressed soon, as a parity check now takes 36-48 hours.

    Link to comment
    Share on other sites
    1 hour ago, Videodr0me said:

    No improvement with 6.9 beta1. Parity speeds still at about 55-65mb/s instead of approx 90mb/s. I rechecked very early logs (pre 6.x) and then i had over 100mb/s. I hope this gets addressed soon, as a parity check now takes 36-48 hours.

    Not really an answer to your question, but have you looked into using the Parity Check Tuning plugin to avoid the parity check running during prime time with its adverse affect on performance?

    Link to comment
    Share on other sites

    I simply pause the parity check whenever i really need full server performance. My main concern with the unneccessary long parity checks is not only performace, but that the resources are not utilized efficiently leading to higher energy consumption, more wear on drives, more noise, chassis/drive temperatures are higher for longer period of times etc.....
     

    This is awkward as it seems to be a solvable problem. Granted it affects fast cpu system less, but i always liked unraid because it worked well on low-power older cpu systems. Its annoying that one cpu core is maxed out while otherr cores idle or have minimal load. To me it seems that some of the raid code (maybe both read buffers or the actual parity calculation) is not really multithreaded well. But there may be other reasons for the performance degradation. Just hope that this gets fixed

    Link to comment
    Share on other sites
    On 3/14/2020 at 8:02 AM, jwoolen said:

    image.png.8e095162ef4cf6a611d77f5513a092f6.png 

     

    Still no improvement with 6.8.3.

    I thought I would check back here - mine seems to have improved back to what it was before. Now under 6.8.3. Will see if the next parity check is about the same. Perhaps it was a once off.

    Link to comment
    Share on other sites

    No idea if related (in same ballpark)... I went from 6.7.0->6.8.3, while my parity checks appear to be similar speed 85-100MB/s...

     

    Video playback via Kodi was horrible during a parity check and also rebuild.  Would buffer every minute or so.  SMB and NFS.

     

    Reverted to 6.7.0, running a parity check now and playback appears fine.  Will be running a rebuild after the check, will confirm that as well.  

     

    Update:  Confirmed not having playback issues during rebuild on 6.7.0

    Edited by dandirk
    Link to comment
    Share on other sites

    Are there any updates on this? I have a PCIe 4.0 HBA that I was planning to use. The slow parity check speeds mitigate the performance benefit. I tried 6.9.0-beta22 and it was just as slow.

    Edited by jwoolen
    Link to comment
    Share on other sites
    4 hours ago, jwoolen said:

    Are there any updates on this?

    Not yet, v6.9 should be very similar to 6.8, my tests show very small differences (checks with single parity are a little slower faster, with dual parity a little faster slower) but those are very small differences +/- 5MB/s to 10MB/s and likely just because it's using a different kernel, but basically it will be the same, I would guess that understandably LT is looking at more important issues first, but do hope this will be eventually fixed.

     

    Edit: slower/faster corrected, it's the other way around

    Link to comment
    Share on other sites



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.