• [v6.8.x] Parity check slower than before in certain cases


    JorgeB
    • Annoyance

    There is a known issue with v6.8 where parity syncs can be slower than before, mostly but not limited to faster and/or very wide arrays.

     

    Although this isn't an urgent issue by any means this thread is meant to make it easier for LT to track the problem, also to have a better idea of how many users are affected and if there is a typical server configuration where the slowdown starts to be more noticeable.

     

    This issue results from changes made to the md driver and tunables to fix the "writes starve reads" problem with v6.7 and based on my testing there's a general slowdown fo the max possible speed for any array size, but since most servers will be disk/controller limited those shouldn't noticed any slowdown, users with very fast and/or wide arrays can notice it.

     

    This is from my test server using only SSDs:

    pc12.thumb.PNG.91ba055287d3a900aa186f6d82bf68b9.PNG

    You can see parity check speed with v6.8 is always slower independent of array size, in fact v6.7.2 is faster with dual parity than v6.8.0 with single parity, there's one curious exception, a read check (no parity) with 8 or 12 devices where v6.8.0 is noticeably faster, all results are consistent on repeat testing (+/- 5MB/s).

     

    Since most servers are disk/controller limited I ran another test this time always using one slower SSD with a max read speed of 220MB/s (with single parity):

     

    pc9.thumb.PNG.1011d711521b71ee34d553939c8f653c.PNG

    So in a typical HDD array where disks are limited from 150 to 200MB/s the user would only start noticing the issue with more than 20 devices or so.

     

    I'm limited by the hardware I can test on but to try and rule it out as much as possible, mostly the disk controllers, I also did some tests using 2 old SAS2LP controllers (also single parity):

    pc8.thumb.PNG.21c3e1f5b125f875f65f5105f0b5b25d.PNG

    I can only connect up to 16 devices on both controllers (parity is on an Intel onboard SATA port), but the trend seems clear, though interestingly it's faster than before with 8 devices, this can be because the tunables used with v6.7.2 are more tuned for the LSI or the new code just works better with the SAS2LP and few devices.

     

    Now for some real world examples, I have two 30 disks servers, on the first one parity check started at 200MB/s (disk limited), now it starts at 150MB/s, on the second server, which uses older hardware and slower disks parity check started at 140MB/s (also disk limited) and slows down just a little with v6.8 to 130MB/s, speed remains unchanged for all my other servers, mostly between 10 to 14 disks.

     

    Anyone having the same issue please post here together with a general server description, mainly CPU, board, controller and disk types used, or post the diagnostics which contains all the info we need to see the hardware used, note that if there is a slowdown it should be more evident at the start of the parity check, which should be the max speed, average speed will likely not suffer as much, so try to compare parity check start speed after a minute or so so it can stabilize, make sure nothing else is using the array, or do a quick test in maintenance mode so you can be certain nothing else is.

     

    Also note, that anyone using default or not optimized tunables with 6.7 might even notice a performance improvement due to the new code which attempts to auto configure the tunables for best performance.

    • Like 5


    User Feedback

    Recommended Comments



    I was directed to this thread from one I started recently in General Support here. Lots of my config info, etc, are posted there. In brief, I have an Intel J3160 cpu with 16 GB ram with array of 6x 6TB 7200 rpm data drives with two parity (2x 6TB) - all the same model HGST Deskstars. I also have 2x Samsung 850 EVO 1 TB for cache pool. I have not changed any hardware since the beginning.

     

    I am also experiencing VERY DRASTIC SLOWDOWN. I can't tell when I upgraded to each newer version of UnraidOS but I am guessing from my Parity History that it parallels what other users here have seen. I started with 160 MB/s checking speed dropping to about 85 MB/s and now on 6.9.0-b30, I am seeing a paltry 23.5 MB/s. After testing with diskspeed, all disks and controllers report throughput 100-200 MB/s. I DO have a wimpy CPU also which, after reading all posts, sounds like it is the particular piece of hardware that is not taking it well. There are some outlier 'high throughput speed' entries on my parity check log, but I am attributing that to the way the log seems to calcuate since I pause my check during the day (too hot) and restart again at night.

    parity-history.txt

    Link to comment
    Share on other sites

    Same problem here.

     

    Very slow parity check speed with 40 MB/s (7*8 TB drives + 3*18TB drives).

     

    I've tested my 18 TB drives -> All are doing at least 200 MB/s.

    Also I've tested my 8 TB drives -> All are doing at least 100 MB/s.

     

    I think the newer unraid versions (6.8+) are having a problem with very large drives.

     

    Please fix this behaviour or at least please offer us more options to tinker with like in older unraid versions.

    Link to comment
    Share on other sites
    16 hours ago, amigenius said:

    I think the newer unraid versions (6.8+) are having a problem with very large drives.

    This issue is about the number of drives, not their size, you might have other issue or some controller bottleneck, diags grabbed during a parity check might help.

    Link to comment
    Share on other sites

    Fixed my problem.

     

    Reason was, that somehow the CPU was in power saving mode.

     

    With Performance / Schedutil mode i get ~80 MB/s.

     

    But maybe the par check can be more optimized using more cpu cores instead of practically only one...

     

    Link to comment
    Share on other sites



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.