• [v6.8.x] Parity check slower than before in certain cases


    JorgeB
    • Annoyance

    There is a known issue with v6.8 where parity syncs can be slower than before, mostly but not limited to faster and/or very wide arrays.

     

    Although this isn't an urgent issue by any means this thread is meant to make it easier for LT to track the problem, also to have a better idea of how many users are affected and if there is a typical server configuration where the slowdown starts to be more noticeable.

     

    This issue results from changes made to the md driver and tunables to fix the "writes starve reads" problem with v6.7 and based on my testing there's a general slowdown fo the max possible speed for any array size, but since most servers will be disk/controller limited those shouldn't noticed any slowdown, users with very fast and/or wide arrays can notice it.

     

    This is from my test server using only SSDs:

    pc12.thumb.PNG.91ba055287d3a900aa186f6d82bf68b9.PNG

    You can see parity check speed with v6.8 is always slower independent of array size, in fact v6.7.2 is faster with dual parity than v6.8.0 with single parity, there's one curious exception, a read check (no parity) with 8 or 12 devices where v6.8.0 is noticeably faster, all results are consistent on repeat testing (+/- 5MB/s).

     

    Since most servers are disk/controller limited I ran another test this time always using one slower SSD with a max read speed of 220MB/s (with single parity):

     

    pc9.thumb.PNG.1011d711521b71ee34d553939c8f653c.PNG

    So in a typical HDD array where disks are limited from 150 to 200MB/s the user would only start noticing the issue with more than 20 devices or so.

     

    I'm limited by the hardware I can test on but to try and rule it out as much as possible, mostly the disk controllers, I also did some tests using 2 old SAS2LP controllers (also single parity):

    pc8.thumb.PNG.21c3e1f5b125f875f65f5105f0b5b25d.PNG

    I can only connect up to 16 devices on both controllers (parity is on an Intel onboard SATA port), but the trend seems clear, though interestingly it's faster than before with 8 devices, this can be because the tunables used with v6.7.2 are more tuned for the LSI or the new code just works better with the SAS2LP and few devices.

     

    Now for some real world examples, I have two 30 disks servers, on the first one parity check started at 200MB/s (disk limited), now it starts at 150MB/s, on the second server, which uses older hardware and slower disks parity check started at 140MB/s (also disk limited) and slows down just a little with v6.8 to 130MB/s, speed remains unchanged for all my other servers, mostly between 10 to 14 disks.

     

    Anyone having the same issue please post here together with a general server description, mainly CPU, board, controller and disk types used, or post the diagnostics which contains all the info we need to see the hardware used, note that if there is a slowdown it should be more evident at the start of the parity check, which should be the max speed, average speed will likely not suffer as much, so try to compare parity check start speed after a minute or so so it can stabilize, make sure nothing else is using the array, or do a quick test in maintenance mode so you can be certain nothing else is.

     

    Also note, that anyone using default or not optimized tunables with 6.7 might even notice a performance improvement due to the new code which attempts to auto configure the tunables for best performance.

    • Like 5



    User Feedback

    Recommended Comments



    I was directed to this thread from one I started recently in General Support here. Lots of my config info, etc, are posted there. In brief, I have an Intel J3160 cpu with 16 GB ram with array of 6x 6TB 7200 rpm data drives with two parity (2x 6TB) - all the same model HGST Deskstars. I also have 2x Samsung 850 EVO 1 TB for cache pool. I have not changed any hardware since the beginning.

     

    I am also experiencing VERY DRASTIC SLOWDOWN. I can't tell when I upgraded to each newer version of UnraidOS but I am guessing from my Parity History that it parallels what other users here have seen. I started with 160 MB/s checking speed dropping to about 85 MB/s and now on 6.9.0-b30, I am seeing a paltry 23.5 MB/s. After testing with diskspeed, all disks and controllers report throughput 100-200 MB/s. I DO have a wimpy CPU also which, after reading all posts, sounds like it is the particular piece of hardware that is not taking it well. There are some outlier 'high throughput speed' entries on my parity check log, but I am attributing that to the way the log seems to calcuate since I pause my check during the day (too hot) and restart again at night.

    parity-history.txt

    Link to comment

    Same problem here.

     

    Very slow parity check speed with 40 MB/s (7*8 TB drives + 3*18TB drives).

     

    I've tested my 18 TB drives -> All are doing at least 200 MB/s.

    Also I've tested my 8 TB drives -> All are doing at least 100 MB/s.

     

    I think the newer unraid versions (6.8+) are having a problem with very large drives.

     

    Please fix this behaviour or at least please offer us more options to tinker with like in older unraid versions.

    Link to comment
    16 hours ago, amigenius said:

    I think the newer unraid versions (6.8+) are having a problem with very large drives.

    This issue is about the number of drives, not their size, you might have other issue or some controller bottleneck, diags grabbed during a parity check might help.

    Link to comment

    Fixed my problem.

     

    Reason was, that somehow the CPU was in power saving mode.

     

    With Performance / Schedutil mode i get ~80 MB/s.

     

    But maybe the par check can be more optimized using more cpu cores instead of practically only one...

     

    Link to comment

    Any updates on this I am also experience slow parity speeds. 

    I have dual parity 16Tb drives and an array of 24 data drives.

    I am running 6.8.3.

    Any workarounds for this issue?

    Link to comment
    1 hour ago, swamiforlife said:

    running 6.8.3

    Why?

     

    1 hour ago, swamiforlife said:

    slow parity speeds

    How slow?

     

    1 hour ago, swamiforlife said:

    24 data drives

    Controller bottleneck? Port multiplier?

     

    Post diagnostics.

    Link to comment

    I didnt need any of the new features from the newer versions so never bothered to upgrade. 

    But if the problem is solved then i dont mind upgrading.

     

    I used to have speeds around 116MBps.

    Now i have changed to some faster parity drives and am getting around 85Mbps

    I was getting faster speeds before the parity drive upgrades with the same amount of drives in the array.

     

    Diagnostics are attached. 

     

    tower-diagnostics-20211027-2109.zip

    Link to comment
    5 minutes ago, swamiforlife said:

    Now i have changed to some faster parity drives and am getting around 85Mbps

    Assuming you mean 85MB/s that's about right for 24 drives on a SAS2 expander with single link, do you have one or two cables connected to the HBA? Or just post the output of:

     

    cat /sys/class/sas_host/host7/device/port-7\:0/sas_port/port-7\:0/num_phys

     

    With dual link you should be able to get around 110MB/s, bottleneck then would be the PCIe 2.0 slot.

    Link to comment
    8 minutes ago, swamiforlife said:

    4

    This means single link, so the speed you're getting is normal, a single SAS2 link theoretical max is 2400MB/s, of those around 2200MB/s are usable, so dividing that by 26 drives (since parity devices also count) you'll get 84.62MB/s max speed, like mentioned you can increase that by connecting a seconds cable from the HBA to the expander, assuming it supports dual link, and it should since it's an LSI.

    Link to comment

    Is anything further being looked into this to address the issue with large arrays? I am myself experiencing painful parity rebuild speeds while upgrading disks in my array from 8TB to 14TB. Each disk I replace costs me 72 hours. It's going to take me a good month or two in order to get through these.

     

    I have tried everything I can think of and no issues found in my sys log. Adjusted all tunables to defaults and every which way with no difference. Disabling the docker engine or VMs does not matter (they are all on cache pool anyways). 

     

    Currently I am on the 6.10.0-RC2 and hoping maybe something else is up the sleeve to alleviate this in 6.10 final.

    Link to comment

    First make sure that's your actual problem, 72H seems like a lot, could be some controller bottleneck or other config issue, difficult to say without any more info.

    Link to comment

    This is a LSI SAS 9305-24i running @ PCIe3.0 x8 with 24 sata3 drives connected ranging from 8TB and up. Should have more than enough bandwidth running at that link speed. Definitely was faster previously. 

     

    Rest of system specs are Supermicro X9DRi-LN4+ with dual CPU E5-2650 v2 cpu and 384GB RAM.

    Link to comment
    27 minutes ago, sirkuz said:

    Should have more than enough bandwidth running at that link speed.

    It should, and with 24 devices and dual parity I could still do 140MB/s on my test server, that's about 12TB per 24H, what speed do you get at the start of the check?

    Link to comment
    4 minutes ago, JorgeB said:

    It should, and with 24 devices and dual parity I could still do 140MB/s on my test server, that's about 12TB per 24H, what speed do you get at the start of the check?

    If I recall correctly I'll start out around 60MB/s or slightly above before settling in around 50-55MB/s. 

     

     

    msedge_N3gtWKeqR7.png

    Link to comment

    With it being so slow I am currently living dangerously and doing a two drive upgrade since I have dual parity.

    Currently at ~55MB/s each around the 46% mark.

    Link to comment

    That still seems too slow to be this issue, I would recommend downgrading to v6.7 and do a quick test, you just need to run a check for a couple of minutes to check the starting speed and confirm.

    Link to comment
    4 minutes ago, JorgeB said:

    That still seems too slow to be this issue, I would recommend downgrading to v6.7 and do a quick test, you just need to run a check for a couple of minutes to check the starting speed and confirm.

    Not sure I can downgrade without breaking a bunch of stuff. Specifically running my containers with ipvlan rather than macvlan. Might be better off rolling a new trial USB and testing that way if I can prevent ruining my current configuration :)

    Link to comment
    15 minutes ago, sirkuz said:

    Not sure I can downgrade without breaking a bunch of stuff.

    You could stop docker/VMs services, then just couple a 2min parity check and upgrade back up.

    • Like 1
    Link to comment
    1 minute ago, JorgeB said:

    You could stop docker/VMs services, then just couple a 2min parity check and upgrade back up.

    I'll give it a go after current sync is done in a couple days to see what the speeds look like. Appreciate the help!

    Link to comment
    29 minutes ago, JorgeB said:

    You could stop docker/VMs services, then just couple a 2min parity check and upgrade back up.

    Sorry, but I am coming up empty trying to find a link to download 6.7.2. Do you happen to have a link to older versions like this? I have a secondary server with the same controller I might try it on first to see if it makes a difference. That system has slower drives already but similar hardware and numbers of drives.

     

    Thanks in advance!

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.