• [v6.8.x] Parity check slower than before in certain cases


    JorgeB
    • Annoyance

    There is a known issue with v6.8 where parity syncs can be slower than before, mostly but not limited to faster and/or very wide arrays.

     

    Although this isn't an urgent issue by any means this thread is meant to make it easier for LT to track the problem, also to have a better idea of how many users are affected and if there is a typical server configuration where the slowdown starts to be more noticeable.

     

    This issue results from changes made to the md driver and tunables to fix the "writes starve reads" problem with v6.7 and based on my testing there's a general slowdown fo the max possible speed for any array size, but since most servers will be disk/controller limited those shouldn't noticed any slowdown, users with very fast and/or wide arrays can notice it.

     

    This is from my test server using only SSDs:

    pc12.thumb.PNG.91ba055287d3a900aa186f6d82bf68b9.PNG

    You can see parity check speed with v6.8 is always slower independent of array size, in fact v6.7.2 is faster with dual parity than v6.8.0 with single parity, there's one curious exception, a read check (no parity) with 8 or 12 devices where v6.8.0 is noticeably faster, all results are consistent on repeat testing (+/- 5MB/s).

     

    Since most servers are disk/controller limited I ran another test this time always using one slower SSD with a max read speed of 220MB/s (with single parity):

     

    pc9.thumb.PNG.1011d711521b71ee34d553939c8f653c.PNG

    So in a typical HDD array where disks are limited from 150 to 200MB/s the user would only start noticing the issue with more than 20 devices or so.

     

    I'm limited by the hardware I can test on but to try and rule it out as much as possible, mostly the disk controllers, I also did some tests using 2 old SAS2LP controllers (also single parity):

    pc8.thumb.PNG.21c3e1f5b125f875f65f5105f0b5b25d.PNG

    I can only connect up to 16 devices on both controllers (parity is on an Intel onboard SATA port), but the trend seems clear, though interestingly it's faster than before with 8 devices, this can be because the tunables used with v6.7.2 are more tuned for the LSI or the new code just works better with the SAS2LP and few devices.

     

    Now for some real world examples, I have two 30 disks servers, on the first one parity check started at 200MB/s (disk limited), now it starts at 150MB/s, on the second server, which uses older hardware and slower disks parity check started at 140MB/s (also disk limited) and slows down just a little with v6.8 to 130MB/s, speed remains unchanged for all my other servers, mostly between 10 to 14 disks.

     

    Anyone having the same issue please post here together with a general server description, mainly CPU, board, controller and disk types used, or post the diagnostics which contains all the info we need to see the hardware used, note that if there is a slowdown it should be more evident at the start of the parity check, which should be the max speed, average speed will likely not suffer as much, so try to compare parity check start speed after a minute or so so it can stabilize, make sure nothing else is using the array, or do a quick test in maintenance mode so you can be certain nothing else is.

     

    Also note, that anyone using default or not optimized tunables with 6.7 might even notice a performance improvement due to the new code which attempts to auto configure the tunables for best performance.

    • Like 5



    User Feedback

    Recommended Comments



    20 minutes ago, JorgeB said:

    You can download v6.7.2 from here.

    Thanks, got that system downgraded. Very similar hardware other than the drives themselves. Same gen cpu ram and same exact controller model. Drives are slower.

     

    Downgrading from 6.10.0-rc2 to 6.7.2 my parity check read speed almost double. Seeing as high as 110MB/s on that array compared to the previous 50-60 range.

     

    unraid_6.7.2_paritychkt3.thumb.png.121db7e4b5839dc29320766dbcdc5c0f.pngunraid_6_10.0-rc2_paritychkt3.thumb.png.0135a0f13ab3811aaea32b5161530af6.pngunraid_6_10.0-rc2_historyt3.png.a79bdf3412a3a7cbb8317bba02fde91e.png

    Link to comment

    Yep, that's a big difference, more than I'm used to seeing, but different hardware can provide very different results, looks like yours is considerably affected by the changes.

    Link to comment

    And I might be wrong but doubt LT considers this issue a high priority, so you might need to live with it for some time.

    Link to comment

    It is unfortunate. At this point it might be worth my time to actually downgrade my fast server that I am upgrading just so I can complete the drive replacements in a more timely manner. 3 days for each is killer when doing 10 drives.

    I'll send in some feedback directly from my systems for them as well.

    Link to comment

    FYI - if memory serves me right the 6.7.x series had SQLite corruption, and will corrupt docker sqlite DB's. Either go back to 6.8.3 or to 6.6.7.

     

    Edited by klepel
    fixed typo
    Link to comment

    Did some more testing today on my faster drive machine. I doubled my parity speed by enabling safe mode and gained another 50% speed increase when booting 6.7.2

     

    6.10.0-rc2 safe mode ~100MB/s
    6.7.2 safe mode ~150MB/s sometimes seeing as high as 170-180MB/s
    Restoring 6.10.0-rc2 NON safe mode, containers/vms disabled 45-50MB/s
    reboot 6.10.0-rc2 safe mode ~100MB/s again

    Link to comment
    22 minutes ago, JorgeB said:

    That suggests a plugin issue, safe mode by itself shouldn't make any different for this.

    Right, trying to track down which might be the culprit next. Right now trying to get my drive updates done as quickly as possible then will have some time to start removing plugins and testing each time.

    Link to comment

    I am doing some further testing while doing a parity rebuild in safe mode... I have also discovered that simply enabling the docker service results in a performance hit, even with all containers stopped.  In my case I estimate it to be around 30MB/s speed. As soon as I stop the docker service I get those 30MB/s back.

     

    This could take a lot of time to try and track down if I have both plugins and docker service causing issues with parity speeds. 🤪

    Link to comment
    2 hours ago, sirkuz said:

    I am doing some further testing while doing a parity rebuild in safe mode... I have also discovered that simply enabling the docker service results in a performance hit, even with all containers stopped.  In my case I estimate it to be around 30MB/s speed. As soon as I stop the docker service I get those 30MB/s back.

     

    This could take a lot of time to try and track down if I have both plugins and docker service causing issues with parity speeds. 🤪

    Are you sure you don't have appdata or or the docker image on the Array ?

    Link to comment
    3 hours ago, ChatNoir said:

    Are you sure you don't have appdata or or the docker image on the Array ?

    Yes, both exist on cache pool which is a pair mirrored 1TB nvme drives.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.