• ZFS write speed issue (array disk, no parity) on 6.12.0-rc6


    fritzdis
    • Minor

    I'm seeing a write speed issue on a ZFS array disk (no parity) when the source disk is able to provide data faster than the destination disk can write it.  In a controlled test, write speed was about 80 MB/s, whereas by limiting the source speed, write speed was over 170 MB/s.

     

    I would expect this to be reproducible on at least some systems.  My system has dual Xeon E5-2450s (not V2).  Their age and fairly weak single-core performance may have something to do with the issue, but given how drastic the difference is, I don't think that CPU limitations fully explains it.

     

    --

     

    Here are the details:

     

    I added a new drive (12TB Red Plus) to my unprotected array.  It had no previous partition, and I let unRAID format it as ZFS (no compression).  I created a share called "Test" on only that disk with no secondary storage.

     

    On my existing SSD cache pool, I have a "Download" share (shows as exclusive access).  In that share, I copied several very large media files into a "_test" folder (83 GB total).

     

    At the command line, I navigated to the Download test folder and ran the following:

    rsync -h --progress --stats -r -tgo -p -l * /mnt/user/Test/

    This took 17m57s, for speed of 79 MB/s (calculated, not just taking rsync's word for it).  As you can see from the first image, write activity was very sporadic.  During the dips, one or two CPU threads were pegged at 100% (usually two).

     

    Next, I removed the files from the Test share and reran the command, this time adding the --bwlimit option:

    rsync -h --progress --stats -r -tgo -p -l --bwlimit=180000 * /mnt/user/Test/

    I may have been able to go higher, but I wanted to make sure to use a speed under the max capabilities of the destination disk.  It completed in just 8m12s, for a speed of 173 MB/s.  The second image shows a drastically different profile for the write activity.

     

    Finally, just in case the order of operations had an impact, I removed the files again and reran the original command without the limit.  The speed & activity profile was the same as before.

     

    --

     

    I'm thinking of reformatting the test disk to BTRFS to see how it behaves (I still want to use ZFS long-term).  I also don't know what will happen once I have dual-parity.  Although using reconstruct write, I would expect to still be limited by this issue.

     

    I'm actually not that concerned about array write speed once I have the data populated, but the performance loss is so substantial that I thought I should report it in case anything can be done.  Let me know if there's any additional testing I could do that would help.

     

    Test1.PNG

    Test2.PNG

    sf-unraid-diagnostics-20230526-2129.zip




    User Feedback

    Recommended Comments

    Quote

    I'm seeing a write speed issue on a ZFS array disk (no parity) when the source disk is able to provide data faster than the destination disk can write it.

     

    This is a known issue when transferring data from a faster source, with the array only, not with pools, I think it's some interaction between the unRAID driver and zfs which possibly needs to be optimized, I've found it some time ago and LT is already aware, but it still needs some more investigating, it's especially noticeable with no parity, but some performance penalty is always there, it also varies with the disk brand/model used, a few tests I did with two different servers, first one using older disks, second one using more recent models:

     

    image.png

     

    image.png

     

    And this like you mentioned shows the problem is only when the source is faster than the disk being written:

     

    image.png

     

     

    • Like 1
    Link to comment
    17 minutes ago, JorgeB said:

    This is a known issue, I've found it some time ago and LT is already aware, but it still needs some more investigating, it's especially noticeable with no parity, but some performance penalty is always there,

    Good to know, thanks!

     

    For the big initial data transfer, it seems I can just use the bwlimit option for rsync to get reasonably good performance.  After that, I'll have parity and cache.  As I said, I won't be particularly concerned about array write speeds at that point.

     

    Is there any way to set a speed limit for the mover, since that may actually improve its performance?

    Link to comment
    7 minutes ago, fritzdis said:

    Is there any way to set a speed limit for the mover

    Not that I'm aware, there's a mover plugin, but also not sure if it includes that functionality.

    Link to comment
    12 minutes ago, JorgeB said:

    Not that I'm aware, there's a mover plugin, but also not sure if it includes that functionality.

    Yeah, I don't imagine limiting mover speeds was ever a very common request.

    • Haha 1
    Link to comment

    Small follow up for reference:

     

    Even using the bwlimit option for rsync can be tricky.  It's unclear, but it appears to me that even for a fresh disk, the writes may not always just work their way inward from the outer, faster tracks.  So it can be hard to predict what speed the writes will be able to handle before the choppiness appears.

     

    The only surefire approach seems to be using a limit below the minimum sustained write speed of the disk, which sacrifices potential performance.  If you periodically monitor it, you can somewhat dial it in better, but in that case, speeds could still drop off a cliff at any moment it seems.

    Link to comment
    10 hours ago, fritzdis said:

    but it appears to me that even for a fresh disk, the writes may not always just work their way inward from the outer, faster tracks.

    Yes, also noticed this with zfs, writes can land anywhere on the disk, not written sequentially like usual with other filesystems.

    Link to comment

    I have tried the BW throttle, it seemed to work at first.

    But this was only when the target drive was totally empty.

    After some hours of copying speed dropped again, regardless of BW. I've killed it, lowered BW again (maybe I was too high at the beginning?). After a restart it went up again, but only to drop down after an hour or so.

    Now the disk is about 50% full and speed is low to 50MB/s even with 100000 BW.

    (and the source is filled up to 97%, I expect to see even lower rates soon)

     

    (BTW: I'm not doing some artificial test, I'm really copying over 17,7TB of Data from an XFS drive to an ZFS drive)

     

    Looks like a serious fragmentation problem to me...

    Edited by MAM59
    • Like 1
    Link to comment
    4 hours ago, MAM59 said:

    I have tried the BW throttle, it seemed to work at first.

    But this was only when the target drive was totally empty.

    After some hours of copying speed dropped again, regardless of BW. I've killed it, lowered BW again (maybe I was too high at the beginning?). After a restart it went up again, but only to drop down after an hour or so.

    Now the disk is about 50% full and speed is low to 50MB/s even with 100000 BW.

    (and the source is filled up to 97%, I expect to see even lower rates soon)

     

    (BTW: I'm not doing some artificial test, I'm really copying over 17,7TB of Data from an XFS drive to an ZFS drive)

     

    Looks like a serious fragmentation problem to me...

    I believe this is because of what JorgeB said - ZFS writes don't seem the fill the drive in the way you would expect.  This doesn't necessarily mean individual files are significantly fragmented, but it definitely makes the bwlimit approach tricky.

     

    I'll be if you dropped all the way down to like 75000, you'd get close to that speed, since it's likely slower than the HDD's worst-case sequential speed.  But of course, that could be leaving a lot of performance on the table if there are still big stripes of empty space on the outer tracks (I don't know if there's a way to "view" that for ZFS).

    Link to comment
    14 minutes ago, fritzdis said:

    'll be if you dropped all the way down to like 75000,

    Why would somebody castrate himself freely ?

    160000 or more are ok-ish, but 75? thats much too slow. Would take over a week for a copy to finish. I'm retired and patient, but NOT THAT PATIENT :-)))) 😜

     

    BTW: speed has recovered a bit, now back in the 100MB/s range. Maybe the "zone usage" idea contains some truth. Although, even in the slowest zone the disk should handle more than 60MB/s without problems.

    Edited by MAM59
    Link to comment
    4 hours ago, MAM59 said:

    Why would somebody castrate himself freely ?

    160000 or more are ok-ish, but 75? thats much too slow. Would take over a week for a copy to finish. I'm retired and patient, but NOT THAT PATIENT :-)))) 😜

     

    BTW: speed has recovered a bit, now back in the 100MB/s range. Maybe the "zone usage" idea contains some truth. Although, even in the slowest zone the disk should handle more than 60MB/s without problems.

    Yeah, the issue is the extra drop-off once write speed is below read speed (either from the source or limited by bwlimit).  The only way to be SURE to avoid that is to go ridiculously low.

     

    There may be a sweet spot value for every individual drive where you spend a small enough amount of time in the "degraded speed" zone that the performance loss there is offset by faster overall speeds.  Whether that's the case would depend on just how degraded the speed is.

    Link to comment

    I am seeing this as well.  I was planning on converting my current XFS disks to ZFS to take advantage of some of the ZFS features but I think I will wait until this gets sorted. I don't want to move around 70TB of data between my disks using unBalance with write speeds of ~55MB/s.

    Link to comment
    6 minutes ago, B_Sinn3d said:

    I am seeing this as well.  I was planning on converting my current XFS disks to ZFS to take advantage of some of the ZFS features but I think I will wait until this gets sorted. I don't want to move around 70TB of data between my disks using unBalance with write speeds of ~55MB/s.

    I have a feeling that ZFS may have inherent performance issues when in the main array because of the way Unraid parity is handled.   I would love to be proved wrong but I think users should be aware there may be no easy solution.

    Link to comment
    30 minutes ago, itimpi said:

    ZFS may have inherent performance issues when in the main array because of the way Unraid parity is handled.   I would love to be proved wrong but I think users should be aware there may be no easy solution.

    Not fully correct. The speed degregation also happens, if there is NO PARITY Drive at all!

    (I've tested all combinations, ZFS was always bad :-( )

    Link to comment
    1 minute ago, MAM59 said:

    Not fully correct. The speed degregation also happens, if there is NO PARITY Drive at all!

    (I've tested all combinations, ZFS was always bad :-( )

    If that is the case maybe it is an inherent problem when ZFS is used on a single drive file system?    I cannot see why Limetech would have done anything special about ZFS being used in the array?   Be nice to be proved wrong though.

    Link to comment
    56 minutes ago, itimpi said:

    maybe it is an inherent problem when ZFS is used on a single drive file system

    it works well in other OS's. The slowdown in UNRAID also happens on certain hardware (still no clue which one, I have AMD only).

     

    Link to comment
    3 hours ago, itimpi said:

    If that is the case maybe it is an inherent problem when ZFS is used on a single drive file system?

    It works fine with a single device pool, I believe the Unraid driver (md driver) needs some tuning to perform better with zfs.

    Link to comment
    On 9/3/2023 at 9:58 PM, JorgeB said:

    It works fine with a single device pool, I believe the Unraid driver (md driver) needs some tuning to perform better with zfs.

     

    I think that I see the same problem on a single pool drive, I use a 500GB 7200rpm 2.5" disk for always on services as I have a cctv software that's rumored to chew up SSD drives.

     

    The symptoms are one or two pegged cores on the dashboard but nothing out of the ordinary is shown in htop. The transfer runs at full GB speed, 112-113MB/s. But stopps drops to 0 once in a while.

     

    The same transfer to a nvme drive use much less CPU.

     

    For some reason I can't get iotop or the stats plugin to show any transfer speeds.

     

    I have not done any zfs tuning yet. I have never had to do that on a modern computer with GB ethernet but I have seen erratic performance with ZFS when using too old hardware for testing.

    Link to comment
    8 hours ago, JorgenK said:

    I think that I see the same problem on a single pool drive

    If the drive is really in a pool, not the array,  it can't be the same problem, since the md driver is not used, maybe it's an SMR drive? If not suggest creating a new thread in the general support forum and post the diagnostics

    Link to comment
    On 9/19/2023 at 8:38 AM, JorgeB said:

    If the drive is really in a pool, not the array,  it can't be the same problem, since the md driver is not used, maybe it's an SMR drive? If not suggest creating a new thread in the general support forum and post the diagnostics

     

    I was led to believe that the md driver was used when not setting up exclusive access to for the share. But I guess that I was mistaken. 

     

    It's not SMR, but a slow 2.5" drive only just capable of Gb transfer rates. 

     

    I will do some more tests with and without zfs. I will probably let zfs go anyway for now as the drives with zfs keep spinning up. I use the B460 chipsets SATA controllers.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.