• [6.8.0] Massive write amplification on Raid 1 BTRFS SSD Cache pool with sparse files.


    kurai
    • Minor

    [Note:

    I originally added this as as reply to [6.7.2] DOCKER IMAGE HUGE AMOUNT OF UNNECESSARY WRITES ON CACHE 

    Didn't notice it was raised against 6.7.2, not 6.8.0.  Sorry]

     

    For what it's worth:-

     

    Is this perhaps an issue with sparse files ? 

     

    I'm having a somewhat similar problem when writing a torrent of an ISO to a share on a mirrored cache pool.

    (2 x Crucial MX500 SSDs, BTRFS, Raid 1)

     

    6.7.2 was fine, on 6.8.0 there's huge write amplification whenever the torrent client tries to write out a chunk.

     

    e.g. Torrent client creates a 3.5GB sparse file, then starts downloading chunks to it's internal RAM cache.  When a chunk (4MB) part is completely received it writes it to disk into the pre-allocated ISO sparse file.

    However - instead of the expected 4MB disk write it seems to be rewriting the *entire* 3.5GB file every time it sends a new chunk to the disk file.

    This leads to the SSDs writing continuously for hours, for what *should* take less than a second (and also getting very hot).

     

    Other types of disk write activity (copying, moving, file creation etc) behave normally, at expected speeds and levels of SSD activity.  i.e. Copying a regular 3.5GB file to the share takes < 10 seconds.

    Also -  I reverted from 6.8.0 to 6.7.2 and the problem disappeared, so it's not related to a bad Unraid cache pool config, or anything in the, unchanged, torrent client config.

     

    I found this bug report before I raised one of my own and I wonder if it was the same root cause as (I believe) the Docker IMG files are also created as sparse files.

     

    If I'm way off base, and my issue is unrelated issue please let me know and I'll raise mine as a separate bug report.

    --

    kurai




    User Feedback

    Recommended Comments

    This may be related to the issue I'm having on 6.8 where any torrenting causes huge lag-spikes making the system essentially unusable (and pinning the CPU).  Running qbittorrent, not sure if it's unique to that client, but certainly wasn't the case in 6.7.2.  It's making the entire 6.8 chain unusable for me.  Normal SMB file writes are fine.  Since my torrent client didn't change, it must've been something on the unRAID side.

     

    If it's related to the mitigation of the database corruption, I'd prefer an option to "un-mitigate it" for those of us affected, as I had no database problems at all.

    Edited by dgriff
    Link to comment

    Going through my SMB parameters, the only lead I have right now (and it's getting too late to test tonight, maybe tomorrow) is that 6.8 mandates SMB2_02 as the minimum protocol, where 6.7.2 has CORE as the minimum protocol.  Wondering if overriding that back to CORE would perhaps do something productive?

     

    There are also minor changes in prefork settings and server min protocol (NT1) instead of LANMAN1.  SMB config attached here.

    smb672.txt smb680.txt

    Link to comment

    Hoping we'll see some progress on this one, as it will murder SSD cache drives in short order if it's re-writing a multi-gig file on every sparse file update while torrenting. Didn't have any luck with SMB parameters.

    Link to comment
    13 minutes ago, dgriff said:

    Hoping we'll see some progress on this one, as it will murder SSD cache drives in short order if it's re-writing a multi-gig file on every sparse file update while torrenting. Didn't have any luck with SMB parameters.

    diagnostics.zip please

     

    Is your torrent client running on another host on your network or in a docker container in your server?

    Link to comment

    Also wondering if you can test it with a xfs drive to see if it's btrfs specific?

    What you described sounds a bit like COW (Copy-on-Write) gone wild.

    Link to comment
    22 hours ago, limetech said:

    diagnostics.zip please

     

    Is your torrent client running on another host on your network or in a docker container in your server?

    Sure thing, I'll reinstall 6.8 this weekend and generate a new diagnostics file (or should I go with the newly released 6.8.1 RC1?)

     

    The torrent client is running on my Windows 10 PC and writing to the share (with cache enabled).  If it helps narrow it down at all, and maybe with regards to testdasi's post, I could also try disabling caching on that share so it writes directly to the XFS share and bypasses the btrfs cache drive.

    Link to comment

    Update:

    TL;DR - Turning off "Disable Windows caching of disk writes" in torrent client(s) seems to resolve this particular issue.

     

    I had some time this weekend so I decided to try again with Unraid 6.8.2.

    Initially I had the same issue when using the existing, working, settings from 6.7.2.

     

    Went through lots of various combinations of option changes in Unraid & the hosted Windows 10 VM & a few torrent applications, and finally discovered an option that works ... disabling the torrent client's internal option to bypass OS level write caching.

    This is a performance setting to exempt it's disk writes from the standard OS handling, and look after caching/coalescing large numbers of small writes itself - designed to prevent "double caching" & excessive memory usage and/or swapfile thrashing in some situations - it was useful back in the day if you had a low amount of spare RAM over and above Windows "baseline" usage. e.g. running Windows 7 with 1GB.

    These days, with widespread usage of much larger/cheaper RAM configurations it's rarely a relevant option.

    (I have 32GB in my Unraid server, of which 6GB is available to the VM, so not really an issue for me now)

    Leaving all other Unraid/VM/Windows settings as per the working 6.7.2 config and only changing this option (tried it in 3 different torrent clients) has stopped the massive SSD cache write overhead problem.

     

    I still don't have any real idea what particular element of the Unraid 6.7.2 -> 6.8.x update was the root cause of the altered behaviour but this setting change, if not ideal, at least stops my SSDs being murdered.

     

     

    Link to comment
    On 2/16/2020 at 10:58 PM, kurai said:

    Update:

    TL;DR - Turning off "Disable Windows caching of disk writes" in torrent client(s) seems to resolve this particular issue.

    Can I ask which Torrent client you're using? This particular option doesn't appear in qbittorrent.  I've tried playing with the cache options in qb but no luck yet.

    Link to comment

    My preferred option for my current setup is uTorrent (the ancient 2.2.1 build, before it got annoying with all the ads and worthless extra `features`) - I just need lightweight/fast/reliable without all the bells and whistles.

     

    In qBittorent (4.2.1) the equivalent setting is in Options->Advanced->libtorrent_section: "Enable OS cache"

    (note that the wording is reversed in qB, so setting the checkbox turns the OS caching ON, wheras in uT it turns the OS caching OFF see https://www.libtorrent.org/reference-Settings.html#disk_io_write_mode for detail) 

     

    Also note that some of the caching configuration options (on Windows hosts, at least) are set at application start and require an application restart to take affect.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.