• [6.8.0] 3-4x slower SMB directory listing + other SMB performance regressions


    golli53
    • Minor

    2 issues I've noticed through testing:

     

    1. [detectable on all directory sizes] Each directory listing is ~3-4x slower on average on v6.8 vs v6.72 (2.5s vs 0.7s for a 6k file directory)
    2. [detectable for very large directory sizes] Trying to open a single NONEXISTENT file in a 250k file directory on 6.8 (should be microseconds per attempt), every 100 or so attempts, it hangs for ~5s

     

    Initially, thought it had to do with concurrent calls, but it doesn't.  Concurrent calls are simply additive in terms of execution time and concurrency itself doesn't seem to be the problem, but it makes it more apparent.

     

    This is the result of comparison testing of 6.8.0rc7 through stable vs 6.7.2 from a Windows 10 client on gigabit LAN.  See SMB config readouts and Python code for benchmarking below.

     

    [OLD POST PRE ADDITIONAL TESTING]

    Noticing this on 6.8.0rc7 (previously using 6.7.2). I'm using a Windows 10 client and have a bunch of Python scripts that list files on an unRAID samba share over gigabit LAN. This slows down to a crawl (several seconds just to refresh a ~100 file directory listing, whether using Python or Windows built-in file explorer). Was running the exact same scripts on 6.7.2 without this issue.

     

    Transfer speeds seem normal. When accessing from a different Windows 10 client (at the same time that the first client has many concurrent requests), the directory listings SEEM normal.

     

    Were there any settings changed for SMB that I might be able to try tweaking?




    User Feedback

    Recommended Comments

    Rolled back to 6.7.2 and this goes back to normal. Directory listing times with concurrent requests from same client are back to near instant vs 2-10s

     

    Tried 6.8.0rc8 and same issues as rc7

     

    Sticking with 6.7.2 for now, because the update practically makes my samba client operations unusuable. Unfortunately, have the concurrent write performance issues on the other hand

    Link to comment
    Quote

    Were there any settings changed for SMB that I might be able to try tweaking?

    No but Samba team changes defaults all the time from release-to-release (kinda maddening they do this).

    Please type this command running 6.7.2:

    testparm -sv > /boot/smb672.txt

    and then boot 6.8-rc8 and type:

    testparm -sv > /boot/smb680.txt

    You now have those two text files on your flash which you can post here.  Note: those files will contain your share names, if you don't want to post here then send to me via PM.

     

    Also would be helpful to describe to me how to reproduce this issue.

    • Thanks 1
    Link to comment
    5 hours ago, limetech said:

    No but Samba team changes defaults all the time from release-to-release (kinda maddening they do this).

    Please type this command running 6.7.2:

    
    testparm -sv > /boot/smb672.txt

    and then boot 6.8-rc8 and type:

    
    testparm -sv > /boot/smb680.txt

    You now have those two text files on your flash which you can post here.  Note: those files will contain your share names, if you don't want to post here then send to me via PM.

     

    Also would be helpful to describe to me how to reproduce this issue.

    Thanks- I will try as soon as I get a chance (running production environment).

     

    I'm essentially calling code like below on several (~5) network folders with 5k subdiectories each from a Win10 client. That slows SMB directory listing down to a halt for that client, including normal browing using Windows file explorer.

    def recursive_ls(path):
        files = os.listdir(path)
        for f in files:
            subpath = os.path.join(path, f)
            if os.path.isdir(subpath):
    	        recursive_ls(subpath)

    [edit] Each subdirectory only has a couple files in it, so the number of files may not be an issue, but rather just concurrent ls requests, so calling ls on a loop from multiple threads on 1 client may do the same thing (may be easier to setup using a shell script)

    Edited by golli53
    Link to comment

    smb672.txtsmb680.txt

    Attaching the testparm outputs. I also tried some debugging after rolling back again to the previous version. I see two differences in behavior

     

    - Each os.listdir call is ~3-4x slower on average on v6.8 vs v6.72 (2.5s vs 0.7s for a 6k file directory)

    - When calling os.stat on a single NONEXISTENT file in a 250k file directory on 6.8 (should be microseconds per call), every 100 or so calls, it hangs for ~5s

     

    Concurrent calls are simply additive in terms of execution time and concurrency itself doesn't seem to be the problem, but it makes it more apparent.

     

    Note that under the hood, Python is just using Windows native protocol for listing / accessing stats on these files, but using Python just makes it easier to debug many requests.

     

    My code for reproducing this below:

    from datetime import datetime as dt
    import os
    
    # observation 1
    while True:
        start = dt.now()
        os.listdir('//192.168.1.100/share/path')
        print((dt.now() - start).total_seconds())
    
    # observation 2
    while True:
        start = dt.now()
        try:
            os.stat('//192.168.1.100/share/bigpath/nonexistent.txt')
        except:
            pass
        print((dt.now() - start).total_seconds())

     

    Edited by golli53
    added imports to code
    Link to comment
    19 minutes ago, limetech said:

    Are you saying there are 250,000 files in a single directory?

    😀Yes it's a very big one for automatically archiving json files. There's no natural categorization for assigning subdirectories, so wouldn't improve the speed for my app.

    Link to comment

    Subdir by some designation, like by create or archive date, is required.

     

    That sort of directory even on native WinOS with local drives is going to be excruciatingly painful. The pain starts around the 30K range.

    Link to comment
    7 minutes ago, BRiT said:

    Subdir by some designation, like by create or archive date, is required.

     

    That sort of directory even on native WinOS with local drives is going to be excruciatingly painful. The pain starts around the 30K range.

    I never call a directory listing in that directory. I only open specific files by naming convention. So, adding subdirs would make things more inefficient because I would have to check for a file in each subdir. My current setup works very fast using a normal Samba server eg 6.7.2 or Ubuntu.

     

    The first issue is a problem for much smaller directories (a few thousand).

    • Thanks 1
    Link to comment
    19 minutes ago, BRiT said:

    But your sample code is getting directory listings.

    There are two issues and the sample code has a section for each (preceded by a comment header)

     

    Part 1 of the code is getting listings. That seems to be slower on 6.8.0 for all directories and is noticeable to a human on a single call without concurrency starting with a couple thousand files.

     

    Part 2 of the sample code is only calling stat. I can only reproduce this issue for very large directories, but maybe that's because it requires large directories to produce a measurable difference.

    • Thanks 1
    Link to comment

    I am not sure why we are even discussion yuor code when you clearly state that the same dir list calls  are much slower on the new release then on the old. If its a slow call due to large directory , fine, so be it, but then it should be just as slow in the new release.

    I see similar issue on my OsX cllients and on top of that read performace has dropped about 200+MB/s over my 10G with the only difference beeing an upgrade or downgrade so also moved back. There is definately something there that affects some but clearly not all people.

     

    When you do the exact same thing and the only difference is the release we should focus on what has changed that can cause the changed behavior and not questioning the code other then in its great capability to identify a weakspot that was not there before.

     

    Edited by glennv
    Link to comment

    I can confirm the problem. In my case, I have large Backup directories that are synced with a 3rd-Party Software between my computer and my Unraid system. With 6.7.2, recursive directory listing via SMB of thousands of subdirectories with tens of thousands of files took seconds to minutes, now it takes minutes to hours with 6.8.0 stable.

     

    Looks like it is already very slow when you make a right click -> settings on a dictionary when Windows starts to sum up the size of all files inside the dictionary. This is definitely much slower than before, you can nearly see it sum up file by file.

    It also looks like the data transfer is much more than before. I have constant 20-30 kb/s transfer while listing directories.

    When I Wireshark the SMB2 packets, they take quite some time inbetween Requests and Responses.

     

    BTW: I have the Cache dirs plugin active, but it does not seem to have any impact whether enabled or disabled.

    Edited by Addy90
    Link to comment

    Interesting, this is probably the same issue im facing, with a folder where i have 2777 media files, each file takes at least 3 times as long to open on 6.8.0 stable compared to 6.7.2 stable.

     

     

    Issue only occurs when there's plenty of files in one folder on 6.8.x

    Edited by je82
    Link to comment

    I am also seeing this issue on 6.8.0 from a Windows 10 Desktop browsing a "Media" share with 5000+ directories with each directory having one or two media files at most. It literally takes 5-6 seconds for the directory to refresh while navigating from the sub-folder back to the media share again.

     

    I have a mirror of all of this media running in a Synology 1815+ which the same Desktop can navigate in and out of directories instantly. So this tells me it is an UnRAID problem and not something with my Desktop configuration.

     

    I did not experience this issue on UnRAID v6.6.7.

    Edited by Xed
    Link to comment

    ended up rolling back to 6.7.2, smb performance is just key for me to use unraid, it was not great on 6.8.0 but it may be due to kernel changes rather then something unraid did, i guess time will tell!

    Link to comment

    Samba performance from a Mac running an older OS (10.11) is unusable on large directories. No such issues exist accessing Microsoft SMB shares hosted in Windows. Opening a directory with 1000-2000 files can take multiple minutes.

    Link to comment

    Folks, I too have been experiencing this very slow opening of share directories when I click on my Unraid shared drive directory it takes a very long time to list out the directory contents.  The disk that the shares are on is already in a spun up state (not sleeping) so it's not due to disk wake up spin up delay.  

    I am currently on the latest Unraid Version 6.8.3 2020-03-05.  I am very unhappy with this performance as it is frustrating to always have to wait for Unraid to access my shared directories.  Yes there are directories that contain thousands of files in them.  But still the performance really stinks on Unraid compared to my FreeNAS server with the same backup directory shares and content.  FreeNAS wakes up and shows the drive contents almost instantaneously while Unraid takes minutes to wake up and show the directory contents listings.  I am a paid Unraid user and I wish that they would get to the bottom of this and fix it.  if FreeNAS works properly to list huge file directories quickly, then so should Unraid.  Please fix this annoying bug.

    Edited by wyee
    Link to comment
    On 11/18/2020 at 9:58 PM, wyee said:

    Folks, I too have been experiencing this very slow opening of share directories when I click on my Unraid shared drive directory it takes a very long time to list out the directory contents.  The disk that the shares are on is already in a spun up state (not sleeping) so it's not due to disk wake up spin up delay.  

    I am currently on the latest Unraid Version 6.8.3 2020-03-05.  I am very unhappy with this performance as it is frustrating to always have to wait for Unraid to access my shared directories.  Yes there are directories that contain thousands of files in them.  But still the performance really stinks on Unraid compared to my FreeNAS server with the same backup directory shares and content.  FreeNAS wakes up and shows the drive contents almost instantaneously while Unraid takes minutes to wake up and show the directory contents listings.  I am a paid Unraid user and I wish that they would get to the bottom of this and fix it.  if FreeNAS works properly to list huge file directories quickly, then so should Unraid.  Please fix this annoying bug.

    Please upgrade to 6.9-beta and be sure 'Settings/SMB/Enhanced macOS interoperability' is set to Yes.

    Link to comment

    @limetech If I change over to the 6.9-beta would I be able to roll back to the 6.8.3 version without corrupting my data on all disk drives?  Also you mention to make sure to enable Enhanced macOS interoperability.  Are you assuming my problem report above was for using Apple iOS devices only?  I am having the slow wake up response problem on Windows 10 OS, Android devices.  Will going to the 6.9-beta version fix that too?  One last question, are you just guessing that this will fix the issue and want me to just try it or are you sure it will fix the slow wake up response issue?  I don't want to risk causing problems corrupting my Unraid system by migrating to a beta and then have to roll back in case something incompatible happens and I lose disk data.

    Link to comment
    1 hour ago, wyee said:

    @limetech If I change over to the 6.9-beta would I be able to roll back to the 6.8.3 version without corrupting my data on all disk drives?  Also you mention to make sure to enable Enhanced macOS interoperability.  Are you assuming my problem report above was for using Apple iOS devices only?  I am having the slow wake up response problem on Windows 10 OS, Android devices.  Will going to the 6.9-beta version fix that too?  One last question, are you just guessing that this will fix the issue and want me to just try it or are you sure it will fix the slow wake up response issue?  I don't want to risk causing problems corrupting my Unraid system by migrating to a beta and then have to roll back in case something incompatible happens and I lose disk data.

     

    You are posting this in the 6.9 'prerelease' forum which means you should be running 6.9.  This is where current development and bug fixes is occurring.  You didn't specify what client you are using and previous posts mention macOS and there is known directory listing slow down in Finder if 'Enhanced macOS interop" is not on.

     

    What do you mean by 'slow wake up response'?

     

    Also it's necessary to post diags when creating post in here.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.