• 6.8.0 SMB Ver 4.11.3 significant performance decrease when opening files in folders with 1000+ files in them.


    je82
    • Annoyance

    Hello,

     

    I noticed today after upgrading that my SMB performance had decreased by a lot, but not in the way you would expect. The write/read speeds are fine.

     

    The problem seems to only occur whenever you open a media file in a share where there is 1000+ media files stored in the same root folder.

     

    If you take 50 of the media files, create a folder and move them into a folder so there's only 50 files in the folder, then playback will be normal, new smb is doing something which adds a ton of delay.

     

    Doing the exact same thing on 6.7.2 is much more rapid.

     

    You can re-create the issue by having 1000+ media files in a folder and browse next between them. The more files you have in the folder the slower the performance.

     

    Thanks.

     

     

     

     




    User Feedback

    Recommended Comments



    I experience this as well. When I'm doing SFTPs to these folders it can cause my whole vm to lag in an attempt to list the folders.  It can take a few seconds to bring up all the folders but honestly I have the folders as condensed as I can without plex hating me.

    Link to comment
    10 minutes ago, Jerky_san said:

    I experience this as well. When I'm doing SFTPs to these folders it can cause my whole vm to lag in an attempt to list the folders.  It can take a few seconds to bring up all the folders but honestly I have the folders as condensed as I can without plex hating me.

    Turn off hard link support, as I already suggested in this topic, and see if this makes a difference.

    Link to comment

    @limetech

    Turning off hard link support indeed increases listening performance by a factor of more than ten as far as I can sense. My backup listing now took around two minutes instead of half an hour with hard link support enabled.

    This is much better! I will leave it disabled, I never used hard links in my Unraid storage anyway.

    Thank you for this suggestion!

    Link to comment
    On 1/20/2020 at 2:49 PM, limetech said:

    Turn off hard link support, as I already suggested in this topic, and see if this makes a difference.

    It did really help with my ftp transfers. Program no longer locks up. Do you believe it will eventually be able to run with hard links enabled or is that kind of up in the air at this time? Also sorry for not completely reading the thread before responding.

    Link to comment
    55 minutes ago, Jerky_san said:

    It did really help with my ftp transfers. Program no longer locks up. Do you believe it will eventually be able to run with hard links enabled or is that kind of up in the air at this time? Also sorry for not completely reading the thread before responding.

    Has to do with how POSIX-compliant we want to be.  Here are the issues:

     

    If 2 dirents (directory entries) refer to the same file, then if you 'stat' either dirent it should return:

    a) 'st_nlink' will be set to 2 in this case, and

    b) the same inode number in 'st_ino'.

     

    Prior to 6.8 release a) was correct, but b) was not (it returns an internal FUSE inode number associated with dirents).  This is incorrect behavior and can confuse programs such as 'rsync', but fixes NFS stale file handle issue.

     

    To fix this, you can tell FUSE to pass along the actual st_ino of the underlying file instead of it's own FUSE inode number.  This works except for 2 problems:

    1. If the file is physically moved to a different file system, the st_ino field changes.  This causes NFS stale file handles.

    2. There is still a FUSE delay because it caches stat data (default for 1 second).  For example, if kernel asks for stat data for a file (or directory), FUSE will ask user-space filesystem to provide it.  Then if kernel asks for stat data again for same object, if time hasn't expired FUSE will just return the value it read last time.  If timeout expired, then FUSE will again ask user-space filesystem to provide it.  Hence in our example above, one could remove one of the dirents for a file and then immediately 'stat' the other dirent, and that stat data will not reflect fact that 'st_nlink' is now 1 - it will still say 2.  Obviously whether this is an issue depends entirely on timing (the worse kind of bugs).

     

    In the FUSE example code there is this comment in regards to hard link support:

    static void *xmp_init(struct fuse_conn_info *conn,
                          struct fuse_config *cfg)
    {
            (void) conn;
            cfg->use_ino = 1;
            cfg->nullpath_ok = 1;
    
            /* Pick up changes from lower filesystem right away. This is
               also necessary for better hardlink support. When the kernel
               calls the unlink() handler, it does not know the inode of
               the to-be-removed entry and can therefore not invalidate
               the cache of the associated inode - resulting in an
               incorrect st_nlink value being reported for any remaining
               hardlinks to this inode. */
            cfg->entry_timeout = 0;
            cfg->attr_timeout = 0;
            cfg->negative_timeout = 0;
    
            return NULL;
    }

    But the problem is the kernel is very "chatty" when it comes to directory listings.  Basically it re-'stat's the entire parent directory tree each time it wants to 'stat' a file returned by READDIR.  If we have the 'attr_timeout' set to 0, then each one of those 'stat's results in a round trip from kernel space to user space (and processing done by user-space filesystem).  I have set it up so that if you enable hard link support, those timeouts are as above and hence you see huge slowdown because of all the overhead.

     

    I could remove that code that sets the timeouts to 0, but as I mentioned, not sure what "bugs" this might cause for other users - our policy is, better to be slow than to be wrong.

     

    So this is kinda where it stands.  We have ideas for fixing but will involve modifying FUSE which is not a small project.

    • Like 3
    Link to comment
    On 1/20/2020 at 3:49 PM, limetech said:

    Turn off hard link support, as I already suggested in this topic, and see if this makes a difference.

    @limetech That did the trick.  Thank you, thank you, thank you!!!

    Link to comment
    3 hours ago, limetech said:

    Has to do with how POSIX-compliant we want to be.  Here are the issues:

     

    If 2 dirents (directory entries) refer to the same file, then if you 'stat' either dirent it should return:

    a) 'st_nlink' will be set to 2 in this case, and

    b) the same inode number in 'st_ino'.

     

    Prior to 6.8 release a) was correct, but b) was not (it returns an internal FUSE inode number associated with dirents).  This is incorrect behavior and can confuse programs such as 'rsync', but fixes NFS stale file handle issue.

     

    To fix this, you can tell FUSE to pass along the actual st_ino of the underlying file instead of it's own FUSE inode number.  This works except for 2 problems:

    1. If the file is physically moved to a different file system, the st_ino field changes.  This causes NFS stale file handles.

    2. There is still a FUSE delay because it caches stat data (default for 1 second).  For example, if kernel asks for stat data for a file (or directory), FUSE will ask user-space filesystem to provide it.  Then if kernel asks for stat data again for same object, if time hasn't expired FUSE will just return the value it read last time.  If timeout expired, then FUSE will again ask user-space filesystem to provide it.  Hence in our example above, one could remove one of the dirents for a file and then immediately 'stat' the other dirent, and that stat data will not reflect fact that 'st_nlink' is now 1 - it will still say 2.  Obviously whether this is an issue depends entirely on timing (the worse kind of bugs).

     

    In the FUSE example code there is this comment in regards to hard link support:

    
    static void *xmp_init(struct fuse_conn_info *conn,
                          struct fuse_config *cfg)
    {
            (void) conn;
            cfg->use_ino = 1;
            cfg->nullpath_ok = 1;
    
            /* Pick up changes from lower filesystem right away. This is
               also necessary for better hardlink support. When the kernel
               calls the unlink() handler, it does not know the inode of
               the to-be-removed entry and can therefore not invalidate
               the cache of the associated inode - resulting in an
               incorrect st_nlink value being reported for any remaining
               hardlinks to this inode. */
            cfg->entry_timeout = 0;
            cfg->attr_timeout = 0;
            cfg->negative_timeout = 0;
    
            return NULL;
    }

    But the problem is the kernel is very "chatty" when it comes to directory listings.  Basically it re-'stat's the entire parent directory tree each time it wants to 'stat' a file returned by READDIR.  If we have the 'attr_timeout' set to 0, then each one of those 'stat's results in a round trip from kernel space to user space (and processing done by user-space filesystem).  I have set it up so that if you enable hard link support, those timeouts are as above and hence you see huge slowdown because of all the overhead.

     

    I could remove that code that sets the timeouts to 0, but as I mentioned, not sure what "bugs" this might cause for other users - our policy is, better to be slow than to be wrong.

     

    So this is kinda where it stands.  We have ideas for fixing but will involve modifying FUSE which is not a small project.

    Thank you very much for the thorough explanation.

    Link to comment
    On 1/23/2020 at 11:40 PM, limetech said:

    Has to do with how POSIX-compliant we want to be.  Here are the issues:

     

    If 2 dirents (directory entries) refer to the same file, then if you 'stat' either dirent it should return:

    a) 'st_nlink' will be set to 2 in this case, and

    b) the same inode number in 'st_ino'.

     

    Prior to 6.8 release a) was correct, but b) was not (it returns an internal FUSE inode number associated with dirents).  This is incorrect behavior and can confuse programs such as 'rsync', but fixes NFS stale file handle issue.

     

    To fix this, you can tell FUSE to pass along the actual st_ino of the underlying file instead of it's own FUSE inode number.  This works except for 2 problems:

    1. If the file is physically moved to a different file system, the st_ino field changes.  This causes NFS stale file handles.

    2. There is still a FUSE delay because it caches stat data (default for 1 second).  For example, if kernel asks for stat data for a file (or directory), FUSE will ask user-space filesystem to provide it.  Then if kernel asks for stat data again for same object, if time hasn't expired FUSE will just return the value it read last time.  If timeout expired, then FUSE will again ask user-space filesystem to provide it.  Hence in our example above, one could remove one of the dirents for a file and then immediately 'stat' the other dirent, and that stat data will not reflect fact that 'st_nlink' is now 1 - it will still say 2.  Obviously whether this is an issue depends entirely on timing (the worse kind of bugs).

     

    In the FUSE example code there is this comment in regards to hard link support:

    
    static void *xmp_init(struct fuse_conn_info *conn,
                          struct fuse_config *cfg)
    {
            (void) conn;
            cfg->use_ino = 1;
            cfg->nullpath_ok = 1;
    
            /* Pick up changes from lower filesystem right away. This is
               also necessary for better hardlink support. When the kernel
               calls the unlink() handler, it does not know the inode of
               the to-be-removed entry and can therefore not invalidate
               the cache of the associated inode - resulting in an
               incorrect st_nlink value being reported for any remaining
               hardlinks to this inode. */
            cfg->entry_timeout = 0;
            cfg->attr_timeout = 0;
            cfg->negative_timeout = 0;
    
            return NULL;
    }

    But the problem is the kernel is very "chatty" when it comes to directory listings.  Basically it re-'stat's the entire parent directory tree each time it wants to 'stat' a file returned by READDIR.  If we have the 'attr_timeout' set to 0, then each one of those 'stat's results in a round trip from kernel space to user space (and processing done by user-space filesystem).  I have set it up so that if you enable hard link support, those timeouts are as above and hence you see huge slowdown because of all the overhead.

     

    I could remove that code that sets the timeouts to 0, but as I mentioned, not sure what "bugs" this might cause for other users - our policy is, better to be slow than to be wrong.

     

    So this is kinda where it stands.  We have ideas for fixing but will involve modifying FUSE which is not a small project.

    thank you for explaining, i wish i didn't rollback so quickly... what effects will turning off hardlinks have more than speeding up directory listing?

     

    i am running 6.7.2 now with a couple of smb shares, the shares are spanned over multiple drives, i am wondering if this is feature is hardlink dependant or if hardlinks is something i could live without?

     

    thanks again for all the hard work you and the team are putting into the project!

    Link to comment
    22 minutes ago, je82 said:

    thank you for explaining, i wish i didn't rollback so quickly... what effects will turning off hardlinks have more than speeding up directory listing?

     

    i am running 6.7.2 now with a couple of smb shares, the shares are spanned over multiple drives, i am wondering if this is feature is hardlink dependant or if hardlinks is something i could live without?

     

    thanks again for all the hard work you and the team are putting into the project!

    Hard link support was added because certain docker apps would use them in the appdata share.

    • Thanks 1
    Link to comment
    1 minute ago, limetech said:

    Hard link support was added because certain docker apps would use them in the appdata share.

    so hardlinking does not exist in 6.7.2? then i wouldn't be affected by upgrading and disabling it as it is obviously something i do not use even if i though my radarr/sonarr hardlink setup was working correctly, i guess it never was :) thanks for the heads up!

    Link to comment

    This happens outside of unraid too, Windows in particular tries to enumerate each file in the folder. Unless you only ever access the folder programatically the solution is to not store 1000 files in a folder.

    Link to comment

    @limetech First of all, thank you for taking the time to dig into this. From my much more limited testing, the issue seems to be a painful one to track down.

     

    I upgraded yesterday and while this tweak solves listdir times, stat times for missing files in large directories is still bugged (observation 2 in the below post):

    For convenience, I reproduced in Linux and wrote this simple script in bash:

    # unraid
    cd /mnt/user/myshare
    mkdir testdir
    cd testdir
    touch dummy{000000..200000}
    
    # client
    sudo mkdir /myshare
    sudo mount -t cifs -o username=guest //192.168.1.100/myshare /myshare
    while true; do start=$SECONDS; stat /myshare/testdir/does_not_exist > /dev/null 2>&1 ; end=$SECONDS; echo "$((end-start)) "; done

    On 6.8.x, each call takes 7-8s (vs 0-1s on previous versions), regardless of hard link support. The time complexity is nonlinear with the number of files (calls go to 15s if I increase the number of files by 50% to 300k).

    Edited by golli53
    • Thanks 1
    Link to comment
    3 hours ago, golli53 said:

    For convenience, I reproduced in Linux and wrote this simple script in bash:

    Where do all those dummy files get created?  That is on cache on disk1..N, or spread among them all?

    Link to comment
    54 minutes ago, limetech said:

    Where do all those dummy files get created?  That is on cache on disk1..N, or spread among them all?

    All on cache (which is 2xSSD RAID1 btrfs for me). Same issue occurs with folder that's on array though (spread across disks). Seems to be SMB issue because I don't see extra lag when calling stat from unRAID shell or through NFS from Linux client.

    Edited by golli53
    Link to comment
    13 hours ago, golli53 said:

    @limetech First of all, thank you for taking the time to dig into this. From my much more limited testing, the issue seems to be a painful one to track down.

     

    I upgraded yesterday and while this tweak solves listdir times, stat times for missing files in large directories is still bugged (observation 2 in the below post):

    For convenience, I reproduced in Linux and wrote this simple script in bash:

    
    # unraid
    cd /mnt/user/myshare
    mkdir testdir
    cd testdir
    touch dummy{000000..200000}
    
    # client
    sudo mkdir /myshare
    sudo mount -t cifs -o username=guest //192.168.1.100/myshare /myshare
    while true; do start=$SECONDS; stat /myshare/testdir/does_not_exist > /dev/null 2>&1 ; end=$SECONDS; echo "$((end-start)) "; done

    On 6.8.x, each call takes 7-8s (vs 0-1s on previous versions), regardless of hard link support. The time complexity is nonlinear with the number of files (calls go to 15s if I increase the number of files by 50% to 300k).

    thanks for the heads up, i opened this issue again as it seems to persist even when hardlink is disabled.

    Link to comment

    Pretty sure we got to the bottom of this.  Turns out Samba operations on large directories has been problematic for a long time.  It has to do with how case is handled with file names.  google 'samba very large directories' for a good survey of the problem.

     

    With array Started, if you include this option for one of your shares in file /etc/samba/smb-shares.conf it will speed up:

    case sensitive = Yes

    After editing file type:

    samba restart

    You should see dramatic speedup.  Unfortunately your edit will not be preserved across array Stop/Start.  Also, there are a couple more options related to case, read about them here.

    (see below, just put 'case sensitive = Yes' in your SMB Extras settings.

     

    We will add a new share config option that specifies whether to set 'case sensitive' to yes or no.

    Link to comment

    I have these entries in the "Samba extra configuration", see Settings -> SMB -> SMB Extras

    [global]
    case sensitive = true
    default case = lower
    preserve case = true
    short preserve case = true

     

    Link to comment
    26 minutes ago, bonienl said:

    I have these entries in the "Samba extra configuration", see Settings -> SMB -> SMB Extras

    Wasn't sure 'case sensitive' was global - it's documented as per-share.

    For the other ones, those are current defaults.

    Link to comment
    45 minutes ago, bonienl said:

    I have these entries in the "Samba extra configuration", see Settings -> SMB -> SMB Extras

    
    [global]
    case sensitive = true
    default case = lower
    preserve case = true
    short preserve case = true

     

    Verified, yes that works.

     

    You only need to add a single line in SMB Extras:

    case sensitive = yes

    Note: "yes" == "true" and is case insensitive

    • Like 1
    Link to comment

    Also having this issue now... my CCC backups kept failing and I thought it wasn’t UnRAID until I stumbled across this... I'm fairly sure this is affecting me too.

     

    All the above has had zero affect on me.

     

    One folder with 22k files in (A time-lapse from 9 months ago) either shows nothing or the incorrect number of files in Finder.

     

    Going to roll back to 6.7.2 to test...

    Edited by Interstellar
    Link to comment
    On 2/7/2020 at 1:48 AM, limetech said:

    Verified, yes that works.

     

    You only need to add a single line in SMB Extras:

    
    case sensitive = yes

    Note: "yes" == "true" and is case insensitive

    This fixes the stat issue for very large folders - thanks for your hard work! Unfortunately, SMB is still quite slow - I think the listdir calls are still ~2x slower than with prior versions, despite Hard Links disabled. With the tweaks, my scripts now run instead of stalling, though are still noticebly slower. I'll try to reproduce and compare when I get a chance to try 6.8.2 again. Regardless, thanks for your efforts here.

    Link to comment

    Just an FYI for anyone running Poste.io email server or any other email system using Dovecot IMAP server. It seems that hard links are required for the Dovecot IMAP engine within Poste.io to operate correctly as it uses hard links to copy messages. I had to re-enable hard links to get IMAP back. It looks like it is possible to configure Dovecot to not use hard links, but I haven't tried that yet.

    Link to comment

    In this comment I mentioned that backing up to sparsebundles on macOS had become incredibly slow. Setting "case sensitive = yes" in SMB extras has resolved that issue for me (including TimeMachine backups).

    Link to comment

    +1 here for poor network performance (well actually, I have 2 with poor network performance; but I digress)

     

    sorry for the noob questions, but consider the following:

    #disable SMB1 for security reasons
    
    [global]
       min protocol = SMB2
    #Are the 3 space before the above setting necessary?
    
    [global]
    #Set Case Sensitivity - can comments go after [global]?
    case sensitive = true
    
    default case = lower #can comments go on the same line?
    preserve case = true
    short preserve case = true
    
    #vfs_recycle_start
    #Recycle bin configuration
    [global]
       syslog only = No
       log level = 0 vfs:0
    #vfs_recycle_end

     

    * Is the syntax/formatting correct? (Does placement of CRs matter, placing comment lines after [global], etc.?)

    * Is it better to have just 1 [global] heading and put all global setting after it?

    * How does unRAID know when global instructions end? there's not [/global] or something to indicate.

     

    Thanks for any assistance that is offered!

    Edited by Joseph
    typos/clarification
    Link to comment
    On 2/23/2020 at 10:01 PM, Vynce said:

    In this comment I mentioned that backing up to sparsebundles on macOS had become incredibly slow. Setting "case sensitive = yes" in SMB extras has resolved that issue for me (including TimeMachine backups).

    setting case sensitive to yes in smb config does that mean in order to access a file that is on smb path you have to write out the path and filename exactly as it is with the case?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.