• Slow SMB performance


    ptr727
    • Minor



    User Feedback

    Recommended Comments



    1 minute ago, mgutt said:

    @TexasUnraid

    @Juzzotec

    @CS01-HS

    Which CPU has your server?

    I have a Xeon 1235, basically a 2600k. Not the fastest CPU but should be ample for basic file transfers. I was actually using a slower CPU when the server was running on windows and the SMB performance was WAY WAY faster.

     

    ~1 hour for backups vs 6-8+ hours now.

    Link to comment
    9 minutes ago, TexasUnraid said:

    have a Xeon 1235

    Is sufficient. Must be something else.

     

    Old server had Multichannel?

     

     

    Link to comment
    31 minutes ago, mgutt said:

    Is sufficient. Must be something else.

     

    Old server had Multichannel?

     

     

    Multichannel memory?

     

    It is the same hardware minus the CPU, I used to have an I5 2500 but found this xeon for $20 so tossed it in as an upgrade. Both setups had 32gb of memory in 4 sticks.

     

    Network setup is unchanged as well except that I added a 10gb NIC in a P2P setup since then but I get basically the same results using the 10gb vs the normal 1gb when dealing with small files (slightly faster with the 10gb).

    Edited by TexasUnraid
    Link to comment
    2 hours ago, TexasUnraid said:

    Multichannel memory?

    SMB Multichannel. It splits the transfer across all CPU cores on the client AND the server, which is a default behaviour of Windows. If the network adapters even supported RDMA, the CPU load is super low. You can see this in this video at 11:00.

     

    And compared to other operation systems, Unraid adds an overhead through FUSE/SHFS which @limetech described on the first page of this bug report. He even explained, that Unraid bypasses SHFS for VMs itself, by replacing /mnt/user paths against direct-disk access paths like /mnt/cache.

     

    I used the same trick to boost my Plex server:

    https://forums.unraid.net/topic/88999-unraid-tweaks-for-media-server-performance/?tab=comments#comment-898167

     

    Of course limetech still need to find a way to optimize the SMB<>SHFS situation, but you already have multiple options to bypass SHFS by yourself. You find many ways in my guide:

    https://forums.unraid.net/topic/97165-smb-performance-tuning/

     

    Regarding the bug itself: It's only a guess, but as the SMB session count explodes for small files, I would say that something like a "chunk size" between SMB and SHFS does not fit. Sadly we can't help limetech as the SHFS mount command / flags are part of the unraid source code. And another guess of me is, that the Samba process and the SHFS process(es) often use the same CPU core, so Samba is not able to fully utilize one core exclusively.

    Link to comment

    Pretty much positive that it is not a CPU issue or the like as a packet capture showed a lot of failed and re-sent packets among other things showing a deeper issue then a mere hardware bottleneck. Also netdata shows basically no CPU load during small file transfers.

     

    It is so bad that even doing the file comparison for a backup can be as slow as 2-4 files a second at times. On windows they go so fast I can't even read them, closer to 300 files a second.

     

    I have also tried (and current use) the disk shares which bypass the fuse file system, performance is slightly better but only slightly. Took backups from 12+ hours down to ~6-8 hours.

     

    I tried all the SMB tweaks people suggested earlier in this thread, nothing seemed to make a difference.

    Edited by TexasUnraid
    Link to comment
    24 minutes ago, TexasUnraid said:

    as a packet capture showed a lot of failed and re-sent packets

    I found your capture results here:

    https://forums.unraid.net/bug-reports/stable-releases/slow-smb-performance-r566/page/2/?tab=comments#comment-9639

     

    I repeated this test as follows. At first I generated 200 random files and downloaded them on my W10 client:

    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {1..200}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    

    Then I started tcpdump as follows:

    tcpdump -s 200 -nn -i eth0 -w "/mnt/cache/tcpdump_$(date +'%Y%m%d_%H%M%S').cap" host 192.168.178.21 and port 445

    And then I uploaded the random files to a different path.

     

    Then I opened your dump and mine and used this filter to get only the SMB errors:

    smb2.error.context_count == 0

    And the results are completely different:

     

    1101603675_2021-01-1303_53_40.thumb.png.64cfd2daf5e6fb3a32166b9fb19f8c1d.png

     

    Then I reviewed your dump and I found out that you are not using Windows to copy your files as your process is much more complex:

    1030223990_2021-01-1303_59_10.thumb.png.d1c6fb86610383562e5468f72f1030fc.png

     

    At first it asks the server if the "FileGen 26139278788.bin" exists, which returns an "STATUS_NO_SUCH_FILE" error. Then it asks the server for "~vv1.tmp" which returns the "STATUS_OBJECT_NAME_NOT_FOUND" error, then it creates this tmp file and finally it renames it to "FileGen 26139278788.bin".

     

    Regarding my research "~vv1.tmp" files are created through ViceVersa. Is this correct? What do I need to set in this app to emulate your situation?

    Link to comment

    It has been known for some time that small file writes to user shares are considerably slow, and they've been getting slower with every new release, anyone dealing with small files should always use disk shares if at all possible, even if you need to reserve just one disk or a pool for that.

    Link to comment
    10 hours ago, mgutt said:

    I found your capture results here:

    https://forums.unraid.net/bug-reports/stable-releases/slow-smb-performance-r566/page/2/?tab=comments#comment-9639

     

    I repeated this test as follows. At first I generated 200 random files and downloaded them on my W10 client:

    
    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {1..200}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    

    Then I started tcpdump as follows:

    
    tcpdump -s 200 -nn -i eth0 -w "/mnt/cache/tcpdump_$(date +'%Y%m%d_%H%M%S').cap" host 192.168.178.21 and port 445

    And then I uploaded the random files to a different path.

     

    Then I opened your dump and mine and used this filter to get only the SMB errors:

    
    smb2.error.context_count == 0

    And the results are completely different:

     

    1101603675_2021-01-1303_53_40.thumb.png.64cfd2daf5e6fb3a32166b9fb19f8c1d.png

     

    Then I reviewed your dump and I found out that you are not using Windows to copy your files as your process is much more complex:

    1030223990_2021-01-1303_59_10.thumb.png.d1c6fb86610383562e5468f72f1030fc.png

     

    At first it asks the server if the "FileGen 26139278788.bin" exists, which returns an "STATUS_NO_SUCH_FILE" error. Then it asks the server for "~vv1.tmp" which returns the "STATUS_OBJECT_NAME_NOT_FOUND" error, then it creates this tmp file and finally it renames it to "FileGen 26139278788.bin".

     

    Regarding my research "~vv1.tmp" files are created through ViceVersa. Is this correct? What do I need to set in this app to emulate your situation?

     

    Correct, I use viceversa for the backups. Pretty simple to set it up, pick a folder, tell it to back it up to another folder at the most basic level. It has advanced features but they should not effect much in this case outside of the windows copy option.

     

    I did a test with both windows file copy and viceversa, viceversa also has an option to use the windows file copy instead of it's built in system but I get basically the same speed with either and the viceversa allows for some extra features so I just use that.

     

    @_wateever was able to replicate the results on the next page after the one you linked.

     

    What size files were you testing with? Large files seem to work fine, the issue only presents itself with small ~4kb files, of which I have a million or so it has to scan during backups.

     

    Far as the errors being different, that is honestly outside my knowledge base. I just know that the raw speed between the windows file copy and vice versa were close enough it was not worth using the windows file copy mode when doing the complete backup. Windows file copy only saved a little time.

     

    I also did tests with plain windows explorer and the results were similar as well.

     

    The first ~thousand files will sometimes go faster but then the speed drops to the point I can read each file name as it goes along.

     

    A slight performance penalty I could understand but ~8-10x worse is quite annoying. Turned my weekly backups from a quick morning task to an all day grind. It is to the point I will sometimes zip up the small files and manually copy the file to the server and unzip it there, takes a fraction of the time but is a lot more work for me and I have to replace all the files instead of just the ones that needed to be updated.

    Link to comment

    Im running an i5 8400 cpu which is plenty for a win10 vm.

     

    By the way my smb transfer seems to be capped to 4mb/s when trying to transfer a 4K MKV video file. Seems a lot different from the issues reported here... are you guys seeing slow transfer for large files or small files only?

    Link to comment
    On 1/14/2021 at 2:33 AM, Juzzotec said:

    Im running an i5 8400 cpu which is plenty for a win10 vm.

     

    By the way my smb transfer seems to be capped to 4mb/s when trying to transfer a 4K MKV video file. Seems a lot different from the issues reported here... are you guys seeing slow transfer for large files or small files only?

    Only small files. Your problem must be something else.

    Link to comment

    Hi everyone, just found this thread and amazed by the content - until now, I was not aware you could have disk shares.

    Using 6.8.3, I just enabled disk shares and I can transfer at pretty much max speed for my hdd (over network), vs user shares which were unbearably slow (crawling speeds).

    Even outside of SMB, another of my use case was a docker, having /mnt/user/sharename mounted as /media which was crazy slow and when changed to /mnt/disk4/sharename it became max speed of the actual hdd!!

    Is there an update to this SHFS issue? There seem to be two different issues mentioned in this thread, the SMB one and the FUSE aka SHFS issue? 

    I'd be interested in knowing if newer releases like the latest RC for 6.9 addresses the issue, but have no spare server to try this on.

    Thanks in advance!

    Link to comment
    7 minutes ago, xxxliqu1dxxx said:

    enabled disk shares

    Do not mix user shares and disk shares when moving or copying files. Linux doesn't know they are different views of the same files so could try to overwrite what it is trying to read if the source and destination paths work out that way.

    • Like 1
    Link to comment
    Just now, trurl said:

    Do not mix user shares and disk shares when moving or copying files. Linux doesn't know they are different views of the same files so could try to overwrite what it is trying to read if the source and destination paths work out that way.

    Thanks - I am not. I move from disk to disk and avoid this situation.

    Link to comment
    19 hours ago, xxxliqu1dxxx said:

    Even outside of SMB, another of my use case was a docker, having /mnt/user/sharename mounted as /media which was crazy slow and when changed to /mnt/disk4/sharename it became max speed of the actual hdd!!

     

    Thats the reason why my Plex configuration looks like this:

     

    2018128802_2021-02-0318_20_55.png.4e9fe7762da67136acae12adc9026991.png

     

    /tmp = RAM-Disk inside Container

    /dev/dri = iGPU

    /mnt/user/Movie = Share path as movies could be on multiple disks and a little bit SHFS overhead is acceptable.

    /mnt/cache/appdata/ = Disk path as Plex database, thumbnails, cover, etc should be loaded as fast as possible

     

    Conclusion: If you want the best performance, you must understand how Unraid works and change paths accordingly.

     

     

    Link to comment

    Just wondering why UNRAID doesn't comment on this?

    I pay for a license, and my system gets slower with each update.

    The only people that care are other customers that have given money to UNRAID. Seems that it might be time to move on to another system?

    I'm very disappointed in the lack of concern and speed of UNRAID.

    Thanks to all those who have posted and spent hours and hours trying to fix a software problem that UNRAID doesn't seem to care about nor has any desire to fix.

    • Like 1
    Link to comment
    26 minutes ago, John_M said:

    with kernel 5.10 replacing 4.19

     

    I don't think the SHFS overhead will be solved by a newer kernel.

    Link to comment
    5 minutes ago, mgutt said:

     

    I don't think the SHFS overhead will be solved by a newer kernel.

     

    The raw filesystem overhead is an issue but the far more significant issue is the SMB overhead. Dealing with lots of small files using krusader is slower then expected but still tolerable.

     

    Trying to work with those same files over SMB is aging.

     

    Now the filesystem issue could be causing the SMB issues but SMB is the far more annoying one for me.

    Link to comment
    1 hour ago, TexasUnraid said:

    Trying to work with those same files over SMB is aging.

     

     

     

    I still think its related to your setup or settings or paths.

     

    Here is a test of mine:

     

    5000 random sized files, transfered in 60 seconds

     

    90695367_2021-02-0720_11_06.thumb.png.b964561c9e617f226434636f3bf42d12.png

    • Like 1
    Link to comment

    Thanks for testing it on your end.

     

    Try it with 1,000,000+ (think I am closer to 1.5mil now), the first few thousand do seem to go faster for some reason. I have no idea why. it adds up real quick though when you have over a million files it has to crawl through.

     

    Then try the same thing on a windows to windows connection. Thats all that really matters is the windows > windows vs Windows > Unraid speed.

     

    All I know for sure, is that the backup process used to take about ~45mins - 1 hours start to finish on windows. I had it timed as part of my weekend morning routine.

     

    Now it takes 4-6 hours on average since switching to disk shares exclusively. Took 6-8 with user shares.

     

    Also little things like searching said files takes forever now over smb. I have to use krusader for any searching now as it takes a fraction of the time.

     

    I don't know what the issue is but I have sanity checked myself several times by testing speeds with a windows system and it is always back to what I remember. Unraid is always much much slower then windows SMB when dealing with random files.

     

    If you are really getting the same speeds windows > windows as Windows > linux I am all ears as to how your setup differs from mine.

    Edited by TexasUnraid
    Link to comment
    38 minutes ago, TexasUnraid said:

    Then try the same thing on a windows to windows connection. Thats all that really matters is the windows > windows vs Windows > Unraid speed.

     

    You mean a Windows VM as the target on the Unraid server, while still using ViceVersa? Or using a different tool or different machine?

     

    I suppose Windows to Windows automatically uses SMB Multichannel which will be multi threaded if both network adapters support RSS.

     

    While transfering you can check this and compare the results between win2win and win2unraid by opening Powershell as Admin and then execute the following:

    Get-SmbMultichannelConnection -IncludeNotSelected

     

    If you get a result = Multichannel is enabled

    If "Client RSS Cabable" is true = RSS is enabled = Multi Threading is active

     

    45 minutes ago, TexasUnraid said:

    Try it with 1,000,000+

     

    I started creating 1M random files with this command:

    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {1..1000000}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    

     

    Maximum write speed to my SATA SSD is 1 to 3 MB/s.

     

    I stopped it and repeated it with these four commands in four different terminal windows:

    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {1..250000}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    
    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {250001..500000}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    
    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {500001..750000}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    
    share_name="Music"
    mkdir "/mnt/cache/${share_name}/randomfiles"
    for n in {750001..1000000}; do
        dd status=none if=/dev/urandom of="/mnt/cache/${share_name}/randomfiles/$( printf %03d "$n" ).bin" bs=4k count=$(( RANDOM % 5 + 1 ))
    done
    

     

    Now maximum write speed is 9 to 19 MB/s.

     

    Multi Threading is king.

    Link to comment

    I am talking about another bare metal windows computer entirely, I have 6 of them in the house to play with.

     

    I tried that powershell command but it just goes to a new line and does not give any information with a transfer active (or not).

     

    Those random file commands appear to be run natively on linux, so that does not surprise me that they work fine, my issue is only when using a windows client to access the unraid host via smb (NFS runs at full speed, although I had permissions issues when using it).

     

    If I use krusader or anything else directly on unraid performance is ok.

     

    One other point that might play a role is I am on beta 30, although I saw the same results on 6.8.3.

     

    An easy why I can tell it takes a lot longer over SMB is that when comparing files, it will scan the local files much much faster then it will scan the unraid files.

    Edited by TexasUnraid
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.