golli53

Members
  • Posts

    81
  • Joined

Report Comments posted by golli53

  1. On 4/1/2020 at 4:02 PM, rhard said:

    Hi guys, unfortunately I see the same issue with SMB. Here is my results comparing Performance and On Demand CPU profile. Pstates are disabled in config, while running my CPU at max frequency even in idle.

    Which version are you using? I saw a signifcant performance drop starting in 6.8.X, with only partial recovery of performance by modifying Tunable Direct IO and SMB case settings. 6.6.7 at least should be quite a bit faster and doesn't suffer from multistream read/write issues in 6.7.X.

  2. On 2/7/2020 at 1:48 AM, limetech said:

    Verified, yes that works.

     

    You only need to add a single line in SMB Extras:

    
    case sensitive = yes

    Note: "yes" == "true" and is case insensitive

    This fixes the stat issue for very large folders - thanks for your hard work! Unfortunately, SMB is still quite slow - I think the listdir calls are still ~2x slower than with prior versions, despite Hard Links disabled. With the tweaks, my scripts now run instead of stalling, though are still noticebly slower. I'll try to reproduce and compare when I get a chance to try 6.8.2 again. Regardless, thanks for your efforts here.

  3. 54 minutes ago, limetech said:

    Where do all those dummy files get created?  That is on cache on disk1..N, or spread among them all?

    All on cache (which is 2xSSD RAID1 btrfs for me). Same issue occurs with folder that's on array though (spread across disks). Seems to be SMB issue because I don't see extra lag when calling stat from unRAID shell or through NFS from Linux client.

  4. @limetech First of all, thank you for taking the time to dig into this. From my much more limited testing, the issue seems to be a painful one to track down.

     

    I upgraded yesterday and while this tweak solves listdir times, stat times for missing files in large directories is still bugged (observation 2 in the below post):

    For convenience, I reproduced in Linux and wrote this simple script in bash:

    # unraid
    cd /mnt/user/myshare
    mkdir testdir
    cd testdir
    touch dummy{000000..200000}
    
    # client
    sudo mkdir /myshare
    sudo mount -t cifs -o username=guest //192.168.1.100/myshare /myshare
    while true; do start=$SECONDS; stat /myshare/testdir/does_not_exist > /dev/null 2>&1 ; end=$SECONDS; echo "$((end-start)) "; done

    On 6.8.x, each call takes 7-8s (vs 0-1s on previous versions), regardless of hard link support. The time complexity is nonlinear with the number of files (calls go to 15s if I increase the number of files by 50% to 300k).

    • Thanks 1
  5. I don't have a test server for unRAID so can only try out these suggestions on a weekend when I don't need my production environment up and running. For now, I'm going back to 6.6.7 to avoid slow SMB and the concurrent disk problem in 6.7.2.

     

    Also, I think there was something else going on in addition to the 3-4x slower directory listings. Some of my apps would lag for 20 minutes compared to 5 seconds, so I think there were additional SMB performance regressions. I detailed some other slow behavior in the Prerelease thread, but those were just the regressions I happened to notice from debugging the code in a couple of my apps one weekend, so I may have missed others.

  6. 16 hours ago, limetech said:

    FWIW I tried accessing a share on a remote server which has 3088 items in the top-level and it populated windows explorer near instantaneously.  This was via WireGuard connection where the remote server has crappy DSL internet access with mere 4Mbits upload.  These were all music directories and files and playing them via VLC worked ok, there was a slight pause to read the files but I attribute this to the aforementioned crappy DSL link.  Clearly this is not exhibiting the issue being mentioned.

    I guess it's the definition of near instantaneously. In my testing over many thousands of calls, I was averaging 2.5s vs 0.7s (for 6.7.2.) for 3k items. When 2 programs are accessing SMB simultaenously, that becomes 5s vs. 1.4s. For 10 programs, 25s vs 7s. I think it's common for services to access SMB shares on a server simultaenously.

  7. 19 minutes ago, BRiT said:

    But your sample code is getting directory listings.

    There are two issues and the sample code has a section for each (preceded by a comment header)

     

    Part 1 of the code is getting listings. That seems to be slower on 6.8.0 for all directories and is noticeable to a human on a single call without concurrency starting with a couple thousand files.

     

    Part 2 of the sample code is only calling stat. I can only reproduce this issue for very large directories, but maybe that's because it requires large directories to produce a measurable difference.

    • Thanks 1
  8. 7 minutes ago, BRiT said:

    Subdir by some designation, like by create or archive date, is required.

     

    That sort of directory even on native WinOS with local drives is going to be excruciatingly painful. The pain starts around the 30K range.

    I never call a directory listing in that directory. I only open specific files by naming convention. So, adding subdirs would make things more inefficient because I would have to check for a file in each subdir. My current setup works very fast using a normal Samba server eg 6.7.2 or Ubuntu.

     

    The first issue is a problem for much smaller directories (a few thousand).

    • Thanks 1
  9. smb672.txtsmb680.txt

    Attaching the testparm outputs. I also tried some debugging after rolling back again to the previous version. I see two differences in behavior

     

    - Each os.listdir call is ~3-4x slower on average on v6.8 vs v6.72 (2.5s vs 0.7s for a 6k file directory)

    - When calling os.stat on a single NONEXISTENT file in a 250k file directory on 6.8 (should be microseconds per call), every 100 or so calls, it hangs for ~5s

     

    Concurrent calls are simply additive in terms of execution time and concurrency itself doesn't seem to be the problem, but it makes it more apparent.

     

    Note that under the hood, Python is just using Windows native protocol for listing / accessing stats on these files, but using Python just makes it easier to debug many requests.

     

    My code for reproducing this below:

    from datetime import datetime as dt
    import os
    
    # observation 1
    while True:
        start = dt.now()
        os.listdir('//192.168.1.100/share/path')
        print((dt.now() - start).total_seconds())
    
    # observation 2
    while True:
        start = dt.now()
        try:
            os.stat('//192.168.1.100/share/bigpath/nonexistent.txt')
        except:
            pass
        print((dt.now() - start).total_seconds())

     

  10. 5 hours ago, limetech said:

    No but Samba team changes defaults all the time from release-to-release (kinda maddening they do this).

    Please type this command running 6.7.2:

    
    testparm -sv > /boot/smb672.txt

    and then boot 6.8-rc8 and type:

    
    testparm -sv > /boot/smb680.txt

    You now have those two text files on your flash which you can post here.  Note: those files will contain your share names, if you don't want to post here then send to me via PM.

     

    Also would be helpful to describe to me how to reproduce this issue.

    Thanks- I will try as soon as I get a chance (running production environment).

     

    I'm essentially calling code like below on several (~5) network folders with 5k subdiectories each from a Win10 client. That slows SMB directory listing down to a halt for that client, including normal browing using Windows file explorer.

    def recursive_ls(path):
        files = os.listdir(path)
        for f in files:
            subpath = os.path.join(path, f)
            if os.path.isdir(subpath):
    	        recursive_ls(subpath)

    [edit] Each subdirectory only has a couple files in it, so the number of files may not be an issue, but rather just concurrent ls requests, so calling ls on a loop from multiple threads on 1 client may do the same thing (may be easier to setup using a shell script)

  11. 10 hours ago, veruszetec said:

    Hey, quick question: Why is User0 considered deprecated?

     

    Is there something I should be using instead to replace this functionality?

    I'm interested in this also. I use user0 for several applications that do large background file transfers that I want to skip my cache (in order to avoid filling it up all the time)

    • Thanks 1
  12. On 9/12/2019 at 2:56 PM, yendi said:

    Is there any ETA for next version? It's getting out of control, my server is simply unusable.

    This is simply by running the mover or copying a file. PLEASE RELEASE AN EMERGENCY FIX.

    And no, I will not downgrade and risk other side effects, i'm in a production env.

    image.png.860dad5e85517c4fde898e707ce77729.png

    Just curious - what is this graph from?

  13. On 9/5/2019 at 6:33 AM, GHunter said:


    I ran across this problem too and found that is was related to this problem "[6.7.x] Very slow array concurrent performance by Johnnie.Black" that has been reported.

    I don't run plex but do have several other dockers that use sqlite and have never had a problem. Appdata is set to "cache only"

    I also think these two are related. Experiencing both on 6.7.2