mgutt Posted September 24, 2020 Share Posted September 24, 2020 I'm using Unraid for a while now and collected some experience to boost the SMB transfer speeds: Donate? 🤗 1.) Choose the right CPU The most important part is to understand that SMB is single-threaded. This means SMB uses only one CPU core to transfer a file. This is valid for the server and the client. Usually this is not a problem as SMB does not fully utilize a CPU core (except of real low powered CPUs). But Unraid adds, because of the ability to split shares across multiple disks, an additional process called SHFS and its load raises proportional to the transfer speed, which could overload your CPU core. So the most important part is, to choose the right CPU. At the moment I'm using an i3-8100 which has 4 cores and 2257 single thread passmark points: And since I have this single thread power I'm able to use the full bandwith of my 10G network adapter which was not possible with my previous Intel Atom C3758 (857 points) although both have comparable total performance. I even was not able to reach 1G speeds while a parallel Windows Backup was running (see next section to bypass this limitation). Now I'm able to transfer thousands of small files and parallely transfer a huge file with 250 MB/s. With this experience I suggest a CPU that has around 1400 single thread passmark points to fully utilize a 1G ethernet port. As an example: The smallest CPU I would suggest for Unraid is an Intel Pentium Silver J5040. P.S. Passmark has a list sorted by single thread performance for desktop CPUs and server CPUs. 2.) Bypass single-thread limitation The single-thread limitation of SMB and SHFS can be bypassed through opening multiple connections to your server. This means connecting to "different" servers. The easiest way to accomplish that, is to use the ip-address of your server as a "second" server while using the same user login: \\tower\sharename -> best option for user access through file explorer as it is automatically displayed \\10.0.0.2\sharename -> best option for backup softwares, you could map it as a network drive If you need more connections, you can add multiple entries to your windows hosts file (Win+R and execute "notepad c:\windows\system32\drivers\etc\hosts"): 10.0.0.2 tower2 10.0.0.2 tower3 Results If you now download a file from your Unraid server through \\10.0.0.2 while a backup is running on \\tower, it will reach the maximum speed while a download from \\tower is massively throttled: 3.) Bypass Unraid's SHFS process If you enable access directly to the cache disk and upload a file to //tower/cache, this will bypass the SHFS process. Beware: Do not move/copy files between the cache disk and shares as this could cause data loss! The eligible user account will be able to see all cached files, even those from other users. Temporary Solution or "For Admins only" As Admin or for a short test you could enable "disk shares" under Settings -> Global Share Settings: By that all users can access all array and cache disks as SMB shares. As you don't want that, your first step is to click on each Disk in the WebGUI > Shares and forbid user access, except for the cache disk, which gets read/write access only for your "admin" account. Beware: Do not create folders in the root of the cache disk as this will create new SMB Shares Safer Permanent Solution Use this explanation. Results In this thread you can see the huge difference between copying to a cached share or copying directly to the cache disk. 4.) Enable SMB Multichannel + RSS SMB Multichannel is a feature of SMB3 that allows splitting file transfers across multiple NICs (Multichannel) and create multiple TCP connection depending on the amount of CPU Cores (RSS) since Windows 8. This will raise your throughput depending on your amount of NICs, NIC bandwidth, CPU and used settings: This feature is experimental SMB Multichannel is considered experimental since its release with Samba 4.4. The main bug for this state is resolved in Samba 4.13. The Samba developers plan to resolve all bugs with 4.14. Unraid 6.8.3 contains Samba 4.11. This means you use Multichannel on your own risk! Multichannel for Multiple NICs Lets say your mainboard has four 1G NICs and your Client has a 2.5G NIC. Without Multichannel the transfer speed is limited to 1G (117,5 MByte/s). But if you enable Multichannel it will split the file transfer across the four 1G NICs boosting your transfer speed to 2.5G (294 MByte/s): Additionally it uses multiple CPU Cores which is useful to avoid overloading smaller CPUs. To enable Multichannel you need to open the Unraid Webterminal and enter the following (the file is usually empty, so do not wonder): nano /boot/config/smb-extra.conf And add the following to it: server multi channel support = yes Press "Enter+X" and confirm with "Y" and "Enter" to save the file. Then restart the Samba service with this command: samba restart Eventually you need to reboot your Windows Client, but finally its enabled and should work. Multichannel + RSS for Single and Multiple NICs But what happens if you're server has only one NIC. Now Multichannel is not able to split something, but it has a sub-feature called RSS which is able to split file transfers across multiple TCP connections with a single NIC: Of course this feature works with multiple NICs, too: But this requires RSS capability on both sides. You need to check your servers NIC by opening the Unraid Webterminal and entering this command (could be obsolete with Samba 4.13 as they built-in an RSS autodetection ) egrep 'CPU|eth*' /proc/interrupts It must return multiple lines (each for one CPU core) like this: egrep 'CPU|eth0' /proc/interrupts CPU0 CPU1 CPU2 CPU3 129: 29144060 0 0 0 IR-PCI-MSI 524288-edge eth0 131: 0 25511547 0 0 IR-PCI-MSI 524289-edge eth0 132: 0 0 40776464 0 IR-PCI-MSI 524290-edge eth0 134: 0 0 0 17121614 IR-PCI-MSI 524291-edge eth0 Now you can check your Windows 8 / Windows 10 client by opening Powershell as Admin and enter this command: Get-SmbClientNetworkInterface It must return "True" for "RSS Capable": Interface Index RSS Capable RDMA Capable Speed IpAddresses Friendly Name --------------- ----------- ------------ ----- ----------- ------------- 11 True False 10 Gbps {10.0.0.10} Ethernet 3 Now, after you are sure that RSS is supported on your server, you can enable Multichannel + RSS by opening the Unraid Webterminal and enter the following (the file is usually empty, so do not wonder): nano /boot/config/smb-extra.conf Add the following and change 10.10.10.10 to your Unraid servers IP and speed to "10000000000" for 10G adapter or to "1000000000" for a 1G adapter: server multi channel support = yes interfaces = "10.10.10.10;capability=RSS,speed=10000000000" If you are using multiple NICs the syntax looks like this (add RSS capability only for supporting NICs!): interfaces = "10.10.10.10;capability=RSS,speed=10000000000" "10.10.10.11;capability=RSS,speed=10000000000" Press "Enter+X" and confirm with "Y" and "Enter" to save the file. Now restart the SMB service: samba restart Does it work? After rebooting your Windows Client (seems to be a must), download a file from your server (so connection is established) and now you can check if Multichannel + RSS works by opening Windows Powershell as Admin and enter this command: Get-SmbMultichannelConnection -IncludeNotSelected It must return a line similar to this (a returned line = Multichannel works) and if you want to benefit from RSS then "Client RSS Cabable" must be "True": Server Name Selected Client IP Server IP Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable ----------- -------- --------- --------- ---------------------- ---------------------- ------------------ ------------------- tower True 10.10.10.100 10.10.10.10 11 13 True False In Linux you can verify RSS through this command which returns one open TCP connection per CPU core (in this case we see 4 connections as my client has only 4 CPU cores, altough my server has 6): netstat -tnp | grep smb tcp 0 0 192.168.178.8:445 192.168.178.88:55975 ESTABLISHED 3195/smbd tcp 0 0 192.168.178.8:445 192.168.178.88:55977 ESTABLISHED 3195/smbd tcp 0 0 192.168.178.8:445 192.168.178.88:55976 ESTABLISHED 3195/smbd tcp 0 0 192.168.178.8:445 192.168.178.88:55974 ESTABLISHED 3195/smbd Note: Sadly Samba does not create multiple smbd processes, which means we still need a CPU with high single thread performance to benefit from RSS. This is even mentioned in the presentation: If you are interested in test results, look here. 5.) smb.conf Settings Tuning I did massive testing with a huge amount of smb.conf settings provided by the following websites and really NOTHING resulted in a noticable speed gain: https://wiki.samba.org/index.php/Performance_Tuning https://wiki.samba.org/index.php/Linux_Performance https://wiki.samba.org/index.php/Server-Side_Copy https://www.samba.org/~ab/output/htmldocs/Samba3-HOWTO/speed.html https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html https://lists.samba.org/archive/samba-technical/attachments/20140519/642160aa/attachment.pdf https://www.samba.org/samba/docs/Samba-HOWTO-Collection.pdf https://www.samba.org/samba/docs/current/man-html/ (search for "vfs") https://lists.samba.org/archive/samba/2016-September/202697.html https://codeinsecurity.wordpress.com/2020/05/18/setting-up-smb-multi-channel-between-freenas-or-any-bsd-linux-and-windows-for-20gbps-transfers/ https://www.snia.org/sites/default/files/SDC/2019/presentations/SMB/Metzmacher_Stefan_Samba_Async_VFS_Future.pdf https://www.heise.de/newsticker/meldung/Samba-4-12-beschleunigt-Verschluesselung-und-Datentransfer-4677717.html I would say the recent Samba versions are already optimized by default. 6.) Choose a proper SSD for your cache You could use Unraid without an SSD, but if you want fast SMB transfers an SSD is absolutely required. Else you are limted to slow parity writes and/or through your slow HDD. But many SSDs on the market are not "compatible" for using it as an Unraid SSD Cache. DRAM Many cheap models do not have a DRAM Cache. This small buffer is used to collect very small files or random writes before they are finally written to the SSD and/or is used to have a high speed area for the file mapping-table. In Short, you need DRAM Cache in your SSD. No exception. SLC Cache While DRAM is only absent in cheap SSDs, SLC Cache can miss in different price ranges. Some cheap models use a small SLC cache to "fake" their technical data. Some mid-range models use a big SLC Cache to raise durability and speed if installed in a client pc. And some high-end models do not have an SLC Cache, as their flash cells are fast enough without it. Finally you are not interested in SLC Cache. You are only interested in continuous write speeds (see "Verify Continuous Writing Speed") Determine the Required Writing Speed But before you are able to select the right SSD model you need to determine your minimum required transfer speed. This should be simple. How many ethernet ports do you want to use or do you plan to install a faster network adapter? Lets say you have two 5G ports. With SMB Multichannel its possible to use them in sum and as you plan to install a 10G card in your client you could use 10G in total. Now we can calculate: 10G * 117.5 MByte/s (real throughput per 1G ethernet) = 1175 MByte/s and by that we have two options: buy one M.2 NVMe (assuming your motherboard has such a slot) with a minimum writing speed of 1175 MByte/s buy two or more SATA SSDs and use them in a RAID0, each with a minimum writing speed of 550 MByte/s Verify Continuous Writing Speed of the SSD As an existing "SLC Cache" hides the real transfer speed you need to invest some time to check if your desired SSD model has an SLC cache and how much the SSD throttles after its full. A solution could be to search for "review slc cache" in combination with the model name. Using the image search could be helpful as well (maybe you see a graph with a falling line). If you do not find anything, use Youtube. Many people out there test their new ssd by simply copying a huge amount of files on it. Note: CrystalDiskMark, AS SSD, etc Benchmarks are useless as they only test a really small amount of data (which fits into the fast cache). Durability You could look for the "TBW" value of the SSD, but finally you won't be able to kill the SSD inside the warranty as long your very first filling of your unraid server is done without the SSD Cache. As an example a 1TB Samsung 970 EVO has a TBW of 600 and if your server has a total size of 100TB you would waste 100TBW on your first fill for nothing. If you plan to use Plex, think about using the RAM as your transcoding storage which would save a huge amount of writes to your SSD. Conclusion: Optimize your writings instead of buying an expensive SSD. NAS SSD Do not buy "special" NAS SSDs. They do not offer any benefits compared to the high-end consumer models, but cost more. 7.) More RAM More RAM means more caching and as RAM is even faster than the fastest SSDs, this adds additional boost to your SMB transfers. I recommend installing two identical (or more depening on the amount of slots) RAM modules to benefit from "Dual Channel" speeds. RAM frequency is not as important as RAM size. Read Cache for Downloads If you download a file twice, the second download does not read the file from your disk, instead it uses your RAM only. The same happens if you're loading covers of your MP3s or Movies or if Windows is generating thumbnails of your photo collection. More RAM means more files in your cache. The read cache uses by default 100% of your free RAM. Write Cache for Uploads Linux uses by default 20% of your free RAM to cache writes, before they are written to the disk. You can use the Tips and Tweaks Plugin to change this value or add this to your Go file (with the Config Editor Plugin) sysctl vm.dirty_ratio=20 But before changing this value, you need to be sure to understand the consequences: Never use your NAS without an UPS if you use write caching as this could cause huge data loss! The bigger the write cache, the smaller the read cache (so using 100% of your RAM as write cache is not a good idea!) If you upload files to your server, they are 30 seconds later written to your disk (vm.dirty_expire_centisecs) Without SSD Cache: If your upload size is generally higher than your write cache size, it starts to cleanup the cache and in parallel write the transfer to your HDD(s) which could result in slow SMB transfers. Either you raise your cache size, so its never filled up, or you consider totally disabling the write cache. With SSD Cache: SSDs love parallel transfers (read #6 of this Guide), so a huge writing cache or even full cache is not a problem. But which dirty_ratio value should you set? This is something you need to determine by yourself as its completely individual: At first you need to think about the highest RAM usage that is possible. Like active VMs, Ramdisks, Docker containers, etc. By that you get the smallest amount of free RAM of your server: Total RAM size - Reserved RAM through VMs - Used RAM through Docker Containers - Ramdisks = Free RAM Now the harder part: Determine how much RAM is needed for your read cache. Do not forget that VMs, Docker Containers, Processes etc load files from disks and they are all cached as well. I thought about this and came to this command that counts hot files: find /mnt/cache -type f -amin -86400 ! -size +1G -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }' | numfmt --to=iec-i --suffix=B It counts the size of all files on your SSD cache that are accessed in the last 24 hours (86400 seconds) The maximum file size is 1GiB to exclude VM images, docker containers, etc This works only if you hopefully use your cache for your hot shares like appdata, system, etc Of course you could repeat this command on several days to check how it fluctuates. This command must be executed after the mover has finished its work This command isn't perfect as it does not count hot files inside a VM image Now we can calculate: 100 / Total RAM x (Free RAM - Command Result) = vm.dirty_ratio If your calculated "vm.dirty_ratio" is lower than 5% (or even negative), you should lower it to 5 and buy more RAM. between 5% and 20%, set it accordingly, but you should consider buying more RAM. between 20% and 90%, set it accordingly If your calculated "vm.dirty_ratio" is higher than 90%, you are probably not using your SSD cache for hot shares (as you should) or your RAM is huge as hell (congratulation ^^). I suggest not to set a value higher than 90. Of course you need to recalcuate this value if you add more VMs or Docker Containers. #8 Disable haveged Unraid does not trust the randomness of linux and uses haveged instead. By that all encryptions processes on the server use haveged which produces extra load. If you don't need it, disable it through your Go file (CA Config Editor) as follows: # ------------------------------------------------- # disable haveged as we trust /dev/random # https://forums.unraid.net/topic/79616-haveged-daemon/?tab=comments#comment-903452 # ------------------------------------------------- /etc/rc.d/rc.haveged stop 14 9 Quote Link to comment
testdasi Posted September 24, 2020 Share Posted September 24, 2020 (edited) For (3) an safer alternative rather than enabling disk share universally is to have custom SMB config file pointing to a top level folder on a disk (e.g. for a share called sharename, have custom SMB config pointing to /mnt/cache/sharename or /mnt/disk1/sharename). Then have the SMB Extras in SMB Settings "include" that config file. That way you just need to restart SMB to change the config file (instead of needing to stop the array to change SMB Extras). Works really well with my cache-only nvme-raid0 share. More detailed guide: Let's say you have a cache-only share called "sharename" that you want a user called "windows" to access via SMB with shfs-bypass. Create a smb-custom.conf with the content below and save it in /boot/config [sharename-custom] path = /mnt/cache/sharename comment = browseable = no Force User = nobody valid users = windows write list = windows vfs objects = Then with the array stopped -> Settings -> SMB -> add this line to Samba extra configuration box: include = /boot/config/smb-custom.conf Apply, done, start array. You can now access the bypassed share at \\tower\sharename-custom or \\server-ip\sharename-custom Some hints It's critical that the name of the bypassed share (e.g. sharename-custom) is DIFFERENT from the normal share name or you will run into weird quirks i.e. Unraid share conflicts with your custom share. To add more shares, just copy-paste the above block in the smb-custom.conf and make appropriate changes (e.g. name, path, user), save and then restart SMB. No need to stop array. Similarly, for edit, just edit smb-custom.conf, save and restart SMB. Edited September 25, 2020 by testdasi Added more detailed guide 2 1 Quote Link to comment
falconexe Posted September 24, 2020 Share Posted September 24, 2020 7 hours ago, mgutt said: I'm using Unraid for a while now and collected some experience to boost the SMB transfer speeds: I will post my results after all tests have been finished. By now I would say it does not really influence the performance as recent Samba versions are already optimized, but we will see. I appreciate your work and analysis on this. I'm still not clear if this is something that @SpencerJ @limetech can improve in a future update. Per the tail end of my thread, you mentioned that it may be possible. I'm hoping they can officially shed some light on this SHFS overhead, and how it might be optimized for us that use 10GBe network cards to transfer large amounts of data on a regular basis. Quote Link to comment
mgutt Posted September 24, 2020 Author Share Posted September 24, 2020 (edited) 11 minutes ago, falconexe said: This is actually what I am doing in my "Fix" Are you? He is only able to access \\tower\cache\sharename and not \\tower\cache. I will realize this too. This is really safer. Of course the Guide will be updated, too. Edited September 24, 2020 by mgutt Quote Link to comment
Energen Posted September 24, 2020 Share Posted September 24, 2020 What we, as an Unraid community, really needs... is an up to date guide on all things SMB speed related topics. There are so many threads with people asking about SMB speed and performance and I don't think there is any single source to go to for answers. I was always like "eh" about my SMB speed, I figured it is what it is, and my 10-50mb transfers were the best I was going to get.. but when I see people with 100-200-300+ mb/s transfers that really gets me wondering what I'm doing wrong. I don't do a ton of transferring where I really need that speed but if I could get it, and should get it, I WANT it. Right now a transfer from my Unraid to my new PC build with an NVMe drive I transfer 1.4gb file ar around 112mb/s which is an improvement over my old system. Copying the file back to Unraid (cache enabled) is the same speed. But it's certainly not 710mb/sec like the image above. Share to share that same 1.4gb was about 500mb/s just now, so that's not terrible (not using mapped drives at the moment), if it matters. So this thread is a good start but if we could somehow gather all the SMB information from the entire forum and make it useable, that would be something special. Another related topic is how your network hardware affects your SMB speeds. That should be included as well. mgutt is using a 10G network adapter so clearly his results cannot be expected for someone using a 100mbit LAN. @mgutt have you tweaked your Unraid smb.conf for the SMB protocol version at all? I'm currently using server min protocol = SMB3_11 client min protocol = SMB3_11 to theoretically use the latest (fastest?) version of SMB for Unraid <-> Windows 10. Quote Link to comment
mgutt Posted September 24, 2020 Author Share Posted September 24, 2020 (edited) 3 hours ago, Energen said: have you tweaked your Unraid smb.conf for the SMB protocol version at all? I'm currently using server min protocol = SMB3_11 client min protocol = SMB3_11 to theoretically use the latest (fastest?) version of SMB for Unraid <-> Windows 10. Nope. My configuration contains only default values except the Multichannel part as explained by this Guide. SMB 3.1.1 is automatically used through my Windows 10 client (remove your min protocoll setting and use "Get-SmbConnection" in Powershell to verify the usage of 3.1.1). But even without the Multichannel setting, my performance was already good. It was only bad with my old CPU. What is your hardware setup? EDIT: Ok found it here. Regarding my single-thread theory your Pentium G4560 should be sufficient. Use this little checklist (replace "sharename" with one of your cached shares): Is your SSD cache fast enough (with full SLC cache)? Open the Webterminal and execute htop and leave it open. Open a second Webterminal and generate a huge random file on your cache with dd if=/dev/urandom iflag=fullblock of=/mnt/cache/sharename/10GB.bin bs=1GiB count=10 Download this file from \\tower\sharename to your Windows Client, note the speed and check the load/processes in htop Download it again and check the read speed of your SSD in the Unraid WebGUI. Did it fall to zero? (Should, as the 10GB file should fit in your RAM = Linux Cache) Download it from \\server.ip\sharename (#2 of this guide), note the speed and check the load/processes in htop Enable (temporarily) Disk Shares in Global Settings and download it from \\server.ip\cache\sharename (#3 of this guide to bypass SHFS), note the speed and check the load/processes in htop, disable Disk Shares afterwards (for Security Reasons as described) Disable Hyperthreading in your BIOS and repeat this test. My theory is that SMB indeed uses a different thread, but the same core (randomly) as the SHFS process which would make Hyperthreading absolute useless for Unraid. But until now nobody helped me to verify this. What is your conclusion? What happens with the load and processes in the different tests and how fast can you go without SHFS? Edited September 24, 2020 by mgutt Quote Link to comment
mgutt Posted September 25, 2020 Author Share Posted September 25, 2020 (edited) #3 "Completely bypass Unraid's additional process" rewritten because of @testdasi #4 "Multichannel + RSS" rewritten to reflect the different amounts of NICs and their bandwith #6 "Choose a proper SSD for your cache" added #7 "More RAM" added Edited September 25, 2020 by mgutt Quote Link to comment
JorgeB Posted September 26, 2020 Share Posted September 26, 2020 On 9/24/2020 at 12:14 PM, mgutt said: Enable SMB Multichannel Recommend adding that SAMBA SMB multichannel is still experimental and not recommended for production as in some rare cases it can cause data corruption, from the SAMBA manpage: Quote Warning: Note that this feature is still considered experimental. Use it at your own risk: Even though it may seem to work well in testing, it may result in data corruption under some race conditions. Future releases may improve this situation. Quote Link to comment
mgutt Posted September 26, 2020 Author Share Posted September 26, 2020 (edited) Will add a warning. Last status is that data corruption is solved in Samba 4.13 with some minor bugs left (6.8.3 uses 4.11) https://bugzilla.samba.org/show_bug.cgi?id=11897 Which Samba version is used in the recent Unraid Beta? P.S. Samba 4.12 will add extra boost with vfs object "io_uring": https://www.heise.de/newsticker/meldung/Samba-4-12-beschleunigt-Verschluesselung-und-Datentransfer-4677717.html Edited September 27, 2020 by mgutt Quote Link to comment
mgutt Posted September 27, 2020 Author Share Posted September 27, 2020 - #4 "Experimental"-Warning has been added. I found another interesting thing about Multichannel. With Samba 4.13 the RSS capability of the network adapters will be auto-detected: https://www.samba.org/~metze/presentations/2020/SDC/StefanMetzmacher_sdc2020_multichannel_io-uring-live-compact.pdf Quote On Linux we autodetect the RSS capability We use ETHTOOL GRXRINGS in order to detect it 1 Quote Link to comment
mgutt Posted September 28, 2020 Author Share Posted September 28, 2020 (edited) There is a downside by enabling Multichannel + RSS: Downloading from HDD Array without RSS: Downloading with RSS: We know that enabling RSS splits the SMB transfer across my 4 CPU cores in 4 different file streams, but I never excepted that those splitted streams read from different HDD sectors. I thought it will read one stream, split it by the CPU into the RAM and the send it to the downloading Client. But it is obviously not so. Note: Uploading to the HDD array has no impact with RSS. So enabling Multchannel with multiple NICs or enabling RSS with a single NIC has positive Impact on SSDs, but negative impact on downloading from HDDs. At least with Samba 4.11 as used by Unraid 6.8.3. Maybe this becomes better in further Samba versions. We will see. But there is more... I found a new step for the Guide: Installing huge RAM, boosts all Uploads and recurring Downloads. This happens after I wanted to show a friend how slow Unraid could be, if you write directly to the HDDs and yes these benchmarks were done without SSD Cache: Of course this could only happen because it writes to the Server's RAM instead to the HDDs, so I uploaded some really huge files and after around 15GB, the 1GB/s transfer fell down to 70 MB/s as expected: After interrupting the transfer it took several minutes until the dashboard didn't show anymore writes. This means the file transfer filled the Server's RAM and as the RAM is much faster than the HDDs, it needs additional time to finally move the file from the RAM to the HDD. So the Server's RAM works similar to an SLC-Cache of an SSD. Nice 😎 But why is this feature not part of the Unraid settings? I mean using an Unraid server without an USV is extremely risky as long files are not finally written to the array. And why does it "only" use 10-15GB of my RAM as writing cache, although there was much more usable? Did anyone heard about this writing cache before? Many benchmarks for nothing... As annoucend in #5 of this Guide I wanted to determine which Samba configurations could add some extra boost. And now, after 5 days of testing, I found this RAM Cache thing. This means all my benchmarks are "wrong" and need re-testing 😭 Ok, they are not really wrong, because they show the real performance of my server, but as not all users install so much RAM, we need test them with a smaller RAM-cache as well. Its time for a huuuuge RAM Disk to block caching 😁 Edited September 28, 2020 by mgutt Quote Link to comment
JorgeB Posted September 29, 2020 Share Posted September 29, 2020 9 hours ago, mgutt said: Did anyone heard about this writing cache before? By default Linux uses 20% free RAM for write cache, this can be adjusted manually or for example with the tips and tweaks plugin. 1 1 Quote Link to comment
mgutt Posted September 29, 2020 Author Share Posted September 29, 2020 1 hour ago, JorgeB said: By default Linux uses 20% free RAM for write cache I found a nice write-up which settings are present and how they work. Now I don't really know what to test, because everyone is using this Cache. Maybe I fully disable it and compare the results with the already existing benchmarks. By that we can compare what happens if the cache is full or empty. P.S. I do not use the Tips and Tweaks Plugin anymore because it can set Network Adapter values that aren't supported, so the server looses internet connection and/or is completely offline. I had this multiple times and after uninstalling the Plugin it never happened again. Quote Link to comment
mgutt Posted September 30, 2020 Author Share Posted September 30, 2020 #7 "More Ram" completely rewritten Quote Link to comment
mgutt Posted September 30, 2020 Author Share Posted September 30, 2020 (edited) At the moment I'm playing around with the vm.dirty_ratio setting and found something interesting. The first test was to benchmark my NVMe. For that I reduced the dirty pages ratio to 1 which means 1% of 63 GB free RAM = 630 MB space for dirty pages. This should help to get real benchmarks which are not influenced by the RAM. So I started my benchmark with dd as follows: sysctl vm.dirty_ratio=20 rm -r /mnt/cache/Music/Benchmark/* fstrim -v /mnt/cache sleep 60 for bs in 512 4k 16k 64k 128k 256k 512k 1M > do > echo ---- $bs: ---- > dd if=/dev/zero of=/mnt/cache/Music/Benchmark/10G_${bs}.bin bs=$bs iflag=count_bytes count=10G > done ---- 512: ---- 20971520+0 records in 20971520+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 29.0343 s, 370 MB/s ---- 4k: ---- 2621440+0 records in 2621440+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 5.41366 s, 2.0 GB/s ---- 16k: ---- 655360+0 records in 655360+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 4.31003 s, 2.5 GB/s ---- 64k: ---- 163840+0 records in 163840+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 4.11446 s, 2.6 GB/s ---- 128k: ---- 81920+0 records in 81920+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 7.68483 s, 1.4 GB/s ---- 256k: ---- 40960+0 records in 40960+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 8.7388 s, 1.2 GB/s ---- 512k: ---- 20480+0 records in 20480+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 8.7285 s, 1.2 GB/s ---- 1M: ---- 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 8.66134 s, 1.2 GB/s I wonder that "16k" and "64k" are the only block sizes that allow writing to the SSD with full speed altough its 4K-aligned. I guess it is because a higher block size means writing to more flash cells at the same time. For example 16k writes to four 4K blocks and 64k writes to sixteen 4K blocks at the same time, but why does it become slower if the block size is even bigger? Than I reminded that my 970 Evo NVMe has an SLC cache of 42 GB and a full cache means a maximum writing speed of around 1.2 GByte/s. So I repeated the test without the smaller block sizes: sysctl vm.dirty_ratio=1 rm -r /mnt/cache/Music/Benchmark/* fstrim -v /mnt/cache sleep 60 for bs in 128k 256k 512k 1M 4M > do > echo ---- $bs: ---- > dd if=/dev/zero of=/mnt/cache/Music/Benchmark/10G_${bs}.bin bs=$bs iflag=count_bytes count=10G > done ---- 128k: ---- 81920+0 records in 81920+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 3.96074 s, 2.7 GB/s ---- 256k: ---- 40960+0 records in 40960+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 4.24214 s, 2.5 GB/s ---- 512k: ---- 20480+0 records in 20480+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 4.19673 s, 2.6 GB/s ---- 1M: ---- 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 4.17924 s, 2.6 GB/s ---- 4M: ---- 2560+0 records in 2560+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 8.35124 s, 1.3 GB/s Ok, theory is correct. But what happens if we fully disable the dirty_pages by setting it to "0" (I reduced the file size to 100M after it took ages): sysctl vm.dirty_ratio=0 rm -r /mnt/cache/Music/Benchmark/* fstrim -v /mnt/cache sleep 60 for bs in 512 4k 16k 64k 128k 256k 512k 1M do echo ---- $bs: ---- dd if=/dev/zero of=/mnt/cache/Music/Benchmark/100M_${bs}.bin bs=$bs iflag=count_bytes count=100M done ---- 512: ---- 204800+0 records in 204800+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 21.2512 s, 4.9 MB/s ---- 4k: ---- 25600+0 records in 25600+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 21.5886 s, 4.9 MB/s ---- 16k: ---- 6400+0 records in 6400+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 19.723 s, 5.3 MB/s ---- 64k: ---- 1600+0 records in 1600+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 21.3909 s, 4.9 MB/s ---- 128k: ---- 800+0 records in 800+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 21.1618 s, 5.0 MB/s ---- 256k: ---- 400+0 records in 400+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 16.1569 s, 6.5 MB/s ---- 512k: ---- 200+0 records in 200+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 20.0074 s, 5.2 MB/s ---- 1M: ---- 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 27.8239 s, 3.8 MB/s Wow, this looked really really bad. But is it only because of dd? I verified it by an usual upload through my client: Then I set dirty ratio to 1 while the transfer was still running: Conclusion: Its a really bad idea to set dirty pages to "0". Edited September 30, 2020 by mgutt Quote Link to comment
mgutt Posted October 2, 2020 Author Share Posted October 2, 2020 (edited) Even after reducing vm.dirty_pages to even 100MB: sysctl vm.dirty_bytes=100000000 it does not influence transfer speeds: This means I do not need to repeat all the benchmarks with simulated low ram conditions. Even the tiniest RAM setup works. Ok, this is not valid for the read cache as it would contain the full file, but a) I do not know how to disable the read cache and b) reading has no real potential to be become faster through changing samba settings. Now it's time to analyze all benchmarks. SMB.conf Tuning I made many tests by adding different Samba settings to smb-extra.conf (3-4 for each setting). This is the default /etc/samba/smb.conf of Unraid 6.8.3 (only the performance part): use sendfile = Yes aio read size = 0 aio write size = 4096 allocation roundup size = 4096 In addition I enabled RSS as this allows the usage of all CPU cores. Now the results: CrystalDiskMark Speed best SMB.conf setting Gain of fastest SEQ1M Q8T1 Read >1170 MB/s default or aio write size or aio read size or write cache size no gain SEQ1M Q1T1 Read >980 MB/s default or strict allocate or write cache size or aio write size (fastest) + 10 MB/s RND4K Q32T16 Read >300 MB/s default or strict allocate or write cache size (fastest) + 30 MB/s RND4K Q1T1 Read >39 MB/s default (fastest) or strict allocate or write cache size no gain SEQ1M Q8T1 Write >1170 MB/s default or strict allocate or write cache size or aio read size or aio write size no gain SEQ1M Q1T1 Write >840 MB/s write cache size (fastest) or aio read size + aio write size + 40 MB/s RND4K Q32T16 Write >270 MB/s write cache size + 20 MB/s RND4K Q1T1 Write >40 MB/s default or write cache size or aio write size no gain NAS Performance Tester Sequential Write Avg >1165 MB/s default or strict allocate or write cache size or aio write size no gain Sequential Read Avg >1100 MB/s default or strict allocate or write cache size (fastest) no gain As we can see "write cache size" seems to be the only setting that is sometimes faster than default. So I retested it with the following sizes: write cache size = 131072 write cache size = 262144 write cache size = 2097152 write cache size = 20971520 write cache size = 209715200 write cache size = 2097152000 Result: No (stable) difference compared to the default settings. So it is obviously random, depending on the minimal load on server and/or client while the benchmarks are running. At next I will try to overwrite the default unraid settings. Maybe this has an influence. Edited October 2, 2020 by mgutt Quote Link to comment
mgutt Posted October 5, 2020 Author Share Posted October 5, 2020 My next test should cover downloading from HDDs. I don't think this is the best we could get: I will test transfering the file to the server's nvme to check the highest possible speed without SMB. Then I'll test FTP and maybe I find a way to boot ubuntu on my client machine to test NFS as well. The most interesting part are these fluctuations. Quote Link to comment
pltaylor Posted November 18, 2020 Share Posted November 18, 2020 On 9/26/2020 at 10:29 AM, mgutt said: Will add a warning. Last status is that data corruption is solved in Samba 4.13 with some minor bugs left (6.8.3 uses 4.11) https://bugzilla.samba.org/show_bug.cgi?id=11897 Which Samba version is used in the recent Unraid Beta? P.S. Samba 4.12 will add extra boost with vfs object "io_uring": https://www.heise.de/newsticker/meldung/Samba-4-12-beschleunigt-Verschluesselung-und-Datentransfer-4677717.html 6.9.0-beta 35 uses samba 4.12.9 1 Quote Link to comment
frodr Posted March 24, 2021 Share Posted March 24, 2021 Copying files from Unraid server to Win client with Multichannel + RSS setup will be limited by the disk (if not cached) transfer speed? Right? From the Array I se up to 250MB/s. Quote Link to comment
mgutt Posted March 24, 2021 Author Share Posted March 24, 2021 @frodr Yes. With RSS it could be even less as it opens multiple transfer channels and HDDs do not really like parallel access. 1 Quote Link to comment
frodr Posted March 24, 2021 Share Posted March 24, 2021 I have made a change to my setup where Unraid docker Plex streams from a Windows Server 3 tier (nvme/2.5" ssd/hdd) Storage Spaces share. Unraid server and Windows Server are connected thru a high speed NIC to NIC. Downloads done in Unraid and serves as a parity checked reserve for the Win Server. Very snappy feeling inside Plex. 1 Quote Link to comment
falconexe Posted March 24, 2021 Share Posted March 24, 2021 (edited) 5 minutes ago, frodr said: I have made a change to my setup where Unraid docker Plex streams from a Windows Server 3 tier (nvme/2.5" ssd/hdd) Storage Spaces share. Unraid server and Windows Server are connected thru a high speed NIC to NIC. Downloads done in Unraid and serves as a parity checked reserve for the Win Server. Very snappy feeling inside Plex. Can you explain this further and detail your architecture? I’m out of PCI slots on my UNRAID server, but have a serious gaming rig connected to it via 10Gig NIC-to-NIC. I’d like to leverage my RTX 3080 to transcode in the Windows PC. Thoughts? Edited March 24, 2021 by falconexe Quote Link to comment
frodr Posted March 24, 2021 Share Posted March 24, 2021 15 minutes ago, falconexe said: Can you explain this further and detail your architecture? I’m out of PCI slots on my UNRAID server, but have a serious gaming rig connected to it via 10Gig NIC-to-NIC. I’d like to leverage my RTX 3080 to transcode in the Windows PC. Thoughts? Yes, but since we are off topic here, we should make a new thread. Quote Link to comment
falconexe Posted March 25, 2021 Share Posted March 25, 2021 5 hours ago, frodr said: Yes, but since we are off topic here, we should make a new thread. Sure. Please send me link. Thanks. Quote Link to comment
Gee1 Posted March 31, 2021 Share Posted March 31, 2021 still in actual Unraid RSS dosnt work Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.