[SOLVED - WORKAROUND] How to Write to NVMe PCIe Cache at FULL 1 GB/s with 10 GBe NIC!


Recommended Posts

Former Title: "1TB NVMe PCIe Cache & 10 GBe NIC = Very Odd Network Issue"

 

Renamed for Better Searchability and Tutorial Purposes.

 

Hello, I have a 10GBe peer to peer network connection with a dedicated Windows 10 PC (CAT6A at around 75 Feet). Both machines have the exact same ASUS XG-C100C NIC. I have been running a 1TB SSD as my primary Cache drive until recently. Back then I was peaking around 300 MB/s on writes to the Cache (not the array). Though these drives peak at 500 MB/s writes, I was not too concerned about losing 200 MB/s. I figured it was some kind of bottleneck or overhead. I have my MTU set to 9014 on both ends, and I have fine tuned my NIC like crazy on the Windows side and it is fully optimized for this work.

 

Today, I installed a Samsung 970 EVO Plus 1GB NVMe PCIe X4 SSD as my primary cache drive. I have 2 of these in my desktop and I peak at 3350 MB/s writes from disk to disk. That being said, with my 10GBe NIC, which should peak around 1250 MB/s fully saturated, and my new NMVe cache drive having a higher throughput than my NIC (it is not a bottleneck), I assumed I would be hitting 1GB/s transfers uploading files to my UNRAID server.

 

Well...there is something VERY ODD about my outcome. I do and I don't. 😆

 

If I write directly to the share which is really on Cache (Cache is set to "YES" on this Share), I lose 2/3 of my throughput and peak at a sustained ~300 MB/s.

 

\\UNRAID\Share

 

image.png.93fb5636ee03d0495969c1872a6c8a96.png

 

 

If I write directly to the Cache drive (only did this as a test) to the same exact folder path, I suddenly hit a rock solid 1 GB/s sustained write speed as expected.

 

\\UNRAID\Cache\Share

 

image.png.2ad22b18569be21ea47e6547299064a5.png

 

 

WHAT AM I MISSING? How is this even happening? 🙄

 

I would certainly expect some kind of performance drop if I was writing directly to the array, but I am writing to CACHE. This is freaking killing me. I just spent a crap ton of money to upgrade this server, and I am basically in the same place as a standard SSD. And now I'm thinking if I had written to Cache directly using the old SSD cache drive as a test, I would have hit my peak 500 MB/s after-all, not to mention I saw the same ~300 MB/s max write speeds on that drive.

 

This is clearly a software issue. The same hardware is being used writing the same file in both instances. *HOW* the file is being written is the only difference. Technically, SMB/UNRAID is agnostic in the fact that it presents the folder/file on the SMB share path and does not delineate if it is actually sitting on the cache drive or the share. It just serves up the file. However, WINDOWS or UNRAID certainly behaves differently via SMB if I explicitly direct the file to just the share via cache, or the cache\share directly.

 

I know you SHOULD NOT write to Cache directly or it can mess stuff up. So my question is, HOW DO I GET FULL THROUGHPUT while writing to Cache the proper way?

 

Please help! Thanks everyone!

Edited by falconexe
  • Like 1
  • Thanks 1
Link to comment

Also, I have Direct IO set to "YES"

 

image.thumb.png.fa0718a6c98512f49d91406ab7117997.png

 

I have also been reading a lot about WRITE CACHE. Since this drive is PCIe and not SATA/SAS, I am unable to check to see if it is on. I am also not sure if this would even matter in the test I did since IT DOES work in one scenario.

 

image.png.2f5a3e960551d47fbb316bef9363b658.png

Link to comment

Here is what I can find in the WIKI regarding speed and how it works. Keep in mind I have been using UNRAID avidly since 2014. I consider myself a pretty advanced user, and I have had a cache disk for a very long time. I have to imagine that what is causing this issue is either something really dumb and basic, or something really technical and beyond my level of expertise.

 

I'm looking forward to anyone's feedback. Thanks again in advance!

 

image.thumb.png.fe6e155845ee289414deb864926bab79.png

 

Cache Wiki Page:

https://wiki.unraid.net/Cache_disk

 

 

Link to comment

So here is the full cache drive disk log. It does appear that DISK CACHING IS ON on the tail end of the log.

 

image.thumb.png.3d2a6e4b10f005c166ed717374d51202.png

 

Sorry about all of the posts in a row... I'm just trying to present as much info as possible for everyone to assist.

 

In the meantime, I am going to fully Power Down and test again to see if there is any change. I'm guessing not, but you never know LOL.

 

EDIT: Reboot had NO effect. Welp...

Edited by falconexe
Link to comment

Thanks for responding @johnnie.black.

 

Why is there any overhead at all when writing to CACHE (a cache share in this case) when parity is not being written to on the array until later via MOVER? How (or What?) can cost me almost 700 MB/s of overhead? I have tried this same test with DOCKER disabled and minimal services running. As you can see from my signature, we are running a seriously ROBUST server.

 

If this is just "known" overhead, I may return the $400 roughly I put into this NVMe cache setup including the PCIe adapters, etc. In my case, there is literally ZERO benefit over a standard SATA6 SSD if this is expected behavior.

 

I have seen PLEX run MUCH more quickly with appdata on this NVMe, so there's that. Overall, I am super disappointed if this is par for the course. We run a media production company and quickly offloading TBs of raw footage is critical to our workflow.

 

@limetech Tom, this is the first time I have reached out to you directly in the 6 years that I have owned UNRAID. Do you have any thoughts or suggestions to navigate this issue? Is there any technical reason why UNRAID cannot support what I am asking? It appears that I have the proper hardware (Profession IT guy), and that this is a software issue or limitation of UNRAID itself. Perhaps I have missed something? Thanks so much for your help. This is 1 of 2 massive servers we operate. We seriously LOVE UNRAID and we have one of the largest single arrays out there.

Link to comment
37 minutes ago, testdasi said:

This is shfs overhead (the Unraid share functionality). I worked around it by creating custom smb config to expose my /mnt/cache/share

 

@testdasi

 

So I "share" my actual Cache drive directly (\\UNRAID\Cache) and this is how I was able to get true 1GB/s transfers.

 

image.thumb.png.d8559d2d32096369def400aa1fcc03c1.png

 

However, this get's dicey with writing directly to the cache disk, the mover handling, and the actual native share schema itself. Sure I could create the entire folder path for the content I want to write directly to cache, but this is a ton of work, and from what I have read over the years, UNSAFE. I'd rather just go to my share and also see the parallel files in that share (the entire share contents). 

 

Is there a way to integrate proper Share Handling while "exposing" the Cache drive? For instance, can I open my "Media" share and write to this normally (to Cache) and obtain the speeds I'm looking for?

 

Can you elaborate on exactly how you accomplished this? What exactly is in your custom SMB config that allows this? I have no issues with exposing my Cache in a different way to solve this issue. Thanks so much for the info!

 

Here is my current SMB config:

 

store dos attributes = yes

#unassigned_devices_start
#Unassigned devices share includes

   include = /tmp/unassigned.devices/smb-settings.conf
#unassigned_devices_end

#vfs_recycle_start
#Recycle bin configuration

[global]
   syslog only = No
   log level = 0 vfs:0

#vfs_recycle_end

#Prevent OSX Files From Showing Up
veto files = /._*/.DS_Store/.AppleDouble/.Trashes/.TemporaryItems/.Spotlight-V100/
delete veto files = yes

#Added Security Settings
min protocol = SMB2
client min protocol = SMB2
client max protocol = SMB3
guest ok = no
null passwords = no

#Speed Up Windows File Browsing
case sensitive = yes

Edited by falconexe
Link to comment
19 hours ago, johnnie.black said:

You can still write and use the user shares normally while avoiding the overheard by writing directly to a disk share, just enable disk shares and then write to:

 

\\tower\cache\your_share

 

instead of

 

\\tower\your_share

 

Thanks everyone. So what you describe is EXACTLY what I did per my first post in order to hit a solid 1 GB/s sustained upload. I was writing files directly to the cache disk share, instead of the Media share.

 

Though this works, it is not ideal for 1 reason. Once the Mover completes (mine runs nightly), it deletes the folder structure (schema) of the Media share and all sub-folders of anything that is successfully FULLY moved to the Array. Therefore, it would be a huge pain in the butt to have to recreate the folder structure (top 2 levels) first on the cache drive direct share DAILY, and then copy the content over.

 

I Run a 2 Folder Deep Split Level on this Share:

 

image.thumb.png.488da7c91f0ec0c7576c43bf63b7d0e5.png

 

 

For instance. If I create the following directory structure on the cache disk share, I can keep dropping files to it UNTIL the mover runs. Once the mover completes, it destroys this architecture.

 

Before Mover:

\\UNRAID\cache\Media\RawFootage

 

After Mover:

\\UNRAID\cache\DELETED_BY_MOVER\DELETED_BY_MOVER

 

I know this is working as designed, but it is not a REAL work around in my opinion to solve the 2/3 loss in network throughput that I am experiencing as described above, because it is not perpetual solution.

 

Maybe I need to just get over this and move on. Let me know if I have missed something, or if I am correct.

 

 

@testdasi Is this what you deal with too? Or have you solved this part?

 

If the Mover moved all files to the array, BUT DID NOT DELETE THE TOP 2 LEVEL FOLDER STRUCTURE (on the cache disk share), this WOULD work for me as an accepted solution. Any thoughts?

 

 

Edited by falconexe
Link to comment
5 hours ago, falconexe said:

If the Mover moved all files to the array, BUT DID NOT DELETE THE TOP 2 LEVEL FOLDER STRUCTURE (on the cache disk share), this WOULD work for me as an accepted solution. Any thoughts?

Since mover won't touch open files, a workaround could be to keep a token file held open in the deepest folder that you need to keep.

  • Like 1
Link to comment

Thanks @johnnie.black @jonathanm.

 

I ended going with a file called ".moverignore" (Much like the syntax of .plexignore). Nothing inside the file (unless you want to put a sentence or two about what the crap this file is for in case you lose your memory for some reason 😂) and I simply created it from a blank txt file and changed the extension and removed the name. The period in front keeps it hidden to Windows *if* you choose to hide hidden files (I don't...)

 

First, I had to create all the files in the source share folders within the second split levels.

 

\\UNRAID\Media\Subfolder1\.moverignore

\\UNRAID\Media\Subfolder2\.moverignore

\\UNRAID\Projects\Subfolder1\.moverignore

 

 

I noticed that once I did this, they immediately show up on the cache drive share (that's fine and expected). So I ran the MOVER which then deleted the files from the cache drive share ONLY (again as expected). Then I manually recreated the folder structure on ONLY the cache drive share.

 

\\UNRAID\cache\Media\Subfolder1\.moverignore

\\UNRAID\cahce\Media\Subfolder2\.moverignore

\\UNRAID\cache\Projects\Subfolder1\.moverignore

 

Finally, I ran the mover again, and voila, the structure remained! Nothing happened as expected. If I put a new file in any of these folders on the cache drive share, parallel to the dummy file, the mover still works and moves only the net new files, and the directory structure persists.

 

So far so good. I'll continue to monitor. It Ain't Pretty, but it WORKS! Now I can write to my cache drive at full 1 GB/s speed and bypass the shfs overhead, but still have the MOVER do its job and write the files to the array/parity normally later!

 

Thanks so much everyone. I'll mark this as solved with a workaround.

  • Thanks 1
Link to comment
56 minutes ago, falconexe said:

Nothing inside the file (unless you want to put a sentence or two about what the crap this file is for in case you lose your memory for some reason 😂)

I strongly advise populating the file with the why and how this is currently working.

 

It may work so well you totally forget about it in a few years, and something changes to make it not work, or you need to "fix" something. Even worse, it might start to feel like this file is being parsed on purpose by Unraid, or someone else needs to manage the server and thinks the file has special meaning to Unraid.

  • Like 1
Link to comment
  • 3 months later...

As this is only a workaround: Will this be solved in the future? My overhead / loss is 40%. For me its looking like the shfs process is not able to use the full cpu performance as none of my cores reaches more than 65% load while a transfer is running. A transfer to \\tower\cache\music produces much more load on multiple smbd processes, while a transfer to \\tower\music has only one single smbd process with high load. It seems smb is not able to use its multi-thread power as long the shfs "layer" is in front?!

Edited by mgutt
Link to comment
  • 3 weeks later...

@falconexe

You have two E5-2620 CPUs, correct? This means ≈865 Passmark points per core. I had an Atom C3758 with ≈584 points per core resulting 450 MB/s. And since today I have an i3-8100 with ≈1538 points per core and now I'm reaching 700 MB/s to \\tower\share and 800 MB/s to \\tower\cache\share (same 10g cards, same 10g switch, same NVMe, same cables).

 

My conclusion: There is no "overhead", it's simply a non existing multi threading.

 

I bet my other problem (low transfer speeds with many parallel open SMB sessions) is much less present now, too.

Edited by mgutt
Link to comment

Jipp, this is exactly the same for my xeon...

If i use /tower/share i get ~500 Mb/s and if i use /tower/cache/share i get ~980 MB/s...

On direct share access one single core has 100% usage...

However, as i have 128 GB ram an my cache is 10% even the nvme cache isn't used on file copy until it's finished.

 

After that it gets flushed to cache at 3-4 GB/s.

So RAM should always be fast, but it isn't, because of single thread on direct share writing.

 

Greetings

Dark

  • Like 1
Link to comment
30 minutes ago, JorgeB said:

Samba is also single threaded

Since SMB3.0 this is not valid anymore.

8713.image_thumb_6CA58A23.png.9a8962d6596b3ba087fe39ff8b6a6976.png


But its need to enable "server multi channel support = yes" in the smb.conf. This is something I will test regarding my other problem with multiple smb connections / sessions. But finally this is not as important as the shfs process as the smb process is not as cpu hungry. I used 10g for a long time with my synology nas (Intel Atom C3538 core ≈399 passmark points, no ssd cache) and never enabled multi channel.

Edited by mgutt
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.