Poor file transfer performance over WireGuard


Go to solution Solved by weirdcrap,

Recommended Posts

UPDATE 3/3/2021: I have definitively determined my performance issues are caused by WireGuard. I do not yet know if or when I'll find a solution.

 

UPDATE 8/1/2022: This is still very much broken. I try a file transfer every couple of months and it continues to be horribly slow. Using RSYNC over SSH outside the Wireguard tunnel works great and is what I will continue to use until I can figure this sh*t out.


FINAL UPDATE 11/24/2022: See my last post here for solution and TL;DR:

 

Let me preface all of this by saying I'm not sure where my issue lies, so I'm going to layout what I know and hopefully get some ideas on where to look for my performance woes. 
 

The before times:

Before setting up WireGuard I had SSH open to the world (with security and precautions in place) on my main server so that once a month my backup server could connect and push and pull content as defined in my backup script. This all worked splendidly for years and I always got my full speeds up to the bandwidth limit I set in my rsync parameters.

 

Now: 

With the release of WireGuard for UnRAID I quickly shutdown my SSH port forward and setup WireGuard. I have one tunnel for my administrative devices and a second tunnel which serves as sever2server access between NODE and VOID.

 

NODE is my main server, and runs 6.8.3 stable. It is located on a 100Mbps/100Mbps fiber line.

UPDATE: As a last ditch effort I upgraded NODE to 6.9.0-RC2 as well, no change in the issue.

 

VOID is my backup, runs 6.9.0-RC2 and lives in my home on a 400Mbps/20Mbps cable line.

 

 

In this setup, my initial rsync session will go full speed for anywhere from 5-30 minutes, then suddenly and dramatically drop in speed, down to 10Mbps or less and stay there until I cancel the transfer. I can restart the transfer immediately and regain full speed for a time, but it always eventually falls again.

 

Here is my rsync call: 

rsync -avu --stats --numeric-ids --progress --delete -e "ssh -i /mnt/cache/.watch/id_rsa -T -o Compression=no -x -o StrictHostKeyChecking=no" [email protected]:/mnt/user/TV/Popeye/ /mnt/user/TV/Popeye/

 

Here is a small sample of the rsync transfer log to illustrate the sudden and sharp  drop in speed:

Season 1938/Popeye - S1938E09 - Mutiny Ain't Nice DVD [BTN].mkv
    112,422,538 100%   10.80MB/s    0:00:09 (xfr#24, to-chk=58/135)
Season 1938/Popeye - S1938E10 - Goonland DVD [BTN].avi
     72,034,304 100%    9.76MB/s    0:00:07 (xfr#25, to-chk=57/135)
Season 1938/Popeye - S1938E11 - A Date to Skate DVD [BTN].mkv
    138,619,127 100%   10.44MB/s    0:00:12 (xfr#26, to-chk=56/135)
Season 1938/Popeye - S1938E12 - Cops Is Always Right DVD [BTN].mkv
    127,109,972 100%   11.02MB/s    0:00:10 (xfr#27, to-chk=55/135)
Season 1939/Popeye - S1939E01 - Customers Wanted DVD [BTN].mkv
    114,673,044 100%   10.50MB/s    0:00:10 (xfr#28, to-chk=54/135)
Season 1939/Popeye - S1939E02 - Aladdin and His Wonderful Lamp DVD [BTN].mkv
    325,996,501 100%   11.69MB/s    0:00:26 (xfr#29, to-chk=53/135)
Season 1939/Popeye - S1939E03 - Leave Well Enough Alone DVD [BTN].mkv
    105,089,182 100%   11.30MB/s    0:00:08 (xfr#30, to-chk=52/135)
Season 1939/Popeye - S1939E04 - Wotta Nitemare DVD [BTN].mkv
    149,742,115 100%  754.78kB/s    0:03:13 (xfr#31, to-chk=51/135)
Season 1939/Popeye - S1939E05 - Ghosks Is The Bunk DVD [BTN].mkv
    114,536,257 100%  675.53kB/s    0:02:45 (xfr#32, to-chk=50/135)
Season 1939/Popeye - S1939E06 - Hello, How Am I DVD [BTN].mkv
     92,083,730 100%  700.03kB/s    0:02:08 (xfr#33, to-chk=49/135)
Season 1939/Popeye - S1939E07 - It's The Natural Thing to Do DVD [BTN].mkv
    110,484,799 100%  715.66kB/s    0:02:30 (xfr#34, to-chk=48/135)
Season 1939/Popeye - S1939E08 - Never Sock a Baby DVD [BTN].mkv
     97,660,132 100%  716.88kB/s    0:02:13 (xfr#35, to-chk=47/135)
Season 1940/Popeye - S1940E01 - Shakespearian Spinach DVD [BTN].mkv
    102,543,357 100%  632.64kB/s    0:02:38 (xfr#36, to-chk=46/135)
Season 1940/Popeye - S1940E02 - Females is Fickle DVD [BTN].mkv
    102,363,188 100%  674.34kB/s    0:02:28 (xfr#37, to-chk=45/135)
Season 1940/Popeye - S1940E03 - Stealin' Ain't Honest DVD [BTN].mkv
    100,702,236 100%  732.80kB/s    0:02:14 (xfr#38, to-chk=44/135)
Season 1940/Popeye - S1940E04 - Me Feelins is Hurt DVD [BTN].mkv
    111,018,052 100%  672.35kB/s    0:02:41 (xfr#39, to-chk=43/135)
Season 1940/Popeye - S1940E05 - Onion Pacific DVD [BTN].mkv
    103,088,015 100%  650.18kB/s    0:02:34 (xfr#40, to-chk=42/135)
Season 1940/Popeye - S1940E06 - Wimmin is a Myskery DVD [BTN].mkv
     61,440,000  59%  757.02kB/s    0:00:56  ^C
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(701) [generator=3.2.3]

 

and my accompanying stats page during the same transfer. You can see the sudden decline around 11:46 which coincides with my sudden drop in transfer speed above:

acitivty_drop1.thumb.png.95bd2bd4965d3e583997fb640a48d292.png

 

I don't see anything telling in the system logs on either server when this speed drop happens. It almost seems like a buffer is filling up and not being emptied quick enough, causing the speed to tank.

 

 

What I don't think it is:


I don't think my issue is with WireGuard or my ISP speeds on either end. While the transfer is crawling along over SSH at sub-par speeds I can easily browse to  NODE over WireGuard from my Windows or Mac computer and pick any file to copy over the tunnel and I can fully saturate the sending servers upload with no issues while SSH is choking in the background:

image.thumb.png.18cb92998a2dc22892fc37be3a5b0d60.png

 

 

Could it have something to do with the SSH changes that took place between 6.8.3 and 6.9.0? None of the changes I'm aware of sound like the culprit but I could be wrong. So  besides that I'm pretty much out of ideas on what it could be without just playing with random ssh and rsync options. 

 

Let me know if there is some other info I can provide, below are both servers diagnostic files:

 

node-diagnostics-20210204-0751.zip

void-diagnostics-20210204-0752.zip

 

EDIT: I just realized LimeTech has a guide about this published: https://unraid.net/blog/unraid-server-to-server-backups-with-rsync-and-wireguard

 

I looked it over and I'm not really doing anything different except not passing -z (compression) to rsync and disabling compression for the SSH connection. a lot of what is transferred for me is video and doesn't compress well so why waste the CPU cycles on it.

Edited by weirdcrap
latest update at top of thread
Link to comment
  • 2 weeks later...

It's my monthly data transfer and the performance is still crap.

 

I've played with every rsync and SSH option I can think of (--whole-file, --inplace). It doesn't matter if the data goes straight into the array or all gets written to a cache disk first.

 

Copying outside of SSH (ie using Windows file explorer) saturates my bandwidth (full 10MB/s) as expected but SSH can barely manage to maintain 1MB/s

 

EDIT: Copying via rsync/SSH from other servers I have access to do not suffer form performance issues. I'm able to max out my home internet bandwidth. 

 

EDIT2: I'm going to try setting up an SSHFS mount between the servers and see if it behaves any differently. I'm grasping at straws here so if anyone has a better idea I'm all ears.

 

EDIT3: SSHFS seemed to work at first, but its speed seems to crater eventually as well.

 

EDIT4: This is driving me up a wall. I have hundreds of gigabytes to transfer and at the horrendous dial-up level speeds I'm getting, I could WALK the files to the backup server faster than I could move them over the internet.

 

EDIT5: I'm playing with NFS shares over WireGuard since I can't figure out what is wrong with my SSH speeds between these two servers. Results are promising though I'm going to have to rescript my entire backup process.

 

EDIT6: I think I cracked it! I never considered stopping the few dockers on the server running the script as the stats in the webui and htop never indicated the CPU was anywhere near busy (it hovered around 10-15%). But sure enough, with everything stopped my speeds are back to normal and have stayed consistent. 

 

Specifically, it seems to be Plex's automatic library change scanning that was absolutely crippling my transfer rate. With dockers started and that turned off my speeds are holding steady so far.

 

EDIT7: EDIT6 is wrong, see latest posts.

Edited by weirdcrap
Link to comment
On 2/2/2021 at 2:18 AM, weirdcrap said:

EDIT6: I think I cracked it! I never considered stopping the few dockers on the server running the script as the stats in the webui and htop never indicated the CPU was anywhere near busy (it hovered around 10-15%). But sure enough, with everything stopped my speeds are back to normal and have stayed consistent. 

 

Specifically, it seems to be Plex's automatic library change scanning that was absolutely crippling my transfer rate. With dockers started and that turned off my speeds are holding steady so far.

 

Very interesting, thanks for documenting what you went through

Link to comment
18 hours ago, ljm42 said:

 

Very interesting, thanks for documenting what you went through

Yeah it has been an extremely frustrating journey for me, but I'm glad I've finally cracked it.

 

What led me to suspect resource utilization despite no clues that it was resource starved was that, with all other things equal, I could use rsync/ssh to push from NODE at my full speeds while trying to have VOID pull the files resulted in terrible speeds (no matter what protocol I used). So I assumed I had to have some sort of resource limitation where rsync/ssh were running from.

 

It's been running since this morning and speeds are stable, something I could never accomplish over the last couple of months for more than an hour at a time.

 

I've always had Plex monitoring the library for changes, so I'm not sure why it has suddenly become a big deal. The only major plex change that comes to my mind around that time was them adding intro detection to the server.

 

^This is wrong, the issue is back and as hard to pin down as ever.

Edited by weirdcrap
Link to comment
  • weirdcrap changed the title to Poor SSH performance over WireGuard

alright well I spoke to soon. 

 

I was having some issues with my plex docker running poorly on NODE so I rolled it back to 6.8.3 as a test. This appears to have fixed my plex issue but now VOID is only managing to pull about 1MB/s again (I was getting 10MB/s before). I can push from NODE to VOID at full speed as always...

 

It's late and I'm tired of messing with this, if the speed issue continues I may try to upgrade back to 6.9.0-rc2 and see if that makes the problem go away again.

 

Alternatively, could I downgrade back to 6.8.3 on VOID? I can't roll back using the update tool, can I just download 6.8.3 from the website and extract it onto my flash? 

 

EDIT: The answer is yes, just replace all the bz files. Downgrading VOID to 6.8.3 to see if it makes a difference.

 

EDIT2: JFC, so I can't downgrade to 6.8.3 on VOID because of the new f*cking partition layout for BTRFS.  So my choices are either blow away my cache again (I just upgraded it on Monday), or stay on 6.9.0-RC2 and never know if my problems are because of the beta or not.

Edited by weirdcrap
Link to comment

It would be really nice if someone, anyone, could come in here and post about their experiences with the RC vs the stable. Is anyone else doing what I'm doing with SSH and RSYNC? Are you having the head smashingly frustrating performance problems I have?

 

Who wrote LimeTech's guide on this? Are they having these kinds of problems? I'm absolutely shocked that in the three weeks I've been posting here about this not a single person has come forward to either confirm or deny that this is a legitimate issue vs something with my setup.

 

I'm sorry for my ranting, I'm beyond frustrated and disappointed that I keep thinking I've found the answer, only to be proven wrong time and time again mere hours later.

 

EDIT: OK so I've got NODE and VOID on RC2 again and NODE is finally behaving itself after discovering my CPU governor setting got changed.

 

I'm not sure what sort of voodoo magic I summoned yesterday when upgrading NODE to RC2 but the performance gains I saw the entire day yesterday disappeared that night. RSYNC over SSH is back to 1MB/s > when using VOID to pull from NODE. Whether dockers are running or not makes no difference.

Pushing from NODE to VOID continues to give me my full 10MB/s.

 

I've placed new diagnostics files in the OP.

 

Every single time I think I'm starting to figure out the problem, UnRAID throws me a god damn curveball. So now on 6.9.0-RC2 whether I run the script on NODE or VOID, the performance starts out good and then falls to 1MB/s > or less.

 

Edited by weirdcrap
Link to comment

@ljm42 I'm noticing an odd behavior with my server to server wireguard tunnel...

 

When I'm signed into NODE, I can't ping VOID initially. Once I start a ping from VOID to NODE however, replies start flowing both ways...

 

This is not something I had noticed before. Should I enable keep alive to prevent this?

 

I'm going to snag a video of this behavior.

 

EDIT: So as you can see in the video, I start to ping VOID's WG IP on NODE and get no replies until I start a ping from VOID to NODE. Then like magic all of a sudden NODE realizes VOID is, in fact, available.

2021-02-04_8-55-58.thumb.gif.9a43ea8dd8a0f826f4e6111e545e8f09.gif

 

I'm to the point in this I'm willing to offer a cash reward if someone can just tell me WTF is wrong with rsync/SSH.

Edited by weirdcrap
Link to comment
5 hours ago, weirdcrap said:

Who wrote LimeTech's guide on this? Are they having these kinds of problems? I'm absolutely shocked that in the three weeks I've been posting here about this not a single person has come forward to either confirm or deny that this is a legitimate issue vs something with my setup.

 

I think you are referring to the guest blog? It is credited at the top of the page and there is a link to discuss the blog at the bottom of the page.

 

I'm sorry I do not have a lot of experience with rsync myself. I don't do a lot of file transfers over WireGuard at the moment.

 

5 hours ago, weirdcrap said:

whether I run the script on NODE or VOID, the performance starts out good and then falls to 1MB/s > or less.

 

Out of curiosity, how much data is transferred before this happens?

 

You have already turned off dockers on both ends to try and simplify things, let's go the next step and bypass the array too. Try doing your rsync transfers from cache drive to cache drive, or even from one unassigned device to another.
 

Any chance you have a second Unraid machine on the same network so you can rsync without WireGuard? 

 

2 hours ago, weirdcrap said:

So as you can see in the video, I start to ping VOID's WG IP on NODE and get no replies until I start a ping from VOID to NODE. Then like magic all of a sudden NODE realizes VOID is, in fact, available.

 

Setting a keep alive will mask the problem, but if it were me I would want either side to be able to start the connection. 

 

If the connection can only be started from one side, then you need to check the port forwarding and "peer endpoints" for the other side.

 

It sounds like VOID has the proper peer endpoint to be able to reach NODE, and port forwarding is setup on NODE's side to accept it. But there is a problem going the other way, likely the peer endpoint is wrong or there is a problem with the port forwarding. Connection problems are very difficult to troubleshoot as WireGuard fails silently and there are no logs, I've tried to consolidate a bunch of info in the first two posts here: https://forums.unraid.net/topic/84226-wireguard-quickstart/

 

Link to comment
Quote

I think you are referring to the guest blog? It is credited at the top of the page and there is a link to discuss the blog at the bottom of the page.

 

I'm sorry I do not have a lot of experience with rsync myself. I don't do a lot of file transfers over WireGuard at the moment.

I am, I for some reason thought it was written by one of the unraid admins, I didn't realize it was a user guest blog. Either way I didn't mean to come off like I expected him to come help me, I just meant in comparing my self made setup to theirs I did everything I could "by the book" so to speak.

 

Quote

Out of curiosity, how much data is transferred before this happens?

 

You have already turned off dockers on both ends to try and simplify things, let's go the next step and bypass the array too. Try doing your rsync transfers from cache drive to cache drive, or even from one unassigned device to another.
 

Any chance you have a second Unraid machine on the same network so you can rsync without WireGuard? 

It varies, sometimes I can get through 10-20 files at 1-2GB a pop. Other times the speed will tank half way through the first file.

 

I did test a cache to cache transfer in my flurry of work yesterday. It did not improve the situation from what I recall but I did not document the results well so it bears another test to be sure.

 

I do not unfortunately and the primary server is actually hosted about 4 hours away from me so its not something I can just go and visit on a whim.

Quote

Setting a keep alive will mask the problem, but if it were me I would want either side to be able to start the connection. 

 

If the connection can only be started from one side, then you need to check the port forwarding and "peer endpoints" for the other side.

 

It sounds like VOID has the proper peer endpoint to be able to reach NODE, and port forwarding is setup on NODE's side to accept it. But there is a problem going the other way, likely the peer endpoint is wrong or there is a problem with the port forwarding.

When I first setup WireGuard I definitely tested this and both ends were able to start the connection so I'm not sure what changed there.

 

My router here with VOID is a pfsense router and from what I can tell my port forwards are setup correctly:

image.thumb.png.efc184e2bed8202c3048d321cd81736a.png
I'll look those threads over again as I'll admit my grasp of wireguard when i configured it was tentative at best and its all looking greek to me now looking back at it. 

 

I think I see my problem though, on NODE I have the peer address set to the same thing as my local endpoint. The peer for VOID should be my dynamic DNS name for my home IP right?

image.thumb.png.87d93318dd23e44f3bdc646ff46ff14d.png

 

 

EDIT: 

 

Started a cache (NODE) to cache (VOID) transfer and barely made it into the first file before the speed tanked.

 

 

Untitled.png

 

Meanwhile I've been uploading files from VOID to NODE for the last hour at full speed. 10GB files at 2.8MB/s (the max for my crappy comcast upload).

 

This is the part that drives me nuts, I can't seem to find a pattern to what transfers utilize their full potential speed while others just languish in the land of dial-up.

Edited by weirdcrap
added cache to cache test results
Link to comment
40 minutes ago, weirdcrap said:

I think I see my problem though, on NODE I have the peer address set to the same thing as my local endpoint. The peer for VOID should be my dynamic DNS name for my home IP right?

Yeah on NODE, the peer address for VOID should resolve to VOID's public WAN IP, which is then port-forwarded to VOID's internal IP. That peer address is what NODE will use to contact VOID to start the tunnel.

 

8 minutes ago, weirdcrap said:

Started a cache to cache transfer and barely made it into the first file before the speed tanked.

Well that is good really, it means parity calculations on the Unraid array are not the issue. But I'm struggling to think of what the issue could be.

 

 

  • Like 1
Link to comment
1 hour ago, ljm42 said:

Yeah on NODE, the peer address for VOID should resolve to VOID's public WAN IP, which is then port-forwarded to VOID's internal IP. That peer address is what NODE will use to contact VOID to start the tunnel.

 

Well that is good really, it means parity calculations on the Unraid array are not the issue. But I'm struggling to think of what the issue could be.

 

 

Ok the WireGuard issue is fixed, now either server can restart the connection.

 

It shouldn't have mattered but I'm going to start another transfer and see if the WireGuard fix made any difference.

 

I'm really hoping to rally the vast community knowledge here (and reddit), I can't imagine I'm the only person using rsync and ssh in this manner for backups.

 

EDIT: Nope, no difference with the WG issue fixed.

Edited by weirdcrap
Link to comment
22 minutes ago, weirdcrap said:

Ok the WireGuard issue is fixed, now either server can restart the connection.

great! 

 

22 minutes ago, weirdcrap said:

I can't imagine I'm the only person using rsync and ssh in this manner for backups.

I did a Google search for "rsync slows down over time".  There are a lot of results :) 

 

Maybe some ideas here?

 

Link to comment
1 hour ago, ljm42 said:

great! 

 

I did a Google search for "rsync slows down over time".  There are a lot of results :) 

 

Maybe some ideas here?

 

Yeah I've been looking through Google, I was just hoping someone here would have encountered this issue before and would be able to guide me to a solution quicker than I could find one just trying random stuff I find online. 

 

A lot of what I find on Google seems to be people expecting rsync to be able to saturate gigabit ethernet consistently, or somehow magically overcome the limitations of their  USB 2.0 drive, HDD I/O, CPU, whatever bottleneck they may have:

 

https://serverfault.com/questions/377598/why-is-my-rsync-so-slow

https://unix.stackexchange.com/questions/303937/why-my-rsync-slow-down-over-time

https://superuser.com/questions/424512/why-do-file-copy-operations-in-linux-get-slower-over-time

 

The consensus online seems to be rsync and SSH are not and were never meant to be the most performant pieces of software out there, which I totally get. There are threads full of alternatives to try. However, in my googling I'm regularly seeing all these people who complain about the slowness of rsync easily achieving the (IMO) very low speeds I'm wanting. Their worst reported speeds are honestly what I'm aiming for here.

 

Like your example from reddit, I'd be absolutely thrilled if I could maintain 7MB/s or even 5MB/s that they complain of in that thread. Everywhere I've seen stats reported, rsync with SSH should be completely capable of maintaining this paltry speed with the encryption & protocol overhead and my so-so hardware. I normally limit my rsync jobs to 5MB/s (--bwlimit=5000) as I have to share bandwidth with other servers where NODE is hosted and I'm positive that it can handle a consistent 5MB/s stream of data both sending and receiving. I'm not seeing high iowaits, CPU, RAM, or anything really when the transfers do slow down. That is what has made this so hard to diagnose, a complete and utter lack of clues.

 

 

https://www.raspberrypi.org/forums/viewtopic.php?p=1404560&sid=ac2739c958d835a87f2afff7ad0df267#p1404560

This suggestion is interesting, I had not considered that it could be in between network equipment. However I'm able to utilize other TCP heavy protocols at maximum speed like SCP, SFTP, FTP when downloading files from the internet to these servers separately. I simply can't make them talk quickly to each other, for extended periods of time.

 

Finally, I want to say thank you for taking time out of your day to look at this with me. Trying to figure out a problem you've been working on for weeks and weeks requires new perspectives sometimes.

Edited by weirdcrap
Link to comment

Hi @weirdcrap,

 

Unfortunately I'm not here to offer a solution.

 

But to share my experience, that is basically, the same problem, performance dropping drastically after some time.

 

However my environment is completely different, to be precise:

 

  • I do not use UnRAID at all.
  • Both my endpoints are in 600/600 FTTH.
  • I tried NFS4 over UDP and TCP, SMB3 over TCP, SSHFS over TCP, rsync over TCP. All thru Wireguard.
  • iperf3 shows 350Mbps (half of the link speed, but, well, it's ok).
  • Speed starts at about 300Mbps, suddenly gets down to 90Mbps. Amount transferred is irrelevant, seems random. After more than half an hour with no transfer, the speeds go back to ~300.
  • Maximum disk speed on source is 500MBytes/sec, on destination it's 300MBytes/sec.
  • Server is on a VMware ESXi VM, with 4Gb of RAM, 2 cores from an AMD Ryzen 3400.
  • Client is native Xeon E5650 with 48Gb of RAM.
  • Both client and server are running Arch Linux with latest updates.
  • When the speeds gets down there is no CPU usage (less than 5%), memory usage is minimal (megabytes) and network is, well, 90Mbps.
  • When the speeds gets down, immediately using iperf3 shows the slow speeds as well.

To me it pretty much seems to be something about Wireguard, or maybe their ChaCha algorithm.

  • Like 1
Link to comment

@clauniaThat is interesting, I have been blaming my setup for the issues. With how widespread WireGuard adoption is becoming and its touted speed I would have thought someone would have noticed this before us if it was in fact being caused by WireGuard. 


I also wonder why it only seems to affect SSH/RSYNC transfers for me. I can use Windows file explorer to copy data at full speed.

 

I've used up most of my data cap from my ISP for this month so I can't do more testing right now. However next month I plan on re-enabling direct SSH access again with public key only auth setup for a few days of testing and comparing what speeds I can get. 

 

I will be quite disappointed if it turns out I can't use WireGuard for backups...

Edited by weirdcrap
Link to comment
  • 2 weeks later...

I've been playing with iPerf3 to see if I can saturate my wireguard tunnel with other programs/protocols. My hope is to determine if this is in fact an SSH/RSYNC issue or if this random speed issue is WireGuard itself.

 

With NODE running "iperf3 -s" and VOID acting as the client with "iperf3 -c 10.253.1.1 -t 30 -P 8" I can only seem to get about 20-25Mbps....

 

Cranking parallels up to 30 made no difference, I still maxed out at a meager 25Mbps. Is this indicative of a bigger problem or am I just not using iperf correctly? This is my first time messing with it so I'm inclined to think I'm doing something wrong with my test.

 

EDIT: Could my speed woes be an MTU issue?

https://www.reddit.com/r/WireGuard/comments/jn7d7e/slow_file_transfer/

https://keremerkan.net/posts/wireguard-mtu-fixes/

https://superuser.com/questions/1537638/wireguard-tunnel-slow-and-intermittent

 

running traceroute --mtu <endpoint> on both ends of the server to server tunnel show my first hop MTU of 1500 so I assume that means wireguards default of 1420 should be fine for me right? 

 

I may try to tune it lower like the second link suggests just to test during my monthly data transfer to see if it has any real affect on the issue or not.

 

 

EDIT2: Well my iperf3 issues appear to be my home internet's limited upload. In my described setup above VOID as the client was uploading data to the server rather than downloading data from it. Using --reverse fixes it and I seem to be able to fully maximize my bandwidth both directions.

 

I'll have to pick a time this evening to run an iperf3 test where I can saturate the server bandwidth for like 45 minutes and see if it is able to maintain it the entire time.

Edited by weirdcrap
added stuff about MTUs
Link to comment

It's that time of the month again! Just upgraded both servers to 6.9.0 stable.

 

I haven't done my test with iperf3 yet that I mentioned above, I figured I'd try to adjust my MTU first and see what happens. 

 

I have lowered MTU to 1400 on both ends of the tunnel and have started a data transfer.

 

EDIT: 1400 didn't work, trying 1380 as recommended in that link I posted. I don't know what else to try.

Edited by weirdcrap
added details about my upgrade to 6.9 STABLE
Link to comment

Some promising results with an MTU of 1380. I was able to complete a 60GB transfer from start to finish with zero drop in speed. I've started a much larger 250GB transfer, we shall see if I can maintain speed through the entire thing.

 

EDIT: ANNNNDDDDD just like that there it goes. This is just ridiculous.

 

EDIT2: I've re-opened SSH up to the world, restricted to my current comcast IP only. I'm running my 250GB transfer completely outside of the WireGuard tunnel and so far so good, but I've said that dozens of times up to this point so I won't hold my breath.

 

For whatever reason, despite making the appropriate changes in sshd_config and rebooting for good measure, I cannot get UnRAID to completely disable password auth.

Edited by weirdcrap
Link to comment

Final update. I regret to report that I am now 100% positive this entire issue is caused by WireGuard itself as @claunia mentioned and no amount of setting changes is going to fix it.

 

I re-enabled direct SSH access to NODE through the firewall and restricted it to my current comcast IP. When using the exact same setup and going entirely around the WG tunnel I get my full consistent speeds without any random drops or issues.

 

I transferred roughly 300GB of data over the course of about 12 hours yesterday and NOT ONCE did the speed drop to an unacceptable level (only minor variances of +/-0.5MBps). This is the performance I have been seeking from the WireGuard tunnel and I am still no closer to figuring out why this simply doesn't work consistently.

 

@ljm42 I assume WireGuard was not built with CONFIG_DYNAMIC_DEBUG enabled in the Kernel? 

 

 

I would really like to see this issue solved. However with nothing but anecdotal evidence and my personal testing I don't feel right just showing up on the WireGuard mailing list (or wherever the devs hang out) and saying there's a problem without some actionable proof. That and I have no flipping clue what about WireGuard causes this performance issue.

 

EDIT: I guess I'll join #Wireguard on Freenode and see if any of the devs have a clue on what, if anything else, I can try to change to fix this or get some actionable proof and a fix made.

 

 

Edited by weirdcrap
Link to comment
  • 5 weeks later...

Well after upgrading my existing gaming PC I decided to take the old Mobo and CPU and put it into VOID as a major upgrade. So now VOID has an i7-4790K and an ASUS ROG Maximus VII Hero with an Intel Killer NIC. A complete change of hardware has made no difference in this issue either.

 

I made it about 30 minutes into copying a 50GB ISO file and the speed has tanked already.

Edited by weirdcrap
Link to comment
  • 1 month later...

Hi, just joining the club.

All my downloads stall. The speed drops from ~50mb/s until a few kb/s and then it stalls. My Unraid server looses all internet connection and I would have to deactivate wireguard, wait a bit end re-enable. It's strange, but I too think it's a problem with wireguard.

 

Sorry I can't help, I'm feeling your pain!

Link to comment

I'm kind of in the same boat. 2 unRAID-Servers connected thru Wireguard, accessing SMB shares mounted thru Unassigned Devices (but it also happens when mounted manually). unRAID and all plugins are updated.

 

Here's what happens: The download starts and most of the time - not always though - it drops from 250 MBit/s to 10 MBit/s at some point in time. Sometimes after a minute, sometimes after hours, while copying a file and sometimes while starting to copy a new one, with rsync or Midnight Commander. I mostly can fix it by unmounting the share and remounting it and the game starts again. I tried troubleshooting for hours, using Google and the unRAID forum (and just trying out stuff) and more than once thought I made it...but it always comes back.

 

What I think I should mention is CPU load/multi core behavior. When wireguard and SMB is working, multiple cores are used and load is jumping all over them, sometimes spiking up to 98% on a core for a split second then using multiple cores again. But when the problem occurs, only one core is used. It shoots up to about 90% and some milliseconds later to 100% and it stays there for maybe two seconds. Then it jumps to another core, 90% 100% 2s, next core. But it doesn't seem to make a difference in "top", the CPU load per process seems to stay the same as if it's still using multiple cores.

 

Unfortunately I have no idea what info could be usefull to continue troubleshooting, so please let me know if you have any suggestions.

Edited by Torben
Link to comment
  • weirdcrap changed the title to Poor file transfer performance over WireGuard

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.