Hi all, I've been facing a problem with my Unraid server for a while whereby a large file transfer to the server will somehow upset Docker and cause problems, including: The Docker portions of the webGUI become slow Attempts to stop, remove, or kill Docker containers fail with error messages along the lines of "attempted to kill container but did not receive a stop event from the container" This is regardless of whether I use the webGUI, the command line, or Compose Connections to Docker containers from elsewhere on the network fail. (particularly annoying) my ADS-B receiver Docker container stops working and needs to be manually restarted before it'll work again, even if the offending transfer has already finished. Examples of big transfers in that can cause this problem: A backup coming in to the server from offsite (through rsync directly on Unraid) Windows File History backing up to the server (through Unraid's built-in SMB server functionality) A large download through Lancache on the server (through the Lancache Docker container) You'll notice that the first two of those should have nothing to do with Docker. Frustratingly, the following do not cause the problem: A backup going offsite from the server (rsync again) Mac OS backing up through (a Time Machine Docker container) Media being served out to other devices (built-in SMB) Clearly, my server is running out of some resource that Docker needs, but I'm at a loss as to what it is. Despite the problem being associated with a lot of network traffic, I doubt it's bandwidth itself, since one of the triggers comes from offsite, and it's unlikely my 100Mb connection (less VPN overheads) is saturating the server's gigabit Ethernet. While the CPU does run up during problematic transfers, it doesn't seem to be totally overloaded, and there's bags and bags of free memory. Let's take as a specimen what happened yesterday evening and see what Netdata recorded: As you can see, the ADS-B receiver fell over sometime around 17:00. I believe this was triggered by Windows backing up to Unraid's SMB server. The CPU runs up a bit for about 20 minutes at 16:30, then a bit more an hour later, but it never goes above 60%: There's some disk IO at the same times: Nonetheless, we have all of the memory: And there's a network traffic spike, but nothing gigabit Ethernet shouldn't be able to handle: Can anyone suggest other metrics I should examine? Or is there a way to decrease the general niceness of the Docker daemon so it will just take a bigger share of what resources there are? Vital statistics: Unraid v6.12.6 Dell PowerEdge R720XD Diagnostics are attached

[SOLVED] High inward network traffic upsets Docker

Followers

December 28, 20232 yr

Hi all,

I've been facing a problem with my Unraid server for a while whereby a large file transfer to the server will somehow upset Docker and cause problems, including:

The Docker portions of the webGUI become slow
Attempts to stop, remove, or kill Docker containers fail with error messages along the lines of "attempted to kill container but did not receive a stop event from the container"
- This is regardless of whether I use the webGUI, the command line, or Compose
Connections to Docker containers from elsewhere on the network fail.
(particularly annoying) my ADS-B receiver Docker container stops working and needs to be manually restarted before it'll work again, even if the offending transfer has already finished.

Examples of big transfers in that can cause this problem:

A backup coming in to the server from offsite (through rsync directly on Unraid)
Windows File History backing up to the server (through Unraid's built-in SMB server functionality)
A large download through Lancache on the server (through the Lancache Docker container)

You'll notice that the first two of those should have nothing to do with Docker.

Frustratingly, the following do not cause the problem:

A backup going offsite from the server (rsync again)
Mac OS backing up through (a Time Machine Docker container)
Media being served out to other devices (built-in SMB)

Clearly, my server is running out of some resource that Docker needs, but I'm at a loss as to what it is. Despite the problem being associated with a lot of network traffic, I doubt it's bandwidth itself, since one of the triggers comes from offsite, and it's unlikely my 100Mb connection (less VPN overheads) is saturating the server's gigabit Ethernet. While the CPU does run up during problematic transfers, it doesn't seem to be totally overloaded, and there's bags and bags of free memory. Let's take as a specimen what happened yesterday evening and see what Netdata recorded:

image.png.b127eb6f993e67dfac3e50e3e396e421.png
As you can see, the ADS-B receiver fell over sometime around 17:00. I believe this was triggered by Windows backing up to Unraid's SMB server. The CPU runs up a bit for about 20 minutes at 16:30, then a bit more an hour later, but it never goes above 60%:

There's some disk IO at the same times:

Nonetheless, we have all of the memory:

And there's a network traffic spike, but nothing gigabit Ethernet shouldn't be able to handle:

Can anyone suggest other metrics I should examine? Or is there a way to decrease the general niceness of the Docker daemon so it will just take a bigger share of what resources there are?

Vital statistics:

Unraid v6.12.6
Dell PowerEdge R720XD
Diagnostics are attached

Edited July 18, 20242 yr by ScottAS2
Mark as solved

Quote

Solved by tpill90

February 21, 20242 yr

Go to solution

1 month later...

February 21, 20242 yr

Solution

What drives (model number is helpful) are you running in the server? Do they have a dram cache?

This sounds like your drives simply can't handle the write workload. The SLC cache on the drive is exhausted, so the remaining writes go at the actual full speed of the drive. The huge drop in performance causes huge contention between the applications writing to disk, amplifying the performance issues they all are having.

The reads aren't affected by this because reads don't suffer from the same kind of issues. They will generally be at full speed.

Quote

February 21, 20242 yr

Author

14 hours ago, tpill90 said:

What drives (model number is helpful) are you running in the server? Do they have a dram cache?

This sounds like your drives simply can't handle the write workload. The SLC cache on the drive is exhausted, so the remaining writes go at the actual full speed of the drive. The huge drop in performance causes huge contention between the applications writing to disk, amplifying the performance issues they all are having.

The reads aren't affected by this because reads don't suffer from the same kind of issues. They will generally be at full speed.

This chimes with a discovery I've made in the meantime: bypassing the cache and writing directly to the array seems to solve the problem. Both cache drives (BTRFS/RAID1) are Crucial BX500 1TB SATA SSDs; part number CT1000BX500SSD1. I'm not sure how to find out if they have a DRAM cache, although the data sheet makes no mention of it, which probably means "no".

Quote

4 months later...

July 18, 20242 yr

Author

Ultimately, I "solved" this by adding two new SSDs and creating a second pool on them for the appdata and system shares. That suggests tpill90 was very much on the right track, if not correct.

Quote

2 yr2 yr SarahAS2 changed the title to [SOLVED] High inward network traffic upsets Docker

July 18, 20242 yr

Community Expert

On 2/21/2024 at 10:08 AM, ScottAS2 said:

I'm not sure how to find out if they have a DRAM cache, although the data sheet makes no mention of it, which probably means "no".

Here is a review on the Crucial BX500 series of SSD drives:

https://www.pcworld.com/article/394337/crucial-bx500-sata-ssd-review.html

Read the section under Performance about what happens to the BX500 when the NAND memory is filled and the QLC is being filled in real time. This information, apparently, is not included on most spec sheets so you have to look for reviews that actually test for it! (NAND memory is much more expensive per bit than QLC memory and it is more power hungry.)

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

[SOLVED] High inward network traffic upsets Docker

Featured Replies

Solved by tpill90

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)