Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking severely degraded on versions > 4.12 #13393

Closed
3 tasks done
rg9400 opened this issue Apr 13, 2023 · 10 comments
Closed
3 tasks done

Networking severely degraded on versions > 4.12 #13393

rg9400 opened this issue Apr 13, 2023 · 10 comments

Comments

@rg9400
Copy link

rg9400 commented Apr 13, 2023

  • I have tried with the latest version of Docker Desktop
  • I have tried disabling enabled experimental features
  • I have uploaded Diagnostics
  • Diagnostics ID: 7E746511-651C-4A74-8C84-91189E8962C1/20230406225924

Actual behavior

I created a very detailed ticket for paid support, but they told me to just come and create it here.

I am facing numerous degradations in networking performance from any version later than 4.12.0. I will list out the symptoms below, but after updating multiple times to each version after 4.12 as well as trying to update to 4.18 at least 5 times, these issues always get introduced. As soon as I revert to 4.12.0, they instantly go away. Something happened in 4.13 (maybe related to the proxy changes) which has resulted in significant network degradation.

  1. My download speed on my Windows host is around 1gbps, usually hitting 900mbps. Because of Docker/WSL, I was only ever getting 200mbps in Docker containers on any version up to 4.12.0. On versions greater than 4.12, speedtest drops to an average of 40mbps which is only 20% of the original throughput.

  2. I run Uptime-Kuma in my Docker Compose stack. I have it checking multiple container health statuses by pinging the container at a specific port. I use Docker DNS resolution, so I set the container name as the host and the relevant port in the monitors. I have 60 monitors for each container. This works perfectly in 4.12. As soon as I update, instantly there are intermittent Connection Refused hits on the monitors. To be clear, this does not happen for every single ping, but it starts happening enough that the monitors keep flipping between healthy/unhealthy all of a sudden.

  3. I have a CIFS volume for a shared Windows folder in my compose which is attached to multiple containers. I notice multiple containers suddenly see the volume disappear. It reappears, but again, like point 2, it starts happening intermittently.

All of these are symptoms of what is probably dropped packets in the networking, and I am sure there are significant other issues occurring. I noticed a lot of other issues that might be related. For example, this reply seems to highlight the change happening on 4.13+ - #13092 (comment)

These are pretty significant downgrades in network performance which is basically making it impossible for me to use anything higher than 4.12 in production. I assumed these would have been fixed now after half a year, but seems they may not even be on the team's radar?

Expected behavior

Networking should work like in 4.12. Maybe there is demand for the proxy changes, but this level of network degradation is not an acceptable tradeoff for it.

Information

  • Windows Version: Windows 11 Pro 22H2 22621.1555
  • Docker Desktop Version: 4.18.0
  • WSL2 or Hyper-V backend? WSL2
  • Are you running inside a virtualized Windows e.g. on a cloud server or a VM: Bare metal Windows 11

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

It seems the diagnostics might be broken in 4.18.0. I noticed someone else having all the nonsense about virtualization and stuff not being enabled here #13388 after running the self-diagnostics, which is the exact same issues my self-diagnostics are giving. These are clearly not correct as they do not occur on 4.12.0, and I wouldn't be able to run a compose stack with 60 containers on production with those basic steps missing. Apparently this is creating too much noise for them to dig into what is happening and makes this command unhelpful

Steps to reproduce the behavior

First Issue

  1. Run a Speedtest Container on 4.12 such as below. Ideally overall download speed is high enough to notice reductions
speedtest:
    image: henrywhitaker3/speedtest-tracker:dev
    container_name: speedtest
    environment:
      - OOKLA_EULA_GDPR=true
    ports:
      - 8765:80
    restart: unless-stopped
  1. Collect and calcualte average speedtest performance
  2. Upgrade to 4.18
  3. Collect and calculate average speedtest performance

Second Issue

  1. Run Uptime-Kuma on 4.12 such as below. Ideally, have a handful (I have 60, not sure if you can notice with fewer) containers/monitors to check
  uptime-kuma:
    image: louislam/uptime-kuma:beta
    container_name: uptime-kuma
    volumes:
      - ~/docker/configs/uptime-kuma:/app/data
    ports:
      - 3001:3001
    restart: unless-stopped
  1. Create a handful of monitors to check containers in the compose. Use container name for host and run the port monitoring
  2. Verify that none of the monitors are failing randomly in the Docker Logs for the container
  3. Upgrade to 4.18
  4. Monitor Docker Logs and verify if monitors start failing with Connection Refused errors

Third Issue

  1. Create a CIFS volume in your compose as below on 4.12
  backups:
    driver_opts:
      type: cifs
      o: username=${HOSTUSER},password=${HOSTPASS},iocharset=utf8,serverino,rw,uid=${PUID},gid=${PGID},file_mode=0777,dir_mode=0777
      device: \\${HOSTIP}\backups
  1. Mount this volume into a container that just checks the existence of the volume every few seconds (I don't have any easy to use examples for this off hand, but should be very easy to create)
  2. Verify the existence is persistent
  3. Upgrade to 4.18
  4. Monitor the logs in the container and notice intermittent disconnections/disappearances of the volume
@jmto-cambri
Copy link

I think the networking in 4.18.0 is totally bonkers. I've had big troubles even contacting port that the container is listening.
Running diagnosis:

"C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

Fails to ping ipc stuff and finding stuff. Reverted to 4.16.3 and the diagnose works much better.

@jmto-cambri
Copy link

I figured out my problem. I have Windows 11, running Ubuntu in WSL2 where I run docker.
If my docker bind ports with 127.0.0.1 they are not working and exposed to even to the Ubuntu or Windows.
I need to bind 0.0.0.0 and I can see the ports exposed in Ubuntu and Windows.

Something has changed.

@noselasd
Copy link

noselasd commented Apr 17, 2023

I think the networking in 4.18.0 is totally bonkers. I've had big troubles even contacting port that the container is > listening. Running diagnosis:

Yes, there's something not working with docker networking on windows in 4.18.0
All projects I got with a docker-compose.yml file that has ports binding to localhost is not working, the port is in some odd state, it can be connected to, but immediately disconnects, e.g, this is no longer working:

ports:
      - 127.0.0.1:3306:3306

Changing it to

ports:
      - 0.0.0.0:3306:3306

Or just

ports:
      - 3306:3306

However that's of course not a generally good idea to expose stuff to the outside world, even if in theory the windows firewall would save you if it by chance is thought of and properly configured.

@jmto-cambri
Copy link

Yes. That was what I was referring to with my comment.

I need to bind 0.0.0.0 and I can see the ports exposed in Ubuntu and Windows.

I'm starting to think would the hosts file issue somehow explain this or be relevant for the cause. Eg, fails to find the host and something something. At least not seen issue that would better describe the listening issue itself.

Ref. #13388 and #13398

@djs55
Copy link

djs55 commented Apr 18, 2023

Hi,

I've got an experimental build with lots of Windows networking fixes if you'd like to give it a go:

In particular container outgoing bandwidth should be improved, for example I have a 1G network and I can now saturate that from a container: (with iperf3 -s running on a local Mac and the client running under Docker/WSL2)

>  docker run -it djs55/iperf3 iperf3 -c 192.168.1.17
Connecting to host 192.168.1.17, port 5201
[  5] local 172.17.0.2 port 35714 connected to 192.168.1.17 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   119 MBytes   999 Mbits/sec    0    527 KBytes
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    527 KBytes
[  5]   2.00-3.00   sec   113 MBytes   948 Mbits/sec    0    527 KBytes
[  5]   3.00-4.00   sec   110 MBytes   923 Mbits/sec    0    527 KBytes
[  5]   4.00-5.00   sec   112 MBytes   937 Mbits/sec    0    527 KBytes
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec    0    527 KBytes
[  5]   6.00-7.00   sec   114 MBytes   952 Mbits/sec    0    527 KBytes
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec    0    527 KBytes
[  5]   8.00-9.00   sec   112 MBytes   939 Mbits/sec    0    527 KBytes
[  5]   9.00-10.00  sec   113 MBytes   952 Mbits/sec    0    527 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   947 Mbits/sec    0             sender
[  5]   0.00-10.03  sec  1.10 GBytes   941 Mbits/sec                  receiver

iperf Done.

I also found a long-standing bug in the port-forwarding code which could cause it to stop working (particularly if under load) until Docker was restarted.

Regarding port forwarding, watch out for clashes between Docker Desktop's port-forwarding and WSL 2's port-forwarding. If you're still having trouble, perhaps try disabling the localhostForwarding option in .wslconfig (some docs here) and stopping Docker and wsl --shutdown to see if it works any better.

Thanks for all your patience on this.
Dave

@rg9400
Copy link
Author

rg9400 commented Apr 18, 2023

Hello @djs55,

Thanks for the experimental build.

My issues seem to be resolved with this build.

  1. I ran the iperf container like you did and was able to get close to 1G speeds connected onto my Windows host. However, my speedtest to an external server is still only at 170mbps, which is also lower than the 500mbps I get running speedtest directly in WSL. That said, on 4.18.0, I was averaging 40mbps, so this experimental build seems to have returned bandwidth throughput to the baseline in 4.12.0.
    image

  2. The internal networking between containers throwing CONNREFUSED seems to be resolved. I will monitor, but on earlier builds, I would instantly start seeing those errors, and running this build for a few hours, there seems to be no issues so far.

  3. Similarly, I am not seeing my CIFS volume disappear so far. Will monitor as well, but again, I noticed this quickly on 4.18 and haven't so far.

BONUS: If you remember, you worked with me to reproduce this issue #8861 a few years ago. I reran the curl testing that we did to reproduce the issue. The issue still persists, but it takes much longer for the connections to start hitting the timeout limit.

@yuriy-boyko
Copy link

Hello, for Windows, the experimental version 4.19.0 + localhostForwarding=false helped me, and now docker works, but now i have another problem

Projects that use address like http://localhost:3000/ (react, vue, node) won't work

For now, my solution is to remove localhostForwarding=false when working with react, vue, node, etc and put back that config when using docker.

Is there any way to get both working?
Thanks

@rg9400
Copy link
Author

rg9400 commented Apr 22, 2023

One thing to highlight with the experimental build is that the RAM usage seems incredibly high. I have 128GB RAM, with 48GB set as the limit for WSL. Docker Desktop is almost using that amount in addition to the RAM being reserved by WSL (idle since nothing is running in WSL except Docker).

This is a separate issue to the networking one, but since it's related to the build you shared, not sure where else to post

image

@rg9400
Copy link
Author

rg9400 commented Apr 27, 2023

Closing this issue as the recently released 4.19 resolves the issues in the initial post. The timeouts to host are also greatly improved but still exist, so keeping that issue open.

@rg9400 rg9400 closed this as completed Apr 27, 2023
@docker-robot
Copy link

docker-robot bot commented Jun 23, 2023

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked

@docker-robot docker-robot bot locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants