-
Notifications
You must be signed in to change notification settings - Fork 244
Inter-container traffic always NAT'd post-packer change? #193
Comments
FWIW, the recipes that used to build the boxes would apply replace-SNAT-with-MASQ.patch while the new builds seem to use warden-linux without changes. I've actually wondered about the use of SNAT vs MASQ in warden/garden in the past; might be interesting to see if there are opinions there about changes to enable this use case without breaking the required encapsulation of containers on the DEAs. |
I understand why you'd want this on DEAs - but BOSH-Lite is supposed to be a general purpose tool, used for developing more than CF core. For example, using a Cassandra bosh-release through this SNAT is likely to result in Weird Stuff Happening. |
We have recently changed how bosh lite boxes are built and this feature was skipped. None of our tests caught this so we decided to postpone it until someone asks for it. It seems that a proper place for this would be as a configuration for garden-linux. Recently we have experimented with splitting off garden into its own bosh release and London team taking over maintaining it. Bosh-lite did not switch over to using that release yet pending some changes. I will talk with that team and see how hard would this be to support it. I'll update this issue with more info shortly. If you need a temporary solution you can always apply patch that was mentioned above via your vagrantfile. -dmitriy
|
Thanks Dmitriy. Yes, I have a workaround for my use case - just concerned that other people will run in to mysterious problems related to this change and not know the cause - seems this happened to CF London with cassandra thrift. |
+1 for addressing this, somehow. cassandra (thrift ?) doesn't like the NAT between bosh-lite VMS, changing. making the following changes (thanks @james-masson) on our bosh-lite (vagrant) VM fixed the problem: # remove post-routing NAT rule
iptables -t nat -F w--postrouting
# skip NAT for internal network
iptables -t nat -A w--postrouting --source 10.244.0.0/16 ! --destination 10.244.0.0/16 --jump MASQUERADE |
Tracker story in CF Garden: https://www.pivotaltracker.com/story/show/79557632 |
In the tracker story @aramprice mentioned that his problems went away on current versions of bosh-lite. Mine haven't - had to apply the fix mentioned by @aramprice above on box v370 |
@james-masson no changes have been made to garden that bosh-lite uses so patch is still required to make this work. -dmitriy
|
@cppforlife We're still having this problem, but now it is inconsistent across infrastructures. Everything works fine on virtualbox but when using bosh lite on aws we are still seeing the above problem. |
We were wrong about it being inconsistent across infrastructures. Created a PR to address the issue: |
Pre-packer ( eg. box 237 ) the NAT rules between containers was something like...
This meant all inter-container traffic was direct routed.
Post-packer ( eg. head or box 293 ), the NAT rules seem to be SNATing all inter-container traffic.
This means that all traffic comes with a source of the wrong IP.
This may work with traffic that doesn't imply meaning from source/destination IP, but will confuse applications that rely on this.
Is this SNAT deliberate? I wouldn't normally expect this behaviour on real networks.
thanks
James M
The text was updated successfully, but these errors were encountered: