Asymmetrical Traffic Flows and Complexity

One of my readers sent me a list of questions on asymmetrical traffic flows in IP networks, particularly in heavily meshed environments (where it’s really hard to ensure both directions use the same path) and in combination with stateful devices (firewalls in particular) in the forwarding path.

Unfortunately, there’s no silver bullet (and the more I think about this problem, the more I feel it’s not worth solving).

IP was designed as a datagram protocol where (A) every packet is independently forwarded across the network and (B) paths are unidirectional – path taken from source to destination is in no way related to the return path. Even MPLS (which is effectively a form of circuit switching) doesn’t have bidirectional paths… and this fact annoyed traditional transport equipment vendors (and their customers) so much that they tweaked MPLS until they got MPLS-TP. I’ve seen many networks using MPLS, and I have yet to see one using MPLS-TP (or maybe I’m speaking with the wrong people).

As long as you understand these principles and don’t try to tweak IP into something it was never designed to be life’s good and your network is simple. The moment you’re trying to enforce traffic flow symmetry you’re trying to squeeze overripe tomato into a small square hole – it’s bound to get messy, and you’ll probably increase network complexity to ridiculous levels.

I was involved in one such attempt - the customer had VLANs stretched across multiple data centers and wanted to ensure symmetrical traffic flow through the "preferred" device - and while we got it semi-working (on the whiteboard), it got ridiculously complex.

Speaking of stateful devices: one of the underlying assumptions of IP networks is the end-to-end transparency, and stateful devices break that assumption. No wonder things get complex.

The only "reliable" way to enforce symmetrical traffic flow across a stateful device (that I’m aware of) is to use source address NAT, forcing the return traffic to go through the same device. Alternatively, make sure that the stateful device is the only path between the endpoints – that's why we use pass-through load balancers and make them default gateways for server segments.

Of course you can also deploy a cluster of stateful devices across all possible paths, and exchange state across them. I wish you luck… let’s talk after your concoction survives the first DDoS attack, a serious port scan, and a split-brain-inducing link failure.

Some of the specific problems my reader encountered in his networks include:

Please think of VRRP also as a source of asymmetry (tracking WAN interface is not a solution when you have many WAN links with redundancy and some of them can be active or master router whereas the other is backup router).

VRRP is another ugly hack trying to fix a wrong assumption ("we'll never need more than one gateway per LAN"). CLNP had ES-IS built in; there was nothing like that in IPv4, and even in the IPv6 world some people want to replace RA with the kludge they know (DHCP-assigned default gateway + VRRP).

Also, the interface tracking (or any other means of selecting active VRRP peer) is there just to ensure most of the outbound traffic doesn't traverse too many hops. Trying to solve asymmetrical traffic flows with it is another great way of making your network unnecessarily complex.

Finally, there’s IP multicast and RPF checks… and no, I’m not opening that can of worms.

6 comments:

  1. Very interesting thought. How about disable the sequence checking on the Firewalls (Ok we may loose AV & IPS but let's suppose we don't need them). We give up stateful inspection, but this could be a solution without a real increase of the risk (I was also reading your related article http://blog.ipspace.net/2016/02/should-firewalls-track-tcp-sequence.html)

    State exchange could also be a solution with a rate limit on the number of sessions that could be synchronized (DDoS & Big Scan would not be a problem). If the DCI link goes down (Split-Brain) everything will still working as in the case where there was no Firewall, this would not increase the risk.
  2. You could also move the firewall to the edge...I don't follow what happens lately in the industry but isn't this one of the promises of SDN / openflow / vmware NSX etc? I think Cisco tried to scratch the area of integrating more brains in the switches with PISA SUP some years ago but it probably didn't work out.
  3. As always - it's only a problem if it causes a problem for the business. Networkers, including myself, can get a bit 'OCD' sometimes, performing IT for IT's sake, and crafting the network as if it was a sculpture.

    To be sure, the asymmetric flows could be classed as technical debt that makes troubleshooting harder during an issue. But that troubleshooting effort needs to be balanced against the effort and additional complexity that forced symmetry brings.
  4. Symmetric routing could be important for special applications such an RTT measurements based dynamic delay compensation.
    The best solution is to move to a hybrid SDN architecture and just control the interesting traffic flows.
    Based on the knowledge of the full topology you could select the same paths for both directions and then execute this policy in the network.
    Of course, it has a price, so do it just for those flows where it is really necessary.
    Replies
    1. Many customers/people just don't know the "interesting" traffic. Not everybode is Facebook or any other "one" content provider, exactely knowing what's interesting.
      From my point of view in a "normal" latency network this kind of problem should not bother a normal behaving application.
  5. Broad topic. Just to touch it. Same stateful device and different interfaces for ingress and egress paths is easy as long as both share the same security zones. Most devices have zone level policies (not interface level). Not hitting the same stateful device (ingress and egress traffic goes through different devices) can be sometimes solved by using HA clusters. The worst is when we do not hit the same HA cluster (different paths). It depends on topology.
    No quick, general fixes. Disabling the tcp-syn check moves us to 90ties where we had to have acl for both direction because the returning traffic is not automatically enabled. Not always acceptable.
Add comment
Sidebar