Per-Packet Load Balancing on WAN links
One of my readers got an interesting idea: he’s trying to make the most of his WAN links by doing per-packet load balancing between a 30 Mbps and a 50 Mbps link. Not exactly surprisingly, the results are not what he expected.
The obvious problems
Per-packet load balancing on stateless packet-by-packet devices (routers or switches) is inherently a bad idea, as it inevitably results in packet reordering and reduced TCP throughput (I won’t even try to figure out what it could do to some UDP traffic). The only corner case where you might think you need it is when you’re trying to send traffic from a single (or a few) TCP sessions across multiple WAN uplinks, but even then you might get worse link utilization than if you’d have used a single uplink for the elephant TCP session due to packet reordering.
Doing stateless per-packet load balancing across unequal-bandwidth links is usually a Really Bad Idea. Ignoring the effects of packet reordering on TCP throughput you’ll never get more than N times the bandwidth of the slowest link, unless you’re using tricks that result in unequal-cost load balancing (DMZ Bandwidth with BGP, parallel MPLS-TE tunnels, or EIGRP). The proof is left as an exercise for the reader.
Finally, the bandwidth-delay product might further limit the throughput of a single TCP session. See also Mathis formula and TCP throughput calculator.
WAN optimization products (recently relabeled as Software Defined WAN) like VeloCloud (or some others) solve the problem by reassembling and reordering the packets before delivering them to the end-host, resulting in pretty decent aggregate bandwidth… and you can always use MP-TCP.
Disclosure: I totally enjoyed VeloCloud presentation @ NFD9 (this video prompted the previous paragraph). You probably know that presenting companies indirectly cover the travel expenses of NFD delegates, but that never stopped me from having my own opinions ;)
IIRC we sometimes got better results with per-dest but it was never perfect. Damn, I hated that product.
Multipath TCP was designed for cases exactly like this one. If the network just does normal flow-based load-balancing, the routers on the asymmetrical link will simply load-balance the subflows as they normally would, and each subflow will increase its window size until it fills its pipe.