Why Are High-Speed Links Better than Port Channels or ECMP
I’m positive I’ve answered this question a dozen times in various blog posts and webinars, but it keeps coming back:
You always mention that high speed links are always better than parallel low speed links, for example 2 x 40GE is better than 8 x 10GE. What is the rationale behind this?
Here’s the N+1-th answer (hoping I’m being consistent):
I'm talking about port channels in this blog post. The exact same reasoning applies to parallel L3 links with ECMP. I'm explaining the difference (or lack thereof) between port channels and ECMP L3 links in leaf-and-spine designs webinar.
It's really hard to push a single 20 Gbps TCP session across a bundle of 10Gbps links. Brocade is the only one that has (had?) a good answer and as they patented the idea a long while ago I doubt anyone else will go down that same route. Everyone else is limited to 5-tuple load balancing (each TCP/UDP session is pinned to a single physical link) because packet reordering causes too much performance loss.
Also, if you get stuck behind an elephant on a 10 GE link it hurts more (latency-wise) than if you're behind that same behemoth on a 40 GE link.
Flowlet-based load balancing (reshuffling idle flows on less-congested links) helps, as do end-system-based solutions like FlowBender.
Finally, losing a member in a port channel as opposed to a complete higher-speed link results in non-symmetrical forwarding fabric, which might result in unwanted congestions. I described the problem in details in the Leaf-and-Spine Fabric Designs webinar, and the way Arista EOS addresses this challenge with BGP DMZ-Bandwidth attribute in 2016 update of Data Center Fabric Architectures.
1 comments: