Linux operating system is used as the foundation for numerous network operating systems including Arista EOS and Cumulus Linux. It provides most networking constructs we grew familiar with including interfaces, VLANs, routing tables, VRFs and contexts, but they behave slightly differently from what we’re used to.
In Software Gone Wild Episode 86 Roopa Prabhu and David Ahern explained the fundamentals of packet forwarding on Linux, and the differences between Linux and more traditional network operating systems.
Long time ago in a podcast far far away Greg and Ethan pondered whether networking solutions need to scale or not, and obviously one cannot disagree with their generic conclusion that enterprises need just-good-enough solutions and not Google-scale architectures.
However, do keep in mind that:
My beloved source of meaningless marketing messages led me to a blog post with a catchy headline: are open-source SDN controllers ready for carrier-grade services?
It turned out the whole thing was a simple marketing gig for Ixia testers, but supposedly “the response of the attendees of an SDN event was overwhelming”, which worries me… or makes me happy, because it’s easy to see plenty of fix-and-redesign work in the future.
I got an interesting question from one of my readers:
If every device talking to a centralized control plane uses an out-of-band channel to talk to the OpenFlow controller, isn’t this a scaling concern?
A year or so ago I would have said NO (arguing that the $0.02 CPU found in most networking devices is too slow to overload a controller or reasonably-fast control-plane network).
These presentations focus more on the application-level technologies (client- and server side), but I’m positive you’ll find some useful content in the caching and scale-out applications with load balancing sections.
Network Address Translation (NAT) is one of those stateful services that’s almost impossible to scale out, because you have to distribute the state of the service (NAT mappings) across all potential ingress and egress points.
Midokura implemented distributed stateful services architecture in their Midonet product, but faced severe scalability challenges, which they claim to have solved with more intelligent state distribution.
Security groups (or Endpoint Groups if you’re a Cisco ACI fan) are a nice traffic policy abstraction: instead of dealing with subnets and ACLs, define groups of hosts and the rules of traffic control between them… and let the orchestration system deal with IP addresses and TCP/UDP port numbers.
My usual answer: You have to understand (A) what type of gateway you need, (B) what performance you need and (C) what form factor will give you that performance. For more details, watch the Hardware Gateways video from Scaling Overlay Virtual Networks webinar
I was listening to one of the HP SDN Packet Pushers podcasts in which Greg made an interesting comment along the lines of “people say that OpenFlow doesn’t scale, but what HP does with its IMC is it verifies the amount of TCAM in the switches, checks whether it can install new flows, and throws an alert if it runs out of TCAM.”
One of my readers sent me this question:
I have an Internet edge setup with two routers connected to two upstream ISPs and receiving full BGP routing table from them. I’m running iBGP between my Internet routers. Is there a formula to estimate convergence time if one of my uplinks fail? How many updates will I need to get the entire 512K routes in BGP table and also how much time it would take?
As always, the answer is it depends.
A week or so ago I described why a properly implemented hypervisor-based overlay virtual networking data plane is not a scalability challenge; even though the performance might decrease slightly as the total number of forwarding entries grow, modern implementations easily saturate 10GE server uplinks.
Scalability of the central controller or orchestration system is a totally different can of worms. As I explained in the Scaling Overlay Networks, the only approach that avoids single failure domain and guarantees scalability is scale-out control plane architecture.
In the Myths That Refuse to Die: Scalability of Overlay Virtual Networking blog post I wrote “number of MAC addresses has absolutely no impact on the forwarding performance until the MAC hash table overflows”, which happens to be almost true.
If you watched the Network Field Day videos, you might have noticed an interesting (somewhat one-sided) argument I had with Sunay Tripathi, CTO and co-founder of Pluribus Networks (start watching at around 32:00 to get the context). Let’s try to get the record straight.
“Thou Shalt Have No Chokepoints” is one of those simple scalability rules that are pretty hard to implement in real-life products. In the Distributed Data Plane part of Scaling Overlay Networks webinar I listed data plane components that can be easily distributed (layer-2 and layer-3 switching), some that are harder to implement but still doable (firewalling) and a few that are close to mission-impossible (NAT and load balancing).