Category: scalability
Packet Forwarding on Linux on Software Gone Wild
Linux operating system is used as the foundation for numerous network operating systems including Arista EOS and Cumulus Linux. It provides most networking constructs we grew familiar with including interfaces, VLANs, routing tables, VRFs and contexts, but they behave slightly differently from what we’re used to.
In Software Gone Wild Episode 86 Roopa Prabhu and David Ahern explained the fundamentals of packet forwarding on Linux, and the differences between Linux and more traditional network operating systems.
Every Product Needs to Scale… to a Point
Long time ago in a podcast far far away Greg and Ethan pondered whether networking solutions need to scale or not, and obviously one cannot disagree with their generic conclusion that enterprises need just-good-enough solutions and not Google-scale architectures.
However, do keep in mind that:
Response: Are Open-Source Controllers Ready for Carrier-Grade Services?
My beloved source of meaningless marketing messages led me to a blog post with a catchy headline: are open-source SDN controllers ready for carrier-grade services?
It turned out the whole thing was a simple marketing gig for Ixia testers, but supposedly “the response of the attendees of an SDN event was overwhelming”, which worries me… or makes me happy, because it’s easy to see plenty of fix-and-redesign work in the future.
Scalability of OpenFlow Control Plane Network
I got an interesting question from one of my readers:
If every device talking to a centralized control plane uses an out-of-band channel to talk to the OpenFlow controller, isn’t this a scaling concern?
A year or so ago I would have said NO (arguing that the $0.02 CPU found in most networking devices is too slow to overload a controller or reasonably-fast control-plane network).
Published: Designing Scalable Web Applications (Part 2)
I published the second part of my Designing Scalable Web Applications course on my free content web site.
These presentations focus more on the application-level technologies (client- and server side), but I’m positive you’ll find some useful content in the caching and scale-out applications with load balancing sections.
Published: Designing Scalable Web Applications
The first batch of the latest materials for my Designing Scalable Web Applications course have been published on my free content web site.
Video: Scale-Out NAT
Network Address Translation (NAT) is one of those stateful services that’s almost impossible to scale out, because you have to distribute the state of the service (NAT mappings) across all potential ingress and egress points.
Midokura implemented distributed stateful services architecture in their Midonet product, but faced severe scalability challenges, which they claim to have solved with more intelligent state distribution.
Scaling OpenStack Security Groups
Security groups (or Endpoint Groups if you’re a Cisco ACI fan) are a nice traffic policy abstraction: instead of dealing with subnets and ACLs, define groups of hosts and the rules of traffic control between them… and let the orchestration system deal with IP addresses and TCP/UDP port numbers.
Hardware Gateways in Overlay Virtual Networks
Whenever I’m running an SDDC workshop or doing on-site SDN/SDDC-related consulting, the question of hardware gateways between overlay virtual networks and physical world inevitably pops up.
My usual answer: You have to understand (A) what type of gateway you need, (B) what performance you need and (C) what form factor will give you that performance. For more details, watch the Hardware Gateways video from Scaling Overlay Virtual Networks webinar
There’s a Difference between Scaling and Not Being Stupid
I was listening to one of the HP SDN Packet Pushers podcasts in which Greg made an interesting comment along the lines of “people say that OpenFlow doesn’t scale, but what HP does with its IMC is it verifies the amount of TCAM in the switches, checks whether it can install new flows, and throws an alert if it runs out of TCAM.”
Estimating BGP Convergence Time
One of my readers sent me this question:
I have an Internet edge setup with two routers connected to two upstream ISPs and receiving full BGP routing table from them. I’m running iBGP between my Internet routers. Is there a formula to estimate convergence time if one of my uplinks fail? How many updates will I need to get the entire 512K routes in BGP table and also how much time it would take?
As always, the answer is it depends.
Scaling Overlay Networks: Scale-Out Control Plane
A week or so ago I described why a properly implemented hypervisor-based overlay virtual networking data plane is not a scalability challenge; even though the performance might decrease slightly as the total number of forwarding entries grow, modern implementations easily saturate 10GE server uplinks.
Scalability of the central controller or orchestration system is a totally different can of worms. As I explained in the Scaling Overlay Networks, the only approach that avoids single failure domain and guarantees scalability is scale-out control plane architecture.
Update: Performance of Hash Table Lookups
In the Myths That Refuse to Die: Scalability of Overlay Virtual Networking blog post I wrote “number of MAC addresses has absolutely no impact on the forwarding performance until the MAC hash table overflows”, which happens to be almost true.
Myths That Refuse to Die: Scalability of Overlay Virtual Networking
If you watched the Network Field Day videos, you might have noticed an interesting (somewhat one-sided) argument I had with Sunay Tripathi, CTO and co-founder of Pluribus Networks (start watching at around 32:00 to get the context). Let’s try to get the record straight.
Scaling Overlay Networks: Distributed Data Plane
“Thou Shalt Have No Chokepoints” is one of those simple scalability rules that are pretty hard to implement in real-life products. In the Distributed Data Plane part of Scaling Overlay Networks webinar I listed data plane components that can be easily distributed (layer-2 and layer-3 switching), some that are harder to implement but still doable (firewalling) and a few that are close to mission-impossible (NAT and load balancing).