My beloved source of meaningless marketing messages led me to a blog post with a catchy headline: are open-source SDN controllers ready for carrier-grade services?
It turned out the whole thing was a simple marketing gig for Ixia testers, but supposedly “the response of the attendees of an SDN event was overwhelming”, which worries me… or makes me happy, because it’s easy to see plenty of fix-and-redesign work in the future.
Anyway, let’s walk through the presentation.
What was the testbed? Ixia software emulated numerous OpenFlow switches connecting to a single instance of an open source OpenFlow controller. The switches were connected in a linear topology (N 2-port switches in sequence), which is the least likely topology you’ll ever see in a network.
What were they measuring? Pretty useless stuff that’s easy to measure:
- How many OpenFlow switches can connect to a single controller instance?
- How long does it take the controller to install a single flow across all switches?
- How long does it take a controller to discover network topology?
Also, it’s impossible (from the presentation published on Ixia web site) to figure out what exactly they were measuring, and whether it's relevant. For example, they assume the controller discovered the network topology when the LLDP packets generated by the controller where delivered back to the controller.
Why are those metrics useless? Let’s go through them one-by-one:
- How many OpenFlow switches can connect to a controller? A single OpenFlow domain is a single failure domain, and unless you plan to use overlay virtual networking (= mimic wireless controllers) you don’t want your failure domain to be too large. Also, a decent carrier-grade controller would have a scale-out architecture (no, not a cluster of two controllers, but a real scale-out architecture with eventual consistency), which would make this metric moot.
- How long does it take the controller to install a single flow? This one might expose internal workings of a controller (is the controller programming flows in switch-by-switch sequence or in parallel), but measuring anything beyond a few dozens of switches (= number of hops across the network) is plain ridiculous. Not surprisingly, the “interesting” behavior emerges in the totally-ridiculous territory (500+ switches in sequence), so let’s put that on the slide and claim victory.
- How long does it take to discover network topology? Measuring this on a chain of 100 switches in linear topology is absolutely meaningless. What would make sense are questions like “how quickly is a topology change that is not signaled via an interface down message detected?” or “how quickly are N thousand flows rerouted after a topology change?” We still don’t know.
Finally, while it seems (at least from the presentations like this one) that the main focus of SDN is reinventing bridges (because dynamic MAC learning really needs to get reinvented), everyone conveniently ignores the scalability challenges of running linecard protocols across hundreds of switches from a central controller. BFD anyone?
What has this to do with readiness for carrier-grade services? Absolutely nothing. The setup is irrelevant (no carrier would use a single-instance controller), the switches used (2-port switches) and the linear topology are meaningless, and the metrics they measured don’t reflect real-time scenarios.
The only link to carrier-grade services I could find is the need for a catchy headline.
Ready for a dose of reality?
- Start with the free Introduction to SDN webinar if you need the answer to the “What is SDN?” question.
- Read the SDN and OpenFlow (the Harsh Reality) digital book, because it’s easier to read a book than recursively read over 200 blog posts;
- Watch the OpenFlow Deep Dive webinar to discover true OpenFlow scalability limitations.