Some more OpenFlow Q&A
InformationWeek has recently published an OpenFlow article by Jeff Doyle in which they graced me with a single grumpy quote taken out of three pages of excellent questions that Jeff asked me when preparing for the article. Jeff has agreed that I publish the original questions and my answers to them. Here they are (totally unedited):
You’ve posted several blogs that voice skepticism not so much of OpenFlow itself but of the hype surrounding it. You also state that looking beyond the hype, the protocol has some promise. Do you see any areas in which OpenFlow can do things that reasonably sophisticated router or switch operating systems cannot?
I can’t see a single area where a TCAM download protocol (which is what OpenFlow is) can do things that a router or switch could not do. There are things that can be done with OpenFlow that cannot be implement with current protocols, but then one has to ask oneself – why has nobody yet developed a protocol to address those things.
OpenFlow will make it easier to implement customized functionality. It will also allow a third-party software package to exercise TCAM-level control in multi-vendor environment (which is a total mission-impossible today). It remains to be seen whether average enterprise or SP users will risk going down that route. I am positive academics and Amazons/Googles will do so (and even write their own OpenFlow controllers), but they are outliers.
Do you see the possibility of OpenFlow creating a divergence in the networking industry, in which some vendors specialize in control plane products and some vendors specialize in forwarding plane products?
The divergence will definitely happen. Companies excelling in low-cost manufacturing and logistics (HP comes to mind) will try to get rid of R&D costs and push their hardware, while others (including numerous startups) will try to get a foothold into the networking software business.
The traditional networking vendors (Cisco and Juniper) will probably try to stick to the old model as long as possible. They use hardware markup to finance R&D instead of charging the true cost of their software; moving toward OpenFlow-based economy will be highly disruptive for them.
For the most part, OpenFlow implementations have been in university environments where the controller is close to the switch. Data centers seem like another fit for the protocol. But in more geographically distributed networks, it seems like the delays between controller and forwarding planes could cause forwarding table entries to have periods of inconsistency. Would you agree with this concern, or do you think there can be reasonable safeguards against this?
As you know, centralized control plane has been tried before. Traditional WAN networks (ATM and Frame Relay) immediately come to mind, as do LANE/ATM-to-the-desktop ideas and multi-layer switching experiments in late 1990’s (I know Cabletron had an architecture that was close to today’s OpenFlow). There are plenty of reasons some of them never took off and some of them are dying. Lack of resilience and scalability are just two of them.
Centralized control plane requires either perfectly reliable network or out-of-band control network. Welcome back to the 1980’s. In both cases, if a node loses its connectivity to the control plane (OpenFlow controller), it cannot adapt to local changes even if it continues to forward the packets based on a frozen state of its TCAM.
Based on the above-mentioned limitations, OpenFlow is an ideal solution for hypervisor-based virtual switches and access points (both wired and wireless). If those devices lose their uplinks, they stop functioning anyway, so loss of connectivity to OpenFlow controller is not a major issue. Also, TCAM in those devices stays pretty stable until users try to connect/disconnect (more so if you bundle all uplinks in a single LAG), so they can continue to operate in a frozen state until the OpenFlow control channel is reestablished.
Core networking devices have a totally different set of requirements.
Something of an opposite view of the previous topic is that a centralized control plane could eliminate certain problems inherent in traditional distributed control planes, such as transient routing loops, slow convergence (particularly in BGP networks), and flooding of reachability information. Do you think there are advantages of control plane centralization that might outweigh liabilities?
Someone should tackle this issue from a very formal perspective – it’s almost like quantum physics’ uncertainty principle for the networks. You can have either a resilient network with distributed control plane (and all the associated drawbacks that you’ve mentioned) or a brittle network that relies exclusively on a centralized control entity.
The situation reminds me of people building inter-site clusters ... and forgetting that they’ll lose half of the nodes (due to quorum loss) if the inter-site link goes down.
In any case, you have to decide what you need. If it’s resiliency, you usually need distributed intelligence.
The ONF seems to have support from most of the major vendors. Do you think this support will continue, or will traditional vendors like Cisco or Juniper see OpenFlow products as a threat to the control of their gear?
Of course nobody would say “we’re against OpenFlow because it will hurt our margins”. Cisco and Juniper are in a perfect position to drop their hardware business and become Oracle/VMware/Microsoft of networking. Will they manage to do it or will they see OpenFlow as a threat? We’ll find out in a few years.
There has been talk of potential benefits for the mobile industry. Do you think these benefits are valid, in a way that cannot be supported with existing protocols?
As mentioned above, OpenFlow does allow you to easily implement edge policies that are not available in standard software that comes with access devices. It does give you the ability to deploy new features very quickly without having to wait for the networking vendor(s) to support them in their software.
As long as you manage to implement your desired functionality within the access-layer (edge) devices, you’re pretty safe, more so if your policies push user traffic in end-to-end tunnels (MPLS LSPs or IP-based tunnels). OpenFlow is thus an ideal mechanism if you want to deploy creative access layer features and could be a good fit for the mobile industry.
Looks like Cisco is "in".
On "what is possible that's impossible now" point - I think I have an example. With OpenFlow you could potentially create a multi-tenanted network, where users are given edge ports on one or more devices, and total freedom to do whatever they want in regards to how and what travels between these ports. Without affecting other tenants.
Another point that comes to mind is an ability to support arbitrary new protocols, for example's sake IPv7 or whatever. All you need to do is to get your controller to understand it; no changes needed on the devices themselves.
(Hopefully I did get what OpenFlow could potentially do correctly - please let me know if I'm off the mark)
Arbitrary new protocols - not before OpenFlow becomes a generic pattern matching mechanism. Today you can match on MAC, 802.1Q, MPLS or IP fields (not even IPv6). Nicira has some pattern matching extensions, but they're proprietary (isn't the world of emerging standards beautiful 8-) )
However, it's pretty hard to implement generic pattern matching in existing hardware tailored to the needs of MAC- and IPv4/IPv6 forwarding.
These share a single control plane, so not much freedom to choose how data travels between the edge ports. OpenFlow, on other hand, from what I understand, allows somewhat organised anarchy in the control plane department.
Question: does OpenFlow today has hooks into physical layer state changes (like loss of light or carrier)? (Yes, I could read the specs and figure it myself, but I'm lazy, or a bit busy right now, or both) ;)
> not before OpenFlow becomes a generic pattern matching mechanism
Hope this is a part of the plan. And yes, I realise this will likely require forklift upgrade, but hopefully where market needs lead, the R&D shall follow. Either that, or you can't deny man a hope! :) (That's whole lot of "hopes" in one paragraph, but hey - it's Friday!) :)
Need to go back to the docs, but I do remember seeing some hooks for fast reroute after physical layer state change.
You're right, I don't! :) But, isn't it exactly what SDN proponents are promising us - hiding all this horrible complexity under layers of abstraction, the same way it has happened in programming?