Link Aggregation in OpenFlow Environment

One of my readers couldn’t figure out how to combine Link Aggregation Groups (LAG, aka Port Channel) with OpenFlow:

I believe that in LAG, every traditional switch would know how to forward the packet from its FIB. Now with OpenFlow, does the controller communicate with every single switch and populate their tables with one group ID for each switch? Or how does the controller figure out the information for multiple switches in the LAG?

As always, the answer is “it depends”, and this time we’re dealing with a pretty complex issue.

There are at least four ways you can deal with LAG in an OpenFlow world:

Ignore it. Who needs it anyway ;)

Leave it to the switch to figure it out. Switches that support OpenFlow in combination with traditional control plane can run LACP locally and present a port channel (or LAG) as a single interface to the OpenFlow controller.

The OpenFlow controller uses the LAG (Port Channel) interface in its forwarding rules (flow entries) and the switch automatically performs intra-LAG load balancing when a forwarding rule sends a packet to a LAG interface.

Multi-chassis LAG (MLAG/vPC) implemented on the switch does not work with OpenFlow. It’s impossible to tell the controller about the two parts of a LAG – each physical switch is an independent OpenFlow instance, and it’s part of the MLAG bundle would be presented as a separate interface.

Handle it in the controller. LACP is a control-plane protocol, and the controller runs it with the outside world like it would run any other control-plane protocol. This design obviously leads to severe scalability challenges, which prompted NEC to fall back to static port channels in their ProgrammableFlow implementation.

In an OpenFlow-only world it’s trivial to terminate LAG on multiple switches (assuming we ignore scalability challenges for a moment). The LACP traffic is sent to the controller anyway, and the controller programs the forwarding entries in all switches (similar to what switches would do in MLAG environment).

Obviously the OpenFlow controller has to reinvent all the wheels, including orphan ports, loop prevention across MLAG members etc. I still think it doesn’t make sense to reinvent all those wheels, but obviously people working on OpenFlow controllers controlling physical devices disagree with that sentiment.

Offload LACP to the switches. While the OpenFlow controller owns all LACP sessions, the individual switches send and receive (and process) periodic LACP packets to improve the scalability of the solution – this is how Big Switch Networks made their fabric work.

There’s no way you could pull this off with OpenFlow we know today (as of OpenFlow 1.5); the only way to make OpenFlow work in large-scale environment is still through proprietary extensions.

Finally, let me mention that Nexus 1000V supported LACP offload for years… and it took OpenFlow vendors at least as long to get past the religious beliefs and implement what makes sense from engineering perspective.

Need more?

Six hours of OpenFlow deep dive and 20+ hours of SDN training are just a few clicks away… or you could go for the subscription package which gives you access to 100+ hours of high-quality advanced networking materials.

4 comments:

  1. Good post Ivan. Do you know if host-mlag is supported by any of the vendors? I believe Bigswitch has a mechanism for mlag between fabric switches. I could not find even proprietary mechanisms to establish host-mlag which is a critical requirement in Enterprises.
  2. What would "host-mlag" be? Connecting the same host to multiple ToR switches or something else?
    Replies
    1. Yes Ivan, same host connecting to two different ToR switches to form NIC teaming or bonding. Sorry for the use of confusing terminology.
    2. I think BSN do that with their BCF, as does NEC with their ProgrammableFlow (NEC didn't have LACP the last time I checked).
Add comment
Sidebar