VXLAN, OTV and LISP

Immediately after VXLAN was announced @ VMworld, the twittersphere erupted in speculations and questions, many of them focusing on how VXLAN relates to OTV and LISP, and why we might need a new encapsulation method.

VXLAN, OTV and LISP are point solutions targeting different markets. VXLAN is an IaaS infrastructure solution, OTV is an enterprise L2 DCI solution and LISP is ... whatever you want it to be.

VXLAN tries to solve a very specific IaaS infrastructure problem: replace VLANs with something that might scale better. In a massive multi-tenant data center having thousands of customers, each one asking for multiple isolated IP subnets, you quickly run out of VLANs. VMware tried to solve the problem with MAC-in-MAC encapsulation (vCDNI), and you could potentially do the same with the right combination of EVB (802.1Qbg) and PBB (802.1ah), very clever tricks a-la Network Janitor, or even with MPLS.

Compared to all these, VXLAN has a very powerful advantage: it runs over IP. You don’t have to touch your existing well-designed L3 data center network to start offering IaaS services. The need for multipath bridging voodoo magic that a decent-sized vCDNI deployment would require is gone. VXLAN gives Cisco and VMware the ability to start offering reasonably-well-scaling IaaS cloud infrastructure. It also gives them something to compete against Open vSwitch/Nicira combo.

Reading the VXLAN draft, you might notice that all the control-plane aspects are solved with handwaving. Segment ID values just happen, IP multicast addresses are defined at the management layer and the hypervisors hosting the same VXLAN segment don’t even talk to each other, but rely on layer-2 mechanisms (flooding and dynamic MAC address learning) to establish inter-VM communication. VXLAN is obviously a QDS (Quick-and-Dirty-Solution) addressing a specific need – increasing the scalability of IaaS networking infrastructure.

VXLAN will indeed scale way better than VLAN-based solution, as it provides total separation between the virtualized segments and the physical network (no need to provision VLANs on the physical switches), it will scale somewhat better than MAC-in-MAC encapsulation because it relies on L3 transport (and can thus work well in existing networks), but it’s still a very far cry from Amazon EC2. People with extensive (bad) IP multicast experience are also questioning the wisdom of using IP multicast instead of source-based unicast replication ... but if you want to remain control-plane ignorant, you have to rely on third parties (read: IP multicast) to help you find your way around.

It seems there have already been claims that VXLAN solves inter-DC VM mobility (I sincerely hope I’ve got a wrong impression from Duncan Epping’s summary of Steve Herrod’s general session @ VMworld). If you’ve ever heard about traffic trombones, you should know better (but it does prove a point @etherealmind made recently). Regardless of the wishful thinking and beliefs in flat earth, holy grails and unicorn tears, a pure bridging solution (and VXLAN is no more than that) will never work well over long distances.

Here’s where OTV kicks in: if you do become tempted to implement long-distance bridging, OTV is the least horrendous option (BGP MPLS-based MAC VPN will be even better, but it still seems to be working primarily in PowerPoint). It replaces dynamic MAC address learning with deterministic routing-like behavior, provides proxy ARP services, and stops unicast flooding. Until we’re willing to change the fundamentals of transparent bridging, that’s almost as good as it gets.

As you can see, it makes no sense to compare OTV and VXLAN; it’s like comparing a racing car to a downhill mountain bike. Unfortunately, you can’t combine them to get the best of both worlds; at the moment, OTV and VXLAN live in two parallel universes. OTV provides long-distance bridging-like behavior for individual VLANs, and VXLAN cannot even be transformed into a VLAN.

LISP is yet another story. It provides a very rudimentary approximation to IP address mobility across layer-3 subnets, and it might be able to do it better once everyone realizes hypervisor is the only place to do it properly1. However, it’s a layer-3 solution running on top of layer-2 subnets, which means you might run LISP in combination with OTV (not sure it makes sense, but nonetheless. You could also be able to run LISP in combination with VXLAN once you can terminate VXLAN on a LISP-capable L3 device.

So, with the introduction of VXLAN, the networking world hasn’t changed a bit: the vendors are still serving us all isolated incompatible technologies ... and all we’re asking for is tightly integrated and well-architected designs.


  1. This idea died before it got anywhere close to a product. ↩︎

14 comments:

  1. Shouldn't it be titled "The good, the bad and the ugly"? Oh wait, there is no good ;)
  2. VXLAN really seems to be a bottom-up design, bringing networking up in to virtualization, rather than a top-down (starting from config and operational perspectives) that extends better control in to network operation. You've pointed out the key issue that there's just "handwaving" around managment. OF and its ilk at least seem to start from a premise that there has to be management, no matter how poorly defined it is today.
  3. Add to the fun that there's a 'L2 in LISP' draft out there.
  4. According to the Nexus 1000v product manager, we should "stay tuned" for the ability to terminate VXLANs on other devices. That is, if I interpreted his comment correctly. At the moment, VXLANs are appropriate for L2-only VLAN needs. For example, a VLAN behind a virtual firewall or load balancer.
  5. Ivan, thanks for the insightful perspective on how all these interrelate. Agree that IaaS is supposed to be the main target/beneficiary of this. But just how many of them want to be running Cisco and VMware???

    Another point that's not widely appreciated is that getting in and out of these VxLANs starts to look a whole lot like routing, very likely creating islands of VxLANs within the provider routed out to the rest of the world.

    And while I agree that 'introduction of VXLAN, the networking world hasn’t changed a bit: the vendors are still serving us all isolated incompatible technologies', I'll also argue it also makes the remaining gaps more apparent and important.
  6. I'm not holding my breath. Even if you can terminate it into another device, it will be a Nexus. I don't need 16 million VLAN domains. Would be nice if they made a passing attempt at supporting an existing technology like Q-in-Q or MPLS. Its not like they don't have the code to do it...
  7. Great post Ivan! Like you said in the post, one big question I had after reading the VXLAN draft was how does VTEP allocate VNI to VMs directly attached to it? This would create issues when a VM migrates to a physical server connected to a different VTEP. Since VMs belonging to different tenants can have same MAC and IP addresses, how does the new VTEP know the VNI of the migrated VM?
    Replies
    1. VNI is assigned to a virtual network by the management plane. For example, you could do it in Nexus 1000V configuration or it could be assigned to a port group automatically by vCloud Director. In both cases, all hypervisors would know which VNI to use for which port group via the VDS configuration.
  8. Just as an update / question - I think PBB-EVPN has replaced the BGP MAC VPN ... yes?

    (That is to say, I believe http://tools.ietf.org/html/draft-ietf-l2vpn-pbb-evpn-04 replaces http://tools.ietf.org/html/draft-raggarwa-mac-vpn-01 ...)

    Sound about right?
    Any updated thoughts on OTV/VxLAN(/FabricPath)/etc. when factoring PBB-EVPN into the conversation?
    Replies
    1. No. PBB-EVPN might be an interesting SP technology (to replace VPLS), but not a DC one (regardless of what Contrail tells you).
    2. Hi Ivan,

      Why do you consider that PBB-EVPN is not a good technology for interconnecting DC? Do you know if there is an existing comparison between PBB-EVPN and OTV to see which technology is the best one in this case?
  9. Hi,

    In a secure DC with multiple security zones who will allow traffic to tunnel through firewalls?
    Which is what is proposed by LISP upto now from what I can see.

    So at some point outside the DC. The correct entrance point to the DC based on the VM location is chosen by consulting the Mapping Server. The traffic is then tunneled into the DC in LISP encapsulation which can't be properly inspected by the firewall.. Unless this point has been changed.
  10. You have to shift your perspective - the internal DC network (IP transport) becomes a single security zone (like a VLAN today), with firewalling done in VXLAN-capable firewalls (or virtual appliances or NIC-level firewalls)
  11. Ok Thanks, I'm sure you are right, I have been stuck on this point with LISP, I;m wondering if as it grows in use firewall vendors will inspect the internal IP packet of LISP encapsulated data. Maybe it won't happen, and like you said the future of DC security is all about VM hypervisor level firewalls.
  12. One of the purpose of LISP is to reduce the size of the Internet routing table. For me, before the whole Internet is LISP enabled. LISP enabled sites still needs to setup PITR (Proxy ITR) and advertise the EID subnets to attract traffic initialized from non LISP clients. During the transition period, there is no reduce of the size of Internet routing table.

    Since the traffic are proxied (tunneled) between PITR to ETR, the route path for non LISP enabled sites is: Client-PITR-ETR-EID. This might not be the optimal path and may have performance impact that hindering the adoption of this technology.
    !
    Make it as one statement: Enable LISP will only benefit customers that having LISP enabled which is a minority of the Internet community. The majority of the customers may get a worse performance until switching to LISP. This makes me thinking adopting LISP might not be a wise decision.
    !
    If none LISP enabled customers having the same performance, that might be a easier decision.
    !
    If LISP is where Internet goes, a shared (or free) PITR infrastructure may excellerate the implementation.
  13. A more significant potential show-stopper for LISP adoption is its security vulnerabilities: http://datatracker.ietf.org/doc/draft-ietf-lisp-threats/
  14. Suddenly the BGP EVPN kicks in instead of LISP on the control plane of VxLAN.

    I am still not convinced why not just improve the MPLS encapsulation rather than introduce a whole new system to solve some old issues.
    Replies
    1. Things have been changed a lot after a decade. The operation on control/data plane makes every wanted capabilities available.

      Now we have BGP/EVPN can use Vxlan purely as encapsulation, the Cisco's sturborn way to do EVPN with LISP/Vxlan (SDA).

      I yet to understand Cisco's consideration on LISP over BGP, also LISP can be used as encapsulation but why it is not there even now?

    2. LISP is preferred by Cisco for SDA, since they do not need the BGP path selection algorithm complexity. Enterprise LANs are moving to WiFi, private 5G, multiple branches, home offices, etc. and generic mobility is becoming a basic need, LISP is better suited for including mobility with network address session continuity into enterprise connectivity. Mobility with BGP is terrible in performance. There were some failures in history such as Boeing Connexion. The handover time with LISP could be in the low millisecond range and it could support ten thousands of frequently moving users. This was tested by moving robots. Victor Moreno has a YouTube video and slide set on the LISP mobility scalability tests. Additionally, LISP could also provide active multi-link mobility. This is currently under standardization for aviation networks. You could find more at the CORDIS portal under PJ14 Solution 77 FCI.

Add comment
Sidebar