VPLS versus OTV for L2 Data Center Interconnect (DCI)

DJ Spry asked an interesting question in a comment to my MPLS/VPN in DCI designs post: “Why would one choose OTV over MPLS/VPN?” The answer is simple: it depends on what you need. MPLS/VPN provides path isolation between layer-3 domains (routed networks) across MPLS or IP infrastructure whereas OTV providers layer-2 transport (and VLAN-based path isolation) across IP infrastructure. However, it does make sense to compare OTV with VPLS (which was DJ Spry’s next question). Apart from the obvious platform dependence (OTV runs on Nexus 7000, VPLS runs on Catalyst 6500/Cisco 7600 and a few other routers) which might disappear once ASR1K gets the rumored OTV support, there’s a huge gap in functionality and complexity between the two layer-2 transport technologies.

VPLS (Virtual Private LAN Service) is a hodgepodge of kludges grown “organically” over the last decade. It all started with the MPLS/VPN revolution – service providers were replacing their existing L2 gear (ATM and Frame Relay switches) with routers, but still needed a life support mechanism for legacy cash cows. It didn’t take long for someone to invent pseudowires over MPLS, which supported Frame Relay and frame-mode ATM first, with cell-mode ATM and Ethernet eventually added to the mix. Not surprisingly, the technology is called AToM (Any Transport over MPLS).

A bit later, L3 vendors (Cisco and Juniper) got stuck in their MPLS/VPN blitzkrieg: their customers (traditional service providers) didn’t have the knowledge needed to roll out enterprise-grade L3 service (MPLS/VPN), so they wanted to keep it simple (like in the good old days) and provide L2 transport. Sure, why not ... it took just a few more kludges to provision a full-mesh of pseudowires, add dynamic MAC learning and later BGP-based PE-router autodiscovery and the brand-new VPLS technology was ready for business.

When the Data Center engineers wanted to implement L2 DC interconnects (a huge mistake if there ever was one), the VPLS technology was readily available and in another doing more with less epiphany a square peg was quickly hammered into a round hole.

To say that VPLS was less than a perfect fit for the L2 DCI needs would be an understatement of the year. VPLS never provided standard PE-router (DC edge device) redundancy (ICCP protocol is in its early stages, although it does seem to be working on ASR9K), so you had to heap a number of additional kludges on top of it (there is a whole book describing the kludges you have to use to get VPLS working in DCI scenarios) or merge two edge devices into a single logical device (with VSS and A-VPLS).

Furthermore, VPLS (at least Cisco’s implementation of it) relies on MPLS transport; if your DCI link has to use IP infrastructure, you have to configure MPLS over GRE tunnels before you can configure VPLS.

Last but definitely not least, as Cisco never supported point-to-multipoint LSP, the multicast and broadcast packets sent over VPLS (as well as unknown unicast floods) are replicated at the ingress device (inbound DC edge device) with multiple copies sent across DCI infrastructure.

OTV (Overlay Transport Virtualization) is a clean-slate L2 transport over IP design. It does not use tunnels (at least not conceptually, you could argue that the whole OTV cloud is a single multipoint tunnel) or intermediate shim technologies but encapsulates MAC frames directly into UDP datagrams. It does not rely on dynamic MAC address learning but uses IS-IS to propagate MAC reachability information. There is no unknown unicast flooding (bridging-abusing brokenware is supported with manual configuration) and L2 multicasts are turned into IP multicasts for optimal transport across DCI transport backbone (assuming the transport backbone can provide IP multicast services).

OTV (like other modern L2 technologies) also solves multihoming issues – it uses an Authoritative Edge Device approach very similar to TRILL’s appointed forwarder. There are additional goodies like ARP snooping and active-active forwarding (with VPC) ... and the icing on the cake is its beautifully simple configuration syntax (until, of course, large customers start asking for knobs solving their particular broken designs and a full-speed feature creep kicks in).

The only grudge I have with OTV at the moment is that its current implementation still feels like a proof-of-concept (I know OTV aficionados are jumping up and down at this point ;):

  • Maximum number of devices in an OTV overlay is 6; 3 sites @ 2 devices each.
  • It requires IP multicast in the transport IP core (if your IP transport infrastructure doesn’t provide IP multicast, you have to insert an extra layer of devices running IP MC over GRE tunnels); unicast mode is supposedly coming with NX-OS release 5.2.
  • Nexus 7000 behaves like an IP host on the OTV side. It must use physical interfaces; the only redundancy you can get is a port channel (loopback interface support with routing protocol-based redundancy was promised for a future release).

More information

The Data Center Interconnects webinar (register here) describes numerous L2 DCI interconnect technologies, including VPLS, A-VPLS, OTV, TRILL, BGP MPLS based MAC VPN (from Juniper) and EtherIP between load balancers (F5). You’ll also discover things you never wanted to know about L2 DCI caveats and challenges.

The Choose the Optimal VPN Service webinar describes numerous VPN services (including MPLS/VPN, pseudowires and VPLS) from customer’s perspective.

Recordings of both webinars are available as part of the yearly subscription.

The Favela Nova Friburgo and House on Mount Radmore Crescent photographs are from Wikimedia Commons.

15 comments:

  1. Amen.. VPLS and A-VPLS are really for folks who have vast majority of Cat6K and don't want to spend extra $$ to redesign their infra with N7K. Which might be wise thing to do and wait until OTV matures.

    ReplyDelete
  2. About a year ago, I tried making a blog post on exactly the same topic :) Back then OTV was still more of "obscure" technology, so I had to dig the Cisco's patent that dated back in 2007-2008. I believe Cisco has pretty good comparison of OTV/VPLS on their website as well:

    http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9402/white_paper_c11-574984.html

    In my opinion, it is more correct comparing OTV to MAC-based MPLS VPNs, rather than VPLS. Though the MAC-based VPNs slowly progress through IETF, so not sure if they will make it to the end :) Oh, another good reference to compare with would be IPLS, that implemented MAC address learning based on protocol signaling.

    As for P2MP support in VPLS (we have IGMP snooping at least!), I hope Cisco will finally make it available this year - now that they have had mLDP implemented for quite a while.

    ReplyDelete
  3. Oh, that was me, by the way :)

    ReplyDelete
  4. Hi Ivan nice topic!
    during my broadcast experience (IPTV), i have appreciated multicast, specially IGMPv3 with double encapulation source and same IP source, Destination M group to archive best reliability in core network (squared topology with double PE and P).

    In and between the DCs why we should use the multicast ? Which applications do a intensive use of multicast between and inside DC? I suppose in and between DCs unicast is predominant, isn't ?
    could i use multicast for backup purpose ?

    thanks a lot

    P.S.
    However i think for IPTV is not necessary LSP P2MP, P2P MPLS-TE tunnel, (difficult to manage double source with the same address), yes 50 ms of FRR is not so bad but a fine tuning of IGP (in global) offers 150 ms in my case. Sure we have a private backbone and all it's more easier. Yes we had used the classical MPBGP MPLS-VPN (i read your book ) for intranet, management and voip.

    ReplyDelete
  5. or don't use Cisco hardware

    ReplyDelete
  6. Lots of kludges, I can't say I truly understand, I'm hoping someday to catch one of your courses.

    But in my company's implementation of metro-ethernet/application vrf/L3-VPN I believe we use VPLS (over MPLS) between the CE and PE. Then in the core its BGP L3 VPN.

    To me this all seem like kludges. OTV seems to have a chance of simplifying these architectures that is until AT&T, Verizon and the rest start throwing their weight around at Cisco and Juniper to throw in their kludges.

    ReplyDelete
  7. Ivan Pepelnjak16 April, 2011 07:49

    OTV uses multicast on the overlay to distribute L2 broadcasts/multicasts between edge devices. You have to propagate broadcasts across the WAN if you want to emulate L2 infrastructure and using IP MC for that is the optimal solution (assuming, of course, your WAN backbone supports IP MC).

    ReplyDelete
  8. Ivan Pepelnjak16 April, 2011 07:51

    Whether it's kludgy or not depends on what you're trying to achieve. For example, L3 VPN is best for path isolation. If that's what you need, you can't get a more scalable solution than MPLS/VPN.

    ReplyDelete
  9. assuming, of course, your WAN backbone supports IP MC).
    and i completely agree with you if we want deliver a L2 transparent intra-datacenter and across private/service provider WAN, for example to use Vmotion service.

    But for example for DC interconnectivity across IP Cloud like internet (no native multicast support) and only L3 end2end connectivity today DMVPN can be the valid/best solution ?

    ReplyDelete
  10. Another great post Ivan and I enjoy the analogies. Kludges, true. I went through the “Interconnecting VPLS” book this past summer. Though interesting in terms of how things were done and the use of certain protocols to achieve something unnatural to those protocols(analogy alert!) the appearance of a round peg in a square hole is present. The use of eem scripts to clear mac address tables et al. make for a bowl of kludge.

    VPLS for certain solutions(pseudowires/xconnect) may be the choice in unique situations but not for data center bridging/linking.

    OTV appears as others here have mentioned to be the opportunity to simplify the architecture and approach to handling data center bridging needs. Let’s hope so. Otherwise more “split brain” for the network architect/engineer.

    Regards…

    ReplyDelete
  11. hello Ivan , thanks for this article and I think the DF - Don't Fragment characteristic present in all 'OTV packets' is also worth to be mentioned here ; regarding the tunnels and otv we must say that regarding the transport of the c-plane/d-plane it's all GRE based today but is likely to evolve to UDP based in the future NX-OS developments. Otherwise I think it's just a great concept for some deployments , especially for layer 2 multicast applications.
    Cheers !
    michal

    ReplyDelete
  12. Ivan Pepelnjak27 April, 2011 09:40

    Thanks for reminding me. The DF bit is a must; without it, the receiver gets burnt out in reassembly process.

    ReplyDelete
  13. as you say , probably not easy for a wire speed switch. In general the NX-7K is not able to fragment but reassembling seems to be a more difficult task in terms of processing so that's why they are putting the DF in each IP header of the OTV packet.
    The missing part is for me the whole interoperability of the future UDP ( http://tools.ietf.org/html/draft-hasmit-otv-01 ) vs. current GRE based OTV.

    ReplyDelete
  14. Hi,

    I wondered if you have any comments on running multicast across OTV extensions? We have hit a whole bunch of bugs and problems over the last year or so and it doesn't seem to have been thought through all that thoroughly. The technology was deployed before I joined the company and I wouldn't have deployed something that looks to be so untested if I was here earlier, plus it doesn't appear to be fit for purpose anyway based on the limits:

    We have messaging applications that use a single mcast address as an index so this group is sent and received on all clients. Whilst we are a small amount of users, the OTV mcast groups soon mount up and have hit the ~2ooo limit. I do like the ground up design of OTV but it seems pretty crazy to have such a low limit on the amount of OTV mcast routes? I dont see how this could be considered a viable or best alternative to other l2 extension technologies when running multicast?

    ReplyDelete
  15. Please excuse the anonymous. Just to add, the mcast routes are doubling up per vlan and after extending 5-10 vlans have reached the limits of OTV mroutes.

    ReplyDelete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.

Ivan Pepelnjak, CCIE#1354, is the chief technology advisor for NIL Data Communications. He has been designing and implementing large-scale data communications networks as well as teaching and writing books about advanced technologies since 1990. See his full profile, contact him or follow @ioshints on Twitter.