Build the Next-Generation Data Center
6 week online course starting in spring 2017

Replacing FabricPath with VXLAN, EVPN or ACI?

One of my friends plans to replace existing FabricPath data center infrastructure, and asked whether it would make sense to stay with FabricPath (using the new Nexus 5600 switches) or migrate to ACI.

I proposed a third option: go with simple VXLAN encapsulation on Nexus 9000 switches. Here’s why:

  • I don’t think FabricPath has a long-term future. It was one of those crazy kludges invented to solve the wrong problem. To make it worse, it uses proprietary encapsulation which is not supported (AFAIK) by any merchant silicon out there.
  • I don’t think L2 fabrics like TRILL or SPB have a bright future. Avaya is the only vendor still actively promoting a L2 fabric in the data center; most everyone else is talking about VXLAN (including Brocade).
  • L2 fabrics are reinventing the routing wheel. VXLAN runs on top of well-greased wheel that has been around for decades (IP transport).
  • ACI is interesting, but it’s still relatively new and has many moving parts. Using something more conservative might make more sense in some environments.
  • ACI (and FabricPath) are single-vendor solutions. Bare-bones VXLAN using either IP multicast or configured head-end replication works across multiple vendors.

Want to know more about data center fabric solutions from various vendors? Data Center Fabrics webinar has dozens of videos describing solutions from all major vendors.

Which VXLAN are we talking about?

After that recommendation my friend sent me a follow-up question:

I assume I’d have to configure multicast and OSPF on the N9ks so we have a routed L3 leaf/spine architecture, then enabling VTEP on the leaf switches to act as gateways from classic LAN to VXLAN which can route over the fabric?

He got two out of three right:

  • Build a routed leaf-spine architecture;
  • Configure leaf switches to run as bridges between VLANs and VXLAN segments.

However, you don’t need IP multicast for VXLAN to work. You could use EVPN control plane or configure VXLAN head-end replication on the leaf switches.

If the above paragraph doesn’t make much sense, you need the VXLAN Deep Dive webinar.

To EVPN or not to EVPN?

EVPN control plane for VXLAN is probably the best long-term approach. However, it’s still fresh, not supported by all the interesting vendors, and might have interoperability challenges (ever deployed SIP?).

On the other hand, if you’re not interested in EVPN’s layer-3 capabilities (for example, ARP proxies or host-based routing), static VXLAN with head-end replication works pretty well, the “only” challenge being consistent configuration across the whole fabric (you have to configure a list of all peers for every VNI on every VTEP).

Wondering what I’m talking about? You’ll find plenty of details in the Leaf-and-Spine Fabric Designs webinar.

Most data center switch vendors have some VXLAN configuration solution (example: Arista’s CloudVision or Cisco’s VTS) that you might want to look at, but if you have even basic automation skills, it’s really easy to build something lightweight on top of Ansible.

As always, you have to decide whether it makes sense to build or buy and it’s never an easy decision.

Don’t know how to get started? Register for my network automation workshop or explore my network automation webinars.

15 comments:

  1. There are more options to choose an automated fabric along with EVPN like Dell OS9 or Cisco NFM. More mature products than EVPN are NSX, Contrail, Nuage. This is a similar discussions we had together several years ago. Which was more mature VPLS or a new OTV which had in-built timer before a convergence starts? At that time it occurred that VPLS was better, converged faster. After 2-3 additional years OTV was better. At least for enterprise class Data Centers. The same case is with EVPN with the exception that NSX, Contrail and Nuage are still developed and providing more features than EVPN. Of course there are pros and cons as always. Depending on the business and technical requirements.

    ReplyDelete
    Replies
    1. In most cases it comes down to "how much does it cost" or "which vendor will throw in the licenses for free just to get the business" ;)

      Also, in most enterprise environments Contrail or Nuage are a non-starter.

      Delete
  2. I keep thinking that VXLAN + BGP EVPN + HSRP Anycast combo, all of which tend to go together, are too complex for a typical enterprise to configure and maintain. For small and medium sized "cloud" providers may be good fit, as they can automate, and reduce some of the configuration complexity. Having worked with both NSX and ACI, and other similar products (e.g. Mirantis for OpenStack), I think they are a better fit for typical enterprise customer, that may be running 2000 or less VMs, as they hide much of the complexity. I feel that's a necessity for smaller shops that only have 5-10 people handling all of the data center infrastructure. It's like using any version of Windows, without necessarily needing to know about every single registry entry.

    ReplyDelete
    Replies
    1. case in point:
      http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/vxlan/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_7x/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_7x_chapter_0100.html

      Delete
    2. Hi Salman,

      For 2000 VMs you need two switches with simple VLAN trunk between them. No need for FP, VXLAN, EVPN or ACI. In that environment VXLAN might only be useful for people who firmly believe in L2 DCI.

      Delete
    3. Well - I am thinking of a typical enterprise, 70-75% virtualized, with rest physical workloads (Exchange, DBs, appliances, may be even Mainframe, or other Unix servers) - then you'll end up needing much more than two switches :)

      Delete
    4. Time to write "extending the 2-switch DC" blog post :)

      Delete
  3. Although I agree with everything mentioned, I recently heard that in Asia L2 fabrics are huge. Huawei seems to have pushed TRILL further than anybody expected and are solving real problems. Maybe somebody in the know can further comment. Also, Brocade hasn't stopped developing their TRILL solution, their IP fabric and their VCS fabric just suit different needs - the main argument being scale (but it is true that the new SLX platform is pure IP fabric).

    ReplyDelete
    Replies
    1. 25 years ago I've heard Banyan Vines was huge in Australia ;))

      As for Brocade, let's see how many new features we'll see in VCS Fabric versus IP Fabric in the next 24 months. I'm not saying L2 fabrics are dead today, I'm just seeing where everyone is running.

      Delete
    2. Marian Ďurkovič30 September, 2016 15:01

      Well, TRILL has much more use cases than DC fabric.

      2 years ago we implemented internet exchange point using TRILL switches (http://www.six.sk) and as you can check in our looking glass, we still have several BGP sessions with 2 years uptime.

      This year we used TRILL for 100GE national research & educational backbone (http://www.sanet.sk/en/siet_topologia.shtm) where 40 TRILL switches cover the whole country.

      But yes, complexity sells better, thus simple, straightforward and cost-effective solutions don't get big attention from vendors...

      Delete
    3. Look like perfect use case SPB-M that a large Metro E

      Delete
  4. Cisco VTS is EoL already... Probably not a long term fit for anybody. Better invest resources in something else:

    http://www.cisco.com/c/en/us/products/collateral/cloud-systems-management/virtual-topology-system/eos-eol-notice-c51-736748.html

    ReplyDelete
    Replies
    1. I'm little bit confused I think VTS is not EoL. At the link above Cisco suggeted to customers to migrate to VTS 2.0 that is VxLAN based overlay with EVPN control plane.

      Delete
  5. Agree with everytying here. One minor correction I think: "However, you don’t need IP multicast for VXLAN to work. You could use EVPN control plane or configure VXLAN head-end replication on the leaf switches." EVPN control plane doesn't negate the need for multicast or head end replication, but your statement seems to read like that. EVPN still requires ingress replication or multicast to deal with (thankfully now limited) BUM traffic. EVPN just handles learning so there is much less BUM. BUM still exists, can't escape that.

    ReplyDelete
    Replies
    1. Agreed. It depends on how you read that sentence ;) We're effectively saying the same thing.

      Delete

You don't have to log in to post a comment, but please do provide your real name/URL. Anonymous comments might get deleted.