Replacing FabricPath with VXLAN, EVPN or ACI?
One of my friends plans to replace existing FabricPath data center infrastructure, and asked whether it would make sense to stay with FabricPath (using the new Nexus 5600 switches) or migrate to ACI.
I proposed a third option: go with simple VXLAN encapsulation on Nexus 9000 switches. Here’s why:
- I don’t think FabricPath has a long-term future. It was one of those crazy kludges invented to solve the wrong problem. To make it worse, it uses proprietary encapsulation which is not supported (AFAIK) by any merchant silicon out there.
- I don’t think L2 fabrics like TRILL or SPB have a bright future. Avaya is the only vendor still actively promoting a L2 fabric in the data center; most everyone else is talking about VXLAN (including Brocade).
- L2 fabrics are reinventing the routing wheel. VXLAN runs on top of well-greased wheel that has been around for decades (IP transport).
- ACI is interesting, but it’s still relatively new and has many moving parts. Using something more conservative might make more sense in some environments.
- ACI (and FabricPath) are single-vendor solutions. Bare-bones VXLAN using either IP multicast or configured head-end replication works across multiple vendors.
Want to know more about data center fabric solutions from various vendors? Data Center Fabrics webinar has dozens of videos describing solutions from all major vendors.
Which VXLAN are we talking about?
After that recommendation my friend sent me a follow-up question:
I assume I’d have to configure multicast and OSPF on the N9ks so we have a routed L3 leaf/spine architecture, then enabling VTEP on the leaf switches to act as gateways from classic LAN to VXLAN which can route over the fabric?
He got two out of three right:
- Build a routed leaf-spine architecture;
- Configure leaf switches to run as bridges between VLANs and VXLAN segments.
However, you don’t need IP multicast for VXLAN to work. You could use EVPN control plane or configure VXLAN head-end replication on the leaf switches.
If the above paragraph doesn’t make much sense, you need the VXLAN Deep Dive webinar.
To EVPN or not to EVPN?
EVPN control plane for VXLAN is probably the best long-term approach. However, it’s still fresh, not supported by all the interesting vendors, and might have interoperability challenges (ever deployed SIP?).
On the other hand, if you’re not interested in EVPN’s layer-3 capabilities (for example, ARP proxies or host-based routing), static VXLAN with head-end replication works pretty well, the “only” challenge being consistent configuration across the whole fabric (you have to configure a list of all peers for every VNI on every VTEP).
Wondering what I’m talking about? You’ll find plenty of details in the Leaf-and-Spine Fabric Designs webinar.
Most data center switch vendors have some VXLAN configuration solution (example: Arista’s CloudVision or Cisco’s VTS) that you might want to look at, but if you have even basic automation skills, it’s really easy to build something lightweight on top of Ansible.
As always, you have to decide whether it makes sense to build or buy and it’s never an easy decision.
Don’t know how to get started? Register for my network automation workshop or explore my network automation webinars.
Also, in most enterprise environments Contrail or Nuage are a non-starter.
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/vxlan/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_7x/b_Cisco_Nexus_9000_Series_NX-OS_VXLAN_Configuration_Guide_7x_chapter_0100.html
For 2000 VMs you need two switches with simple VLAN trunk between them. No need for FP, VXLAN, EVPN or ACI. In that environment VXLAN might only be useful for people who firmly believe in L2 DCI.
As for Brocade, let's see how many new features we'll see in VCS Fabric versus IP Fabric in the next 24 months. I'm not saying L2 fabrics are dead today, I'm just seeing where everyone is running.
2 years ago we implemented internet exchange point using TRILL switches (http://www.six.sk) and as you can check in our looking glass, we still have several BGP sessions with 2 years uptime.
This year we used TRILL for 100GE national research & educational backbone (http://www.sanet.sk/en/siet_topologia.shtm) where 40 TRILL switches cover the whole country.
But yes, complexity sells better, thus simple, straightforward and cost-effective solutions don't get big attention from vendors...
http://www.cisco.com/c/en/us/products/collateral/cloud-systems-management/virtual-topology-system/eos-eol-notice-c51-736748.html