Could We Build an IXP on Top of VXLAN Infrastructure?
Andy sent me this question:
I'm currently playing around with BGP & VXLANs and wondering: is there anything preventing from building a virtual IXP with VXLAN? This would be then a large layer 2 network - but why have nobody build this to now, or why do internet exchanges do not provide this?
There was at least one IXP that was running on top of VXLAN. I wanted to do a podcast about it with people who helped them build it in early 2015 but one of them got a gag order.
In the meantime, several IXPs deployed VXLAN in production including:
- INEX (they also open-sourced their management software) – pointer provided by Anonymous, more information from Nick Hillard in the comments;
- LONAP – pointer provided by Blake, more information from Will Hargrave in the comments;
- Equinix in several metro fabrics.
Want to know why you need L2 network to run an IXP? I wrote about that in 2012.
This leads me to another topic: IXPs are mostly local, nobody did yet span up one layer 2 VLAN throughout whole America or Europe. I've tried finding some information, but I don't know what I am missing. What prevents somebody from building such a large layer 2 network?
Point-to-point layer-2 networks spanning continents have been a reality since (at least) Frame Relay days, and there’s at least one SP offering L2-over-VXLAN across US and they might be using EVPN as the control plane. The trick to make these things work is to keep the L2 domain small and to minimize the impact of potential stupidities or bad hair day on either customer network or transport infrastructure.
Large L2 domains spanning continents or countries? It has been tried many times before, and failed miserably every single time. I’m positive someone will try to do it again now that you can move VMs across the continent.
Of course, latency may be an issue, but if you have a quite flat design STP should not be your problem ...
How about the fact that a single endpoint could bring down the whole network with a broadcast storm? All it takes is a broken NIC.
Keep in mind that even the regular broadcast caused by ARP gets so damaging in large L2 domains that people like AMS-IX had to deploy ARP Sponge to limit its damage.
Long story short: Friends don’t let friends build large layer-2 domains, more so if the said domain spans more than a single site. Or as Ethan Banks said once, nuked earth is not a nice sight.
Want to know more?
- Lukas Krattiger and myself will talk about multi-site and multi-pod data center fabrics (and how to build them in a relatively sane way) in another live session of Leaf-and-Spine Fabric Architectures webinar on March 29th;
- You’ll find even more information about data center fabrics in the Designing and Building Data Center Fabrics online course;
- Dinesh Dutt will talk about EVPN-with-VXLAN details in the second part of EVPN Technical Deep Dive webinar on April 5th.
VXLAN - would love to hear about more of them. Any pointers you can share?
Love the ixpmanager you found. Thanks a million!
e.g.
http://www.trefor.net/2016/05/25/lonap-ripe/
We're a mid-sized IXP (~200 networks connected, ~3Tbit connected capacity) who have been running VXLAN on Arista for around two years now, with great results. This is in a 'flood and learn' config with Head-End-Replication (HER) - i.e we are replicating BUM at the edge to all other edge nodes. We are doing some testing with EVPN-on-VXLAN although it is worth noting it doesn't have some of the compelling advantages for an IXP as it does for a L2/L3 datacentre network.
Our friends over at INEX are in a similar setup for their primary LAN, which I think they deployed during 2017.
We started down this road in mid-2015 after a failed deployment of VPLS/MPLS with another vendor, and it swiftly became clear that a 'datacentre class' leaf-spine architecture with something like VXLAN was the way to go for a growing IXP of our size. ECMP has let us scale easily from n*10G to n*100G in the core and with VXLAN, the imposition of entropy on the source UDP port means intermediate network elements can effectively loadbalance the traffic.
As regards the topic of large l2 networks 'spanning the globe', I think we need to take a step back from technology and look at human and commercial factors. By far the most popular model for IXP charging is a low flat-rate per-port model. It is more difficult to keep a control on costs if you have expensive leased capacity there, which is why successful IXPs keep to the metro where they can scale easily and avoid competing with their own members.
Moreover there is an expectation among network operators that the endpoint of their BGP session across the fabric is relatively nearby. Long-stretched L2 domains are unpopular among many as they mess up this assumption, cause hairpinning and thus bad enduser experience. There is a role for stretched IXP model - i.e. 'reseller' programs and the like, under controlled conditions. But most operators prefer to meet over a fast, local fabric in the metro.
LONAP in her 21 years of existence has seen many such 'global IX' operators come and go. :)
EVPN control plane for vxlan isn't ready for production networks yet. Hopefully soon.
Thanks for the feedback (and yet again: I love your software).