One of my subscribers sent me a question along these lines (heavily abridged):
My customer is running a colocation business, and has to provide L2 connectivity between racks, sometimes even across multiple data centers. They were using Q-in-Q to deliver that in a traditional fabric, and would like to replace that with multi-site EVPN fabric with ~100 ToR switches in each data center. However, Cisco doesn’t support Q-in-Q with multi-site EVPN. Any ideas?
As Lukas Krattiger explained in his part of Multi-Site Leaf-and-Spine Fabrics section of Leaf-and-Spine Fabric Architectures webinar, multi-site EVPN (VXLAN-to-VXLAN bridging) is hard. Don’t expect miracles like Q-in-Q over VNI any time soon ;)
Also, Cisco’s Cloud Scale ASICs are the only chipsets supporting VXLAN-to-VXLAN bridging at this point - most recent merchant silicon can do VXLAN routing, including VXLAN-to-VXLAN routing, but I haven’t seen anyone else but Cisco doing VXLAN-to-VXLAN bridging or intra-VXLAN split-horizon flooding.
However, you might not need multi-site VXLAN bridging and/or IP multicast (or even EVPN). Figure out:
- What services the customer plans to offer. Are they looking for P2P pseudowires over VXLAN (E-line in Carrier Ethernet terminology) or any-to-any connectivity (E-LAN)? Can your preferred $vendor provide Q-in-Q with P2P VXLAN? If not, can any other vendor do that?
- What are the hardware limitations (number of VTEPs in ingress node replication list and total number of remote VTEPs per switch)? Have some serious discussions with the vendor engineers and get the answers in writing (and change vendor if they can’t/won’t provide them). Make your RFP contingent on meeting/exceeding these limitations.
- Is there a scenario where these hardware limitations could be exceeded? In the case of my subscriber, we’re not talking about any-to-any networking over a generic virtualized compute/networking infrastructure but about implementing Carrier Ethernet functionality with VXLAN.
Check out the podcast we did with PacketFabric – they went through this same process and deployed a large-scale VXLAN-based solution in production without the complexities of multi-site EVPN.
If you figure out that the hardware limitations will not be exceeded, stop cramming complex technologies into places where they’re not needed. You might not need EVPN at all, automated flood list configuration based on customer/services database might be good enough (and more reliable/secure than a dynamic routing protocol ;).
INEX did exactly this. Instead of using EVPN they automated flood list management with IXP Manager (an open-source IXP orchestration tool). For more details, watch the presentation Nick Hilliard had in Building Network Automation Solutions online course (available to course attendees and users with Expert Subscription).
In any case, building a multi-site fabric with hundreds of edge switches is a complex problem, and like I said in the introduction to Data Center Fabric Architectures webinar: if you’re building a fabric of this size, find someone who did something similar in the past (and we have a few people like that in ExpertExpress team). Relying solely on information provided on the Internet (including my webinars) is probably not good enough as we can never cover the specifics of your particular deployment.
Looking for design guidelines? You’ll find them in:
- Leaf-and-Spine Fabrics webinar if you prefer studying on your own,
- Designing and Building Data Center Fabrics self-paced course if you prefer to have a guided tour, reviewed design assignments and support), or
- Building Next-Generation Data Center online course if you’d like to learn about more than just network fabrics.