Q-in-Q Support in Multi-Site EVPN
One of my subscribers sent me a question along these lines (heavily abridged):
My customer runs a colocation business and has to provide L2 connectivity between racks, sometimes even across multiple data centers. They were using Q-in-Q to deliver that in a traditional fabric and would like to replace that with multi-site EVPN fabric with ~100 ToR switches in each data center. However, Cisco doesn’t support Q-in-Q with multi-site EVPN. Any ideas?
As Lukas Krattiger explained in his part of Multi-Site Leaf-and-Spine Fabrics section of Leaf-and-Spine Fabric Architectures webinar, multi-site EVPN (VXLAN-to-VXLAN bridging) is hard. Don’t expect miracles like Q-in-Q over VNI any time soon ;)
Also, very few chipsets support VXLAN-to-VXLAN bridging. Most recent merchant silicon can do VXLAN routing, including VXLAN-to-VXLAN routing, but you need high-end ASICs to do VXLAN-to-VXLAN bridging or intra-VXLAN split-horizon flooding.
However, you might not need multi-site VXLAN bridging and/or IP multicast (or even EVPN). Figure out:
- What services the customer plans to offer. Are they looking for P2P pseudowires over VXLAN (E-line in Carrier Ethernet terminology) or any-to-any connectivity (E-LAN)? Can your preferred $vendor provide Q-in-Q with P2P VXLAN? If not, can any other vendor do that?
- What are the hardware limitations (number of VTEPs in the ingress node replication list and total number of remote VTEPs per switch)? Have some serious discussions with the vendor engineers and get the answers in writing (and change vendors if they can’t/won’t provide them). Make your RFP contingent on meeting or exceeding these limitations.
- Is there a scenario where these hardware limitations could be exceeded? For my subscriber, we’re not talking about any-to-any networking over a generic virtualized compute/networking infrastructure but about implementing Carrier Ethernet functionality with VXLAN.
If you determine that the hardware limitations will not be exceeded, stop cramming complex technologies into places where they’re not needed. You might not need EVPN; an automated flood list configuration based on a customer/services database might be good enough (and more reliable/secure than a dynamic routing protocol).
In any case, building a multi-site fabric with hundreds of edge switches is a complex problem, and like I said in the introduction to Data Center Fabric Architectures webinar: if you’re building a fabric of this size, find someone who did something similar in the past (and we have a few people like that in ExpertExpress team). Relying solely on information provided on the Internet (including my webinars) is probably not good enough, as we can never cover the specifics of your particular deployment.
http://yves-louis.com/DCI/?p=1381
>>>has to provide L2 connectivity between racks, sometimes even across multiple data centers
>>>They were using Q-in-Q to deliver that in a traditional fabric
As I understand, they were using QnQ to achieve tenant separation in the Core/DCI part of their network - separate SVLAN for every tenant.
In EVPN there is Vlan-Aware Bundle EVPN instance for that use case. All you need is to configure separate EVPN instance for every tenant, and inside that instance provide CVLANS for that customer.
And if you still think you need QnQ in EVPN - look at Juniper: https://www.juniper.net/documentation/en_US/junos/topics/topic-map/evpn-vxlan-flexible-vlan-tag.html
But expect a LOT of limitations, especially on the QFX5100 hardware.
I have no practical experience in EVPN but have some experience in Q-in-Q.
In Q-in-Q (it's NOT MAC-in-MAC) you need to take care of the MAC uniqueness in the whole network to avoid issues when the same MAC appears on different VLAN (depends on topology of course).
Does this issue impact the topology where the EVPN is used?
VxLAN EVPN solution in whitebox OcNOS by IPInfusion supports Q-in-Q
It has been deployed in LINX with multiple sites, refer to LON2 in https://portal.linx.net/
Note that BUM traffic is limited in this deployment