BGP, EVPN, VXLAN, or SRv6?

Daniel Dib asked an interesting question on LinkedIn when considering an RT5-only EVPN design:

I’m curious what EVPN provides if all you need is L3. For example, you could run pure L3 BGP fabric if you don’t need VRFs or a limited amount of them. If many VRFs are needed, there is MPLS/VPN, SR-MPLS, and SRv6.

I received a similar question numerous times in my previous life as a consultant. It’s usually caused by vendor marketing polluting PowerPoint slide decks with acronyms without explaining the fundamentals1. Let’s fix that.

IP networks use hop-by-hop destination-only forwarding using a single forwarding table. If that’s what you need, step away from the acronyms; all you have to decide is what routing protocol to use. We had that discussion a gazillion times in the past, and the only sane recommendation is, “It usually doesn’t matter2; use whatever is familiar.

Traffic Diversions

Next, you might want to divert some traffic (still within a single routing domain) onto a path that might be considered suboptimal. You might want to do some traffic engineering or push traffic around a failure before everyone involved realizes the network topology has changed (more details).

In that case, you must hide the traffic from the intermediate nodes into a tunnel or a virtual circuit – you need a data-plane packet-hiding technology. MPLS (data-plane encapsulation) is the most commonly used one; vendors trying to sell new ASICs are preaching the beauties of SRv6, and I don’t think I’ve seen VXLAN used in this scenario (primarily because it does not allow you to do loose source routing).

You obviously need a control plane to support the traffic-hiding shenanigans3. A decade ago, we used RSVP to build MPLS-TE tunnels. Today, I’d recommend using an SR-aware IGP, and I’m sure someone could make it work with BGP-LU.

To recap, this is what I would do in 2024:

  • Select an IGP that works with Segment Routing (the MPLS variant)
  • Enable MPLS and SR-MPLS in your network
  • Provision traffic engineering paths on head-end routers or through a controller
  • Configure TI-LFA to support fast rerouting around failures.

Multiple Forwarding Domains

Now for the fun part: you want multiple independent forwarding domains running on top of a shared infrastructure. The marketing term for that is Virtual Private Networks (VPN), and they could use MAC addresses (L2VPN) or IP addresses (L3VPN) to make forwarding decisions.

You could involve all network devices in making those forwarding decisions:

  • The ingress device would mark a packet with a VPN identifier
  • The intermediate devices would look at the VPN identifier and select the desired forwarding table
  • The egress device would remove the VPN identifier from the packet and forward the packet to the final destination.

Did you notice I described VLANs and VLAN-based VRF-lite? Sometimes, they are all you need to get the job done.

OK, now let’s assume you’ve seen as many VLAN-based fabric meltdowns as I did4 and want something better. Yet again, we need two components:

  • A data plane component that will hide the VPN traffic from the intermediate nodes. The usual candidates are MPLS, VXLAN, GRE, SRv6, or even PBB (802.1ah) or LISP (data plane encapsulation).
  • A control plane component that will distribute the VPN endpoint information from egress to ingress nodes. You could choose between L3VPN (RFC 4364, aka MPLS/VPN), EVPN, LISP, or even SPB (802.1aq) if you want a standard solution.

You can’t willy-nilly combine data-plane encapsulations with control plane protocols:

  • L3VPN usually works with MPLS encapsulation. Some vendors automatically add GRE tunnels as needed.
  • SPB (802.1aq) works with PBB (802.1ah) encapsulation. Yet again, there are proprietary implementations involving GRE tunnels.
  • SRv6 is just a data-plane encapsulation and needs a control plane. RFC 9252 defines two control planes you can use5: EVPN is the usual answer, but you could also use L3VPN (RFC 4364)6.
  • EVPN is the clear winner. EVPN implementations use MPLS, VXLAN, PBB, or SRv6 data-plane encapsulations.

I’m positive there are a half-dozen RFCs out there describing other combinations. Please leave pointers in the comments.

Layer-3-Only VPNs

It took us a long time, but we finally reached the question we started with: What can you use to implement layer-3-only VPNs?

L3VPN (RFC 4364) is the obvious answer but usually requires MPLS encapsulation (unless your vendor implemented layer-3 VPNs with SRv6 using RFC 4364 control plane). That’s not a big deal; most modern ASICs support MPLS in one way or another, and SR-MPLS makes MPLS fabrics as simple as possible. The real problem is that L3VPN is not cool enough for the vendor marketing departments (hey, everyone has been doing that for 25 years). They might want to use it to power their SRv6 implementations, but then they might keep mum about it and pretend that SRv6 magic gets the job done.

EVPN is another (more popular) answer, but there’s just a tiny gotcha: it was designed to be a layer-2 VPN solution. Making it into a layer-3 VPN is possible but requires significant deviations from vendor defaults (= tweaking nerd knobs) (see Jeff Doyle and Jeff Tantsura discussing it for more details).

The beauty of EVPN is that you can use it with whatever your vendor is pushing you into: it always worked with MPLS, it works beautifully with VXLAN, and most vendors preaching the benefits of SRv6 can use it as the VPN control plane on top of it.


  1. To be fair, it’s not their job to explain the fundamentals to enable a customer to make an informed decision. Their job is to sell their boxes; a confused customer is always an excellent mark. ↩︎

  2. If your network is extensive enough so that it matters, and you can’t decide what to use, stop reading blog posts and hire someone with a cluebat. ↩︎

  3. You could also prove RFC 1925 Rule 6 and offload the problem to an SDN controller. ↩︎

  4. Or was told about as part of my consulting work ; ↩︎

  5. Unless you believe in the magic powers of SDN controllers. ↩︎

  6. Who would have thought such an ancient technology could be used to control the shiny new stuff? ↩︎

4 comments:

  1. What's the point of SRv6? To me it's like those esoteric Linux distributions - yes, you can make them work, but why? I use Arch BTW.

    Replies
    1. I always claim it's a solution in search of a problem unless you happen to be a vendor desperate to sell expensive new ASICs, in which case it's manna from heaven ;)

    2. I'm not from a vendor but SP and I don't agree with You. It's already stable working (as primary DP) within the production network since a year (including TE). But yes, not everything is perfect yet. Still, in 2024 if someone can deliver their services with sr-mpls and srv6 I would chose srv6 way.

    3. So nice to hear someone is running SRv6 in production. Is there a public description of what you're doing that you could point us to? Thank you!

  2. The reason of L2VPN is becoming more popular by service providers and customers is about provisioning complexity.

    With an L3VPN usually you have to spend weeks on agreeing configuration details with your service provider. This process is very difficult to automate. With an L2VPN you just have to accept a few parameters and you can connect immediately. Even you can have a Network-as-a-Service style. The only disadvantage is to install your own edge routers to separate the service provider section from your private section.

    By the way, this is where private LISP could be an easy to use alternative to private MPLS for your private WAN on top of L2VPN telco services. :-)

    L2VPNs can be also automatically provisioned through a number of different service provider sections. L3VPNs are more difficult to manage over multiple service provider sections. It is doable, but not easy to automate.

    The Metro Ethernet Forum has tons of specifications on the different VPN services. There you can see clearly the difference.

    Replies
    1. > What's the point of SRv6?

      Firstly, I'm not speaking for my employer here. I have hardly worked yet on SRv6 myself. My team-mates do that. I focus on other stuff for now.

      There are several benefits of SRv6 that I've heard of.

      1) You'll have one address space for addresses and labels. Some people like that.

      2) You have network programmability inside your SIDs. You can do stuff like NFV, etc. I have no idea how many networks really use this.

      3) You can have non-SRv6 speakers in your SRv6 network. As the destination-addresses, which are SIDs/locators, are still just IPv6-addresses, any router that does IPv6 but not SRv6, can still be deployed in your SRv6 network.

      Personally I don't care about that reason. I really dislike it when a nice clean new technology gets mucked up, just so it is a bit easier to migrate an existing network. The migrations will finished within a few years. And then you have those ugly details and hacks remaining for the next 3 decades.

      4) You can summarize locators.

      This is the big benefit in my eyes. We know how to build a network with a 1000 or so routers, in one IGP domain. But once we go bigger, things are not trivial anymore. We can use BGP to glue parts of the network together. We can divide an IGP into areas. But that causes other problems or inefficiencies. E.g. if you do SR-MPLS, what MPLS labels are your L1L2 routers gonna advertise? They'll need to advertise all MPLS labels for all the loopbacks in the L1 area. And the other way around (L2->L1) is even worse (even more loopbacks). We can not (really) summarize MPLS labels in the control plane. And we sure as hell can not summarize MPLS labels in the dataplane.

      With SRv6, this problem is solved. The L1L2 routers can summarize not only the regular prefixes in the area, but also the locators, the flex-algo locators, etc. The only remaining problem is when BGP-speakers expect to see /32s (or /128s) for every peer they talk to. There are no elegant solutions for this yet, so it would be the next problem to fix.

      You probably don't care about summarizable locators, or networks with 10k+ routers. But in the end: "the only problem is scalability". (I though that was a quote from Randy Bush. But Google can't find it. So not sure). My big interest in my career has been scalability in routing protocols. (And performance, robustness and convergence, as they are all closely related). So I like it when SRv6 locators can be summarized.

      Or maybe someone with a lot of cash to spend on new routers can ask their supplier for summarizable MPLS labels? :) In today's companies, nobody will listen to a simple engineer as myself. But if a large ISP or hyperscaler, with a fat wallet, asks for something, that could make a difference. :) But then, nobody asked for summarizable MPLS labels during the last 25 years, so I expect nobody will any time soon.

    2. I don't really understand the first 2 arguments, how is that different from SR-MPLS?

      The summarization argument is interesting. This obviously is applicable only to hierarchical designs, with multiple IGP domains. So with appropriate addressing of locators, you can advertise only a summary route to other domains.

      My question is whether there are ASICs that support longest prefix match for SRv6 locators? LPM is more complicated and expensive than exact match, and now this has to be implemented for a new encapsulation. And how is this done, I presume recirculation similar to what happens in L3VPN with per-VRF label? I haven't worked with SRv6 but very curious to look at hardware limitations and caveats. What I know for sure is that SID depth for SRv6 is typically much lower than for SR-MPLS, which limits traffic engineering capabilities.

      Also, while SR-MPLS strictly speaking doesn't have summarization, you can use IS-IS area proxy and advertise only area segment into L2, so routers won't receive L1 routes/SID. This will save FIB space. The downside is that controller will be required to reach routers inside L1, so it's not the most robust design, but allows to do "sort of" summarization with SR-MPLS.

    3. > advertise only a summary route to other domains

      Domains or areas, yes. As I heard tli once say: "the only real tool we have for scalability is hierarchy with summarization". We have that in IGPs with areas. Or multiple IGP domains with redistribution (which is not much different). But if the areas have limitations, and you can't really use them, we have no real tool for scalability.

      > there are ASICs that support longest prefix match for SRv6 locators?

      Remember that SRv6 locators look just like IPv6 addresses. And hardware has been capable to do LPM for IPv6 for decades. So yeah.

      > SID depth for SRv6 is typically much lower than for SR-MPLS

      Yes. But how many SIDs do you really need? For TI-LFA 2 SIDs is usually enough (for the P-node and the Q-node). For uloop-avoidance, 1 or 2 SIDs, is usually enough. For VPNs you might need 2 SIDs. Only for real TE you might need more. I've heard in the real world, have 2-3 SIDs is usually enough for TE. Also, I think MTUs are huge these days. So the overhead of a few SIDs doesn't matter. The only issue is how many SIDs can a router deal with itself? (I have no idea, tbh. I am not a hardware guy).

      > you can use IS-IS area proxy and advertise only area segment into L2, so routers won't receive L1 routes/SID.

      I don't think this does what you want it to do. There are other, simpler ways to advertise "a prefix of MPLS labels". See binding-SIDs. The problem is that when the control-plane knows about these "MPLS prefixes", the hardware does not. So any packet destined for a label out of such a "MPLS prefix" can not be forwarded. If it was up to me (but it isn't), the next feature for SR/MPLS forwarding hardware for all vendors would be the ability to do LPM for labels. I think that would help build easier larger more scalable networks. But if we don't have that in SR-MPLS, we do have it in SRv6.

      And that was my point. Summarizable SRv6 SIDs (aka locators) allow you to build much larger networks.

    4. > how is that different from SR-MPLS?

      The first argument is not very impressive. Right now people have to give every box in their network one or more IPv4 addresses, one or more IPv6 address, and one or more SR-MPLS labels. That's 3 address-spaces to take numbers from. If you do SRv6, where labels are just IPv6 addresses (or IPv6 prefixes, depending on how you look at locators), you have only 2 address spaces. I don't think that is a huge benefit. I don't run a network. I was just repeating what I heard.

      The 2nd argument requires a bit more knowledge about SR6v. Remember, SRv6 locators are just IPv6 addresses. An operator reserves a prefix to cut locators out. So suppose they assign a /48 for the locators. Lets say we use 1:2:3::0/48. Then you need to cut individual locators for each router. Let's take 16 bits for that. So now the locator for router N is 1:2:3:N::/64. Now you want to assign locators for flex-algos. Let's use 8 bits for that. So the locator for router N, flex-algo F is: 1:2:3:N:F0::/72. That's it. That's the bits you need for SRv6 routing.

      But you still have the last 56 bits that are zeros. Unused. You could use them. The idea is that you can but "instructions" into those 56 bits. An operator and a parameter. Or an operator and two parameter. And you might be able to fix more than one instruction in those 56 bits. The instructions could be: "this packet needs to go through a firewall". Or "this packet needs to go through NAT". I don't know much about real world application, but that is the idea.

      You're not gonna be able to do this with SR-MPLS. With SR-MPLS you get 20 bit address-space. And that is it. Not enough to do fancy things. (Unless you go stack a shitload of labels, maybe).

    5. > Remember that SRv6 locators look just like IPv6 addresses.

      Is this also true for uSID?

      > So suppose they assign a /48 for the locators. Lets say we use 1:2:3::0/48. Then you need to cut individual locators for each router. Let's take 16 bits for that. So now the locator for router N is 1:2:3:N::/64. Now you want to assign locators for flex-algos. Let's use 8 bits for that. So the locator for router N, flex-algo F is: 1:2:3:N:F0::/72. That's it. That's the bits you need for SRv6 routing.

      Okay, this makes sense. Easier administration than a variety of MPLS labels. BTW, does this work for uSID which is much shorter?

      Also I'm curious to see an existing design leveraging SRv6 to solve actual problems that cannot be solved or are difficult to solve with SR-MPLS.

    6. > Also I'm curious to see an existing design leveraging SRv6 to solve actual problems that cannot be solved or are difficult to solve with SR-MPLS.

      Come on, did you have to ask for that? 🤣🤣 Spoilsport! 😜

  3. > Is this also true for uSID?

    Yes.

    SRv6 works like this: after the normal IPv6 header, there is a new header, the SRH (Segment Routing Header). The SRH holds a list of all IPv6 addresses/locators of intermediate routers that have to be traversed in the path to the destination. Each router on this list must be a SRv6 speaker. But the other intermediate routers do not need to be SRv6 capable.

    The originating (or ingress) router puts the locator/IPv6 address of the first hop in the destination address of the normal IPv6 header. (And fills the SRH with all locators to traverse). Then the packet gets routed (via SRv6 speakers and/or non-SRv6 speakers) to the first hop. Like the IPv4 loose source routing option was supposed to work (before it was banned from real networks).

    When the packet reaches the first intermediate hop, that router will replace the destination address in the IPv6 header with the next locator from the list in the SRH. And sends the packet to the next hop in the SRv6-path. And so on.

    The SRH header containst a list of N IPv6 addresses for N intermediate hops. So if you send the packet across 5 hops, your SRH header will be 5 * 16 = 80 octets (plus 8 for the fixed size part of the SRH) = 88 octets. That's a lot of overhead.

    See:
    https://datatracker.ietf.org/doc/html/rfc8754#name-segment-routing-header

    With uSIDs, you use a slightly different header format than the straight-forward SRH header. I think the SRH is still used. But in stead of holding N * 128 bits, it holds info in a smarter way. E.g. in my example above, I suggested we'd carve all locators out of a /48 block. If you do that, you don't need to repeat that /48 in every SID. You'd have a one-time "Locator-block" in the SRH. And then all SIDs don't need to mention/repeat those 48 bits. Similarly, if you use 16 bits to identifier a router, and 8 bits for the flex-algo number (or 0 for algo-0), then you still have 56 bits set to 0 in every locator. So you don't need to put those bits in the SRH either. (Unless you wanna put instructions&arguments in those 56 bits). That's the idea. Of course there all these little details to make things more complex.

    Replies
    1. > I'm curious to see an existing design leveraging SRv6 to solve actual problems

      I work for a vendor. I'm just a simple programmer. And as I said, I don't do much work on SRv6. (I work on IS-IS. But currently my interest lies in flooding (performance, robustness, scalability, etc). I've also done a bit of work to improve visibility of IS-IS (new show command, improved existing show commands, done CPU profiling, kicking the shit out of IS-IS in my testbed and see what code consumes the CPU, lots of fun things).

      Most customers don't like it if vendors would educate their competition about the details and secrets of their own networks. So even if I knew more about SRv6 in the real world, I probably couldn't tell you. Vendors have PMs. They know these things. And they know what they can share. Some customers don't mind being references to other customers, regarding SRv6. Talk to a PM, and he might be able to tell you more. Sorry.

  4. I found this presentation explains SRv6 uSID and compares it to classic SRv6.

    Introduction to SRv6 uSID Technology. www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2024/pdf/BRKSPG-2203.pdf

    Assuming that a network implements the F3216 uSID format and the operator reserved FC00::/24 for SRv6

    >summarize locators FC00:00TT::/32 for a block, different blocks assigned to each flex algo slice and domain, max 57k routers per domain/slice. FC00:00TT:SS00::/40 per set, one or more sets is assigned to an area or region. Up to 224 sets, e000 to ffff is for 8192 functions local to the node. FC00:00TT:SSII::/48 per node, up to 256 nodes per set

    "SSII" is a 16bit uSID

    Access areas could have just two /24 routes via the area border routers, maybe multiple /32 if more path control is required for flex-algo. Aggregation and core areas would have the /40 aggregates of the other areas and the /32 of other domains. Inside each area is the /48 routes for locator uSIDs of the other routers within that area.

    >SID depth for SRv6 is typically much lower than for SR-MPLS Not for F3216 uSID format, a single IPv6 header can take 6 x uSIDs without even using the SR extension header. If a platform can push 3 SR extension headers then it gives a total of 24 labels deep. I think most use cases will not need an SR extension header, 5 transport SIDs + 1 function SID (replace MPLS service label). Flex algo removed the need for long label stacks to force a path calculated on TE metric or constraints like "use only routers on plane-A". Extension header is needed when the first 32bits of the destination address must be changed along the path for inter-domain routing.
    There are other formats which pack even more uSIDs in one IPv6 header, but that results in a uSID with only 8bits.

    >different from SR-MPLS A large network (> 1000 routers) design with SR-MPLS requires binding SIDs between areas, a set of PCE's must see the full topology via BGP-LS feeds from each region and then give small access routers the path via the anycast binding SIDs by using PCEP with on demand next hop. Seems easier to just aggregate inter-region routes. SRv6 seems to get more acceptance than MPLS in the datacentre to be an alternative to VXLAN, it's supported in Cilium, eBPF, FD.io, Sonic

Add comment
Sidebar