… updated on Monday, May 11, 2026 16:47 +0200
SRv6 as a Host-to-Host Overlay
During the discussion of the On Applicability of MPLS Segment Routing (SR-MPLS) blog post on LinkedIn someone made an off-the-cuff remark that…
SRv6 as an host2host overlay - in some cases not a bad idea
It’s probably just my myopic view, but I fail to see the above idea as anything else but another tiny chapter in the “Solution in Search of a Problem” SRv6 saga1.
There are two well-known reasons one might want to use a host-to-host overlay:
- Implement virtual networks
- Implement service insertion
According to a comment by an Anonymous Friend of the Blog, hyperscalers and friends use SRv6 for host-based traffic engineering.
Yeah, I missed that use case, but I doubt it’s relevant to many readers of this blog. There is no need for that in most data center fabrics; no ready-to-use product does that, and the amount of prerequisite engineering (not to mention the ASIC requirements) is non-trivial. However, that’s the least of your problems when you’re wasting spending hundreds of billions of dollars on hardware. Also, there’s sometimes a bit of a gap between what tech companies claim they do and what they do in most of their production.
It’s like claiming front and rear wings make perfect sense on cars because F1 cars use them.
On the virtual networks front, we had GRE for decades. We got VXLAN almost a decade ago, and GENEVE a few years later. GRE and VXLAN address a specific use case – GRE is primarily used for some-L3-over-IP transport2, while VXLAN excels when you have to transport Ethernet frames over IP.
GENEVE extends VXLAN with multi protocol capabilities and TLV-encoded metadata. It’s not Turing-complete, but it’s probably pretty close to an overlay kitchen sink3. SRv6 adds nothing to the table apart from the one protocol to rule them all Kool-Aid4 and larger headers.
Maybe we’re looking at the wrong problem. Watching various SRv6 (marketing) presentations, one gets the impression that SRv6 shines in the Service Insertion arena, so maybe that’s why we should use it instead of VXLAN or GENEVE. This is how the service insertion fairy tale is usually told:
- A controller figures out what needs to be done
- The controller programs a stack of entries listing all the services a packet must visit in the ingress node
- The ingress node adds that stack of entries to the incoming packet, ensuring the packet will traverse all the required services.
- Every service in the list receives the packet, removes itself from the list of services, processes the packet, and sends the packet to the next service.
Ignoring for the moment the stupendous complexity of real-life service insertion (anyone remembers Cisco’s Virtual Security Gateway?), there’s a tiny detail usually glossed over: all the services have to be aware of the “service processing” header and handle that header together with the user packet. That’s why Network Services Header idea never took off. For more details, watch the Service Insertion part of SDN Use Cases webinar.
Now ask yourself: how many commercial network appliances can do something along those lines? Let me help you: all those that are integrated with AWS Gateway Load Balancer, Azure Gateway Load Balancer, or VMware NSX east-west packet inspection. How many of those use SRv6? None.
It’s not surprising that VMware chose GENEVE as the east-west service insertion transport protocol5 – GENEVE is the default overlay protocol in VMware NSX-T. It’s more interesting that AWS Gateway Load Balancer uses GENEVE even though they use VXLAN for Transit Gateway Connect. Finally, there’s Azure Gateway Load balancer using two VXLAN tunnels between the load balancer and each appliance, proving the age-old wisdom that as long as service insertion means VLAN stitching, you can do it with VXLAN and EVPN6. Is there a shipping implementation of service insertion using SRv6? I’m not aware of one.
Back to the original SRv6 as host-to-host overlay idea:
- I see no good reason to use SRv6 instead of VXLAN or GENEVE to implement overlay virtual networks. It seems no commercial data center overlay virtual networking product is using it7.
- Large-scale commercial service insertion implementations use VXLAN or GENEVE.
As always, I might be missing something obvious, in which case I’d appreciate your comments.
More Information
- Service insertion challenges are described in the SDN Use Cases webinar.
- VMware NSX-T east/west and north/south service insertion is covered in Firewalling and Security part of VMware NSX Technical Deep Dive
- AWS Gateway Load Balancer and AWS Transit Gateway Connect are part of Amazon Web Services Networking webinar.
- Azure Gateway Load Balancer will get a brief mention8 in autumn 2022 update of Microsoft Azure Networking webinar.
Revision History
- 2026-05-11
- According to an Anonymous Commenter, hyperscalers use SRv6 for host-based traffic engineering.
-
Still not as bad as “we could use LISP to implement global VM mobility” idea followed by a demo of a single VM moved across Europe. ↩︎
-
Although we did use GRE for bridging decades ago, and one could always considered NVGRE just a variant of GRE. ↩︎
-
As in “you can throw anything into it without clogging it too much” ↩︎
-
… and the awesome opportunity to enhance your resume ↩︎
-
North-South service insertion in VMware NSX-T is simple VLAN stitching. ↩︎
-
I’m not implying that Azure uses EVPN, just that you can do VLAN stitching with EVPN control plane. ↩︎
-
There is a home-grown OpenStack implementation ↩︎
-
The documentation is approximately two pages long and mostly says “we’re working with our integration partners to bring you the best possible experience.” ↩︎
100% my position. BCM introduced basic SRv6 support in Tomohawk 4. In terms on implementation - to my knowledge Line Japan has implemented host2host SRv6, Rakuten is trying doing something with it.
This article didn't age well at all... as of May 2026 OpenAI is performing the pretraining for all their models using SRv6. Microsoft and Oracle also using it.
OpenAI: "With MRC, dynamic routing became less necessary. If packets are lost on a path, MRC stops using that path. We took the more radical approach of disabling dynamic routing and using IPv6 Segment Routing (or SRv6), instead. "
https://openai.com/index/mrc-supercomputer-networking/
Oracle Cloud Infra: "Oracle Acceleron instead uses source-based routing, using SRv6 (Segment Routing over IPv6) which allows NICs to control the exact path that packets take through the datacenter"
https://blogs.oracle.com/cloud-infrastructure/first-principles-multipath-reliable-connection
Microsoft: "Static SRv6 ensures paths are deterministic, making problems easier to reproduce, debug and, ultimately, more stable over time. "
https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/building-resilient-networks-for-ai-supercomputers/4516919
Thanks for the pointers; added more information to the blog post.
As for the perceived quality of the aging process 😜 -- that depends entirely on who you think this blog's audience is.
Hi Ivan,
The "not ready to use" argument for SRv6 is no longer valid. The ecosystem is fully there. In the case of OpenAI, it runs on commercial merchant silicon with open-source SONiC; for the rest of the us, traditional vendors are all in. I challenge you to name a major networking vendor that doesn't support SRv6 today.
Regarding the supposed gap between what tech companies claim and what they actually do in production, OpenAI proves otherwise. Read the "Resilient AI Supercomputer Networking using MRC and SRv6" paper. It details their actual, at-scale production deployment and the concrete results of running SRv6 across their massive AI training clusters. https://cdn.openai.com/pdf/resilient-ai-supercomputer-networking-using-mrc-and-srv6.pdf
Last but not least, give your readers some credit! We aren't a bunch of techno-dinosaurs clutching to our legacy protocols. We're actually pretty up to speed with modern networking technologies. The real dinosaurs are still out there trying to make RSVP-TE or VXLAN work for everything—and let's be honest, those guys don't read blogs anyway, they only read 15-year-old certification books. 😉