LISP started as yet-another ocean-boiling project focused initially on solving the “we use locators as identifiers” mess (not quite), and providing scalable IPv6 connectivity over IPv4-only transport networks by adding another layer of indirection and thus yet again proving RFC 1925 rule 6a. At least those are the diagrams I remember from the early “look at this wonderful tool” presentations explaining for example how Facebook is using LISP to deploy IPv6 (more details in this presentation).
Somehow that use case failed to gain traction and so the pivots1 started explaining how one can use LISP to solve IP mobility or IP multihoming or live VM migration, or to implement IP version of conversational learning in Cisco SD-Access. After a few years of those pivots, I started dismissing LISP with a short “cache-based forwarding never worked well” counterargument.
Looks like the team developing LISP came to the same conclusions. According to Bela Varkony:
Nowadays [LISP] is used with reliable transport and full PubSub. There is no caching behavior at all. Each xTR has a full routing table.
Also, engineers with exposure to LISP (but no skin in the game) view it as an alternative to existing VPN control-plane mechanisms (smart people used it as replacement for DMVPN years ago). Back to Bela:
My view of LISP is not a solution for the global public Internet. In this aspect I agree, that it has a a lot of issues. I see a potential in LISP as a private overlay replacing MPLS VPN solutions with something that is better in performance and has a built-in support for multi-link mobility. In that environment LISP has a much better performance and scalability then BGP… I agree that in some aspect LISP is a kind of a next generation BGP when PubSub is used with reliable transport.
Time to go back to the first principles. In any VPN control plane solution you have:
- Egress edge nodes that collect local reachability information using MAC learning, DHCP/ARP/ND snooping, or routing protocols. Egress edge nodes advertise that information to some central repository
- (Hopefully redundant) central repository that distributes local reachability information received from the edge nodes to other edge nodes.
- Ingress edge nodes that receive information from one or more repository servers, select information they’re interested in (based on local VPN configuration), compare it with local reachability information to deal with multihomed endpoints, and install the results into the forwarding table.
The above description applies equally well to L3VPN or EVPN BGP address families, PubSub LISP, or most SD-WAN implementations.
Bela is effectively saying “LISP does this faster than BGP” and even mentions a reason or two:
There it is a big advantage that there is no best path selection. So it is very fast for mobility.
Well, you still have to compare remote information with local information, but if you want to go down the path of “the last update we received from the controller cluster is the one we believe” I see no reason why the same mechanism couldn’t be implemented in something like BGP Lite should that be a problem worth solving. Maybe we’re due for another round of “IS-IS is better than OSPF” arguments that were somewhat justified just because the team writing IS-IS code for a major vendor happened to be slightly better than the team writing OSPF code (or so it was explained to me).
Back to LISP and BGP. There are still some things where BGP seems to be better than LISP. It looks like at least Cisco and Juniper implemented Route Target Constrained Route Distribution (RFC 4684)2 whereas Bela claims a corresponding feature is still missing in LISP:
The future opportunity is to use selective subscription in LISP. Then you can have a full control of destinations that are interesting for you. So you can still reduce the memory needs, you do not need to have a full routing table everywhere, but you will not suffer by the generic caching algorithm issues. Your LISP map-cache will be under your full control […] Unfortunately, selective subscription implementation is lagging behind. But it might come soon…
Looking at all of the above, I have a funny feeling that we’re dealing with another instance of XKCD Standards. I’m not working with sufficiently-large-scale fabrics and will stick with BGP3, if you decide to deploy LISP please let me know how it worked out.
Also known as “solutions looking for a problem” ↩︎
Or I got it all wrong, in which case please fix my ignorance (= write a comment) and I’ll update the blog post. ↩︎
Potentially avoiding implementations that need up to 300 seconds to set up a BGP session with 2000 VNIs. ↩︎