Why Is Cisco Pushing LISP in Enterprise Campus?

Monday, September 11, 2017 08:22 +0200

Why Is Cisco Pushing LISP in Enterprise Campus?

I got several questions along the lines of “why is Cisco pushing LISP instead of using EVPN in VXLAN-based Enterprise campus solutions?”

Honestly, I’m wondering that myself (and maybe I’ll get the answer in a few days @ NFD16). However, let’s start at the very beginning…

What Do You Really Need?

It looks like Cisco (and a few other vendors, each one in its own way) still believes in the dire need for large layer-2 domains. I keep wondering why it seems everyone’s so obsessed with large VLANs stretching all across campus. If you have a good use case, please let me know.

Do keep in mind that traffic separation is not a VLAN use case. It seems easier to solve with VLANs instead of VRFs because you don’t appreciate how brittle or convoluted the behind-the-curtain stuff is. In other words, you’re trading explicit complexity (VRFs + associated routing protocols) for hidden complexity (MLAG or TRILL or SPB or VXLAN with EVPN or LISP or…).

I also stopped believing in IP address mobility being the necessary driving force behind large VLANs. I know people using Mobile IP (and it’s even easier with IPv6) on mobile phones, and most phones today can use mobile data + wireless at the same time anyway. On top of that, wireless access points tend to handle roaming pretty well, and in many cases use their own version of IP tunneling.

Long story short: ask yourself whether you really need large VLANs or whether you need a simpler IP network and smart apps (and as I said, do report your findings in the comments).

Back to VXLAN

It looks like the networking industry is in another lemming rush. Everyone is rolling out VXLAN to solve large VLAN challenges, or even replacing MPLS with VXLAN for L3VPN deployments. Every single vendor is rolling out EVPN as the control plane for VXLAN. The current list includes at least Arista, Brocade (aka Extreme), Cisco, Cumulus, and Juniper.

Yet Cisco decided to use a completely different control plane (LISP) in campus networks. I can’t possibly grasp why they’d do that apart from having a solution that has been searching for a problem to solve for years. If you know a really good technical reason why LISP is better in a campus network than EVPN (potentially with conversational learning in case Cisco yet again has hardware challenges) please share it with me.

I’m not the only baffled engineer out there. Here’s what one of my readers wrote:

Cisco DNA is not fulfilling my needs, this is more complex and looks like a marketing solution. Why would I use LISP to do the same thing given that we are already doing using EVPN [in the data center]?

He asked around whether he could use Nexus switches in campus to get the functionality he needs and (not surprisingly) got the answer along the lines “it might work, but we haven’t tested it”. Or as I told him:

Don't fight the vendor. If your use case is not on their radar, don't try to push it through and make it work (though it makes perfect sense technically) - you'll hit all sorts of bugs because you'll be using untested combinations of features... or you might discover that they don't have features you need in your particular environment.

Fortunately, there's more than one networking vendor out there, and some of them are small enough that they might work with you to get an interesting use case off the ground (I’m looking at you, Cumulus Networks). Just saying ;)

Recent posts in the same categories

switching

LISP

LAN

35 comments:

Omer Shtivi 11 September 2017 10:31

Actually for security separation Cisco has trustsec.
And for wireless there is a capwap tunnel between the AP and the WLC.
The main issues (at least from what I see at my customers) is 1 they are used to work in L2 environment and are afraid from L3 and routing protocols.
The other issue is FW clusters, if you have a FW cluster most of them demand L2

Replies

Iain 31 October 2017 13:52

Is anyone else uncomfortable with making ISE the core of their wired campus LANs (trustsec)? I've seen ISE DB meltdowns and it isn't pretty.

Unknown 19 September 2018 20:46

What kind of backup/restore functionality is available in ISE?

Unknown 11 September 2017 12:26

Maybe there are some use-cases, that are more organizationally driven than technically driven. And I think it's not only about large L2 domains.
One example would be a user, which moves from "A" to "B" within the campus. "B" is assigned to another distribution layer as "A" -> user gets a new IP address.

Assuming this user or device has unique firewall rules, pointing to the old address, a few applications won't work after the move.
The reactive action is to create a new DHCP reservation for "B" and alter the firewall object. Depending on the IT agility this could take hours or days :)

Proactive solutions could be:
- Change your firewall and/or rule/object design
- Use some identity based FW feature > use names instead of IPs
- Keep the IP through the campus.
- Enhance the moving organizational process to somehow alter the FW object before the actual move

That's just my thoughts about a potential, not completely useless use-case.

Why they use LISP instead of EVPN for the control plane ... no clue...

Replies

Unknown 11 September 2017 15:55

I agree, that maybe LISP is more "small access" edge device friendly.
Let's assume a Catalyst 3850 switch, which is a potential Campus Fabric edge device. This device supports 24,000 IPv4 routes according to the data sheet. The "SD-Access" scale is 8k IPv4 routes and 16k IPv4 host entries. So maybe this is the reason to use LISP to maintain a conversional routing table at the border nodes.

Ivan Pepelnjak 12 September 2017 01:11

I was thinking about this issue, but it's just a question of RIB-to-FIB filtering (aka conversational learning) that's been done with various L2 technologies (and now with EVPN as well IIRC). No need to use LISP for this.

Jeremy Filliben 11 September 2017 15:24

I have a blog post percolating about this topic, but this may be a better forum.

We have long passed the point of diminishing returns in the campus LAN. Speeds and feeds stopped being a reason to upgrade campus LAN switches when we reached gigabit to the desktop and N x 1gb (or N x 10gb) to the aggregation/core. This is terrible for Cisco's (and other networkers) business, since the lifespan of campus gear went from 3 years in the early 2000s to 10-15 years now. Even wireless has hit a point of 'good enough'; if we never got a faster wireless standard than 802.11ac we could make things work indefinitely by creating smaller, lower-powered cells. Sure, we can find corner use cases that require more speed, but for 95+% of users, we have enough.

So how do networking vendors rectify this? It invents 'compelling' reasons to upgrade that are not based on speeds/feeds. Many of these are dubious, and several are downright harmful to business success. I leave it to the reader to name your favorite unneeded campus technology (mine is/was NAC). These solutions are required by a limited audience, but that will not stop networking vendors from attempting to force them on all customers. It is our job as network architects/managers/engineers to continually ask the question "Why does my business need this technology?".

Jeremy

Replies

Anonymous 19 September 2017 04:58

I agree with you but one of the key use cases emerging in campus environment is network segmentation based on the device posture to support IoT, BYOD, Lab systems etc. how do you achieve it without NAC/and of any fabric technology

Unknown 11 September 2017 16:35

Do remember that Cisco is made up of several business unit and each want to show that they are innovative. It could be for this reason that the enterprise networking BU decided to go a different route to VXLAN with EVPN. As some have mentioned, LISP uses a pull mechanism(DNS method requesting a resolution) and hence is more economical for TCAM space. LISP also work as inter subnet or Intra subnet. Hence flexibility to cater for different use cases. Unfortunately there are poorly designed apps which seems to only work in a broadcast domain this can be resolved using broadcast over unicast but no one bothers. Also too many IT orgs are too bugged down/lazy to re IP entire campus and cause down times.

Bela 11 September 2017 19:15

LISP excels in availability and mobility at the same time. It is not a L2 solution. You would normally use it as a L3 connectivity service. Our company has compared a lot of technologies for the requirements of the Air Traffic Management industry and only LISP satisfies all the requirements. We promote it for ICAO as the preferred technology for the Future Communications Infrastructure.
The reliance on a DNS like resolution is not bad. It is a proven solution for the telecommunications industry. Just look at ENUM/DNS used all over the place in SIP based voice routing. With the advent of VoLTE, it will be THE solution. LISP is just applying something similar for packet routing.
However, not everyone likes the dependence on a centralized resolution service.
We will see if it will succeed or not.
Please, be aware that it is not a pure Cisco technology. HPE fully supports it. Huawei also invests a lot into LISP. For example, Huawei is developing the missing LISP pieces into the ONOS network. So probably, they would like to use in telecommunications networks, not just in the enterprise... :-)

Bela 11 September 2017 19:18

By the way, LISP is now on the track to become a Porposed Standard at IETF.
https://datatracker.ietf.org/wg/lisp/about/

Bela 11 September 2017 19:22

The locator and ID seperation problem is also nothing new. The telephony networks have suffered by this problem, too. The big difference is that the call setup is only done once in a session, so you may have seconds to do all the resolution lookups including number portability and intelligent networks.
NFV is nothing else then the re-discovery of the SS7 IN or the SIP/IMS/iFC mechanism. Just for packkets instead of SIP sessions. LISP will be used together with NFV in the future, further extending this long established idea... There is nothing really new under the sun, just new clothes for the same old thing... :-)

Bela 11 September 2017 19:32

Nothing is wrong with L2 Ethernet framing, if you remove spanning tree and use IS-IS as the underlying routing protocol. Then we are back to a next-generation DECnet. :-)
However, Cisco for some reason rejected both TRILL and the IEEE IS-IS router Ethernet,
Ethernet addresses do not have a locator component, but the IS-IS layer could add this.
The real problem with Ethernet is the cut-through switching, since it will result in a free frame error propagation.
There are lot of technologies, and you could not decide which one will stay with us based on technical merits. The business case will decide. There is nothing wrong with SDH, actually it is much better for a lot of applications, and it had SDN for a long time already. But if there are very low production volumes, it is becoming so expensive, that it will die out...
Remember VHS winning againts the others? But now we have H.26x videos, or real-time streaming... :-)

Piotr Jablonski 11 September 2017 20:01

Why is Cisco Pushing LISP in Enterprise Campus? From the business perspective because it is sticky to Cisco. From the technical standpoint, as Bela wrote, a purpose of LISP is not to extend L2 altough it can be used as a control plane for a VXLAN based overlay. LISP offers a higher scalability and traffic optimization than BGP. Both can be complex but today the management and orchestration hides the complexity. From the user's perspective it is less and less important what is behind the curtain. So SDA can bring LISP a second life. :)

jmanteau 11 September 2017 20:33

On this subject, I can offer a good feedback of 2 years of LISP in production (for VRF transport over WAN). Let's just say that MPLSoDMPVN is back on the table.

Replies

Dan Shechter Gelles 12 September 2017 00:26

Can you share more details?

Anonymous 13 October 2017 14:06

I'm curious about this as well. We've had LISP in production on our WAN for almost six years now and have been very happy with it.

Unknown 12 September 2017 04:47

"I can’t possibly grasp why they’d do that apart from having a solution that has been searching for a problem to solve for years."

I think it comes down to the engineering team that takes on the product development. The issues with "go with what you know" is a big problem in all engineering companies. Architects and Engineering leaders don't put themselves at risk by looking at what is best, they use what the know will work regardless of if it's fit for purpose or passes any sort of Commercial, Strategic or Architectural governance.

A better question to ask is why Cisco's CTO office does not enforce standards or governance in their RnD projects where there's no competitive advantage to be gained.

They continue to pay lip service to open source engagement, why is that?

Unknown 12 September 2017 15:26

Cisco has invested heavily in LISP and is looking for a ROI, which is to be expected. Having just gone through the whole SD Access bootcamp I am just confused!! Abstraction is mean't to introduce simplicity, however, under the hood, the whole SD access piece is a minefield of technologies that a really good CCIE would struggle to debug if things went south. There are way too many moving parts being masked by a fancy looking GUI. Good luck with whoever is an early adopter with that solution.

Victor Moreno 13 September 2017 08:32

Ivan, It’s always interesting to follow your views on what the industry is doing. In this particular instance, the LISP based solution for Access Networks is not focused on stretching Layer 2 or creating large L2 domains (although it can where required/desired) for mobility or otherwise. The solution is however focused on (1) providing very large scale host level connectivity in Campus and Branches and (2) address the churn WiFi mobility creates on the network.

The impact of mobility events in a LISP network (as you know from past reviews published in your blog) is limited to signaling amongst the network elements involved in active connections between the devices. However, the impact of mobility events in a BGP network is unbound. Even if you have conditional FIB programming, all changes are pushed to all participants. You can try to mitigate this with summarization, but that will have little effect in the case of access networks. This is just one of the many lessons the industry has learnt after years of building overlays with traditional push mechanisms.

I happened to be in the process of posting a document (give me a few hours as it propagates through the system so I can give you a URL) that describes a wealth of other functionality that is possible by the simple principle of the demand based control plane and a discussion on why this is best realized with a demand protocol. One thing to remember is that the overlay problem is one of maintaining a directory of locations. This is not necessarily a routing problem. The use of a directory of locations (and other interesting information) allows us to evolve the services that are provided in these networks. This goes well beyond traditional routing services to include policy driven services. That said, even traditional routing services such as multicast and route leaking are improved. For instance, if you have ever set up (and you probably have) multicast across multiple VRFs in an extranet route-leaking arrangement, you would appreciate a solution that can simplify the machinery involved. There is also the fact that LISP can do this without creating any additional state (vs. copying all routes across all VRFs in the traditional solutions).

Hopefully my comments and the pointer provided are good evidence that this is the product of much thought and indeed an evolution in networking. LISP has come a very long way in the last few years as a product of lessons learnt on numerous successful deployments. I am confident that, upon further review, you will find that the SD-Access implementation of LISP is a much richer solution than what you may have explored before and you’d appreciate how LISP is enabling much needed innovation in this space.

Replies

Daniel Dib 13 September 2017 12:27

Hi Victor,

I would also be interested in this document if it's possible for you to share. Do you have any document that discusses reactive control plane vs proactive control plane?

Interested to hear your thoughts on complexity of LISP, cache size, and failure modes etc.

Victor Moreno 13 September 2017 20:42

Here is the document I referenced above. The document discusses the main points on why we chose to use LISP over traditional routing. There are more areas, but you'll find the main ones in the document: https://www.cisco.com/c/dam/en/us/solutions/collateral/enterprise-networks/software-defined-access/white-paper-c11-739593.pdf

Victor Moreno 13 September 2017 20:51

Hi Daniel,

Regarding complexity. It is a different way of looking at things, very similar to DNS from a flow perspective. The configuration and operations are significantly simpler than BGP, but they are perceived as complex because they don't follow the same principles as a routing protocol. So it's a matter of coming to terms with the fact that this isn't a routing problem/task. As for cache size, memory and CPU requirements, we have done benchmarking that shows a footprint that is about 10% of what is required by BGP. The Control Plane is capacity planed following similar guidelines for capacity planning a DNS server. As for failure modes, this is a broad topic, but there is a recursive reliance on the underlay control plane which does use traditional routing protocols with all their functionality, at the borders of the fabric there are mechanisms to maintain visibility into remote network health and circumvent indirect failures (something that most/all overlay mechanisms have failed to address to date).

Dino Farinacci 13 September 2017 23:48

And if you want to have a discussion about LISP, I suggest asking questions (which I can answer) on Facebook group "lisp-routing".

Daren Fulwell 13 September 2017 19:00

Interesting discussion here ...

My slightly wider (and shallower) take is that - as Victor mentions above - the inclusion of LISP is a mobility play not a L2 stretching thing (though Cisco have put that into some of their early education material I notice) The most important part of the SD-Access solution for me is actually not the LISP control plane but the SGT policy stuff. This is the point where you create a separation between host ID (ie IP address) and the security, QoS etc that gets applied to traffic. Based on who you are (or the type of device you are using) you can get access to different stuff, right? Now this isn't new, but have you seen how easy it is to deploy in the DNA Center GUI? That's the big play. Because DNA Center will create the VRFs, the VNIs and all the config under the hood to stitch this stuff together and make it feasible. (Yes a CCIE will still be needed to troubleshoot it when it all goes wrong but hey, we all need jobs right?)

The mobility piece is a nod to legacy as much as anything I think - with SGTs you shouldn't need to care what your IP address is in the access network as you are granted access based on who/what you are. As we know though, sometimes keeping the same IP address is important (especially in legacy apps or legacy networks that don't talk SGT) and so being able to move an address around a network without having to reauthenticate becomes important. And LISP gives you a (nearly) standardised and (relatively) well-worn approach to that without resorting to trying to maintain a distributed database of /32s across an arbitrary topology of switches. We all know a campus network is not like a DC. Traffic flows are very different, volumes are different, connectivity requirements too, and so topologies, control planes and policy enforcement are totally different. I like my DCs being separate from my campus, so am I worried about a different control plane? Not so long as I can translate between them or orchestrate. Centralised identity has been progressively more important in an enterprise, so why not centralised forwarding control plane? Back in the day we were always taught they could be a good thing (off-router route/path calculation etc) and if it's good enough for the DC, why not the campus?

OK, yes, I've drunk the Kool-Aid and I'm sold on the Cisco solution - but I work for a Cisco partner so I would be. I have been waiting for the first viable campus access network overlay solution and it looks like it's arrived from an unexpected source! There will doubtless be others. But Cisco's usual approach of taking a few previously unrelated features and bundling them together to create an "architecture" looks like it might just work this time. Now we can stitch overlays (campus, SD-WAN and DC) together to give a proper end-to-end solution and we can concentrate on giving customers a slicker, easy-to-consume, automated network experience while we continue to do battle with CLI under the hood!!

Replies

Jeremy Filliben 13 September 2017 19:37

This is the point I was trying to get at... Who is asking for the sort of deep security that SGTs provide? I'm sure some customers (think they) need this, but it's not 100% of the campus customer base. I suspect it is more like 15%. It's still a larger market, but not as big as many expect. If I'm wrong about the size of the market, it will be Cisco's marketing muscle to thank. Not actual business requirements.

Victor Moreno 13 September 2017 20:58

Hi Jeremy, SD-Access can be deployed at any level of functionality a customer may desire. One customer may use a multitude of VNs and SGTs, while another customer may not need to use any. There is a wealth of automation and assurance functionality in the solution that is compelling and useful at either end of the spectrum. Additionally, the SD-Access solution provides an intuitive abstraction based on the grouping of end-points to express the relationships between end-point groups, this may or may not drive segmentation based on VNs and/or SGTs, but it does provide a standard way of expressing what we would like the network to do in an abstraction that is closer to human language.

Victor Moreno 13 September 2017 21:01

Hi Daren, Great summary of what is in store. I would add that LISP gives us a simplification of the underlying connectivity machinery and streamlines mobility which remains very relevant for integrated wireless. Regarding the need for a CCIE to troubleshoot, I think we have made their life easier and will continue to do so as the assurance functionality is delivered and as the associated AI engines adapt to the particular customer operations to not only troubleshoot, but also predict possible future failures.

Daren Fulwell 13 September 2017 21:50

Thanks Victor - will be interesting to see when we get our hands on it properly!

Hi Jeremy - understood, but SGTs are not just about security at the edge. While we can make it very cool (who wouldn't want to quarantine unpatched machines or remove a PC's access if it starts misbehaving?) the SGTs can also just be used to deploy traditional firewall policy without resorting to IP addressing. They are also about other policy such as QoS and PBR if you want them to be (indeed they are now called Scalable Group Tags for that reason) And as SGTs can be named, you're now expressing policy by its intent (bingo!)

Daren Fulwell 13 September 2017 22:21

And do customers want it? I think so. We are implementing Trustsec, endpoint profiling and Stealthwatch right now for healthcare, commercial and finance customers. Everyone loves a bit of security these days ...!

Dino Farinacci 13 September 2017 23:44

If you would take a closer look at LISP, you'd find that it solves more than the one use-case EVPN solves. Have a look at https://datatracker.ietf.org/wg/lisp/documents/.

Iain 31 October 2017 13:57

"It looks like Cisco (and a few other vendors, each one in its own way) still believes in the dire need for large layer-2 domains. I keep wondering why it seems everyone’s so obsessed with large VLANs stretching all across campus. If you have a good use case, please let me know."

My experience is that many organizations "require" large stretched VLANs simply because they have not adopted DHCP/DDNS in place of statically assigned addresses for printers and other devices. This design oversight introduces a significant amount of operational overhead and the volume of adds/moves/changes become unsustainable without stretched VLANs.

Anonymous 29 March 2018 16:54

Sound like Cisco still miss something like SPB.
There are good reasons to have that type of tech in the campus...

Darek 14 October 2019 11:52

The answer to the question is simple: to lock You to their solution; to milk Your money. Additionally being enchanted lately by the "quality" of "cisco software company" software, I've the rights to claim that this dna is bug free, rock solid, state of the art, super duper stable, amazing like hell, fast as light, proud like a god, f...g awesome piece of code.

Wally 16 April 2021 11:09

Thank you for this. I am a bit of a novice Engineer. Only having my CCNA but as I began my CCNP studies as soon as I noticed LISP I thought...too complex. The new adage of work smarter not harder popped into my head. Imo that should only apply to network devices whether they be virtual or physical. Human should work smarter and harder. Put our networks thru such monotony would make sense if it bettered th3 endgame with a result of greater speed decreased latency but it just seems to be a project Cisco worked on and now they decide to force it on us.

Add comment