BGP Route Selection: a Failure of Intent-Based Networking

Tuesday, January 9, 2018 09:42 +0100

BGP Route Selection: a Failure of Intent-Based Networking

It’s interesting how the same pundits who loudly complain about the complexities of BGP (and how it will be dead any time soon and replaced by an SDN miracle) also praise the beauties of intent-based networking… without realizing that the hated BGP route selection process represents one of the first failures of intent-based approach to networking.

Let’s start with some definitions. There are two ways to get a job done by someone else:

You tell them how to do the job (algorithmic or imperative approach)
You tell them what should be done but not how to do it (declarative approach). Marketers love to call this intent-based approach.

Now think about any routing protocol implementation. Do you tell a router how to get the job done or do you configure what should be done, for example:

On which interfaces should a routing protocol work
Which interfaces are better or more preferred than others
Which prefixes should it advertise
Which prefixes should be summarized

Got it? Routing protocols were an early implementation of intent-based paradigm… but of course the marketers touting the benefits of intent-based gizmos don’t want to hear that because they’re telling you at the same time how much routing protocols suck.

Now imagine that you don’t like the results of a routing protocol. You can’t change the route selection algorithm, but you can tweak your intent, and hope that the results will be what you want them to be. There are people who tweak IGP costs to push the traffic the right way, and Cariden made a significant business out of tools that predicted how changes in costs would affect traffic flow. They even calculated the optimal link costs for your network based on the network topology and traffic matrix.

I briefly mentioned Cariden in SDN Use Cases webinar. I also wanted to do a deep-dive podcast with them, but unfortunately they got acquired before we managed to schedule the recording.

Things are simple in the IGP land because we have simple goals:

Survive failures;
Keep things relatively stable;
Spread the load across the whole infrastructure;

Now imagine business rules, commercial preferences, and contractual limitations entering the picture. Welcome to the problem BGP is trying to solve.

In an environment that would define the problem before trying to solve it, the final solution would probably include a centralized controller where you’d implement business-driven decisions in your own custom path selection logic, and use simple local versions of that same algorithm as a fallback plan in case of central controller failure.

There are probably people doing exactly that, and I would love to hear from them. Most of the great ideas you get in networking have already been implemented by someone… it’s just that they’re not bragging about what they’re doing in Something-Open-Something-Something conferences.

Guess what – almost nobody wanted to go down that path (assuming they did the homework and realized what would need to be done), because it’s a messy business, and really hard to get right. Everyone wanted the networking vendors to tweak their code to solve business problems one-at-a-time (without ever getting the whole picture) with tweaks to intent-driven data model. That’s how we got:

Weights (I want to influence how a box does route selection)
Local preference (I want to influence route selection within my system)
MED (I want to influence how others send traffic to me)
Communities (I want to influence others but can’t use any other tool, so let’s hope they interpret my intent correctly)
Lists of communities to use (this is how you can signal your intent in a way that I’ll understand it)
Weird rules about route reflector attributes affecting route selection (because it’s always possible to build a broken network and then claim it’s the vendor- or protocol fault)
Crazy stuff like copying IGP metric into BGP extended community because we want to have some more tweaks on intent without having to deal with writing the code ourselves.

In my biased view (because I don’t believe in fairy tales and magic), BGP is a pretty obvious lesson in what happens when you try to solve vague business rules with intent-driven approach instead of writing your own code that does what you want to be done.

It will be great fun to watch how the next generation of intent-based solutions will fare. I haven’t seen anyone beating laws of physics or RFC 1925 Rule 11 yet.

Need more (cynical) details on the intent-based hype spreading through the networking industry? You’ll find them in Network Automation Concepts webinar.

Recent posts in the same categories

automation

intent-based networking

SDN

11 comments:

Joshua Morgan 09 January 2018 22:27

Hi Ivan,

The 'define the problem before trying to solve it' link appears to be broken (BlobNotFound).

Regards,

Josh

Replies

Ivan Pepelnjak 10 January 2018 15:27

Fixed. Thank you!

HEMANTH RAJ 10 January 2018 08:41

Hi Ivan

Intent based networking has to work end to end and it shouldnt be our next hop . BGP solves all the exit and entry points thats basically the next hop problems and not the entire end to end path.
Honouring the intent end to end is the goal where like RSVP is end to end to reserve required B.W. and not doesnt count on Hop counts or Delay.

Intent based network has to be honoured end to end and has to be universally acceptable like BGP communities or RSVP reservations and not just my next hop.

Replies

Ivan Pepelnjak 10 January 2018 15:30

Hi Hemanth,

Ignoring the fact that what you wrote doesn't exactly apply to IBGP-based networks (you know, the other way of using BGP that's being deployed by every ISP worldwide), and that in a WAN network usually you cannot guarantee anything beyond AS boundary (because that's the very definition of an AS), you totally missed the point of the article, which was that trying to express your business logic with declarative intent tends to get messier and messier.

Anonymous 11 January 2018 12:36

The point I got is that this article and most of the articles lately are to promote the online courses. There are no longer articles, that comment technology because it is new, or interesting or controversial. Every article has to end with "If you want to know more, subscribe to online courses..." Not that this is bad thing per se, but is becoming more and more annoying.

adam 12 January 2018 09:26

Well maybe it’s because you folks really need to be educated on something.
Ivan actually summed my thoughts on this very well,
You either have a protocol that you tweak in the most weird and wonderful way and hope for the best, that is when it churns through the selection algorithm it will produce the result you intended in the first place. For folks that are fed up with this or there’s no way in hell to tweak the protocol to get the intended result there’s the DIY way using source based routing, where you program the intended result yourself (by defining the hop by hop path each packet should take).
As you can see the first approach is like setting up pins on Galton board so that most beans end up in one bucket and the more buckets you want to fill the more pins you need and the whole things becomes very complicated.

Adam Vitkovsky
adamv0025.netconsultings.com

Sander Steffann 10 January 2018 09:36

Talk to Andrew Alston, he's doing simple local rules + centralised controller based on business rules with segment routing.

Unknown 11 January 2018 16:31

Hi Ivan - nice article... in terms of the "final solution" you described, this is exactly how the Plexxi controller works. The operator describes the "workload intent" - i.e. what are the performance and security parameters or constraints that are required and our affinity algorithms compute topologies, flow policies, etc, that meet those needs. Everything starts as non-differentiated, so you don't need to involve the controller up front, only when you have specific declarative needs, and yes it fails back to the standard load-shared network algorithms. And yes, we never started as an "SDN" company, we started as a company that wanted to define a specific workload management problems for networks and leverage SDN and modern constraint-based algorithm capabilities to help network operators get to a better answer faster.

Bela 16 January 2018 11:13

Typically, we would need both. Intent based (declarative) and procedural. They are for different audiences. The declarative for the average use or operator. The procedural for the very experienced designer of developer. This a hierarchy of reuse, too. You create something useful in procedural and the others could reuse it in declarative mode without getting into the details.
A contemporary network would have both local intelligence in the network nodes and customization in a central controller. Formerly, the central element was called the network management system. Now we can have fancier names... :-)

DRL 18 January 2018 23:54

As Ivan pointed out, any definition of intent-based which consists entirely of declarative inputs would only work in a fantasy world. A large group of innovators from both vendors and network operators worked to create a much more specific and meaningful definition. In this type of intent-based system the power and value comes not from claiming that you can build infrastructure without prescriptive inputs to the implementation, but from carefully separating the description of the high level human/business goals from the description of the implementation choice specifics. Such a system has two types of input:

Intent- Description of intended state that is completely free of any implementation specific references. Interfaces, devices, addresses, protocol types, vendors, OSI layers are not allowed. They are not intent by this definition. Thus you can't have a meaningful conversation about intent based BGP. It's already too implementation specific to qualify.

Mapping - Implementation details and choices provided by M2M interaction and/or human expert input. Consists of Applications telemetry (description of app and user identity, etc) and infrastructure telemetry (inventory and status/state of paths, interfaces,devices, servers, etc).
In this architecture the two types of input are needed to feed the "intent engine" which is a process that runs forever automatically changing device configuration and behaviors in response to any change in either the intent input or the mapping input.

Intent is that workloads A, B, and C have logical isolation
Mappings would provide:
A has address X, B has address Y, C has address Z.
Logical isolation is implemented using VXLAN (or MPLS VPN)

Intent engine sets up tunnels/forward to make the intent come true. If A gets a new address, intent engine changes config to make it true again. And so on.

The intent can be created/described by any non expert.
The intent can be moved, unmodified to any other vendor, protocol, device, cloud, etc.
The intent can be sharded unmodified across an arbitrary number of administrative domains
The intent never changes as a result in the change of network state.

The address info might come from e.g. VCenter pub/sub
The choice of virtualization technologies might be made by an IT expert.

Above is just the tip of the iceberg of educating about a definition of Intent-based that actually has some value in the evolving IT landscape. Lots of people have worked hard to validate and define this model. After several years agreeing and publishing a definition in the ONF NBI WG, which I chaired, we are now continuing the work in the MEF Forum's Intent Based Orchestration WG which I will co-chair with John Strassner from Huawei. Anyone who wants to clear up all the FUD and intent-washing and work on an important and challenging project should reach out and get involved.

Invariant Intent + app telemetry + infra telemetry are sufficient inputs to build an autonomous system with infrequent manual inputs mostly from non-experts. There is still a need for some experts at the ecosystem level, but not every system user needs to be one, unlike today's operating model. Lower OPEX, shorter time to service and repair, eliminate many causes of human error. Better SLAs. Love to discuss this more with folks...

Dave Lenrow, CN3 Systems

Ethan Banks 19 January 2018 15:23

A thought that some marketing of intent-based preys upon the fears of engineers who lack BGP competency. I.e., BGP is "hard" while IBN purports to be "easy."

Add comment