Reading academic articles about Internet-wide routing challenges you might stumble upon valley-free routing – a pretty important concept with applications in WAN and data center routing design.
If you’re interested in the academic discussions, you’ll find a pretty exhaustive list of papers on this topic in the Informative References section of RFC 7908; here’s the over-simplified version.
Imagine a typical multi-layer somewhat-hierarchical routing structure. It could be a hierarchical WAN design, data center fabric, or the global Internet (we’ll focus on the latter in this blog post). You’d expect the traffic to flow from the source leaf node up the layers until it hits a path toward the destination leaf node where it makes a turn and starts flowing down the layers until it gets to the destination leaf node.
Unfortunately life isn’t always as neat and tidy. For example, a dual-homed customer could create a route leak due to incompetence or fat-finger incident and become a transit AS between ISP Left and ISP Right. Assuming our understanding of routing hierarchy is correct, the traffic from Customer-A toward Customer-B goes uphill, then dips into a valley (dual-homed customer), goes uphill again, and finally goes downhill to reach Customer-B.
Loose definition: A routed network with no traffic flow valleys for any source-destination pair has valley-free routing.
Removing the Valleys
Two mechanisms are commonly used to remove traffic flow valleys from a hierarchical network:
- If you can’t influence the routing protocol policies, install a peer link between elements at the same hierarchical level to create a lower-cost path than a path through the valley.
- Ideally, you’d use routing protocol policies to stop (or at least discourage) route leaks that result in traffic flow valleys. As explained in tedious details in RFC 7454, the two ISPs in our example should filter transit routes received from their shared customers. In an environment that favors reachability over stability or predictable traffic patterns, a better solution might be to increase the cost (or reduce local preference) of routes that would result in traffic flow valleys.
I’m positive at least one of the academic papers listed in RFC 7908 contains a rigorous definition of valley-free routing (pointers welcome); here’s a pretty loose attempt:
Assuming we can assign hierarchical level to network nodes such that the core nodes have the lowest level and the edge nodes have the highest level, a valley-free traffic flow traverses nodes in non-increasing order of levels until it reaches the innermost node, at which point it starts traversing nodes in non-decreasing order of levels.
Here’s an alternative definition assuming we can figure out the contractual relationships between autonomous systems (good luck with that):
- Number links between autonomous systems as (+1,0,-1) for (provider, peer, customer).
- A valley-free path has a sequence of +1s, followed by at most one 0, followed by a sequence of -1s.
Assigning hierarchical levels to network nodes is a fun problem to solve. You could do it manually, automate the process, or try to deduce node’s level in the hierarchy based on its surroundings – RIFT and OpenFabric are both trying to do that to enable plug-and-pray fabric autoconfiguration.
The challenge is relatively easy to solve if you’re allowed to label the core (or edge) nodes. This is how you could do it in the Internet case:
- Assume we could agree on a list of Tier-1 providers;
- Anyone connected to Tier-N provider is Tier-N+1 provider;
- Anyone at tier N not having a connection to a tier N+1 network is an edge node. Whether they want to be called customers is not a technical question ;)
The real fun starts when you’re trying to figure out the hierarchical levels of network elements based solely on network topology… and you have to deal with peer links on any level of hierarchy.
Anyways, I’m an old-school micromanager and prefer to have a structured network design and automated deployment. If you’re like me, you’ll find a data center fabric automation use case by Dinesh Dutt in Network Automation Use Cases webinar, deep dive into the tool he used in Ansible for Networking Engineers webinar… or you could go for a full-blown course on designing, developing and deploying a network automation solution.
So... What we are left with is probably automatic filtering based on what is in route databases BUT:
ISPs are lazy
They care about moving packets not suboptimal routing or security
Route databases often aren't fully accurate
Not all ISPs have automated their filters
So the unfortunate outcome is that we'll see route leaks even in the future. Our hope relies upon people like Job Snijders who are passionate about building a better internet.
Valley free routing (which I call more generically no-zigzag routing, you can easily figure out what I mean, you go north until you change direction and go south which you do with the 0 and +/-, unsurprisingly major principle in RIFT again ;-) is a very desirable, very hard to achieve property unless you have a sense of N-S on the substrate (fabric, AS-mesh, whatever) which preconditions sense of direction (to be bit bloated, you need a partial mesh with a max-upper and lower bound). And hence, if done extremely well you are not even bound by ECMP anymore ;-)
I do obviously think for locally provisioned bandwith (such as IP fabrics) this is a natural way to progress and will become stronger with the demise of "artisanal local network configuration skills". Whether it will be Clos or some variant thereof is not very interesting, routing will adjust, the more "interconnected" your fabric in terms of shortcut the more routing information you'll have to slosh around @ convergence cost @ failure and so on. For things like WAN & AS meshes this is much, much harder thing to achieve, see also the "peer complex role" route leaking discussions in GROW. Hence I don't think lots of progress will be made unless some kind of economic cost will be imposed on hot potato routing ;-)
Good article ...
BTW, that's where "plug-and-pray" comes from. I hope RIFT track record will be infinitely better than that.
Proof's in the pudding ;-) Pudding is downloadable in two implementations, feedback welcome ...
ahh... i see... thank you for the insight