Category: IP routing
A while ago (eons before AWS introduced Gateway Load Balancer) I discussed the intricacies of AWS and Azure networking with a very smart engineer working for a security appliance vendor, and he said something along the lines of “it shows these things were designed by software developers – they have no idea how networks should work.”
In reality, at least some aspects of public cloud networking come closer to the original ideas of how IP and data-link layers should fit together than today’s flat earth theories, so he probably wanted to say “they make it so hard for me to insert my virtual appliance into their network.”
In the blog post introducing fast failover challenge I mentioned several typical topologies used in fast failover designs. It’s time to explore them.
Fast failover is (by definition) adjustment to a change in network topology that happens before a routing protocol wakes up and deals with the change. It can therefore use only locally available information, and cannot involve changes in upstream devices. The node adjacent to the failed link has to deal with the failure on its own without involving anyone else.
From historical perspective, any idea why OSPF guys invented their own transport protocol instead of just relying upon TCP?
I wasn’t there when OSPF was designed, but I have a few possible explanations. Let’s start with the what functionality should the transport protocol provide reasons:
In the introductory fast failover blog post I mentioned the challenge of fast link- and node failure detection, and how it makes little sense to waste your efforts on fast failover tricks if the routing protocol convergence time has the same order of magnitude as failure detection time.
Now let’s focus on realistic failure detection mechanisms and detection times. Imagine a system connecting a hardware switching platform (example: data center switch or a high-end router) with a software switching platform (midrange router):
Sometimes you’re asked to design a network that will reroute around a failure in milliseconds. Is that feasible? Maybe. Is it simple? Absolutely not.
In this series of blog posts we’ll start with the basics, explore the technologies that you can use to reach that goal, and discover one or two unexpected rabbit holes.
One of my readers sent me a question along these lines:
Imagine you have a router with four equal-cost paths to prefix X, two toward upstream-1 and two toward upstream-2. Now let’s suppose that one of those links goes down and you want to have link protection. Do I really need Loop-Free Alternate (LFA) or MPLS Fast Reroute (FRR) to get fast (= immediate) failover or could I rely on multiple equal-cost paths to get the job done? I’m getting different answers from different vendors…
Please note that we’re talking about a very specific question of whether in scenarios with equal-cost layer-3 paths the hardware forwarding data structures get adjusted automatically on link failure (without CPU reprogramming them), and whether LFA needs to be configured to make the adjustment happen.
Did you ever experience an out-of-the-blue BGP session flap after you were running that peering for months? As Dmytro Shypovalov explains in his latest blog post, it’s always MTU (just kidding, of course it’s always DNS, but MTU blackholes nonetheless result in some crazy behavior).
I should have known better, but I got pulled into another stretched VLANs for disaster recovery tweetfest. Surprisingly, most of the tweets were along the lines of you really shouldn’t be doing that and that would never work well, but then I guess I was only exposed to a small curated bubble of common sense… until this gem appeared in my timeline:
Interestingly, that’s exactly how IP works:
When I still cared about CCIE certification, I was always tripped up by the weird scenario with (A) mismatched ARP and MAC timeouts and (B) default gateway outside of the forwarding path. When done just right you could get persistent unicast flooding, and I’ve met someone who reported average unicast flooding reaching ~1 Gbps in his data center fabric.
One would hope that we wouldn’t experience similar problems in modern leaf-and-spine fabrics, but one of my readers managed to reproduce the problem within a single subnet in FabricPath with anycast gateway on spine switches when someone misconfigured a subnet mask in one of the servers.
We all knew it for a long time, now it’s finally official: IP fragmentation is broken, or as the ever-so-diplomatic IETF likes to call it, IP Fragmentation is Considered Fragile.
The can we trust routing protocols series of blog posts I wrote in April 2020 (part 1, part 2, response from Jeff Tantsura) culminated in an interesting discussion with Russ White and Nick Russo now published as The Hedge Episode 43.
While packets should never be reordered in transit in transparent bridging, there’s no such guarantee in IP networks, and IP applications should tolerate out-of-order packets.
One of my regular readers who designs and builds networks supporting VoIP applications disagreed with that citing numerous real-life examples.
Of course he was right, but let’s get the facts straight first:
Two weeks ago I started with a seemingly simple question:
If a BGP speaker R is advertising a prefix A with next hop N, how does the network know that N is actually alive and can be used to reach A?
… and answered it for the case of directly-connected BGP neighbors (TL&DR: Hope for the best).
Jeff Tantsura provided an EVPN perspective, starting with “the common non-arguable logic is reachability != functionality”.
Now let’s see what happens when we add route reflectors to the mix. Here’s a simple scenario:
How does the network know that a VTEP is actually alive? (1) from the point of view of the control plane and (2) from the point of view of the data plane? And how do you ensure that control and data plane liveness monitoring has the same view? BFD for BGP is a possible solution for (1) but it’s not meant for 3rd party next hops, i.e. it doesn’t address (2).
Let’s stop right there (or you’ll stop reading in the next 10 milliseconds). I will also try to rephrase the question in more generic terms, hoping Aldrin won’t mind a slight detour… we’ll get back to the original question in another blog post.
After a brief overview of FRRouting suite Donald Sharp continued with a deep dive into FRR architecture, including the various routing daemons, role of Zebra and ZAPI, interface between RIB (Zebra) and FIB (Linux Kernel), sample data flow for route installation, and multi-threading in Zebra and BGP daemons.