Building network automation solutions

9 module online course

Start now!

Category: IP routing

Reviving Old Content, Part 3

We had the usual gloomy December weather during the end-of-year holidays, and together with the partial lockdown (with confusing ever-changing rules only someone in Balkans could dream up) it managed to put me in OCD mood… and so I decided to remove broken links from the old blog posts.

While doing that I figured out how fragile our industry is – I encountered a graveyard of ideas and products that would make Google proud. Some of those blog posts were removed, I left others intact because they still have some technical merits, and I made sure to write sarcastic update notices on product-focused ones. Consider those comments Easter eggs… now go and find them ;))

read more add comment

What Exactly Happens after a Link Failure?

Imagine the following network running OSPF as the routing protocol. PE1–P1–PE2 is the primary path and PE1–P2–PE2 is the backup path. What happens on PE1 when the PE1–P1 link fails? What happens on PE2?

Sample 4-router network with a primary and a backup path

Sample 4-router network with a primary and a backup path

The second question is much easier to answer, and the answer is totally unambiguous as it only involves OSPF:

read more add comment

Fast Failover: Techniques and Technologies

Continuing our Fast Failover saga, let’s focus on techniques and technologies available to implement it (assuming you still think it’s worth the effort).

The following text is heavily based on comments Jeff Tantsura wrote on one of my LinkedIn posts as well as the original blog post. Thank you!

There are numerous technologies you can use to implement fast reroute, from the most complex to the easiest one:

read more see 13 comments

Fast Failover: Hardware and Software Implementations

In previous blog posts in this series we discussed whether it makes sense to invest into fast failover network designs, the topologies you can use in such designs, and the fault detection techniques. I also hinted at different fast failover implementations; this blog post focuses on some of them.

Hardware-based failover changes the hardware forwarding tables after a hardware-detectable link failure, most likely loss-of-light or transceiver-reported link fault. Forwarding hardware cannot do extensive calculations; the alternate paths are thus usually pre-programmed (more details below).

read more see 13 comments

Why Is Public Cloud Networking So Different?

A while ago (eons before AWS introduced Gateway Load Balancer) I discussed the intricacies of AWS and Azure networking with a very smart engineer working for a security appliance vendor, and he said something along the lines of “it shows these things were designed by software developers – they have no idea how networks should work.

In reality, at least some aspects of public cloud networking come closer to the original ideas of how IP and data-link layers should fit together than today’s flat earth theories, so he probably wanted to say “they make it so hard for me to insert my virtual appliance into their network.

read more see 1 comments

Fast Failover: Topologies

In the blog post introducing fast failover challenge I mentioned several typical topologies used in fast failover designs. It’s time to explore them.

The Basics

Fast failover is (by definition) adjustment to a change in network topology that happens before a routing protocol wakes up and deals with the change. It can therefore use only locally available information, and cannot involve changes in upstream devices. The node adjacent to the failed link has to deal with the failure on its own without involving anyone else.

read more add comment

Why Is OSPF not Using TCP?

A Network Artist sent me a long list of OSPF-related questions after watching the Routing Protocols section of our How Networks Really Work webinar. Starting with an easy one:

From historical perspective, any idea why OSPF guys invented their own transport protocol instead of just relying upon TCP?

I wasn’t there when OSPF was designed, but I have a few possible explanations. Let’s start with the what functionality should the transport protocol provide reasons:

read more see 9 comments

How Fast Can We Detect a Network Failure?

In the introductory fast failover blog post I mentioned the challenge of fast link- and node failure detection, and how it makes little sense to waste your efforts on fast failover tricks if the routing protocol convergence time has the same order of magnitude as failure detection time.

Now let’s focus on realistic failure detection mechanisms and detection times. Imagine a system connecting a hardware switching platform (example: data center switch or a high-end router) with a software switching platform (midrange router):

read more see 1 comments

Fast Failover: The Challenge

Sometimes you’re asked to design a network that will reroute around a failure in milliseconds. Is that feasible? Maybe. Is it simple? Absolutely not.

In this series of blog posts we’ll start with the basics, explore the technologies that you can use to reach that goal, and discover one or two unexpected rabbit holes.

Fast failover is just one of the topics we’ll discuss in Advanced Routing Protocol Features part of How Networks Really Work webinar.
read more see 3 comments

Do We Need LFA or FRR for Fast Failover in ECMP Designs?

One of my readers sent me a question along these lines:

Imagine you have a router with four equal-cost paths to prefix X, two toward upstream-1 and two toward upstream-2. Now let’s suppose that one of those links goes down and you want to have link protection. Do I really need Loop-Free Alternate (LFA) or MPLS Fast Reroute (FRR) to get fast (= immediate) failover or could I rely on multiple equal-cost paths to get the job done? I’m getting different answers from different vendors…

Please note that we’re talking about a very specific question of whether in scenarios with equal-cost layer-3 paths the hardware forwarding data structures get adjusted automatically on link failure (without CPU reprogramming them), and whether LFA needs to be configured to make the adjustment happen.

read more see 18 comments

Grasp the Fundamentals before Spreading Opinions

I should have known better, but I got pulled into another stretched VLANs for disaster recovery tweetfest. Surprisingly, most of the tweets were along the lines of you really shouldn’t be doing that and that would never work well, but then I guess I was only exposed to a small curated bubble of common sense… until this gem appeared in my timeline:

Networking Needs ZIP codes

Interestingly, that’s exactly how IP works:

read more see 4 comments

Weird: Wrong Subnet Mask Causing Unicast Flooding

When I still cared about CCIE certification, I was always tripped up by the weird scenario with (A) mismatched ARP and MAC timeouts and (B) default gateway outside of the forwarding path. When done just right you could get persistent unicast flooding, and I’ve met someone who reported average unicast flooding reaching ~1 Gbps in his data center fabric.

One would hope that we wouldn’t experience similar problems in modern leaf-and-spine fabrics, but one of my readers managed to reproduce the problem within a single subnet in FabricPath with anycast gateway on spine switches when someone misconfigured a subnet mask in one of the servers.

read more see 5 comments
Sidebar