IP routing « ipSpace.net blog

Wednesday, September 7, 2022 07:05 UTC

How Routers Became Bridges

Network terminology was easy in the 1980s: bridges forwarded frames between Ethernet segments based on MAC addresses, and routers forwarded network layer packets between network segments. That nirvana couldn’t last long; eventually, a big enough customer told Cisco: “I don’t want to buy another box if I already have your too-expensive router. I want your router to be a bridge.”

Turning a router into a bridge is easier than going the other way round¹: add MAC table and dynamic MAC learning, and spend an evening implementing STP.

Router Interfaces and Switch Ports

When I started implementing the netlab VLAN module, I encountered (at least) three different ways of configuring physical interfaces and bridging domains even though the underlying packet forwarding operations (and sometimes even the forwarding hardware) are the same. That confusopoly is guaranteed to make your head spin for years, and the only way to figure out what’s going on behind the scenes is to go back to the fundamentals.

Living with Small Forwarding Tables

A friend of mine working for a mid-sized networking vendor sent me an intriguing question:

We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?

This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):

Worth Watching: Source Routing on the Edge (iNOG::14v)

Most large content providers use some sort of egress traffic engineering on edge web proxy/caching servers to optimize the end-user experience (avoid congested transit autonomous systems) and link utilization on egress links.

I was planning to write a blog post about the tricks they use for ages, and never found time to do it… but if you don’t mind watching a video, the Source Routing on the Edge presentation Oliver Herms had at iNOG::14v does a pretty good job explaining the concepts and a particular implementation.

add comment

IP routing

Sunday, April 3, 2022 08:34 UTC

Worth Reading: Higher Levels of Address Aggregation

Every now and then someone tells me how much better the global Internet would be if only we were using recursive layers (RINA) and hierarchical addresses. I always answer “that’s a business problem, not a technical one, and you cannot solve business problems by throwing technology at them”, but of course that has never persuaded anyone who hasn’t been running a large-enough business for long enough.

Geoff Huston is doing a much better job in the March 2022 ISP Column – read the Higher Levels of Address Aggregation, and if you still need more technical details, there’s 30+ pages of RFC 4984.

see 1 comments

Monday, March 28, 2022 08:42 UTC… updated on Friday, May 31, 2024 13:51 +0200

Combining BGP and IGP in an Enterprise Network

Syed Khalid Ali left the following question on an old blog post describing the use of IBGP and EBGP in an enterprise network:

From an enterprise customer perspective, should I run iBGP, iBGP+IGP (OSPF/ISIS/EIGRP), or IGP with mutual redistribution on the edge routers? I was hoping you could share some thoughtful insight on when to select one over the other.

We covered many relevant details in the January 2022 Design Clinic; here’s the CliffNotes version. Remember that the road to hell (and broken designs) is paved with great recipes and best practices and that I’m presenting a black-and-white picture because I don’t feel like transcribing our discussion into an oversized blog post. People wrote books on this topic; search for “Russ White books” to find a few.

Finally, there’s no good substitute for understanding how things work (which brings me to another webinar ;).

ICMP Redirects Considered Harmful

One of my readers sent me an intriguing challenge based on the following design:

He has a data center with two core switches (C1 and C2) and two Cisco Nexus edge switches (E1 and E2).
He’s using static default routing from core to edge switches with HSRP on the edge switches.
E1 is the active HSRP gateway connected to the primary WAN link.

The following picture shows the simplified network diagram:

Highlights: Multi-Threaded Routing Daemons

The multi-threaded routing daemons blog post generated numerous in-depth comments here and on LinkedIn. As always, thanks a million for keeping me honest and providing more details or additional perspectives. Here are some of the best bits.

Jeff Tantsura provided the first dose of reality:

All modern routing protocols implementations are multi-threaded, with a minimum separation of adjacency handling, route calculations and update generation. Note - writing multi-threaded code for complex tasks is a non trivial exercise (you could search for thread safety and similar artifacts and what happens when not implemented correctly). Moving to a multi-threaded code in early 2010s resulted in a multi-release (year) effort and 100s of related bugs all around.

Dr. Tony Przygienda added his hands-on experience (he’s been developing routing protocol software for ages):

Scalable Policy Routing

More than a decade ago (before SD-WAN was even a thing) I wrote an article describing how easy it is to route different applications onto different links (MPLS/VPN versus IPsec tunnels) using a distance vector routing protocol (preferably BGP, although even RIP would work).

You might find it interesting that it’s possible to solve tough problems with good network design instead of proprietary unicorn dust, so I salvaged the article from some dusty archive, cleaned it up, polished it, and published it on ipSpace.net.

Keep reading

see 1 comments

Wednesday, November 24, 2021 07:15 UTC

Anycast Fundamentals

I got into an interesting debate after I published the Anycast Works Just Fine with MPLS/LDP blog post, and after a while it turned out we have a slightly different understanding what anycast means. Time to fall back to a Wikipedia definition:

Anycast is a network addressing and routing methodology in which a single destination IP address is shared by devices (generally servers) in multiple locations. Routers direct packets addressed to this destination to the location nearest the sender, using their normal decision-making algorithms, typically the lowest number of BGP network hops.

Based on that definition, any transport technology that allows the same IP address or prefix to be announced from several locations supports anycast. To make it a bit more challenging, I would add “and if there are multiple paths to the anycast destination that could be used for multipath forwarding¹, they should all be used”.

Multi-Threaded Routing Daemons

When I wrote the Why Does Internet Keep Breaking? blog post a few weeks ago, I claimed that FRR still uses single-threaded routing daemons (after a too-cursory read of their documentation).

Donald Sharp and Quentin Young politely told me ~~I was an idiot~~ I should get my facts straight, I removed the offending part of the blog post, promised to write another one going into the details, and Quentin improved the documentation in the meantime, so here we are…

Non-Stop Routing (NSR) 101

After Non-Stop Forwarding, Stateful Switchover and Graceful Restart, it’s time for the pinnacle of high-availability switching: Non-Stop Routing (NSR)¹.

The PowerPoint-level description of this idea sounds fantastic:

A device runs two active copies of its control plane.
There is no cold/warm start of the backup control plane. The failover is almost instantaneous.
The state of all control plane protocols is continuously synchronized between the two control plane instances. If one of them fails, the other one continues running.
A failure of a control plane instance is thus invisible from the outside.

If this sounds an awful lot like VMware Fault Tolerance, you’re not too far off the mark.

Interactions Between BFD and Graceful Restart

We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Dmitry Perets on the interactions between BFD and GR.

Well, assuming that the C-bit is set honestly (will be funny if not) and assuming that the Helper is using this bit correctly (and I think it’s pretty well defined what “correctly” means - see section 4.3 in RFC 5882), the answer is pretty clear.

Graceful Restart and BFD

The whole High Availability Switching series started with a question along the lines of “does it make sense to run BFD together with Graceful Restart”. After Non-Stop Forwarding 101, Graceful Restart 101, and Graceful Restart and Convergence Speed we finally have enough information to answer that question.

TL&DR: Most probably not.

A more nuanced answer depends (as always) on a gazillion implementation details.

Graceful Restart and Routing Protocol Convergence

I’m always amazed when I encounter networking engineers who want to have a fast-converging network using Non-Stop Forwarding (which implies Graceful Restart). It’s even worse than asking for smooth-running heptagonal wheels.

As we discussed in the Fast Failover series, any decent router uses a variety of mechanisms to detect adjacent device failure:

Physical link failure;
Routing protocol timeouts;
Next-hop liveliness checks (BFD, CFM…)

Category: IP routing