Category: IP routing
A while ago, I wrote a blog post explaining why we should (mostly) disable ICMP redirects, triggering a series of comments discussing the root cause of ICMP redirects. A few of those blamed static routes, including:
Put another way, the presence or absence of ICMP Redirects is a red herring, usually pointing to architectural/design issues instead. In this example, using vPC Peer Gateway or, better yet, running a minimal IGP instead of relying on static routes eliminates ICMP Redirects from both the problem and solution spaces simultaneously.
Unfortunately, that’s not the case. You can get suboptimal routing that sometimes triggers ICMP redirects in well-designed networks running more than one routing protocol.
Imagine you built a layer-2 fabric with tons of VLANs stretched all over the place. Now the users want to exchange traffic between those VLANs, and the obvious question is: which devices should do layer-2 forwarding (bridging) and which ones should do layer-3 forwarding (routing)?
There are four typical designs you can use to solve that challenge:
- Exchange traffic between VLANs outside of the fabric (edge routing)
- Route on core switches (centralized routing)
- Route on ingress (asymmetric IRB)
- Route on ingress and egress (symmetric IRB)
This blog post is an overview of the design models; we’ll cover each design in a separate blog post.
After discussing network addressing and switching, routing, and bridging in the How Networks Really Work webinar, it was high time for a deep dive into routing protocols, starting (as always) with an overview.
Mark Seery wrote a fantastic must-read article explaining why routing will never be a solved problem.
You might want to enjoy it as a relaxing antidote after a painful exposure to SD-WAN (or SD-something-else) brainwashing.
Early bridges implemented a single bridging domain across all ports. Within a few years, we got multiple bridging domains within a single device (including bridging implementation in Cisco IOS). The capability to have multiple bridging domains stretched across several devices was still missing… until the modern-day Pandora opened the VLAN box and forever swamped us in the complexities of large-scale bridging.
Network terminology was easy in the 1980s: bridges forwarded frames between Ethernet segments based on MAC addresses, and routers forwarded network layer packets between network segments. That nirvana couldn’t last long; eventually, a big-enough customer told Cisco: “I don’t want to buy another box if I already have your too-expensive router. I want your router to be a bridge.”
Turning a router into a bridge is easier than going the other way round1: add MAC table and dynamic MAC learning, and spend an evening implementing STP.
When I started implementing the netlab VLAN module, I encountered (at least) three different ways of configuring physical interfaces and bridging domains even though the underlying packet forwarding operations (and sometimes even the forwarding hardware) are the same. That confusopoly is guaranteed to make your head spin for years, and the only way to figure out what’s going on behind the scenes is to go back to the fundamentals.
A friend of mine working for a mid-sized networking vendor sent me an intriguing question:
We have a product using an old ASIC that has 12K forwarding entries, and would like to extend its lifetime. I know you were mentioning some useful tricks, would you happen to remember what they were?
This challenge has no perfect solution, but there are at least three tricks I’ve encountered so far (as always, comments are most welcome):
Most large content providers use some sort of egress traffic engineering on edge web proxy/caching servers to optimize the end-user experience (avoid congested transit autonomous systems) and link utilization on egress links.
I was planning to write a blog post about the tricks they use for ages, and never found time to do it… but if you don’t mind watching a video, the Source Routing on the Edge presentation Oliver Herms had at iNOG::14v does a pretty good job explaining the concepts and a particular implementation.
Every now and then someone tells me how much better the global Internet would be if only we were using recursive layers (RINA) and hierarchical addresses. I always answer “that’s a business problem, not a technical one, and you cannot solve business problems by throwing technology at them”, but of course that has never persuaded anyone who hasn’t been running a large-enough business for long enough.
Syed Khalid Ali left the following question on an old blog post describing the use of IBGP and EBGP in an enterprise network:
From an enterprise customer perspective, should I run iBGP or iBGP+IGP (OSPF/ISIS/EIGRP) or IGP while doing mutual redistribution on the edge routers. I was hoping if you could share some thoughtful insight on when to select one over the another?
We covered tons of relevant details in the January 2022 Design Clinic, here’s the CliffNotes version. Keep in mind that the road to hell (and broken designs) is paved with great recipes and best practices, and that I’m presenting a black-and-white picture because I don’t feel like transcribing the discussion we had into an oversized blog post. People wrote books on this topic; I’m pretty sure you can search for Russ White and find a few of them.
Finally, there’s no good substitute for understanding how things work (which brings me to another webinar ;).
One of my readers sent me an intriguing challenge based on the following design:
- He has a data center with two core switches (C1 and C2) and two Cisco Nexus edge switches (E1 and E2).
- He’s using static default routing from core to edge switches with HSRP on the edge switches.
- E1 is the active HSRP gateway connected to the primary WAN link.
The following picture shows the simplified network diagram:
The multi-threaded routing daemons blog post generated numerous in-depth comments here and on LinkedIn. As always, thanks a million for keeping me honest and providing more details or additional perspectives. Here are some of the best bits.
Jeff Tantsura provided the first dose of reality:
All modern routing protocols implementations are multi-threaded, with a minimum separation of adjacency handling, route calculations and update generation. Note - writing multi-threaded code for complex tasks is a non trivial exercise (you could search for thread safety and similar artifacts and what happens when not implemented correctly). Moving to a multi-threaded code in early 2010s resulted in a multi-release (year) effort and 100s of related bugs all around.
Dr. Tony Przygienda added his hands-on experience (he’s been developing routing protocol software for ages):
More than a decade ago (before SD-WAN was even a thing) I wrote an article describing how easy it is to route different applications onto different links (MPLS/VPN versus IPsec tunnels) using a distance vector routing protocol (preferably BGP, although even RIP would work).
You might find it interesting that it’s possible to solve tough problems with good network design instead of proprietary unicorn dust, so I salvaged the article from some dusty archive, cleaned it up, polished it, and published it on ipSpace.net.
I got into an interesting debate after I published the Anycast Works Just Fine with MPLS/LDP blog post, and after a while it turned out we have a slightly different understanding what anycast means. Time to fall back to a Wikipedia definition:
Anycast is a network addressing and routing methodology in which a single destination IP address is shared by devices (generally servers) in multiple locations. Routers direct packets addressed to this destination to the location nearest the sender, using their normal decision-making algorithms, typically the lowest number of BGP network hops.
Based on that definition, any transport technology that allows the same IP address or prefix to be announced from several locations supports anycast. To make it a bit more challenging, I would add “and if there are multiple paths to the anycast destination that could be used for multipath forwarding1, they should all be used”.