Category: IP routing
More than a decade ago (before SD-WAN was even a thing) I wrote an article describing how easy it is to route different applications onto different links (MPLS/VPN versus IPsec tunnels) using a distance vector routing protocol (preferably BGP, although even RIP would work).
You might find it interesting that it’s possible to solve tough problems with good network design instead of proprietary unicorn dust, so I salvaged the article from some dusty archive, cleaned it up, polished it, and published it on ipSpace.net.
I got into an interesting debate after I published the Anycast Works Just Fine with MPLS/LDP blog post, and after a while it turned out we have a slightly different understanding what anycast means. Time to fall back to a Wikipedia definition:
Anycast is a network addressing and routing methodology in which a single destination IP address is shared by devices (generally servers) in multiple locations. Routers direct packets addressed to this destination to the location nearest the sender, using their normal decision-making algorithms, typically the lowest number of BGP network hops.
Based on that definition, any transport technology that allows the same IP address or prefix to be announced from several locations supports anycast. To make it a bit more challenging, I would add “and if there are multiple paths to the anycast destination that could be used for multipath forwarding1, they should all be used”.
When I wrote the Why Does Internet Keep Breaking? blog post a few weeks ago, I claimed that FRR still uses single-threaded routing daemons (after a too-cursory read of their documentation).
Donald Sharp and Quentin Young politely told me
I was an idiot I should get my facts straight, I removed the offending part of the blog post, promised to write another one going into the details, and Quentin improved the documentation in the meantime, so here we are…
The PowerPoint-level description of this idea sounds fantastic:
- A device runs two active copies of its control plane.
- There is no cold/warm start of the backup control plane. The failover is almost instantaneous.
- The state of all control plane protocols is continuously synchronized between the two control plane instances. If one of them fails, the other one continues running.
- A failure of a control plane instance is thus invisible from the outside.
If this sounds an awful lot like VMware Fault Tolerance, you’re not too far off the mark.
We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Dmitry Perets on the interactions between BFD and GR.
Well, assuming that the C-bit is set honestly (will be funny if not) and assuming that the Helper is using this bit correctly (and I think it’s pretty well defined what “correctly” means - see section 4.3 in RFC 5882), the answer is pretty clear.
The whole High Availability Switching series started with a question along the lines of “does it make sense to run BFD together with Graceful Restart”. After Non-Stop Forwarding 101, Graceful Restart 101, and Graceful Restart and Convergence Speed we finally have enough information to answer that question.
TL&DR: Most probably not.
A more nuanced answer depends (as always) on a gazillion implementation details.
I’m always amazed when I encounter networking engineers who want to have a fast-converging network using Non-Stop Forwarding (which implies Graceful Restart). It’s even worse than asking for smooth-running heptagonal wheels.
As we discussed in the Fast Failover series, any decent router uses a variety of mechanisms to detect adjacent device failure:
- Physical link failure;
- Routing protocol timeouts;
- Next-hop liveliness checks (BFD, CFM…)
In the Graceful Restart 101 blog post, I promised to discuss the ugly parts of this concept in a follow-up post. It turns out we’ll need more than one; today, we’ll focus on other control plane protocols in an access network scenario.
Imagine an access router with multiple uplinks serving a bunch of non-redundantly-connected customers:
Even though you need plenty of traditional networking constructs to deploy a complex application stack in a public cloud (packet filters, firewalls, load balancers, VPN, BGP…), once you start digging deep into the bowels of public cloud virtual networking, you’ll find out it’s significantly different from the traditional Ethernet+IP implementations common in enterprise data centers.
For an overview of the differences watch the Public Cloud Networking Is Different video (part of Introduction to Cloud Computing webinar), for more details start with AWS Networking 101 and Azure Networking 101 blog posts, and continue with corresponding cloud networking webinars.
In the Non-Stop Forwarding (NSF) article, I mentioned that the routers adjacent to the device using NSF have to play along to make the idea work. That capability is called Graceful Restart. Today we’ll explore its intricate details, be diplomatic, and leave the shortcomings and tradeoffs for the next blog post.
Imagine an access (provider edge) router providing connectivity services to its clients and running a routing protocol with one or more upstream devices.
It started with an interesting question tweeted by @pilgrimdave81
I’ve seen on Cisco NX-OS that it’s preferring a (ospf->bgp) locally redistributed route over a learned EBGP route, until/unless you clear the route, then it correctly prefers the learned BGP one. Seems to be just ooo but don’t remember this being an issue?
Ignoring the “why would you get the same route over OSPF and EBGP, and why would you redistribute an alternate copy of a route you’re getting over EBGP into BGP” aspect, Peter Palúch wrote a detailed explanation of what’s going on and allowed me to copy into a blog post to make it more permanent:
One of my readers sent me a sad story describing how Chromium service discovery broke a large multicast-enabled network.
The last couple of weeks found me helping a customer trying to find and resolve a very hard to find “network performance” issue. In the end it turned out to be a combination of ill conceived application nonsense and a setup with a too large blast radius/failure domain/fate sharing. The latter most probably based upon very valid decisions in the past (business needs, uniformity of configuration and management).
Last week we explored the basics of unnumbered IPv4 Ethernet interfaces, and how you could use them to save IPv4 address space in routed access networks. I also mentioned that you could simplify the head-end router configuration if you’re using DHCP instead of per-host static routes.
Obviously you’d need a smart DHCP server/relay implementation to make this work. Simplistic local DHCP server would allocate an IP address to a client requesting one, send a response and move on. Likewise, a DHCP relay would forward a DHCP request to a remote DHCP server (adding enough information to allow the DHCP server to select the desired DHCP pool) and forward its response to the client.
In the previous blog post in this series, I described why it’s (almost) impossible to implement unequal-cost multipathing for anycast services (multiple servers advertising the same IP address or range) with OSPF. Now let’s see how easy it is to solve the same challenge with BGP DMZ Link Bandwidth attribute.
I didn’t want to listen to the fan noise generated by my measly Intel NUC when simulating a full leaf-and-spine fabric, so I decided to implement a slightly smaller network: