Non-Stop Routing (NSR) 101
After Non-Stop Forwarding, Stateful Switchover and Graceful Restart, it’s time for the pinnacle of high-availability switching: Non-Stop Routing (NSR)1.
The PowerPoint-level description of this idea sounds fantastic:
- A device runs two active copies of its control plane.
- There is no cold/warm start of the backup control plane. The failover is almost instantaneous.
- The state of all control plane protocols is continuously synchronized between the two control plane instances. If one of them fails, the other one continues running.
- A failure of a control plane instance is thus invisible from the outside.
If this sounds an awful lot like VMware Fault Tolerance, you’re not too far off the mark.
Building a Separate Infrastructure for Guest Access
One of my readers sent me an age-old question:
I have my current guest network built on top of my production network. The separation between guest- and corporate network is done using a VLAN – once you connect to the wireless guest network, you’re in guest VLAN that forwards your packets to a guest router and off toward the Internet.
Our security team claims that this design is not secure enough. They claim a user would be able to attach somehow to the switch and jump between VLANs, suggesting that it would be better to run guest access over a separate physical network.
Decades ago, VLAN implementations were buggy, and it was possible (using a carefully crafted stack of VLAN tags) to insert packets from one VLAN to another (see also: VLAN hopping).
… updated on Tuesday, February 15, 2022 15:42 UTC
Creating BGP Multipath Lab with netlab
I was editing the BGP Multipathing video in the Advanced Routing Protocols section of How Networks Really Work webinar, got to the diagram I used to explain the intricacies of IBGP multipathing and said to myself “that should be easy (and fun) to set up with netlab”.
 
Fifteen minutes later1 I had the lab up and running and could verify that BGP works exactly the way I explained it in the webinar.
Feedback: Business Aspects of Networking
Every other blue moon someone asks me to do a not-so-technical presentation at an event, and being a firm believer in frugality I turn most of them into live webinar sessions collected under the Business Aspects of Networking umbrella.
At least some networking engineers find that perspective useful. Here’s what Adrian Giacometti had to say about that webinar:
Managing Hierarchical Device Configurations
Parsing and modifying IOS-like hierarchical device configurations is an interesting challenge, more so if you have no idea what the configuration commands mean or whether their order is relevant (I’m looking at you, Ansible ;).
Network to Code team decided to solve that problem for good, open-sourced Hierarchical Configuration Python library, and published a getting started article on their blog.
Soap Opera: SRv6 Is Insecure
I heard about SRv6 when it was still on the drawing board, and my initial reaction was “Another attempt to implement source routing. We know how that ends.” The then-counter-argument by one of the proponents went along the lines of “but we’ll use signed headers to prevent abuse” and I thought “yeah, that will work really well in silicon implementations”.
Years later, Andrew Alston decided to document the state of the emperor’s wardrobe (TL&DR: of course SRv6 is insecure and can be easily abused) and the counter-argument this time was “but that applies to any tunnel technology”. Thank you, we knew that all along, and that’s not what was promised.
You might want to browse the rest of that email thread; it’s fun reading unless you built your next-generation network design on SRv6 running across third-party networks… which was another PowerPoint case study used by SRv6 proponents.
Video: How Can You Master Public Cloud Networking?
If you’re a regular reader of this blog, you’ve probably realized there’s still need for networking in public clouds, and mastering it requires slightly different set of skills. What could you as a networking engineer to get fluent in this different world? I collected a few hints in the last video in Introduction to Cloud Computing webinar.
… updated on Saturday, November 6, 2021 13:31 UTC
Why Does Internet Keep Breaking?
James Miles sent me a long list of really good questions along the lines of “why do we see so many Internet-related outages lately and is it due to BGP and DNS creaking of old age”. He started with:
Over the last few years there are more “high profile” incidents relating to Internet connectivity. I raise the question, why?
The most obvious reason: Internet became mission-critical infrastructure and well-publicized incidents attract eyeballs.
Ignoring the click baits, the underlying root cause is in many cases the race to the bottom. Large service providers brought that onto themselves when they thought they could undersell the early ISPs and compensate their losses with voice calls (only to discover that voice-over-Internet works too well).
Even Simple Data Models Are a Huge Win
Dan Augustine sent me a wonderful example illustrating how even a very simple data model together with some automation templates can simplify a large-scale deployment.
We have a 100 router installation coming up for our schools and both of our installation vendors do not use open source templating tools and they are not willing to share.
Having taken the Data Models in Network Automation part of your Network Automation Concepts webinar, I decided to install GitLab, make an Ansible project and invite our installation partners to the project.
Where Would You Need DNS Anycast?
One of the publicly observable artifacts of the October 2021 Facebook outage was an intricate interaction between BGP routing and their DNS servers needed to support optimal anycast configuration. Not surprisingly, it was all networking engineers’ fault according to some opinions1
There’s no need for anycast2/BGP advertisement for DNS servers. DNS is already highly available by design. Only network people never understand that, which leads to overengineering.
It’s not that hard to find a counter-argument3: while it looks like there are only 13 root name servers4, each one of them is a large set of instances advertising the same IP prefix5 to the Internet.
netsim-tools Release 1.0
It looks like netsim-tools reached a somewhat stable state, so it was time to do a cleanup and publish release 1.0 (also available on PyPi, use pip3 install –upgrade netsim-tools to fetch it).
During the cleanup, I removed all references to the obsolete scripts, leaving only the netlab command. I also found an old bash script that enabled LLDP passthrough on Linux bridges and made it part of netlab up process; your libvirt-based labs will have LLDP enabled by default.
Interested? Install the tools and follow the tutorials to get started.
Worth Reading: Operators and the IETF
Long long time ago (seven years to be precise), ISOC naively tried to bridge the gap between network operators and Internet Vendor Engineering Task Force1. They started with a widespread survey asking operators why they’re hesitant to participate in IETF mailing lists and meetings.
The result: Operators and the IETF draft that never moved beyond -00 version. A quick glimpse into the Potential Challenges will tell you why IETF preferred to kill the messenger (and why I published this blog post on Halloween).
Worth Reading: Programming Sucks
Just FYI: if you’re wondering about the wisdom of every networking engineer should become a programmer religion, you might benefit from the Programming Sucks reality check. I had just enough exposure to programming to realize how spot-on it is (and couldn’t decide whether to laugh or cry).
Nonlinear Effects of Optimization-Induced Complexity
We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Minh Ha on recent Facebook failure and overly complex systems (slightly edited).
I incidentally commented on your NSF post some 3 weeks before […the Facebook outage…] happened, on the unpredictable nature of nonlinear effects resulting from optimization-induced complexity. Their outage just drives home the point that optimization is a dumb process and leads to combinations of circular dependency that no one can account for and test.
Big Picture: BFD, Non-Stop Forwarding, and Graceful Restart
We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Erik Auerswald’s excellent summary of BFD, NSF, and GR.
I’d suggest to step back a bit and consider the bigger picture: What is BFD good for? What is GR/NSF/NSR/SSO good for?
BFD and GR/NSF/NSR/SSO have different goals: one enables quick fail over, the other prevents fail over. Combining both promises to be interesting.