Stateful Switchover (SSO) 101
Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a broken non-redundant network design. Here’s how it’s supposed to work:
- A network device runs two copies of the control plane (primary and backup);
- Primary control plane continuously synchronizes its state with the backup control plane;
- When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.
Delighted? You might be disappointed once you start digging into the details.
Configuring NSX-T Firewall with a CI/CD Pipeline
Initial implementation of Noël Boulene’s automated provisioning of NSX-T distributed firewall rules changed NSX-T firewall configuration based on Terraform configuration files. To make the deployment fully automated he went a step further and added a full-blown CI/CD pipeline using GitHub Actions and Terraform Cloud.
Not everyone is as lucky as Noël – developers in his organization already use GitHub and Terraform Cloud, making his choices totally frictionless.
Worth Reading: Ops Questions in Software Engineering Interviews
Charity Majors published another must-read article: why every software engineering interview should include ops questions. Just a quick teaser:
The only way to unwind this is to reset expectations, and make it clear that:
- You are still responsible for your code after it’s been deployed to production, and
- Operational excellence is everyone’s job.
Adhering to these simple principles would remove an enormous amount of complexity from typical enterprise IT infrastructure… but I’m afraid it’s not going to happen anytime soon.
Lessons Learned: Fundamentals Haven't Changed
Here’s another bitter pill to swallow if you desperately want to believe in the magic powers of unicorn dust: laws of physics and networking fundamentals haven’t changed (see also: RFC 1925 Rule 11).
Whenever someone is promising a miracle solution, it’s probably due to them working in marketing or having no clue what they’re talking about (or both)… or it might be another case of adding another layer of abstraction and pretending the problems disappeared because you can’t see them anymore.
… updated on Saturday, August 27, 2022 16:44 UTC
netlab Overview
In December 2020, I got sick-and-tired of handcrafting Vagrantfiles and decided to write a tool that would, given a target networking lab topology in a text file, produce the corresponding Vagrantfile for my favorite environment (libvirt on Ubuntu). Nine months later, that idea turned into a pretty comprehensive tool targeting networking engineers who like to work with CLI and text-based configuration files. If you happen to be of the GUI/mouse persuasion, please stop reading; this tool is not for you.
During those nine months, I slowly addressed most of the challenges I always had creating networking labs. Here’s how I would typically approach testing a novel technology or software feature:
… updated on Sunday, September 26, 2021 14:48 UTC
Open-Source DMVPN Alternatives
When I started collecting topics for the September 2021 ipSpace.net Design Clinic one of the subscribers sent me an interesting challenge: are there any open-source alternatives to Cisco’s DMVPN?
I had no idea and posted the question on Twitter, resulting in numerous responses pointing to a half-dozen alternatives. Thanks a million to @MarcelWiget, @FlorianHeigl1, @PacketGeekNet, @DubbelDelta, @Tomm3h, @Joy, @RoganDawes, @Yassers_za, @MeNotYouSharp, @Arko95, @DavidThurm, Brian Faulkner, and several others who chimed in with additional information.
Here’s what I learned:
Non-Stop Forwarding (NSF) 101
Non-Stop Forwarding (NSF) is one of those ideas that look great in a slide deck and marketing collaterals, but might turn into a giant can of worms once you try to implement them properly (see also: stackable switches or VMware Fault Tolerance).
Comparing Forwarding Performance of Data Center Switches
One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:
I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?
Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.
Video: Introduction to Network Addressing
A friend of mine pointed out this quote by John Shoch when I started preparing the Network Stack Addressing slide deck for my How Networks Really Work webinar:
The name of a resource indicates what we seek, an address indicates where it is, and a route tells us how to get there.
You might wonder when that document was written… it’s from January 1978. They got it absolutely right 42 years ago, and we completely messed it up in the meantime with the crazy ideas of making IP addresses resource identifiers.
Automating NSX-T Firewall Configuration
Noël Boulene decided to automate provisioning of NSX-T distributed firewall rules as part of his Building Network Automation Solutions hands-on work.
What makes his solution even more interesting is the choice of automation tool: instead of using the universal automation hammer (aka Ansible) he used Terraform, a much better choice if you want to automate service provisioning, and you happen to be using vendors that invested time into writing Terraform provisioners.
netlab Python Package and Unified CLI
One of the major challenges of using netsim-tools (now renamed to netlab) was the installation process – pull the code from GitHub, install the prerequisites, set up search paths… I knew how to fix it (turn the whole thing into a Python package) but I was always too busy to open that enormous can of worms.
That omission got fixed; netlab is now available on PyPI and installed with pip3 install networklab.
Worth Reading: A Historical Perspective On The Usage Of IP Version 9
As early as 1994 (on April 1st, to be precise) a satire disguised as an Informational RFC was published describing the deployment of IPv9 in a parallel universe.
Any similarity with a protocol that started as a second-system academic idea and is still experiencing hiccups in real world even though it could order its own beer in US is purely coincidental.
Worth Reading: Simplifying Networks
Justin Pietsch wrote another fantastic blog post, this time describing how they simplified Amazon’s internal network, got rid of large-scale VLANs and multi-NIC hosts, moved load balancing functionality into a proxy layer managed by application teams, and finally introduced merchant silicon routers.
MUST Read: Operational Security Considerations for IPv6 Networks (RFC 9099)
After almost a decade of bickering and haggling (trust me, I got my scars to prove how the consensus building works), the authors of Operational Security Considerations for IPv6 Networks (many of them dear old friends I haven’t seen for way too long) finally managed to turn a brilliant document into an Informational RFC.
Regardless of whether you already implemented IPv6 in your network or believe it will never be production-ready (alongside other crazy stuff like vaccines) I’d consider this RFC a mandatory reading.
netsim-tools Release 0.8.1: Cumulus VX and Nokia SR Linux Containers
Two interesting container images were released in June/July 2021:
- Michael Kashin managed to package Cumulus VX into a container image (more details).
- Roman Dodin managed to persuade Nokia to release SR Linux as a container.
Both images can be downloaded with no strings attached (two major wins for the good guys) and are supported with the latest release of netsim-tools: