Blog Posts in September 2021
… updated on Thursday, September 30, 2021 16:19 UTC
Reusing Underlay Network for Infrastructure Services
Boris Lazarov sent me an excellent question:
Does it make sense and are there any inherent problems from design perspective to use the underlay not only for transport of overlay packets, but also for some services. For example: VMWare cluster, vMotion, VXLAN traffic, and some basic infrastructure services that are prerequisite for the rest (DNS).
Before answering it, let’s define some terminology which will inevitably lead us to the it’s tunnels all the way down endstate.
Watch Out: ISR Performance License
Bill Dagy sent me an annoying ISR gotcha. In his own words:
Since you have a large audience I thought I would throw this out here. Maybe it will help someone avoid spending 80 man hours troubleshooting network slowdowns.
Here’s the root cause of that behavior:
Cisco is now shipping routers that have some specified maximum throughput, but you have to buy a “boost license” to run them unthrottled. Maybe everyone already knew this but it sure took us by surprise.
Don’t believe it? Here’s a snapshot from Cisco 4000 Family Integrated Services Router Data Sheet:
Graceful Restart (GR) 101
In the Non-Stop Forwarding (NSF) article, I mentioned that the routers adjacent to the device using NSF have to play along to make the idea work. That capability is called Graceful Restart. Today we’ll explore its intricate details, be diplomatic, and leave the shortcomings and tradeoffs for the next blog post.
Imagine an access (provider edge) router providing connectivity services to its clients and running a routing protocol with one or more upstream devices.
State of IT Security in 2021
Patrik Schindler sent me his views on code quality and resulting security nightmares after reading the Cisco SD-WAN SQL Injection saga. Enjoy!
I think we have a global problem with code quality. Both from a security perspective, and from a less problematic but still annoying bugs-everywhere perspective. I’m not sure if the issue is largely ignored, or we’ve given up on it (see also: Cloud Complexity Lies or Cisco ACI Complexity).
Worth Reading: Breaking Down Silos
Here’s another masterpiece by Charity Majors: Why I hate the phrase “breaking down silos”. A teaser in case you can’t decide whether to click the link:
When someone says they are “breaking down silos”, whether in an interview, a panel, or casual conversation, it tells me jack shit about what they actually did.
Building a Small Data Center Fabric with Four Switches
One of my subscribers has to build a small data center fabric that’s just a tad too big for two switch design.
For my datacenter I would need two 48 ports 10GBASE-T switches and two 48 port 10/25G fibber switches. So I was watching the Small Fabrics and Lower-Speed Interfaces part of Physical Fabric Design to make up my mind. There you talk about the possibility to do a leaf and spine with 4 switches and connect servers to the spine.
A picture is worth a thousand words, so here’s the diagram of what I had in mind:
IS-IS Flooding Details
Last week I published an unrolled version of Peter Paluch’s explanation of flooding differences between OSPF and IS-IS. Here’s the second part of the saga: IS-IS flooding details (yet again, reposted in a more traditional format with Peter’s permission).
In IS-IS, DIS1 is best described as a “baseline benchmark” – a reference point that other routers compare themselves to, but it does not sit in the middle of the flow of updates (Link State PDUs, LSPs).
A quick and simplified refresher on packet types in IS-IS: A LSP carries topological information about its originating router – its System ID, its links to other routers and its attached prefixes. It is similar to an OSPF LSU containing one or more LSAs of different types.
… updated on Friday, September 24, 2021 07:05 UTC
Another SD-WAN Security SNAFU: SQL Injections in Cisco SD-WAN Admin Interface
Christoph Jaggi sent me a link to an interesting article describing security vulnerabilities pentesters found in Cisco SD-WAN admin/management code.
I’m positive the bugs have been fixed in the meantime, but what riled me most was the root cause: Little Bobby Tables (aka SQL injection) dropped by. Come on, it’s 2021, SD-WAN is supposed to be about building secure replacements for MPLS/VPN networks, and they couldn’t get someone who could write SQL-injection-safe code (the top web application security risk)?
netlab Network Topology Graphs
A netlab user sent me an intriguing question: “Would it be possible to get network topology graphs out of the tool?”
I did something similar a long while ago for a simple network automation project (and numerous networking engineers built really interesting stuff while attending the Building Network Automation Solutions course), so it seemed like a no-brainer. As always, things aren’t as easy as they look.
Interesting Concept: Time Dilation
I loved the Time Dilation blog post by Seth Godin. It explains so much, including why I won’t accept a “quick conf call to touch base and hash out ideas” from someone coming out of the blue sky – why should I be interested if they can’t invest the time to organize their thoughts and pour them into an email.
The concept of “creation-to-consumption” ratio is also interesting. Now I understand why I hate unedited opinionated chinwagging (many podcasts sadly fall into this category) or videos where someone blabbers into a camera while visibly trying to organize their thoughts.
Just FYI, these are some of the typical ratios I had to deal in the past:
LSA/LSP Flooding in OSPF and IS-IS
Peter Paluch loves blogging in microchunks on Twitter ;) This time, he described the differences between OSPF and IS-IS, and gracefully allowed me to repost the explanation in a more traditional format.
My friends, I happen to have a different opinion. It will take a while to explain it and I will have to seemingly go off on a tangent. Please have patience. As a teaser, though: The 2Way state between DRothers does not improve flooding efficiency – in fact, it worsens it.
New: ipSpace.net Design Clinic
In early September, I started yet another project that’s been on the back burner for over a year: ipSpace.net Design Clinic (aka Ask Me Anything Reasonable in a more structured format). Instead of collecting questions and answering them in a podcast (example: Deep Questions podcast), I decided to make it more interactive with a live audience and real-time discussions. I also wanted to keep it valuable to anyone interested in watching the recordings, so we won’t discuss obscure failures of broken designs or dirty tricks that should have remained in CCIE lab exams.
Stateful Switchover (SSO) 101
Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a
broken non-redundant network design. Here’s how it’s supposed to work:
- A network device runs two copies of the control plane (primary and backup);
- Primary control plane continuously synchronizes its state with the backup control plane;
- When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.
Delighted? You might be disappointed once you start digging into the details.
Configuring NSX-T Firewall with a CI/CD Pipeline
Initial implementation of Noël Boulene’s automated provisioning of NSX-T distributed firewall rules changed NSX-T firewall configuration based on Terraform configuration files. To make the deployment fully automated he went a step further and added a full-blown CI/CD pipeline using GitHub Actions and Terraform Cloud.
Not everyone is as lucky as Noël – developers in his organization already use GitHub and Terraform Cloud, making his choices totally frictionless.
Worth Reading: Ops Questions in Software Engineering Interviews
Charity Majors published another must-read article: why every software engineering interview should include ops questions. Just a quick teaser:
The only way to unwind this is to reset expectations, and make it clear that:
- You are still responsible for your code after it’s been deployed to production, and
- Operational excellence is everyone’s job.
Adhering to these simple principles would remove an enormous amount of complexity from typical enterprise IT infrastructure… but I’m afraid it’s not going to happen anytime soon.
Lessons Learned: Fundamentals Haven't Changed
Here’s another bitter pill to swallow if you desperately want to believe in the magic powers of unicorn dust: laws of physics and networking fundamentals haven’t changed (see also: RFC 1925 Rule 11).
Whenever someone is promising a miracle solution, it’s probably due to them working in marketing or having no clue what they’re talking about (or both)… or it might be another case of adding another layer of abstraction and pretending the problems disappeared because you can’t see them anymore.
… updated on Saturday, August 27, 2022 16:44 UTC
In December 2020, I got sick-and-tired of handcrafting Vagrantfiles and decided to write a tool that would, given a target networking lab topology in a text file, produce the corresponding Vagrantfile for my favorite environment (libvirt on Ubuntu). Nine months later, that idea turned into a pretty comprehensive tool targeting networking engineers who like to work with CLI and text-based configuration files. If you happen to be of the GUI/mouse persuasion, please stop reading; this tool is not for you.
During those nine months, I slowly addressed most of the challenges I always had creating networking labs. Here’s how I would typically approach testing a novel technology or software feature:
… updated on Sunday, September 26, 2021 14:48 UTC
Open-Source DMVPN Alternatives
When I started collecting topics for the September 2021 ipSpace.net Design Clinic one of the subscribers sent me an interesting challenge: are there any open-source alternatives to Cisco’s DMVPN?
I had no idea and posted the question on Twitter, resulting in numerous responses pointing to a half-dozen alternatives. Thanks a million to @MarcelWiget, @FlorianHeigl1, @PacketGeekNet, @DubbelDelta, @Tomm3h, @Joy, @RoganDawes, @Yassers_za, @MeNotYouSharp, @Arko95, @DavidThurm, Brian Faulkner, and several others who chimed in with additional information.
Here’s what I learned:
Non-Stop Forwarding (NSF) 101
Non-Stop Forwarding (NSF) is one of those ideas that look great in a slide deck and marketing collaterals, but might turn into a giant can of worms once you try to implement them properly (see also: stackable switches or VMware Fault Tolerance).
Comparing Forwarding Performance of Data Center Switches
One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:
I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?
Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.
Video: Introduction to Network Addressing
A friend of mine pointed out this quote by John Shoch when I started preparing the Network Stack Addressing slide deck for my How Networks Really Work webinar:
The name of a resource indicates what we seek, an address indicates where it is, and a route tells us how to get there.
You might wonder when that document was written… it’s from January 1978. They got it absolutely right 42 years ago, and we completely messed it up in the meantime with the crazy ideas of making IP addresses resource identifiers.
Automating NSX-T Firewall Configuration
Noël Boulene decided to automate provisioning of NSX-T distributed firewall rules as part of his Building Network Automation Solutions hands-on work.
What makes his solution even more interesting is the choice of automation tool: instead of using the universal automation hammer (aka Ansible) he used Terraform, a much better choice if you want to automate service provisioning, and you happen to be using vendors that invested time into writing Terraform provisioners.
netlab Python Package and Unified CLI
One of the major challenges of using netsim-tools (now renamed to netlab) was the installation process – pull the code from GitHub, install the prerequisites, set up search paths… I knew how to fix it (turn the whole thing into a Python package) but I was always too busy to open that enormous can of worms.
That omission got fixed; netlab is now available on PyPI and installed with pip3 install networklab.