Blog Posts in September 2021

Reusing Underlay Network for Infrastructure Services

Boris Lazarov sent me an excellent question:

Does it make sense and are there any inherent problems from design perspective to use the underlay not only for transport of overlay packets, but also for some services. For example: VMWare cluster, vMotion, VXLAN traffic, and some basic infrastructure services that are prerequisite for the rest (DNS).

Before answering it, let’s define some terminology which will inevitably lead us to the it’s tunnels all the way down endstate.

read more see 2 comments

Watch Out: ISR Performance License

Bill Dagy sent me an annoying ISR gotcha. In his own words:

Since you have a large audience I thought I would throw this out here. Maybe it will help someone avoid spending 80 man hours troubleshooting network slowdowns.

Here’s the root cause of that behavior:

Cisco is now shipping routers that have some specified maximum throughput, but you have to buy a “boost license” to run them unthrottled. Maybe everyone already knew this but it sure took us by surprise.

Don’t believe it? Here’s a snapshot from Cisco 4000 Family Integrated Services Router Data Sheet:

read more see 10 comments

Graceful Restart (GR) 101

In the Non-Stop Forwarding (NSF) article, I mentioned that the routers adjacent to the device using NSF have to play along to make the idea work. That capability is called Graceful Restart. Today we’ll explore its intricate details, be diplomatic, and leave the shortcomings and tradeoffs for the next blog post.

The Problem

Imagine an access (provider edge) router providing connectivity services to its clients and running a routing protocol with one or more upstream devices.

read more see 1 comments

State of IT Security in 2021

Patrik Schindler sent me his views on code quality and resulting security nightmares after reading the Cisco SD-WAN SQL Injection saga. Enjoy!


I think we have a global problem with code quality. Both from a security perspective, and from a less problematic but still annoying bugs-everywhere perspective. I’m not sure if the issue is largely ignored, or we’ve given up on it (see also: Cloud Complexity Lies or Cisco ACI Complexity).

read more see 1 comments

Building a Small Data Center Fabric with Four Switches

One of my subscribers has to build a small data center fabric that’s just a tad too big for two switch design.

For my datacenter I would need two 48 ports 10GBASE-T switches and two 48 port 10/25G fibber switches. So I was watching the Small Fabrics and Lower-Speed Interfaces part of Physical Fabric Design to make up my mind. There you talk about the possibility to do a leaf and spine with 4 switches and connect servers to the spine.

A picture is worth a thousand words, so here’s the diagram of what I had in mind:

read more see 1 comments

IS-IS Flooding Details

Last week I published an unrolled version of Peter Paluch’s explanation of flooding differences between OSPF and IS-IS. Here’s the second part of the saga: IS-IS flooding details (yet again, reposted in a more traditional format with Peter’s permission).


In IS-IS, DIS1 is best described as a “baseline benchmark” – a reference point that other routers compare themselves to, but it does not sit in the middle of the flow of updates (Link State PDUs, LSPs).

A quick and simplified refresher on packet types in IS-IS: A LSP carries topological information about its originating router – its System ID, its links to other routers and its attached prefixes. It is similar to an OSPF LSU containing one or more LSAs of different types.

read more see 1 comments

Another SD-WAN Security SNAFU: SQL Injections in Cisco SD-WAN Admin Interface

Christoph Jaggi sent me a link to an interesting article describing security vulnerabilities pentesters found in Cisco SD-WAN admin/management code.

I’m positive the bugs have been fixed in the meantime, but what riled me most was the root cause: Little Bobby Tables (aka SQL injection) dropped by. Come on, it’s 2021, SD-WAN is supposed to be about building secure replacements for MPLS/VPN networks, and they couldn’t get someone who could write SQL-injection-safe code (the top web application security risk)?

read more add comment

netlab Network Topology Graphs

A netlab user sent me an intriguing question: “Would it be possible to get network topology graphs out of the tool?

Please note that we’re talking about creating graphs out of network topology described as a YAML data structure, not a generic GUI or draw my network tool. If you’re a GUI person, this is not what you’re looking for.

I did something similar a long while ago for a simple network automation project (and numerous networking engineers built really interesting stuff while attending the Building Network Automation Solutions course), so it seemed like a no-brainer. As always, things aren’t as easy as they look.

read more see 1 comments

Interesting Concept: Time Dilation

I loved the Time Dilation blog post by Seth Godin. It explains so much, including why I won’t accept a “quick conf call to touch base and hash out ideas” from someone coming out of the blue sky – why should I be interested if they can’t invest the time to organize their thoughts and pour them into an email.

The concept of “creation-to-consumption” ratio is also interesting. Now I understand why I hate unedited opinionated chinwagging (many podcasts sadly fall into this category) or videos where someone blabbers into a camera while visibly trying to organize their thoughts.

Just FYI, these are some of the typical ratios I had to deal in the past:

read more see 1 comments

LSA/LSP Flooding in OSPF and IS-IS

Peter Paluch loves blogging in microchunks on Twitter ;) This time, he described the differences between OSPF and IS-IS, and gracefully allowed me to repost the explanation in a more traditional format.


My friends, I happen to have a different opinion. It will take a while to explain it and I will have to seemingly go off on a tangent. Please have patience. As a teaser, though: The 2Way state between DRothers does not improve flooding efficiency – in fact, it worsens it.

read more see 2 comments

New: ipSpace.net Design Clinic

In early September, I started yet another project that’s been on the back burner for over a year: ipSpace.net Design Clinic (aka Ask Me Anything Reasonable in a more structured format). Instead of collecting questions and answering them in a podcast (example: Deep Questions podcast), I decided to make it more interactive with a live audience and real-time discussions. I also wanted to keep it valuable to anyone interested in watching the recordings, so we won’t discuss obscure failures of broken designs or dirty tricks that should have remained in CCIE lab exams.

read more add comment

Stateful Switchover (SSO) 101

Stateful Switchover (SSO) is another seemingly awesome technology that can help you implement high availability when facing a broken non-redundant network design. Here’s how it’s supposed to work:

  • A network device runs two copies of the control plane (primary and backup);
  • Primary control plane continuously synchronizes its state with the backup control plane;
  • When the primary control plane crashes, the backup control plane already has all the required state and is ready to take over in moments.

Delighted? You might be disappointed once you start digging into the details.

read more see 1 comments

Configuring NSX-T Firewall with a CI/CD Pipeline

Initial implementation of Noël Boulene’s automated provisioning of NSX-T distributed firewall rules changed NSX-T firewall configuration based on Terraform configuration files. To make the deployment fully automated he went a step further and added a full-blown CI/CD pipeline using GitHub Actions and Terraform Cloud.

Not everyone is as lucky as Noël – developers in his organization already use GitHub and Terraform Cloud, making his choices totally frictionless.

read more add comment

Worth Reading: Ops Questions in Software Engineering Interviews

Charity Majors published another must-read article: why every software engineering interview should include ops questions. Just a quick teaser:

The only way to unwind this is to reset expectations, and make it clear that:

  • You are still responsible for your code after it’s been deployed to production, and
  • Operational excellence is everyone’s job.

Adhering to these simple principles would remove an enormous amount of complexity from typical enterprise IT infrastructure… but I’m afraid it’s not going to happen anytime soon.

add comment

Lessons Learned: Fundamentals Haven't Changed

Here’s another bitter pill to swallow if you desperately want to believe in the magic powers of unicorn dust: laws of physics and networking fundamentals haven’t changed (see also: RFC 1925 Rule 11).

Whenever someone is promising a miracle solution, it’s probably due to them working in marketing or having no clue what they’re talking about (or both)… or it might be another case of adding another layer of abstraction and pretending the problems disappeared because you can’t see them anymore.

You’ll need a Free ipSpace.net Subscription to watch the video.
add comment

netlab Overview

In December 2020, I got sick-and-tired of handcrafting Vagrantfiles and decided to write a tool that would, given a target networking lab topology in a text file, produce the corresponding Vagrantfile for my favorite environment (libvirt on Ubuntu). Nine months later, that idea turned into a pretty comprehensive tool targeting networking engineers who like to work with CLI and text-based configuration files. If you happen to be of the GUI/mouse persuasion, please stop reading; this tool is not for you.

During those nine months, I slowly addressed most of the challenges I always had creating networking labs. Here’s how I would typically approach testing a novel technology or software feature:

read more add comment

Open-Source DMVPN Alternatives

When I started collecting topics for the September 2021 ipSpace.net Design Clinic one of the subscribers sent me an interesting challenge: are there any open-source alternatives to Cisco’s DMVPN?

I had no idea and posted the question on Twitter, resulting in numerous responses pointing to a half-dozen alternatives. Thanks a million to @MarcelWiget, @FlorianHeigl1, @PacketGeekNet, @DubbelDelta, @Tomm3h, @Joy, @RoganDawes, @Yassers_za, @MeNotYouSharp, @Arko95, @DavidThurm, Brian Faulkner, and several others who chimed in with additional information.

Here’s what I learned:

read more see 4 comments

Non-Stop Forwarding (NSF) 101

Non-Stop Forwarding (NSF) is one of those ideas that look great in a slide deck and marketing collaterals, but might turn into a giant can of worms once you try to implement them properly (see also: stackable switches or VMware Fault Tolerance).

NSF has been around for at least 15 years, so I’m positive at least some vendors got most of the details right; I’m also pretty sure a few people have scars to prove they’ve been around the non-optimal implementations.
read more see 1 comments

Comparing Forwarding Performance of Data Center Switches

One of my subscribers is trying to decide whether to buy an -EX or an -FX version of a Cisco Nexus data center switch:

I was comparing Cisco Nexus 93180YC-FX and Nexus 93180YC-EX. They have the same port distribution (48x 10/25G + 6x40/100G), 3.6 Tbps switching capacity, but the -FX version has just 1200 Mpps forwarding rate while EX version goes up to 2600 Mpps. What could be the reason for the difference in forwarding performance?

Both switches are single-ASIC switches. They have the same total switching bandwidth, thus it must take longer for the FX switch to forward a packet, resulting in reduced packet-per-seconds figure. It looks like the ASIC in the -FX switch is configured in more complex way: more functionality results in more complexity which results in either reduced performance or higher cost.

read more see 5 comments

Video: Introduction to Network Addressing

A friend of mine pointed out this quote by John Shoch when I started preparing the Network Stack Addressing slide deck for my How Networks Really Work webinar:

The name of a resource indicates what we seek, an address indicates where it is, and a route tells us how to get there.

You might wonder when that document was written… it’s from January 1978. They got it absolutely right 42 years ago, and we completely messed it up in the meantime with the crazy ideas of making IP addresses resource identifiers.

read more add comment

Automating NSX-T Firewall Configuration

Noël Boulene decided to automate provisioning of NSX-T distributed firewall rules as part of his Building Network Automation Solutions hands-on work.

What makes his solution even more interesting is the choice of automation tool: instead of using the universal automation hammer (aka Ansible) he used Terraform, a much better choice if you want to automate service provisioning, and you happen to be using vendors that invested time into writing Terraform provisioners.

add comment

netlab Python Package and Unified CLI

One of the major challenges of using netsim-tools (now renamed to netlab) was the installation process – pull the code from GitHub, install the prerequisites, set up search paths… I knew how to fix it (turn the whole thing into a Python package) but I was always too busy to open that enormous can of worms.

That omission got fixed; netlab is now available on PyPI and installed with pip3 install networklab.

read more add comment
Sidebar