Why Does Internet Keep Breaking?

James Miles sent me a long list of really good questions along the lines of “why do we see so many Internet-related outages lately and is it due to BGP and DNS creaking of old age”. He started with:

Over the last few years there are more “high profile” incidents relating to Internet connectivity. I raise the question, why?

The most obvious reason: Internet became mission-critical infrastructure and well-publicized incidents attract eyeballs.

Ignoring the click baits, the underlying root cause is in many cases the race to the bottom. Large service providers brought that onto themselves when they thought they could undersell the early ISPs and compensate their losses with voice calls (only to discover that voice-over-Internet works too well).

read more see 3 comments

Even Simple Data Models Are a Huge Win

Dan Augustine sent me a wonderful example illustrating how even a very simple data model together with some automation templates can simplify a large-scale deployment.


We have a 100 router installation coming up for our schools and both of our installation vendors do not use open source templating tools and they are not willing to share.

Having taken the Data Models in Network Automation part of your Network Automation Concepts webinar, I decided to install GitLab, make an Ansible project and invite our installation partners to the project.

read more add comment

Where Would You Need DNS Anycast?

One of the publicly observable artifacts of the October 2021 Facebook outage was an intricate interaction between BGP routing and their DNS servers needed to support optimal anycast configuration. Not surprisingly, it was all networking engineers’ fault according to some opinions1

There’s no need for anycast2/BGP advertisement for DNS servers. DNS is already highly available by design. Only network people never understand that, which leads to overengineering.

It’s not that hard to find a counter-argument3: while it looks like there are only 13 root name servers4, each one of them is a large set of instances advertising the same IP prefix5 to the Internet.

read more add comment

netsim-tools Release 1.0

It looks like netsim-tools reached a somewhat stable state, so it was time to do a cleanup and publish release 1.0 (also available on PyPi, use pip3 install –upgrade netsim-tools to fetch it).

During the cleanup, I removed all references to the obsolete scripts, leaving only the netlab command. I also found an old bash script that enabled LLDP passthrough on Linux bridges and made it part of netlab up process; your libvirt-based labs will have LLDP enabled by default.

Interested? Install the tools and follow the tutorials to get started.

add comment

Worth Reading: Operators and the IETF

Long long time ago (seven years to be precise), ISOC naively tried to bridge the gap between network operators and Internet Vendor Engineering Task Force1. They started with a widespread survey asking operators why they’re hesitant to participate in IETF mailing lists and meetings.

The result: Operators and the IETF draft that never moved beyond -00 version. A quick glimpse into the Potential Challenges will tell you why IETF preferred to kill the messenger (and why I published this blog post on Halloween).

read more see 1 comments

Nonlinear Effects of Optimization-Induced Complexity

We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Minh Ha on recent Facebook failure and overly complex systems (slightly edited).


I incidentally commented on your NSF post some 3 weeks before […the Facebook outage…] happened, on the unpredictable nature of nonlinear effects resulting from optimization-induced complexity. Their outage just drives home the point that optimization is a dumb process and leads to combinations of circular dependency that no one can account for and test.

read more add comment

Big Picture: BFD, Non-Stop Forwarding, and Graceful Restart

We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Erik Auerswald’s excellent summary of BFD, NSF, and GR.


I’d suggest to step back a bit and consider the bigger picture: What is BFD good for? What is GR/NSF/NSR/SSO good for?

BFD and GR/NSF/NSR/SSO have different goals: one enables quick fail over, the other prevents fail over. Combining both promises to be interesting.

read more see 1 comments

EVPN/VXLAN Complexity

We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Minh Ha on complexity of emulating layer-2 networks with VXLAN and EVPN.


Dmytro Shypovalov is a master networker who has a sophisticated grasp of some of the most advanced topics in networking. He doesn’t write often, but when he does, he writes exceptional content, both deep and broad. Have to say I agree with him 300% on “If an L2 network doesn’t scale, design a proper L3 network. But if people want to step on rakes, why discourage them.

read more add comment

Interactions Between BFD and Graceful Restart

We have school holidays this week, so I’m reposting wonderful comments that would otherwise be lost somewhere in the page margins. Today: Dmitry Perets on the interactions between BFD and GR.


Well, assuming that the C-bit is set honestly (will be funny if not) and assuming that the Helper is using this bit correctly (and I think it’s pretty well defined what “correctly” means - see section 4.3 in RFC 5882), the answer is pretty clear.

read more see 1 comments

Worth Reading: Network Validation Evolution at Hostinger

Network validation is becoming another overhyped buzzword with many opinionated pundits talking about it and few environments using it in practice (why am I not surprised?)

As always, there are exceptions. They don’t have to be members of the FAANG club, and some of them get the job done with open-source tools regardless of what vendor marketers would like you to believe. For example, Donatas Abraitis described how the Hostinger networking team gradually implemented network validation using Cumulus VX, Vagrant, SuzieQ, PyTest and Test Kitchen. Enjoy!

add comment

Video: Introduction to AI/ML Hype

In May 2021, Javier Antich ran a great webinar explaining the principles of Artificial Intelligence and Machine learning and how they apply (or not) to networking.

He started with a brief overview of AI/ML hype that should help you understand why there’s a bit of a difference between self-driving cars (not that we got there) and self-driving networks.

You need Free ipSpace.net Subscription to access this webinar.
add comment

Circular Dependencies Considered Harmful

A while ago, my friend Nicola Modena sent me another intriguing curveball:

Imagine a CTO who has invested millions in a super-secure data center and wants to consolidate all compute workloads. If you were asked to run a BGP Route Reflector as a VM in that environment, and would like to bring OSPF or ISIS to that box to enable BGP ORR, would you use a GRE tunnel to avoid a dedicated VLAN or boring other hosts with routing protocol hello messages?

While there might be good reasons for doing that, my first knee-jerk reaction was:

read more see 3 comments
Sidebar