Blog Posts in October 2019

Saved: TCP Is the Most Expensive Part of Your Data Center

Years ago Dan Hughes wrote a great blog post explaining how expensive TCP is. His web site is long gone, but I managed to grab the blog post before it disappeared and he kindly allowed me to republish it.


If you ask a CIO which part of their infrastructure costs them the most, I’m sure they’ll mention power, cooling, server hardware, support costs, getting the right people and all the usual answers. I’d argue one the the biggest costs is TCP, or more accurately badly implemented TCP.

read more see 5 comments

Whitebox Hardware and Open-Source Software

One of my subscribers was interested in trying out whitebox solutions. He wrote:

What open source/whitebox software/hardware should I look at if I wanted to build a leaf-and-spine VXLAN/EVPN/BGP data center.

I don’t think you can get a fully-open-source solution because the ASIC manufacturers hide their SDK behind a mountain of NDAs (that strategy must make perfect sense – after all, it generated such awesome PR for NVIDIA). Anyway, the closest you can get (AFAIK) if you're a mere mortal is Cumulus Linux, and you just choose any whitebox hardware off their Hardware Compatibility List.

read more see 3 comments

OpenBGPD with Claudio Jeker on Software Gone Wild

Everyone is talking about FRRouting suite these days, while hidden somewhere in the background OpenBGPD has been making continuous progress for years. Interestingly, OpenBGPD project was started for the same reason FRR was forked - developers were unhappy with Zebra or Quagga routing suite and decided to fix it.

We discussed the history of OpenBGPD, its current deployments and future plans with Claudio Jeker, one of the main OpenBGPD developers, in Episode 106 of Software Gone Wild.

add comment

Master the Alternate "Public Cloud Networking" Universe

You probably heard me say “networking engineer encountering a public cloud feels like Alice in Wonderland” - packet forwarding works in a different way in every public cloud, subnets are a mix between routed interfaces and VRFs, you cannot change IP addresses without involving the orchestration system…

We covered the networking aspects of Amazon Web Services and Azure in our cloud webinars, but you might need a bigger picture:

read more add comment

Auto-MLAG and Auto-BGP in Cumulus Linux

When I first met Cumulus Networks engineers (during NFD9) their focus on simplifying switch configurations totally delighted me (video).

I was ranting about the more traditional approach to data center fabric configuration resulting in dozens if not hundreds of device configuration commands in 2013… and other vendors still haven't done much in this respect in the meantime.

After solving the BGP configuration challenge (could you imagine configuring BGP in a leaf-and-spine fabric with just a few commands in 2015), they did the same thing with EVPN configuration, where they decided to implement the simplest possible design (EBGP-only fabric running EBGP EVPN sessions on leaf-to-spine links), resulting in another round of configuration simplicity.

read more see 2 comments

Can We Make REST API Transactional Across Multiple Calls?

I got interesting feedback from one of my readers after publishing my REST API Is Not Transactional blog post:

One would think a transactional REST interface wouldn’t be too difficult to implement. Using HTTP1/1, it is possible to multiplex several REST calls into one connection to a specific server. The first call then is a request for start a transaction, returning a transaction ID, to be used in subsequent calls. Since we’re not primarily interested in the massive scalability of stateless REST calls, all the REST calls will be handled by the same frontend. Obviously the last call would be a commit.

I wouldn’t count on HTTP pipelining to keep all requests in one HTTP session (mixing too many layers in a stack never ends well) but we wouldn’t need it anyway the moment we’d have a transaction ID which would be identical to session ID (or session cookie) traditional web apps use.

read more see 5 comments

MUST READ: The NTP Bible

A few months ago Johannes Weber sent me a short email saying “hey, I plan to write a few NTP posts” and I replied “well, ping me when you have something ready”.

In the meantime he wrote a veritable NTP bible - a series of NTP-related blog posts covering everything from Why Should I Run My Own NTP Servers to authentication, security and monitoring - definitely a MUST READ if you care about knowing what time it is.

add comment

You Cannot Have a Public Cloud without Networking

Listening to (some) industry evangelists you would believe that there’s no future in being a networking engineer. After all, all workloads will move into the cloud, and all clients will connect through a universal 5G network… but even if that utopia eventually comes true, you can’t get away from the laws of physics (and the need networking infrastructure).

TL&DR: our new online course will help you master the shiny new world. You can register right now or keep reading ;)

read more add comment

Disaster Recovery Faking, Take Two

An anonymous (for reasons that will be obvious pretty soon) commenter left a gem on my Disaster Recovery Test Faking blog post that is way too valuable to be left hidden and unannotated.

Here’s what he did:

Once I was tasked to do a DR test before handing over the solution to the customer. To simulate the loss of a data center I suggested to physically shutdown all core switches in the active data center.

That’s the right first step: simulate the simplest scenario (total DC failure) in a way that is easy to recover (power on the switches). It worked…

read more see 11 comments

How Did We End with 1500-byte MTU?

A subscriber sent me this intriguing question:

Is it not theoretically possible for Ethernet frames to be 64k long if ASIC vendors simply bothered or decided to design/make chipsets that supported it? How did we end up in the 1.5k neighborhood? In whose best interest did this happen?

Remember that Ethernet started as a shared-cable 10 Mbps technology. Transmitting a 64k frame on that technology would take approximately 50 msec (or as long as getting from East Coast to West Coast). Also, Ethernet had no tight media access control like Token Ring, so it would be possible for a single host to transmit multiple frames without anyone else getting airtime, resulting in unacceptable delays.

read more see 4 comments

How Do You Provision a 500-Switch Network in a Few Days?

TL&DR: You automate the whole process. What else do you expect?

During the Tech Field Day Extra @ Cisco Live Europe 2019 we were taken on a behind-the-stage tour that included a chat with people who built the Cisco Live network, and of course I had to ask how they automated the whole thing. They said “well, we have the guy that wrote the whole system onsite and he’ll be able to tell you more”. Turns out the guy was my good friend Andrew Yourtchenko who graciously showed the system they built and explained the behind-the-scenes details.

read more see 1 comments

Video: Retransmissions and Flow Control in Computer Networks

Grouping the features needed in a networking stack in a bunch of layered modules is a great idea. Unfortunately, you could place several essential features like error recovery, retransmission, and flow control in several different layers, from the data link layer dealing with individual network segments, to the transport layer dealing with reliable end-to-end transmissions.

Where should we put those modules? As always, the correct answer is it depends, in this particular case, on transmission reliability, latency, and bandwidth cost. You’ll find more details in the Retransmissions and Flow Control part of How Networks Really Work webinar.

You need free ipSpace.net subscription to watch the video, or a paid ipSpace.net subscription to watch the whole webinar.
add comment

Automation Solution: Network Health State Report

How nice would it be to have a fabric health dashboard displaying a summary of numerous parameters you’re interested in (number of operational uplinks, number of BGP sessions…) for every switch in your fabric.

I’m positive you could hack something together using the customization capabilities of your favorite network management system… or you could write a simple data gathering solution like Stephen Harding did while attending the Building Network Automation Solutions online course.

I collected dozens of automation solutions created by course attendees in the last few years. Enjoy!
add comment

VMware NSX Killed My EVPN Fabric

I had an interesting discussion with someone running VMware NSX on top of VXLAN+EVPN fabric a while ago. That’s a pretty common scenario considering:

  • NSX’s insistence on having all VXLAN uplink from the same server in the same subnet;
  • Data center switching vendors being on a lemming-like run praising EVPN+VXLAN;
  • The reluctance of non-FAANG environments to connect a server to a single switch.

Apart from the weird times when someone started tons of new VMs, his fabric was running well.

read more see 2 comments

The Cost of Disruptiveness and Guerrilla Marketing

A Docker networking rant coming from my good friend Marko Milivojević triggered a severe case of Deja-Moo, resulting in a flood of unpleasant memories caused by too-successful “disruptive” IT vendors.

Before moving on, please note that the following observations were made from my outsider perspective. If I got something badly wrong, please correct me in a comment.

Imagine you’re working for a startup creating a cool new product in the IT infrastructure space (if you have an oversized ego you would call yourself “disruptive thought leader” on your LinkedIn profile) but nobody is taking you seriously. How about some guerrilla warfare: advertising your product to people who hate the IT operations (today we’d call that Shadow IT).

read more see 3 comments

Worth Reading: Anycast DNS in Enterprise Networks

Anycast (advertising the same IP address from multiple servers/locations) has long been used to implement scale-out public DNS services (the whole root DNS system runs on massive anycast), but it’s not as common in enterprise networks.

The blog posts written by Tom Bowles should get you there. He started with the idea and described his implementation using Infoblox DNS.

Want to know even more? I covered numerous load balancing mechanisms including anycast in Data Centers Infrastructure for Networking Engineers webinar.

add comment

Redundant BGP Connectivity on a Single ISP Connection

A while ago Johannes Weber tweeted about an interesting challenge:

We want to advertise our AS and PI space over a single ISP connection. How would a setup look like with 2 Cisco routers, using them for hardware redundancy? Is this possible with only 1 neighboring to the ISP?

Hmm, so you have one cable and two router ports that you want to connect to that cable. There’s something wrong with this picture ;)

read more see 2 comments

Network Automation Beyond Configuration Templating

Remember Nicky Davey describing how he got large DMVPN deployment back on track with configuration templating? In his own words…:

Configuration templating is still as big win a win for us as it was a year ago. We have since expanded the automation solution, and reading the old blog post makes me realise how far we have come. I began working with this particular customer in May 2017, so 2 years now. At that time the new WAN project was on the horizon and the approach to network configuration was entirely manual.

Here’s how far he got in the meantime:

read more add comment

New Content: Azure Networking and Automation Source-of-Truth

Last week I covered network security groups, application security groups and user-defined routes in the second live session of Azure Networking webinar.

We also had a great guest speaker on the Network Automation course: Damien Garros explained how he used central source-of-truth based on NetBox and Git to set up a network automation stack from the grounds up.

Recordings are already online; you’ll need Standard ipSpace.net Subscription to access the Azure Networking webinar, and Expert ipSpace.net Subscription to access Damien’s presentation. Azure Networking webinar is also part of our new Networking in Public Clouds online course.

add comment

Changing Cisco IOS BGP Policies Based on IP SLA Measurements

This is a guest blog post by Philippe Jounin, Senior Network Architect at Orange Business Services.


You could use track objects in Cisco IOS to track route reachability or metric, the status of an interface, or IP SLA compliance for a long time. Initially you could use them to implement reliable static routing (or even shut down a BGP session) or trigger EEM scripts. With a bit more work (and a few more EEM scripts) you could use object tracking to create time-dependent static routes.

Cisco IOS 15 has introduced Enhanced Object Tracking that allows first-hop router protocols like VRRP or HSRP to use tracking state to modify their behavior.

read more see 10 comments
Sidebar