Building network automation solutions

9 module online course

Start now!

Category: WAN

Must Watch: How NOT to Measure Latency

A while ago someone pointed me to an interesting talk explaining why 99th percentile represents a pretty good approximation of user-experienced latency on a typical web page (way longer version: Understanding Latency and Application Responsiveness, also How I Learned to Stop Worrying and Love Misery)

If you prefer reading instead of watching videos, there’s also everything you know about latency is wrong.

add comment

Automation Example: Drain a Circuit

One of the attendees of our Building Network Automation Solutions online course asked an interesting question in the course Slack team:

Has anyone wrote a playbook for putting a circuit into maintenance mode — i.e. adjusting metrics to drain traffic away from a circuit that is going to be taken down for maintenance?

As always, you have to figure out what you want to do before you can start automating stuff.

read more add comment

Video: Bandwidth Is Neither Infinite Nor Cheap

After decades of riding the Moore’s law curve the networking bandwidth should be (almost) infinite and (almost) free, right? WRONG, as I explained in the Bandwidth Is (Not) Infinite and Free video (part of How Networks Really Work webinar).

There are still pockets of Internet desert where mobile- or residential users have to deal with traffic caps, and if you decide to move your applications into any public cloud you better check how much bandwidth those applications consume or you’ll be the next victim of the Great Bandwidth Swindle. For more details, watch the video.

You need Free Subscription to watch the video, and the Standard Subscription to register for upcoming live sessions.
add comment

Video: End-to-End Latency Is Not Zero

After the “shocking” revelation that a network can never be totally reliable, I addressed another widespread lack of common sense: due to laws of physics, the client-server latency is never zero (and never even close to what a developer gets from the laptop’s loopback interface).

You need Free Subscription to watch the video, and the Standard Subscription to register for upcoming live sessions.
add comment

Connecting Your Legacy WAN to Cloud is Harder than You Think

Unless you’re working for a cloud-only startup, you’ll always have to connect applications running in a public cloud with existing systems or databases running in a more traditional environment, or connect your users to public cloud workloads.

Public cloud providers love stable and robust solutions, and they took the same approach when implementing their legacy connectivity solutions: you could use routed Ethernet connections or IPsec VPN, and run BGP across them, turning the problem into a well-understood routing problem.

read more see 1 comments

Public Cloud Cannot Change the Laws of Physics

Listening to public cloud evangelists and marketing departments of vendors selling over-the-cloud networking solutions or multi-cloud orchestration systems, you could start to believe that migrating your workload to a public cloud would solve all your problems… and if you’re gullible enough to listen to them, you’ll get the results you deserve.

Unfortunately, nothing can change the fundamental laws of physics, networking, or application architectures:

read more add comment

Figure Out What Problem You’re Trying to Solve

A long while ago I got into an hilarious Tweetfest (note to self: don’t… not that I would ever listen) starting with:

Which feature and which Cisco router for layer2 extension over internet 100Mbps with 1500 Bytes MTU

The knee-jerk reaction was obvious: OMG, not again. The ugly ghost of BRouters (or is it RBridges or WAN Extenders?) has awoken. The best reply in this category was definitely:

I cannot fathom the conversation where this was a legitimate design option. May the odds forever be in your favor.

A dozen “this is a dumpster fire” tweets later the problem was rephrased as:

read more add comment

You Don't Need IP Renumbering for Disaster Recovery

This is a common objection I get when trying to persuade network architects they don’t need stretched VLANs (and IP subnets) to implement data center disaster recovery:

Changing IP addresses when activating DR is hard. You’d have to weigh the manageability of stretching L2 and protecting it, with the added complexity of breaking the two sites into separate domains [and subnets]. We all have apps with hardcoded IP’s, outdated IPAM’s, Firewall rules that need updating, etc.

Let’s get one thing straight: when you’re doing disaster recovery there are no live subnets, IP addresses or anything else along those lines. The disaster has struck, and your data center infrastructure is gone.

read more add comment

Disaster Recovery and Failure Domains

One of the responses to my Disaster Recovery Faking blog post focused on failure domains:

What is the difference between supporting L2 stretched between two pods in your DC (which everyone does for seamless vMotion), and having a 30ms link between these two pods because they happen to be in different buildings?

I hope you agree that a single broadcast domain is a single failure domain. If not, let agree to disagree and move on - my life is too short to argue about obvious stuff.

read more add comment

The EVPN Dilemma

Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:

So I am a rather green network engineer trying to solve the typical layer two stretch problem.

I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:

read more see 13 comments

Stretched Layer-2 Subnets in Azure

Last Thursday morning I found this gem in my Twitter feed (courtesy of Stefan de Kooter)

Greg Cusanza in #BRK3192 just announced #Azure Extended Network, for stretching Layer 2 subnets into Azure!

As I know a little bit about how networking works within Azure, and I’ve seen something very similar a few times in the past, I was able to figure out what’s really going on behind the scenes in a few seconds… and got reminded of an old Russian joke I found somewhere on Quora:

read more see 3 comments

Impact of Controller Failures in Software-Defined Networks

Christoph Jaggi sent me this observation during one of our SD-WAN discussions:

The centralized controller is another shortcoming of SD-WAN that hasn’t been really addressed yet. In a global WAN it can and does happen that a region might be cut off due to a cut cable or an attack. Without connection to the central SD-WAN controller the part that is cut off cannot even communicate within itself as there is no control plane…

A controller (or management/provisioning) system is obviously the central point of failure in any network, but we have to go beyond that and ask a simple question: “What happens when the controller cluster fails and/or when nodes lose connectivity to the controller?”

read more see 4 comments