Blog Posts in March 2021
TL&DR: Client clock skew could result in AWS authentication failure when running terraform apply
When I wanted to compare AWS and Azure orchestration speeds I encountered a crazy Terraform error message when running terraform apply:
Error: Error creating VPC: AuthFailure:
AWS was not able to validate the provided access credentials
status code: 401, request id: ...
Obviously I did all the usual stuff before googling for a solution:
Here’s a message I got from one of my subscribers (probably based on one of my recent public cloud rants):
I often think the cloud stuff has been sent to try us in IT – the struggle could be tough enough when we were dealing with waterfall development and monolithic projects. When products took years to develop, and years to understand.
And now we’re being asked to be agile and learn new stuff all the time about moving targets that barely have documentation at all, never mind accurate doco! We had obviously got into our comfort zone and needed shaking out of it!
Always interested to hear your experiences with the cloud networking though – it’s what I subscribed to ipspace.net for TBH as I think it’s the most complete reference source for that purpose and a vital part of enterprise networking these days!
TL&DR: The new release of netsim-tools includes unnumbered interfaces, configuration modules, and OSPF configuration.
In mid-March, we enjoyed another excellent presentation by Dinesh Dutt, this time focused on running OSPF in leaf-and-spine fabrics. He astonished me when he mentioned unnumbered Ethernet interfaces being available on all major network operating systems. It was time to test things out, and I wanted to use my networking simulation builder to build the test lab.
We’re in an unfortunate industry where you can’t learn everything there’s to know in 3 years and keep doing the same stuff for the next 30 years… but how do you keep learning? Andrew Owen documented what works for him in Learning without Burnout.
It’s amazing how easy it is to create a chatbot that will send messages to a Discord channel… just follow John Capobianco’s step by step tutorial.
In the second half of my chat with David Bombal we focused on automation and AI in networking. Even though we discussed many things, including the dangers of doing a repeatable job, and how to make yourself unique, David chose a nice click-bait headline Will AI Replace the Networking Engineers?. According to Betteridge’s law of headlines the answer is still NO, but it’s obvious AI will replace the low-level easy-to-automate jobs (as textile workers found out almost 200 years ago).
While pondering that statement, keep in mind that AI is more than just machine learning (the overhyped stuff). According to one loose definition, “Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions”
When I was complaining about the speed (or lack thereof) of Azure orchestration system, someone replied “I tried to do $somethingComplicated on AWS and it also took forever”
Following the “opinions are great, data is better” mantra (as opposed to “never let facts get in the way of a good story” supposedly practiced by some podcasters), I decided to do a short experiment: create a very similar environment with Azure and AWS.
I took simple Terraform deployment configuration for AWS and Azure. Both included a virtual network, two subnets, a route table, a packet filter, and a VM with public IP address. Here are the observed times:
TL&DR: Azure Route Server works as advertised. Setting it up is excruciatingly slow. You might want to start the process just before taking a long lunch break.
I decided to take Azure Route Server for a ride. Simple setup, two Networking Virtual Appliance (NVA) instances running Quagga to advertise a single prefix (just to see how multipathing works).
Here’s the diagram of what I set up:
TL&DR: You get unequal-cost multipath for free with distance-vector routing protocols. Implementing it in link state routing protocols is an order of magnitude more CPU-consuming.
Continuing our exploration of the Unequal-Cost Multipath world, why was it implemented in EIGRP decades ago, but not in OSPF or IS-IS?
Ignoring for the moment the “does it make sense” dilemma: finding downstream paths (paths strictly shorter than the current best path) is a side effect of running distance vector algorithms.
TL&DR: There cannot be a simple and easy recipe for success, or everyone else would be using it.
My recent chat with David Bombal about networking careers’ future resulted in tons of comments, including a few complaints effectively saying I was pontificating instead of giving out easy-to-follow recipes. Here’s one of the more polite ones:
No tangible solutions given, no path provided, no actionable advice detailed.
I totally understand the resentment. Like a lot of other people, I spent way too much time looking for recipes for success. It was tough to admit there are none for a simple reason: if there was a recipe for easy success, everyone would be using it, and then we’d have to redefine success. Nobody would admit that being average is a success, or as Jeroen van Bemmel said in his LinkedIn comment:
Success requires differentiation, which cannot be achieved by copying others. As Steve Jobs put it: “Be hungry, stay foolish”
I hope you’re aware that the venerable ping (and most of its variants) measures round-trip-time – how long it takes to get to the destination and back – but is there a way to measure one-way latency or find out asymmetric transit times?
Ben Cox found a way to use ICMP timestamps together with reasonably accurate NTP-derived time to do just that. More details in Splitting the ping (HT: Drew Conry-Murray).
A few weeks ago I enjoyed a long-overdue chat with David Bombal. David published the first part of it under the click-bait headline Is Networking Dead (he renamed it Is There any Future for Networking Engineers in the meantime).
According to Betteridge’s law of headlines the answer to his original headline is NO (and the second headline violates that law – there you go 🤷♂️). If you’re still interested in the details, watch the interview.
Does anyone know what secret networking magic the Cloud providers are doing deep in their fabrics which are not exposed to consumers of their services?
TL&DR: Of course not… and I’m guessing it would be pretty expensive if I knew and told you.
In the Does Unequal-Cost Multipathing Make Sense blog post I wrote (paraphrased):
The trick to successful utilization of unequal uplinks is to use them wisely […] It’s how multipath TCP (MP-TCP) could be used for latency-critical applications like Siri.
Minh Ha quickly pointed out (some) limitations of MP-TCP and as is usually the case, his comment was too valuable to be left as a small print at the bottom of a blog post.
One of the attendees of our network automation course asked a question along these lines:
In a previous Ansible-based project I used Excel sheet to contain all relevant customer data. I converted this spreadsheet using python (xls_to_fact) and pushed the configurations to network devices accordingly. I know some people use YAML to define the variables in Git. What would be the advantages of doing that over Excel/xsl_to_fact?
Whenever you’re choosing a data store for your network automation solution you have to consider a number of aspects including:
Last week I described the challenges Azure Route Server is supposed to solve. Now let’s dive deeper into how it’s implemented and what those implementation details mean for your design.
The whole thing looks relatively simple:
If you want to grow beyond being a CLI (or Python) jockey, it’s worth trying to understand things work… not only how frames get from one end of the world to another, but also how applications work, and why they’re structured they way they are.
After reviewing Cisco SD-WAN policies, it’s time to dig into the routing design. In this section, David Penaloza enumerated several possible topologies, types of transport, their advantages and drawbacks, considerations for tunnel count and regional presence, and what you should consider beforehand when designing the solution from the control plane’s perspective.
When preparing an answer to an interesting idea left as a comment to my unequal-cost load balancing blog post, I realized I never described the difference between topology-based and congestion-driven load balancing.
To keep things simple, let’s start with an easy leaf-and-spine fabric:
Imagine you decided to deploy an SD-WAN (or DMVPN) network and make an Azure region one of the sites in the new network because you already deployed some workloads in that region and would like to replace the VPN connectivity you’re using today with the new shiny expensive gadget.
Everyone told you to deploy two SD-WAN instances in the public cloud virtual network to be redundant, so this is what you deploy:
It looks like JSON Schema is the new black. Last week I wrote about a new Ansible module using JSON Schema to validate data structures passed to it; a few weeks ago NetworkToCode released Schema Enforcer, a similar CLI tool (which means it’s easy to use it in any CI/CD pipeline).
Here are just a few things Schema Enforcer can do:
A few weeks ago I got an excited tweet from someone working at Oracle Cloud Infrastructure: they launched full-blown layer-2 virtual networks in their public cloud to support customers migrating existing enterprise spaghetti mess into the cloud.
Let’s skip the usual does everyone using the applications now have to pay for Oracle licenses and I wonder what the lock in might be when I migrate my workloads into an Oracle cloud jokes and focus on the technical aspects of what they claim they implemented. Here’s my immediate reaction (limited to the usual 280 characters, because that’s the absolute upper limit of consumable content these days):
The one and only Avery Pennarun (of the world in which IPv6 was a good design fame) is back with another absolutely-must-read article explaining how various archetypes apply to real-world challenges, including:
- Hierarchies and decentralization (and why decentralization is a myth)
- Chicken-and-egg problem (and why some good things fail)
- Second-system effect (or why it’s better to refactor than to rewrite)
- Innovator’s dilemma (or why large corporations become obsolete)
If you think none of these applies to networking, you’re probably wrong… but of course please write a comment if you still feel that way after reading Avery’s article.
David Bombal invited me for another short chat – this time on what I recommend young networking engineers just starting their career. As I did a bit of a research I stumbled upon some great recommendations on Quora:
- How to identify a good electrical engineer
- What advice would you give young engineers early in their career?
- What are the most important things of working as an engineer that nobody mentioned in college?
I couldn’t save the pages to Internet Archive (looks like it’s not friendly with Quora), so I can only hope they won’t disappear ;)
In the previous video in this series, I described how path discovery works in source routing and virtual circuit environments. I couldn’t squeeze the discussion of hop-by-hop forwarding into the same video (it would make the video way too long); you’ll find it in the next video in the same section.
A few months ago I described how you could use JSON Schema to validate your automation data models, host/group variable files, or even Ansible inventory file.
I had to use a weird toolchain to get it done – either ansible-inventory to build a complete data model from various inventory sources, or yq to convert YAML to JSON… and just for the giggles jsonschema CLI command requires the JSON input to reside in a file, so you have to use a temporary file to get the job done.
One of my readers sent me this question:
My job required me to determine if one IP address is unicast or anycast. Is it possible to get this information from the bgp dump?
TL&DR: Not with anything close to 100% reliability. An academic research paper (HT: Andrea di Donato) documents a false-positive rate of around 10%.
If you’re not familiar with IP anycast: it’s a brilliant idea of advertising the same prefix from multiple independent locations, or the same IP address from multiple servers. Works like a charm for UDP (that’s how all root DNS servers are built) and supposedly pretty well across distant-enough locations for TCP (with a long list of caveats when used within a data center).
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? You mentioned that it’s difficult to create application swimlanes that way. But does subnet matter if your VMs are in different AZs?
It’s time I explain the concepts of application swimlanes and how they apply to availability zones in public clouds.
A while ago Antti Leimio wrote a long twitter thread describing his frustrations with Cisco ACI object model. I asked him for permission to repost the whole thread as those things tend to get lost, and he graciously allowed me to do it, so here we go.
I took a 5 days Cisco DCACI course. This is all new to me. I’m confused. Who is ACI for? Capabilities and completeness of features is fantastic but how to manage this complex system?