Azure and AWS have decent documentation (I always found it relatively easy to figure out what they’re doing), but what they implemented is sometimes so far away from what we’re used to that it’s hard to bridge the gap. Here’s how Olle Wilhelmsson solved that challenge:
I would just like to send a huge thank you, I’ve been a fan of your appearances on tech field day as a voice of reason, and different podcasts all around. Happy to finally be able to contribute and purchase an IPspace subscription, and was not disappointed.
This series on Azure networking was fantastic, it’s been frustrating to find any kind of good material on this topic. Even if Microsofts documentation is generally good, they really don’t have any resources to compare it to “regular” networking in physical equipment. So just a huge thank you, this has definitely saved me countless hours of reading and googling questions!
I know the title sounds like a buzzword-bingo-winning clickbait, but it’s true. Adrian Giacometti decided to merge the topics of two ipSpace.net online courses and automated deployment of AWS security rules using Terraform within GitLab CI pipeline, with Slack messages serving as manual checks and approvals.
Not only did he do a great job mastering- and gluing together so many diverse bits and pieces, he also documented the solution and published the source code:
- Part 1: Cloud & Network automation challenge: Deploy Security Rules in a DevOps/GitOps world with AWS, Terraform, GitLab CI, Slack, and Python (special guest FastAPI)
- Part 2: AWS, Terraform and FastAPI
- Part 3: GitLab CI, Slack, and Python
- Source code: aegiacometti/devops_cloud_challenge · GitLab
Want to build something similar? Join our Network Automation and/or Public Cloud course and get started. Need something similar in your environment? Adrian is an independent consultant and ready to work on your projects.
I’ve been saying the same thing for years, but never as succinctly as Alastair Cooke did in his Understand Your Single Points of Failure (SPOF) blog post:
The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF.
Obviously that assumes that you’re able to follow business objectives and not some artificial measure like uptime. Speaking of artificial measures, you might like the discussion about taxonomy of indecision.
Recently I joked there’s significant difference between AWS and Azure launching features:
- AWS launches a production-ready feature that you can consume the next day.
- Azure launches a preview that might work in 6 months.
Those with long enough memories shouldn’t be surprised. It’s not the first time Microsoft is using the same tactics.
One of my readers sent me a series of “how do I get started with…” questions including:
I’ve been doing networking and security for 5 years, and now I am responsible for our cloud infrastructure. Anything to do with networking and security in the cloud is my responsibility along with another team member. It is all good experience but I am starting to get concerned about not knowing automation, IaC, or any programming language.
No need to worry about that, what you need (to start with) is extremely simple and easy-to-master. Infrastructure-as-Code is a simple concept: infrastructure configuration is defined in machine-readable format (mostly text files these days) and used by a remediation tool like Terraform that compares the actual state of the deployed infrastructure with the desired state as defined in the configuration files, and makes changes to the actual state to bring it in line with how it should look like.
TL&DR: Client clock skew could result in AWS authentication failure when running terraform apply
When I wanted to compare AWS and Azure orchestration speeds I encountered a crazy Terraform error message when running terraform apply:
module.network.aws_vpc.My_VPC: Creating... Error: Error creating VPC: AuthFailure: AWS was not able to validate the provided access credentials status code: 401, request id: ...
Obviously I did all the usual stuff before googling for a solution:
Here’s a message I got from one of my subscribers (probably based on one of my recent public cloud rants):
I often think the cloud stuff has been sent to try us in IT – the struggle could be tough enough when we were dealing with waterfall development and monolithic projects. When products took years to develop, and years to understand.
And now we’re being asked to be agile and learn new stuff all the time about moving targets that barely have documentation at all, never mind accurate doco! We had obviously got into our comfort zone and needed shaking out of it!
Always interested to hear your experiences with the cloud networking though – it’s what I subscribed to ipspace.net for TBH as I think it’s the most complete reference source for that purpose and a vital part of enterprise networking these days!
When I was complaining about the speed (or lack thereof) of Azure orchestration system, someone replied “I tried to do $somethingComplicated on AWS and it also took forever”
Following the “opinions are great, data is better” mantra (as opposed to “never let facts get in the way of a good story” supposedly practiced by some podcasters), I decided to do a short experiment: create a very similar environment with Azure and AWS.
I took simple Terraform deployment configuration for AWS and Azure. Both included a virtual network, two subnets, a route table, a packet filter, and a VM with public IP address. Here are the observed times:
TL&DR: Azure Route Server works as advertised. Setting it up is excruciatingly slow. You might want to start the process just before taking a long lunch break.
I decided to take Azure Route Server for a ride. Simple setup, two Networking Virtual Appliance (NVA) instances running Quagga to advertise a single prefix (just to see how multipathing works).
Here’s the diagram of what I set up:
Does anyone know what secret networking magic the Cloud providers are doing deep in their fabrics which are not exposed to consumers of their services?
TL&DR: Of course not… and I’m guessing it would be pretty expensive if I knew and told you.
Last week I described the challenges Azure Route Server is supposed to solve. Now let’s dive deeper into how it’s implemented and what those implementation details mean for your design.
The whole thing looks relatively simple:
Imagine you decided to deploy an SD-WAN (or DMVPN) network and make an Azure region one of the sites in the new network because you already deployed some workloads in that region and would like to replace the VPN connectivity you’re using today with the new shiny expensive gadget.
Everyone told you to deploy two SD-WAN instances in the public cloud virtual network to be redundant, so this is what you deploy:
A few weeks ago I got an excited tweet from someone working at Oracle Cloud Infrastructure: they launched full-blown layer-2 virtual networks in their public cloud to support customers migrating existing enterprise spaghetti mess into the cloud.
Let’s skip the usual does everyone using the applications now have to pay for Oracle licenses and I wonder what the lock in might be when I migrate my workloads into an Oracle cloud jokes and focus on the technical aspects of what they claim they implemented. Here’s my immediate reaction (limited to the usual 280 characters, because that’s the absolute upper limit of consumable content these days):
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? You mentioned that it’s difficult to create application swimlanes that way. But does subnet matter if your VMs are in different AZs?
It’s time I explain the concepts of application swimlanes and how they apply to availability zones in public clouds.
Now that we know what regions and availability zones are, let’s go back to Daniel Dib’s question:
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? Does subnet matter if your VMs are in different AZs?
Wait, what? A subnet is stretched across multiple failure domains? Didn’t Ivan claim that’s ridiculous?
TL&DR: What I claimed was that a single layer-2 network is a single failure domain. Things are a bit more complex in public clouds. Keep reading and you’ll find out why.