Most of the public cloud training seems focused on developers. No surprise there, they are the usual beachhead public cloud services need to get into large organizations. Unfortunately, once the production applications start getting deployed into public cloud infrastructure, someone has to take over operations, and that’s where the fun starts.
For whatever reason, there aren’t that many resources helping the infrastructure operations teams understand how to deal with this weird new world, at least according to the feedback Jawed left on Azure Networking webinar:
Even though you need plenty of traditional networking constructs to deploy a complex application stack in a public cloud (packet filters, firewalls, load balancers, VPN, BGP…), once you start digging deep into the bowels of public cloud virtual networking, you’ll find out it’s significantly different from the traditional Ethernet+IP implementations common in enterprise data centers.
For an overview of the differences watch the Public Cloud Networking Is Different video (part of Introduction to Cloud Computing webinar), for more details start with AWS Networking 101 and Azure Networking 101 blog posts, and continue with corresponding cloud networking webinars.
However, it looks like most of those materials focus on developers (no wonder – they are the most significant audience), with little thought being given to the needs of network engineers… at least according to the feedback left by one of ipSpace.net subscribers.
Whenever someone starts mansplaining that we need no networking when we move the workloads into a public cloud, please walk away – he has just proved how clueless he is.
He might be a tiny bit correct when talking about software-as-a-service (after all, it’s just someone else’s web site), but when it comes to complex infrastructure virtual networks, there’s plenty of networking involved, from packet filters and subnets to NAT, load balancers, firewalls, BGP and IPsec.
Azure and AWS have decent documentation (I always found it relatively easy to figure out what they’re doing), but what they implemented is sometimes so far away from what we’re used to that it’s hard to bridge the gap. Here’s how Olle Wilhelmsson solved that challenge:
I would just like to send a huge thank you, I’ve been a fan of your appearances on tech field day as a voice of reason, and different podcasts all around. Happy to finally be able to contribute and purchase an IPspace subscription, and was not disappointed.
This series on Azure networking was fantastic, it’s been frustrating to find any kind of good material on this topic. Even if Microsofts documentation is generally good, they really don’t have any resources to compare it to “regular” networking in physical equipment. So just a huge thank you, this has definitely saved me countless hours of reading and googling questions!
I know the title sounds like a buzzword-bingo-winning clickbait, but it’s true. Adrian Giacometti decided to merge the topics of two ipSpace.net online courses and automated deployment of AWS security rules using Terraform within GitLab CI pipeline, with Slack messages serving as manual checks and approvals.
Not only did he do a great job mastering- and gluing together so many diverse bits and pieces, he also documented the solution and published the source code:
- Part 1: Cloud & Network automation challenge: Deploy Security Rules in a DevOps/GitOps world with AWS, Terraform, GitLab CI, Slack, and Python (special guest FastAPI)
- Part 2: AWS, Terraform and FastAPI
- Part 3: GitLab CI, Slack, and Python
- Source code: aegiacometti/devops_cloud_challenge · GitLab
Want to build something similar? Join our Network Automation and/or Public Cloud course and get started. Need something similar in your environment? Adrian is an independent consultant and ready to work on your projects.
I’ve been saying the same thing for years, but never as succinctly as Alastair Cooke did in his Understand Your Single Points of Failure (SPOF) blog post:
The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF.
Obviously that assumes that you’re able to follow business objectives and not some artificial measure like uptime. Speaking of artificial measures, you might like the discussion about taxonomy of indecision.
Recently I joked there’s significant difference between AWS and Azure launching features:
- AWS launches a production-ready feature that you can consume the next day.
- Azure launches a preview that might work in 6 months.
Those with long enough memories shouldn’t be surprised. It’s not the first time Microsoft is using the same tactics.
One of my readers sent me a series of “how do I get started with…” questions including:
I’ve been doing networking and security for 5 years, and now I am responsible for our cloud infrastructure. Anything to do with networking and security in the cloud is my responsibility along with another team member. It is all good experience but I am starting to get concerned about not knowing automation, IaC, or any programming language.
No need to worry about that, what you need (to start with) is extremely simple and easy-to-master. Infrastructure-as-Code is a simple concept: infrastructure configuration is defined in machine-readable format (mostly text files these days) and used by a remediation tool like Terraform that compares the actual state of the deployed infrastructure with the desired state as defined in the configuration files, and makes changes to the actual state to bring it in line with how it should look like.
TL&DR: Client clock skew could result in AWS authentication failure when running terraform apply
When I wanted to compare AWS and Azure orchestration speeds I encountered a crazy Terraform error message when running terraform apply:
module.network.aws_vpc.My_VPC: Creating... Error: Error creating VPC: AuthFailure: AWS was not able to validate the provided access credentials status code: 401, request id: ...
Obviously I did all the usual stuff before googling for a solution:
Here’s a message I got from one of my subscribers (probably based on one of my recent public cloud rants):
I often think the cloud stuff has been sent to try us in IT – the struggle could be tough enough when we were dealing with waterfall development and monolithic projects. When products took years to develop, and years to understand.
And now we’re being asked to be agile and learn new stuff all the time about moving targets that barely have documentation at all, never mind accurate doco! We had obviously got into our comfort zone and needed shaking out of it!
Always interested to hear your experiences with the cloud networking though – it’s what I subscribed to ipspace.net for TBH as I think it’s the most complete reference source for that purpose and a vital part of enterprise networking these days!
When I was complaining about the speed (or lack thereof) of Azure orchestration system, someone replied “I tried to do $somethingComplicated on AWS and it also took forever”
Following the “opinions are great, data is better” mantra (as opposed to “never let facts get in the way of a good story” supposedly practiced by some podcasters), I decided to do a short experiment: create a very similar environment with Azure and AWS.
I took simple Terraform deployment configuration for AWS and Azure. Both included a virtual network, two subnets, a route table, a packet filter, and a VM with public IP address. Here are the observed times:
TL&DR: Azure Route Server works as advertised. Setting it up is excruciatingly slow. You might want to start the process just before taking a long lunch break.
I decided to take Azure Route Server for a ride. Simple setup, two Networking Virtual Appliance (NVA) instances running Quagga to advertise a single prefix (just to see how multipathing works).
Here’s the diagram of what I set up:
Does anyone know what secret networking magic the Cloud providers are doing deep in their fabrics which are not exposed to consumers of their services?
TL&DR: Of course not… and I’m guessing it would be pretty expensive if I knew and told you.
Last week I described the challenges Azure Route Server is supposed to solve. Now let’s dive deeper into how it’s implemented and what those implementation details mean for your design.
The whole thing looks relatively simple: