TL&DR: Client clock skew could result in AWS authentication failure when running terraform apply
When I wanted to compare AWS and Azure orchestration speeds I encountered a crazy Terraform error message when running terraform apply:
module.network.aws_vpc.My_VPC: Creating... Error: Error creating VPC: AuthFailure: AWS was not able to validate the provided access credentials status code: 401, request id: ...
Obviously I did all the usual stuff before googling for a solution:
When I was complaining about the speed (or lack thereof) of Azure orchestration system, someone replied “I tried to do $somethingComplicated on AWS and it also took forever”
Following the “opinions are great, data is better” mantra (as opposed to “never let facts get in the way of a good story” supposedly practiced by some podcasters), I decided to do a short experiment: create a very similar environment with Azure and AWS.
I took simple Terraform deployment configuration for AWS and Azure. Both included a virtual network, two subnets, a route table, a packet filter, and a VM with public IP address. Here are the observed times:
Miha Markočič created sample automation scripts (mostly Terraform configuration files + AWS CLI commands where needed) deploying these features described in AWS Networking webinar:
- IP multicast deployment (video)
- Web Application Firewall deployment (video)
- Network Load Balancer deployment (video)
- Inter-region VPC Peering deployment (video)
To recreate them, clone the GitHub repository and follow the instructions.
Deciding to create AWS Networking and Azure Networking webinars wasn’t easy – after all, there’s so much content out there covering all aspects of public cloud services, and a plethora of certification trainings (including free training from AWS).
Having that in mind, it’s so nice to hear from people who found our AWS webinar useful ;)
Even though we are working with these technologies and have the certifications, there are always nuggets of information in these webinars that make it totally worthwhile. A good example in this series was the ingress routing feature updates in AWS.
It can be hard to filter through the noise from cloud providers to get to the new features that actually make a difference to what we are doing. This series does exactly that for me. Brilliant as always.
AWS Networking and Azure Networking webinars are available with Standard ipSpace.net Subscription. For even deeper dive into cloud networking check out our Networking in Public Cloud Deployments online course.
A while ago (eons before AWS introduced Gateway Load Balancer) I discussed the intricacies of AWS and Azure networking with a very smart engineer working for a security appliance vendor, and he said something along the lines of “it shows these things were designed by software developers – they have no idea how networks should work.”
In reality, at least some aspects of public cloud networking come closer to the original ideas of how IP and data-link layers should fit together than today’s flat earth theories, so he probably wanted to say “they make it so hard for me to insert my virtual appliance into their network.”
In last week’s update session we covered the new features AWS introduced since the creation of AWS Networking webinar in 2019:
- AWS Local Zones, Wavelengths, and Outposts
- VPC Sharing
- Bring Your Own Addresses
- IP Multicast support
- Managed Prefix Lists in security groups and route tables
- VPC Traffic Mirroring
- Web Application Firewall
- AWS Shield
- VPC Ingress Routing
- Inter-region VPC peering with Transit Gateways
The videos are already online; you need Standard or Expert ipSpace.net subscription to watch them.
One of my readers sent me this question (probably after stumbling upon a remark I made in the AWS Networking webinar):
You had mentioned that AWS is probably not using EVPN for their overlay control-plane because it doesn’t work for their scale. Can you elaborate please? I’m going through an EVPN PoC and curious to learn more.
It’s safe to assume AWS uses some sort of overlay virtual networking (like every other sane large-scale cloud provider). We don’t know any details; AWS never felt the need to use conferences as recruitment drives, and what little they told us at re:Invent described the system mostly from the customer perspective.
- Security groups to restrict access to web server and SSH bastion host;
- An IAM policy and associated user that has read-only access to EC2 and VPC resources (used for monitoring)
- An IAM policy that has full access to as single S3 bucket (used to modify static content hosted on S3)
- An IAM role for AWS CloudWatch logs
- Logging SSH events from the SSH bastion host into CloudWatch logs.
I got a question along these lines from a friend of mine:
Google recently announced a huge data center build in country to open new GCP regions. Does that mean I should invest into mastering GCP or should I focus on some other public cloud platform?
As always, the right answer is “it depends”, for example:
Regular readers of my blog probably remember the detailed explanations Erik Auerswald creates while solving hands-on exercises from our Networking in Public Cloud Deployments online course (previous ones: create a virtual network, deploy a web server).
This time he documented the process he went through to develop a Terraform configuration file that deploys full-blown AWS networking infrastructure (VPC, subnets, Internet gateway, route tables, security groups) and multiple servers include an SSH bastion host. You’ll also see what he found out when he used Elastic Network Interfaces (spoiler: routing on multi-interface hosts is tough).
There’s one thing no cloud vendor ever managed to change: virtual machines running on top of cloud infrastructure expect to have Ethernet interfaces.
It doesn’t matter if the virtual Ethernet Network Interface Cards (NICs) are implemented with software emulation of actual hardware (VMware emulated the ancient Novell NE1000 NIC) or with paravirtual drivers - the virtual machines expect to send and receive Ethernet frames. What happens beyond the Ethernet NIC depends on the cloud implementation details.
Last week we finally made it work - unfortunately only in a virtual event, so I got none of the famous Irish beer - and the video about alternate universes of public cloud networking is already online.
Maximilian Wilhelm had great fun turning my usual black-and-white statements into tweets, including:
IPv6 is old enough to buy its own beer (in US, not just in Europe), but there are still tons of naysayers explaining how hard it is to deploy. That’s probably true if you’re forced to work with decades-old boxes, or if you handcrafted your environment with a gazillion clicks in a fancy GUI, but if you used Terraform to deploy your application in AWS, it’s as hard as adding a few extra lines in your configuration files.
Nadeem Lughmani did a great job documenting the exact changes needed to get IPv6 working in AWS VPC, including adjusting the IPv6 routing tables, and security groups. Enjoy ;)
There was an obvious invisible elephant in the virtual Cloud Field Day 7 (CFD7v) event I attended in late April 2020. Most everyone was talking about AWS, how their stuff runs on AWS, how it integrates with AWS, or how it will help others leapfrog AWS (yeah, sure…).
Although you REALLY SHOULD watch my AWS Networking webinar (or something equivalent) to understand what problems vendors like VMWare or Pensando are facing or solving, I’m pretty sure a lot of people think they can get away with CliffsNotes version of it, so here they are ;)
Considering VMware’s enrapturement with vMotion the following news (reported by Salman Naqvi in a comment to my blog post) was clearly inevitable:
I was surprised to learn that LIVE vMotion is supported between on-premise and Vmware on AWS Cloud
What’s more interesting is how did they manage to do it?